Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverting expm to use opnorm(A, 1) #579

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

nlw0
Copy link

@nlw0 nlw0 commented Dec 27, 2018

In my latest tests using the original line with opnorm is faster and prevents allocations.

In my latest tests using the original line with `opnorm` is faster and prevents allocations.
@nlw0 nlw0 mentioned this pull request Dec 27, 2018
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 65.942% when pulling b4f6f26 on nlw0:master into 786b6f0 on JuliaArrays:master.

@codecov-io
Copy link

codecov-io commented Dec 28, 2018

Codecov Report

Merging #579 into master will decrease coverage by 0.03%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #579      +/-   ##
==========================================
- Coverage   65.97%   65.94%   -0.04%     
==========================================
  Files          38       38              
  Lines        2898     2898              
==========================================
- Hits         1912     1911       -1     
- Misses        986      987       +1
Impacted Files Coverage Δ
src/expm.jl 37.17% <0%> (-1.29%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 786b6f0...b4f6f26. Read the comment docs.

@andyferris
Copy link
Member

If we care about performance here, it is probably fastest if we overload LinearAlgebra.opnorm1 and call that?

@nlw0
Copy link
Author

nlw0 commented Jan 2, 2019

I'm a newbie, what exactly does that mean? From what I understand opnorm(A, 1) is just an if that calls opnorm1, and I would hope all of this to get optimized away. Even if it doesn't, I'm happy already by having reached 0 allocations and a noticeable speed impovement, and all that by actually reverting a change and making the code like the original again. This was just the result of me optimizing my application and hunting down the very last allocation I was getting. Do you think further investigations could be profitable?

@andyferris
Copy link
Member

Right - thanks for the contribution, and of course any improvement is helpful :) What would be great to see in PRs like this one is a quick before-and-after benchmark.

But in the eternal quest for perfection, I was just wondering if we can make something faster than this method for statically sized arrays. It seems we can determine has_offset_axes at compile time, and also unroll the loops, and perhaps these changes would be a bit faster. Happy to deal with that in seperate PRs if you don't want to dig into that now.

@andyferris andyferris changed the title Revertig expm to use opnorm(A, 1) Reverting expm to use opnorm(A, 1) Jan 2, 2019
@nlw0
Copy link
Author

nlw0 commented Jan 2, 2019

I ran a more careful benchmark here, and the pacth is actually arguable because the current code is the best for 3x3 matrices. It's only in the 4x4 case that opnorm was better:

I'm not sure what to do, I don't really require this function in my project anymore, but it seems a good opportunity to make a small contribution here, and to learn a lot in the process. What would you say is the Minimum Viable Patch for this issue?

current code (786b6f0)

julia> n=3
3

julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  
  minimum time:     234.000 ns (0.00% GC)
  median time:      249.000 ns (0.00% GC)
  mean time:        253.140 ns (0.00% GC)
  maximum time:     13.319 μs (0.00% GC)
  
  samples:          10000
  evals/sample:     1

julia> n=4
4

julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b)
BenchmarkTools.Trial:
  memory estimate:  336 bytes
  allocs estimate:  3
  
  minimum time:     442.000 ns (0.00% GC)
  median time:      504.000 ns (0.00% GC)
  mean time:        1.852 μs (0.00% GC)
  maximum time:     12.600 ms (0.00% GC)
  
  samples:          10000
  evals/sample:     1

julia> A = @SArray rand(3,3);

julia> @btime exp($A);
  177.013 ns (0 allocations: 0 bytes)

julia> A = @SArray rand(4,4);

julia> @btime exp($A);
  404.617 ns (3 allocations: 336 bytes)

this patch (opnorm(A,1))

julia> n=3
3

julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  
  minimum time:     260.000 ns (0.00% GC)
  median time:      283.000 ns (0.00% GC)
  mean time:        290.936 ns (0.00% GC)
  maximum time:     13.842 μs (0.00% GC)
  
  samples:          10000
  evals/sample:     1

julia> n=4
4

julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  
  minimum time:     419.000 ns (0.00% GC)
  median time:      513.000 ns (0.00% GC)
  mean time:        520.145 ns (0.00% GC)
  maximum time:     13.834 μs (0.00% GC)
  
  samples:          10000
  evals/sample:     1

julia> A = @SArray rand(3,3);

julia> @btime exp($A);
  190.372 ns (0 allocations: 0 bytes)

julia> A = @SArray rand(4,4);

julia> @btime exp($A);
  329.917 ns (0 allocations: 0 bytes)

@tkoolen
Copy link
Contributor

tkoolen commented Jan 2, 2019

I took the liberty of reformatting your code to use code blocks instead of quotes. That way you get a monospace font, and you don't accidentally notify random users when you use a macro (sorry, @sarray 🙂).

@c42f
Copy link
Member

c42f commented Jul 31, 2019

Sorry for the delay. I think it might be best to address the underlying issue if possible and make sure that sum(A; dims=Val(1)) doesn't allocate.

Alternatively implement an (unrolled?) opnorm or opnorm1 for StaticArrays.

@c42f c42f added the performance runtime performance label Jul 31, 2019
@c42f
Copy link
Member

c42f commented Aug 1, 2019

Alternatively implement an (unrolled?) opnorm or opnorm1 for StaticArrays

On further thought I think this would be best, and to do that in addition to the changes in this PR: this operation is indeed opnorm and it would be nice to have that higher level of abstraction.

In local testing, it looks like this change so far makes exp for 3x3 matrices about 20% slower which is not ideal. On the plus side it does remove all allocations which is nice, and doesn't have any real speed penalty for 4x4 and 5x5 matrices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants