-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reverting expm to use opnorm(A, 1) #579
base: master
Are you sure you want to change the base?
Conversation
In my latest tests using the original line with `opnorm` is faster and prevents allocations.
Codecov Report
@@ Coverage Diff @@
## master #579 +/- ##
==========================================
- Coverage 65.97% 65.94% -0.04%
==========================================
Files 38 38
Lines 2898 2898
==========================================
- Hits 1912 1911 -1
- Misses 986 987 +1
Continue to review full report at Codecov.
|
If we care about performance here, it is probably fastest if we overload |
I'm a newbie, what exactly does that mean? From what I understand |
Right - thanks for the contribution, and of course any improvement is helpful :) What would be great to see in PRs like this one is a quick before-and-after benchmark. But in the eternal quest for perfection, I was just wondering if we can make something faster than this method for statically sized arrays. It seems we can determine |
I ran a more careful benchmark here, and the pacth is actually arguable because the current code is the best for 3x3 matrices. It's only in the 4x4 case that I'm not sure what to do, I don't really require this function in my project anymore, but it seems a good opportunity to make a small contribution here, and to learn a lot in the process. What would you say is the Minimum Viable Patch for this issue? current code (786b6f0)julia> n=3
3
julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)
julia> run(b)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
minimum time: 234.000 ns (0.00% GC)
median time: 249.000 ns (0.00% GC)
mean time: 253.140 ns (0.00% GC)
maximum time: 13.319 μs (0.00% GC)
samples: 10000
evals/sample: 1
julia> n=4
4
julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)
julia> run(b)
BenchmarkTools.Trial:
memory estimate: 336 bytes
allocs estimate: 3
minimum time: 442.000 ns (0.00% GC)
median time: 504.000 ns (0.00% GC)
mean time: 1.852 μs (0.00% GC)
maximum time: 12.600 ms (0.00% GC)
samples: 10000
evals/sample: 1
julia> A = @SArray rand(3,3);
julia> @btime exp($A);
177.013 ns (0 allocations: 0 bytes)
julia> A = @SArray rand(4,4);
julia> @btime exp($A);
404.617 ns (3 allocations: 336 bytes) this patch (opnorm(A,1))julia> n=3
3
julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)
julia> run(b)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
minimum time: 260.000 ns (0.00% GC)
median time: 283.000 ns (0.00% GC)
mean time: 290.936 ns (0.00% GC)
maximum time: 13.842 μs (0.00% GC)
samples: 10000
evals/sample: 1
julia> n=4
4
julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)
julia> run(b)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
minimum time: 419.000 ns (0.00% GC)
median time: 513.000 ns (0.00% GC)
mean time: 520.145 ns (0.00% GC)
maximum time: 13.834 μs (0.00% GC)
samples: 10000
evals/sample: 1
julia> A = @SArray rand(3,3);
julia> @btime exp($A);
190.372 ns (0 allocations: 0 bytes)
julia> A = @SArray rand(4,4);
julia> @btime exp($A);
329.917 ns (0 allocations: 0 bytes) |
I took the liberty of reformatting your code to use code blocks instead of quotes. That way you get a monospace font, and you don't accidentally notify random users when you use a macro (sorry, @sarray 🙂). |
Sorry for the delay. I think it might be best to address the underlying issue if possible and make sure that Alternatively implement an (unrolled?) |
On further thought I think this would be best, and to do that in addition to the changes in this PR: this operation is indeed In local testing, it looks like this change so far makes exp for 3x3 matrices about 20% slower which is not ideal. On the plus side it does remove all allocations which is nice, and doesn't have any real speed penalty for 4x4 and 5x5 matrices. |
In my latest tests using the original line with
opnorm
is faster and prevents allocations.