Reverting expm to use opnorm(A, 1) #579

nlw0 · 2018-12-27T18:56:20Z

In my latest tests using the original line with opnorm is faster and prevents allocations.

In my latest tests using the original line with `opnorm` is faster and prevents allocations.

coveralls · 2018-12-27T19:33:38Z

Coverage decreased (-0.03%) to 65.942% when pulling b4f6f26 on nlw0:master into 786b6f0 on JuliaArrays:master.

codecov-io · 2018-12-28T10:45:33Z

Codecov Report

Merging #579 into master will decrease coverage by 0.03%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master     #579      +/-   ##
==========================================
- Coverage   65.97%   65.94%   -0.04%     
==========================================
  Files          38       38              
  Lines        2898     2898              
==========================================
- Hits         1912     1911       -1     
- Misses        986      987       +1

Impacted Files	Coverage Δ
src/expm.jl	`37.17% <0%> (-1.29%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 786b6f0...b4f6f26. Read the comment docs.

andyferris · 2019-01-02T06:14:15Z

If we care about performance here, it is probably fastest if we overload LinearAlgebra.opnorm1 and call that?

nlw0 · 2019-01-02T09:46:47Z

I'm a newbie, what exactly does that mean? From what I understand opnorm(A, 1) is just an if that calls opnorm1, and I would hope all of this to get optimized away. Even if it doesn't, I'm happy already by having reached 0 allocations and a noticeable speed impovement, and all that by actually reverting a change and making the code like the original again. This was just the result of me optimizing my application and hunting down the very last allocation I was getting. Do you think further investigations could be profitable?

andyferris · 2019-01-02T11:15:31Z

Right - thanks for the contribution, and of course any improvement is helpful :) What would be great to see in PRs like this one is a quick before-and-after benchmark.

But in the eternal quest for perfection, I was just wondering if we can make something faster than this method for statically sized arrays. It seems we can determine has_offset_axes at compile time, and also unroll the loops, and perhaps these changes would be a bit faster. Happy to deal with that in seperate PRs if you don't want to dig into that now.

nlw0 · 2019-01-02T12:21:11Z

I ran a more careful benchmark here, and the pacth is actually arguable because the current code is the best for 3x3 matrices. It's only in the 4x4 case that opnorm was better:

I'm not sure what to do, I don't really require this function in my project anymore, but it seems a good opportunity to make a small contribution here, and to learn a lot in the process. What would you say is the Minimum Viable Patch for this issue?

current code (`786b6f0`)

julia> n=3
3

julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  
  minimum time:     234.000 ns (0.00% GC)
  median time:      249.000 ns (0.00% GC)
  mean time:        253.140 ns (0.00% GC)
  maximum time:     13.319 μs (0.00% GC)
  
  samples:          10000
  evals/sample:     1

julia> n=4
4

julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b)
BenchmarkTools.Trial:
  memory estimate:  336 bytes
  allocs estimate:  3
  
  minimum time:     442.000 ns (0.00% GC)
  median time:      504.000 ns (0.00% GC)
  mean time:        1.852 μs (0.00% GC)
  maximum time:     12.600 ms (0.00% GC)
  
  samples:          10000
  evals/sample:     1

julia> A = @SArray rand(3,3);

julia> @btime exp($A);
  177.013 ns (0 allocations: 0 bytes)

julia> A = @SArray rand(4,4);

julia> @btime exp($A);
  404.617 ns (3 allocations: 336 bytes)

this patch (opnorm(A,1))

julia> n=3
3

julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  
  minimum time:     260.000 ns (0.00% GC)
  median time:      283.000 ns (0.00% GC)
  mean time:        290.936 ns (0.00% GC)
  maximum time:     13.842 μs (0.00% GC)
  
  samples:          10000
  evals/sample:     1

julia> n=4
4

julia> b = @benchmarkable exp(y) setup=(y = @SArray rand($n,$n))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  
  minimum time:     419.000 ns (0.00% GC)
  median time:      513.000 ns (0.00% GC)
  mean time:        520.145 ns (0.00% GC)
  maximum time:     13.834 μs (0.00% GC)
  
  samples:          10000
  evals/sample:     1

julia> A = @SArray rand(3,3);

julia> @btime exp($A);
  190.372 ns (0 allocations: 0 bytes)

julia> A = @SArray rand(4,4);

julia> @btime exp($A);
  329.917 ns (0 allocations: 0 bytes)

tkoolen · 2019-01-02T16:04:54Z

I took the liberty of reformatting your code to use code blocks instead of quotes. That way you get a monospace font, and you don't accidentally notify random users when you use a macro (sorry, @sarray 🙂).

c42f · 2019-07-31T08:31:56Z

Sorry for the delay. I think it might be best to address the underlying issue if possible and make sure that sum(A; dims=Val(1)) doesn't allocate.

Alternatively implement an (unrolled?) opnorm or opnorm1 for StaticArrays.

c42f · 2019-08-01T01:25:58Z

Alternatively implement an (unrolled?) opnorm or opnorm1 for StaticArrays

On further thought I think this would be best, and to do that in addition to the changes in this PR: this operation is indeed opnorm and it would be nice to have that higher level of abstraction.

In local testing, it looks like this change so far makes exp for 3x3 matrices about 20% slower which is not ideal. On the plus side it does remove all allocations which is nice, and doesn't have any real speed penalty for 4x4 and 5x5 matrices.

Revertig expm to use opnorm(A, 1)

b4f6f26

In my latest tests using the original line with `opnorm` is faster and prevents allocations.

nlw0 mentioned this pull request Dec 27, 2018

expm is allocating #578

Closed

andyferris changed the title ~~Revertig expm to use opnorm(A, 1)~~ Reverting expm to use opnorm(A, 1) Jan 2, 2019

c42f added the performance runtime performance label Jul 31, 2019

c42f added the linear-algebra label Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reverting expm to use opnorm(A, 1) #579

Reverting expm to use opnorm(A, 1) #579

nlw0 commented Dec 27, 2018

coveralls commented Dec 27, 2018

codecov-io commented Dec 28, 2018 •

edited

Loading

andyferris commented Jan 2, 2019

nlw0 commented Jan 2, 2019

andyferris commented Jan 2, 2019

nlw0 commented Jan 2, 2019 •

edited

Loading

tkoolen commented Jan 2, 2019

c42f commented Jul 31, 2019

c42f commented Aug 1, 2019

Reverting expm to use opnorm(A, 1) #579

Are you sure you want to change the base?

Reverting expm to use opnorm(A, 1) #579

Conversation

nlw0 commented Dec 27, 2018

coveralls commented Dec 27, 2018

codecov-io commented Dec 28, 2018 • edited Loading

Codecov Report

andyferris commented Jan 2, 2019

nlw0 commented Jan 2, 2019

andyferris commented Jan 2, 2019

nlw0 commented Jan 2, 2019 • edited Loading

current code (786b6f0)

this patch (opnorm(A,1))

tkoolen commented Jan 2, 2019

c42f commented Jul 31, 2019

c42f commented Aug 1, 2019

codecov-io commented Dec 28, 2018 •

edited

Loading

nlw0 commented Jan 2, 2019 •

edited

Loading

current code (`786b6f0`)