[MNT] - Simulations refactors #290

TomDonoghue · 2022-01-27T03:03:39Z

This PR does some refactors and tweaks of some of the newer simulation functions.

Naming

An open question (nothing changed yet) is if we want to do any naming updates.

In particular, across the module we use both exponent (or exp), and now in these newer functions, chi to refer to the aperiodic exponent. I think I would vote we consolidate on using the term exponent consistently for this. The main counterpoint I can think of is that exponent is a pretty generic name, and in some cases it could be a little unclear.
While we're on the topic, I'm not 100% sold on sim_peak_oscillation, which seems to imply a slightly different thing from this being a "combined" signal. I'm not sure I have a better suggestion though...

Thoughts?

Optimizations

For sim_knee and sim_peak_oscillation, there where a bunch of embedded loops that seemed optimizable - the updates here being to use sub-functions where they could be used and where appropriate, vectorize functions for applying across the vectors. Combining both also removed some interim computations of arrays. I also think the code might be a little clearer.

Optimization updates:

sim_knee: speedup of ~20-25% faster of shorter signals (10s), and ~100% faster for longer signals (60s)
sim_peak_oscillation: speedup of ~20-25% faster of shorter signals (10s), and ~100% faster for longer signals (60s)

All the actually computations are the same code, and I validated that the new versions give the same results

@ryanhammonds - in terms of review, I'm fairly confident about the updates here, so they shouldn't need a wild amount of further testing for the code changes. Let me know if you have any thoughts on naming, and I'd also be curious to hear if you have any other optimization ideas.

ryanhammonds

I looked into np.vectorize to learn how to speed up my own code and noticed this was in the numpy doc notes:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

I tested this out and a for loop was faster than using np.vectorize. It was also more memory efficient since the 2d cosines isn't required in the loop I tested.

%load_ext line_profiler

import numpy as np
from neurodsp.sim.aperiodic import sim_knee
from neurodsp.utils.data import create_times, compute_nsamples

# Settings
n_seconds=10
fs=1000
chi1=-1
chi2=-2
knee=100

times = create_times(n_seconds, fs)
n_samples = compute_nsamples(n_seconds, fs)

# Create frequencies for the power spectrum and drop the DC component
#   These frequencies are used to create the cosines to sum
freqs = np.linspace(0, fs / 2, num=int(n_samples // 2 + 1), endpoint=True)
freqs = freqs[1:]

%%timeit
np.random.seed(0)

def _coscoef(freq):
    return np.sqrt(1 / (freq ** -chi1 * (freq ** (-chi2 - chi1) + knee)))

# Define & vectorize sub-function for computing the set of cosines, adding a random phase shift
def _create_cosines(freq):
    return _coscoef(freq) * np.cos(2 * np.pi * freq * times + 2 * np.pi * np.random.rand())

vect_create_cosines = np.vectorize(_create_cosines, signature='()->(n)')

# Create the set of cosines
cosines = vect_create_cosines(freqs)

sig = np.sum(cosines, axis=0)

644 ms ± 3.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
np.random.seed(0)

sig = np.zeros(n_samples)

for f in freqs:
    sig += np.sqrt(1 / (f ** -chi1 * (f ** (-chi2 - chi1) + knee))) \
        * np.cos(2 * np.pi * f * times + 2 * np.pi * np.random.rand())

581 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The latter method is also more memory efficient since it doesn't store a large 2d array of cosines. When I run the np.vectorize block with a 100 second signal and 1000 fs, I get:

MemoryError: Unable to allocate 37.3 GiB for an array with shape (50000, 100000) and data type float64

As for variable names, I do think we should choose either exponent or chi. I'm use to it as exponent since that's what fooof used. But I don't have a strong opinion either way. I'll also give renaming sim_peak_oscillation some thought. I'm not sure about a better name either.

TomDonoghue · 2022-02-01T04:18:04Z

@ryanhammonds - awesome deep dive! My original understanding was that vectorize makes functions able to be applied across arrays in effectively a loop (sorta like apply_along) - but that it offered some performance gains in doing so - I think I misunderstood the performance aspects! I like the syntax, but I guess it's basically just syntactic sugar.

I had a quick poke around with what you suggested for sim_knee, and comparing all three implementations (for an intermediate, 15 second signal), I get:

original (main branch): average 2.03 s
my update (current sim branch): average 1.54 s
your update (from comment): average 1.08 s

Which means were getting to about a 100% speed up compared to original! Which is great! Thanks for digging further into this!

Practically, I think the main benefit is simply removing extra loops (doing it all in one), which as you mention also has memory benefits! I think the code also ends up looking nice and clean.

Next steps:

can you add a commit adding your update to sim_knee
do you think there's an equivalent further update for sim_peak_oscillation? Can you check and add if so?
you can also add in the knee fix update from [FIX] Sim knee update #284

For naming: we're agreed on consolidating exponent - I think it makes sense to chose that over chi, as it's already built in (no breaking changes to sim_powerlaw, etc). I'll push a commit for that change now. I'll keep thinking on sim_peak_oscillation too - we can update if we think of a better name.

ryanhammonds · 2022-02-02T04:29:07Z

@TomDonoghue I updated sim_knee and sim_peak_oscillation to use for loops.

I also converted the knee parameter to knee frequency in sim_knee. The math was a bit unwieldy for me, but I think I sorted it out. The knee is defined as:

knee_freq = knee ** (1 / (2*exponent1 + exponent2))

from solving for the fwhm in the Lorenztian:

L(freq) = 1 / (freq**(exponent1) * freq**(exponent2 + exponent1) + knee)
L(knee_freq) = f(0) / 2

1 / (knee_freq**exponent1 * knee_freq**(exponent2 + exponent1)) + knee = 1 / (2 * knee)
knee_freq**exponent1 * knee_freq**(exponent2 + exponent1) + knee = 2 * knee
knee_freq**(2*exponent1 + exponent2) = knee
knee_freq = knee ** (1 / (2*exponent1 + exponent2))

I think this should be the analogous to solving knee_freq = knee**(1/exponent) in the single exponent model.

TomDonoghue · 2022-03-07T22:22:43Z

I'm going to tag in @rdgao for a quick consult here (hopefully he doesn't mind :))

@ Richard - we've been refactoring some of the simulation functions (most of which you can ignore). One update we would like to make is to update the sim_knee function to take in the "knee frequency" as opposed to the value of the knee parameter - meaning we need the conversion function between the knee parameter and the knee frequency, for the case with two exponents.

So, the formulation of the overall spectrum should be:
P(f) = 1 / (f**exponent1 * (f**exponent2 + knee))
But rather than take in knee, we would like to take in knee_freq.

Could you check Ryan's comment above for how to convert the parameter in this case?

rdgao · 2022-03-13T23:17:19Z

wait just to make sure I understand correctly, this is for the case where there's two exponents?

Did we (or I??) ever work out the sim equations for the 2-exponent case with knee?

It's this: L(freq) = 1 / (freq**(exponent1) * freq**(exponent2 + exponent1) + knee)?

TomDonoghue · 2022-03-14T14:46:11Z

@rdgao - yes, this is for double exponent! You and/or Eric did figure out this equation a while ago! We do have a simulation function for this, from Eric Lybrand, which simulates a double-exponent + knee signal using a sum of sinusoids approach.

I missed up the equation I wrote in my comment above (oops - not sure what happened there), but what you wrote is what we are doing (with an added square root?). The code for this equation is sim/aperiodic line 179 on main, or line 183 on the sim branch for this PR.

The thing we were trying to figure out here, is the mapping between knee parameter & knee frequency.
In the formula:
L(freq) = 1 / (freq**(exponent1) * freq**(exponent2 + exponent1) + knee)
the knee parameter is not the knee frequency right, it's the "parameter" value, equivalent to what we fit in specparam?

In the single-exp Lorentzian, we convert knee-freq from knee parameter as: knee_freq = knee ** (1./exponent).
Do you know what the equivalent conversion to compute the knee frequency for the double exponential? Ryan's comment above has a potential answer, but we just wanted to double check. As well as this being a useful conversion to have, we were planning to update the sim func to be able to specify knee frequency directly.

rdgao · 2022-03-15T16:56:13Z

okay I somehow have zero memory of this, but at a quick glance, I think this might be a bit more complicated, though Ryan's derivations look good, at least in terms of the algebra.

I'm a bit occupied in the next week or so for cosyne, but I'll take a look after I come back and maybe play around with some sims just to check (ofcourse you guys can do this as well). I'm actually not sure what the "right" answer for the knee_freq is when there's two slopes, bc it's not so easy to define where the tapering off happens when there's no flat plateau

TomDonoghue · 2022-03-17T16:46:25Z

@rdgao - there's no particular rush here, this can definitely wait for after CoSyne! I played around with this, and the current candidate version, and it seemed close, but maybe not quite right - I need to dig in a little more.

Practically speaking, I'm going to suggest we split out the knee conversion from the rest of this PR, which does other (mostly unrelated) refactors of the simulations. I've opened #301 to continue the knee discussion, from which we can coordinate what to do next for that. Otherwise, I've reverted a couple lines of code from this PR, so that the rest of the PR can be merged, and we can use a new PR for the knee stuff.

@ryanhammonds - does this splitting up of updates make sense to you? If so, as far as I'm aware, this PR is now good to merge, right? Let me know if you have any other thoughts on what is currently in this PR - if it looks good to you, I'll merge it in!

ryanhammonds · 2022-03-17T17:49:53Z

@TomDonoghue Splitting the knee freq update in a different PR sounds good to me, since that will need further review / checking. I went back through the current diff and everything makes sense / looks good!

TomDonoghue added 2 commits January 26, 2022 22:01

refactor sim_knee

5d02d1a

update signature

bfcae29

neurodsp-tools deleted a comment from codecov-commenter Jan 27, 2022

TomDonoghue added 2 commits January 26, 2022 22:15

lambda -> sub-func

f4984af

refactor sim_peak_oscillation

84494d0

neurodsp-tools deleted a comment from codecov-commenter Jan 27, 2022

Merge branch 'main' into sim

a5ac81e

neurodsp-tools deleted a comment from codecov-commenter Jan 27, 2022

clean ups

1b0948e

TomDonoghue requested a review from ryanhammonds January 27, 2022 05:30

neurodsp-tools deleted a comment from codecov-commenter Jan 27, 2022

TomDonoghue changed the title ~~[WIP/MNT] - Simulations refactors~~ [MNT] - Simulations refactors Jan 27, 2022

TomDonoghue mentioned this pull request Jan 27, 2022

[FIX] Sim knee update #284

Closed

TomDonoghue added the 2.2 Updates to go into a 2.2.0 release label Jan 27, 2022

ryanhammonds reviewed Jan 28, 2022

View reviewed changes

TomDonoghue added 2 commits January 31, 2022 23:35

consolidate terminology on exponent

152cfc1

fix terminology - don't use exponent for fluc

7444f20

neurodsp-tools deleted a comment from codecov-commenter Feb 1, 2022

ryanhammonds added 4 commits February 1, 2022 12:58

use for loop in knee simulation

58dc2cd

use for loop in peak simulation

79b1a7d

convert knee to knee frequency

ccc69f8

update knee tests

0f3af93

ryanhammonds force-pushed the sim branch from c548d0f to 0f3af93 Compare February 2, 2022 04:04

fix docs

be2627a

revert to knee parameter (for know)

0af782e

neurodsp-tools deleted a comment from codecov-commenter Mar 17, 2022

TomDonoghue mentioned this pull request Mar 17, 2022

Converting knee param to knee freq (double exponential) #301

Open

TomDonoghue merged commit b6d9ee7 into main Mar 18, 2022

TomDonoghue deleted the sim branch March 18, 2022 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MNT] - Simulations refactors #290

[MNT] - Simulations refactors #290

TomDonoghue commented Jan 27, 2022 •

edited

Loading

ryanhammonds left a comment •

edited

Loading

TomDonoghue commented Feb 1, 2022 •

edited

Loading

ryanhammonds commented Feb 2, 2022 •

edited

Loading

TomDonoghue commented Mar 7, 2022

rdgao commented Mar 13, 2022

TomDonoghue commented Mar 14, 2022

rdgao commented Mar 15, 2022

TomDonoghue commented Mar 17, 2022

ryanhammonds commented Mar 17, 2022

[MNT] - Simulations refactors #290

[MNT] - Simulations refactors #290

Conversation

TomDonoghue commented Jan 27, 2022 • edited Loading

Naming

Optimizations

ryanhammonds left a comment • edited Loading

Choose a reason for hiding this comment

TomDonoghue commented Feb 1, 2022 • edited Loading

ryanhammonds commented Feb 2, 2022 • edited Loading

TomDonoghue commented Mar 7, 2022

rdgao commented Mar 13, 2022

TomDonoghue commented Mar 14, 2022

rdgao commented Mar 15, 2022

TomDonoghue commented Mar 17, 2022

ryanhammonds commented Mar 17, 2022

TomDonoghue commented Jan 27, 2022 •

edited

Loading

ryanhammonds left a comment •

edited

Loading

TomDonoghue commented Feb 1, 2022 •

edited

Loading

ryanhammonds commented Feb 2, 2022 •

edited

Loading