Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] - Simulations refactors #290

Merged
merged 14 commits into from
Mar 18, 2022
Merged

[MNT] - Simulations refactors #290

merged 14 commits into from
Mar 18, 2022

Conversation

TomDonoghue
Copy link
Member

@TomDonoghue TomDonoghue commented Jan 27, 2022

This PR does some refactors and tweaks of some of the newer simulation functions.

Naming

An open question (nothing changed yet) is if we want to do any naming updates.

  • In particular, across the module we use both exponent (or exp), and now in these newer functions, chi to refer to the aperiodic exponent. I think I would vote we consolidate on using the term exponent consistently for this. The main counterpoint I can think of is that exponent is a pretty generic name, and in some cases it could be a little unclear.
  • While we're on the topic, I'm not 100% sold on sim_peak_oscillation, which seems to imply a slightly different thing from this being a "combined" signal. I'm not sure I have a better suggestion though...

Thoughts?

Optimizations

For sim_knee and sim_peak_oscillation, there where a bunch of embedded loops that seemed optimizable - the updates here being to use sub-functions where they could be used and where appropriate, vectorize functions for applying across the vectors. Combining both also removed some interim computations of arrays. I also think the code might be a little clearer.

Optimization updates:

  • sim_knee: speedup of ~20-25% faster of shorter signals (10s), and ~100% faster for longer signals (60s)
  • sim_peak_oscillation: speedup of ~20-25% faster of shorter signals (10s), and ~100% faster for longer signals (60s)

All the actually computations are the same code, and I validated that the new versions give the same results

@ryanhammonds - in terms of review, I'm fairly confident about the updates here, so they shouldn't need a wild amount of further testing for the code changes. Let me know if you have any thoughts on naming, and I'd also be curious to hear if you have any other optimization ideas.

@TomDonoghue TomDonoghue changed the title [WIP/MNT] - Simulations refactors [MNT] - Simulations refactors Jan 27, 2022
@TomDonoghue TomDonoghue added the 2.2 Updates to go into a 2.2.0 release label Jan 27, 2022
Copy link
Member

@ryanhammonds ryanhammonds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into np.vectorize to learn how to speed up my own code and noticed this was in the numpy doc notes:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

I tested this out and a for loop was faster than using np.vectorize. It was also more memory efficient since the 2d cosines isn't required in the loop I tested.

%load_ext line_profiler

import numpy as np
from neurodsp.sim.aperiodic import sim_knee
from neurodsp.utils.data import create_times, compute_nsamples

# Settings
n_seconds=10
fs=1000
chi1=-1
chi2=-2
knee=100

times = create_times(n_seconds, fs)
n_samples = compute_nsamples(n_seconds, fs)

# Create frequencies for the power spectrum and drop the DC component
#   These frequencies are used to create the cosines to sum
freqs = np.linspace(0, fs / 2, num=int(n_samples // 2 + 1), endpoint=True)
freqs = freqs[1:]
%%timeit
np.random.seed(0)

def _coscoef(freq):
    return np.sqrt(1 / (freq ** -chi1 * (freq ** (-chi2 - chi1) + knee)))

# Define & vectorize sub-function for computing the set of cosines, adding a random phase shift
def _create_cosines(freq):
    return _coscoef(freq) * np.cos(2 * np.pi * freq * times + 2 * np.pi * np.random.rand())

vect_create_cosines = np.vectorize(_create_cosines, signature='()->(n)')

# Create the set of cosines
cosines = vect_create_cosines(freqs)

sig = np.sum(cosines, axis=0)

644 ms ± 3.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
np.random.seed(0)

sig = np.zeros(n_samples)

for f in freqs:
    sig += np.sqrt(1 / (f ** -chi1 * (f ** (-chi2 - chi1) + knee))) \
        * np.cos(2 * np.pi * f * times + 2 * np.pi * np.random.rand())

581 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The latter method is also more memory efficient since it doesn't store a large 2d array of cosines. When I run the np.vectorize block with a 100 second signal and 1000 fs, I get:

MemoryError: Unable to allocate 37.3 GiB for an array with shape (50000, 100000) and data type float64

As for variable names, I do think we should choose either exponent or chi. I'm use to it as exponent since that's what fooof used. But I don't have a strong opinion either way. I'll also give renaming sim_peak_oscillation some thought. I'm not sure about a better name either.

@TomDonoghue
Copy link
Member Author

TomDonoghue commented Feb 1, 2022

@ryanhammonds - awesome deep dive! My original understanding was that vectorize makes functions able to be applied across arrays in effectively a loop (sorta like apply_along) - but that it offered some performance gains in doing so - I think I misunderstood the performance aspects! I like the syntax, but I guess it's basically just syntactic sugar.

I had a quick poke around with what you suggested for sim_knee, and comparing all three implementations (for an intermediate, 15 second signal), I get:

  • original (main branch): average 2.03 s
  • my update (current sim branch): average 1.54 s
  • your update (from comment): average 1.08 s

Which means were getting to about a 100% speed up compared to original! Which is great! Thanks for digging further into this!

Practically, I think the main benefit is simply removing extra loops (doing it all in one), which as you mention also has memory benefits! I think the code also ends up looking nice and clean.

Next steps:

  • can you add a commit adding your update to sim_knee
  • do you think there's an equivalent further update for sim_peak_oscillation? Can you check and add if so?
  • you can also add in the knee fix update from [FIX] Sim knee update #284

For naming: we're agreed on consolidating exponent - I think it makes sense to chose that over chi, as it's already built in (no breaking changes to sim_powerlaw, etc). I'll push a commit for that change now. I'll keep thinking on sim_peak_oscillation too - we can update if we think of a better name.

@ryanhammonds
Copy link
Member

ryanhammonds commented Feb 2, 2022

@TomDonoghue I updated sim_knee and sim_peak_oscillation to use for loops.

I also converted the knee parameter to knee frequency in sim_knee. The math was a bit unwieldy for me, but I think I sorted it out. The knee is defined as:

knee_freq = knee ** (1 / (2*exponent1 + exponent2))

from solving for the fwhm in the Lorenztian:

L(freq) = 1 / (freq**(exponent1) * freq**(exponent2 + exponent1) + knee)
L(knee_freq) = f(0) / 2

1 / (knee_freq**exponent1 * knee_freq**(exponent2 + exponent1)) + knee = 1 / (2 * knee)
knee_freq**exponent1 * knee_freq**(exponent2 + exponent1) + knee = 2 * knee
knee_freq**(2*exponent1 + exponent2) = knee
knee_freq = knee ** (1 / (2*exponent1 + exponent2))

I think this should be the analogous to solving knee_freq = knee**(1/exponent) in the single exponent model.

@TomDonoghue
Copy link
Member Author

I'm going to tag in @rdgao for a quick consult here (hopefully he doesn't mind :))

@ Richard - we've been refactoring some of the simulation functions (most of which you can ignore). One update we would like to make is to update the sim_knee function to take in the "knee frequency" as opposed to the value of the knee parameter - meaning we need the conversion function between the knee parameter and the knee frequency, for the case with two exponents.

So, the formulation of the overall spectrum should be:
P(f) = 1 / (f**exponent1 * (f**exponent2 + knee))
But rather than take in knee, we would like to take in knee_freq.

Could you check Ryan's comment above for how to convert the parameter in this case?

@rdgao
Copy link
Contributor

rdgao commented Mar 13, 2022

wait just to make sure I understand correctly, this is for the case where there's two exponents?

Did we (or I??) ever work out the sim equations for the 2-exponent case with knee?

It's this: L(freq) = 1 / (freq**(exponent1) * freq**(exponent2 + exponent1) + knee)?

@TomDonoghue
Copy link
Member Author

@rdgao - yes, this is for double exponent! You and/or Eric did figure out this equation a while ago! We do have a simulation function for this, from Eric Lybrand, which simulates a double-exponent + knee signal using a sum of sinusoids approach.

I missed up the equation I wrote in my comment above (oops - not sure what happened there), but what you wrote is what we are doing (with an added square root?). The code for this equation is sim/aperiodic line 179 on main, or line 183 on the sim branch for this PR.

The thing we were trying to figure out here, is the mapping between knee parameter & knee frequency.
In the formula:
L(freq) = 1 / (freq**(exponent1) * freq**(exponent2 + exponent1) + knee)
the knee parameter is not the knee frequency right, it's the "parameter" value, equivalent to what we fit in specparam?

In the single-exp Lorentzian, we convert knee-freq from knee parameter as: knee_freq = knee ** (1./exponent).
Do you know what the equivalent conversion to compute the knee frequency for the double exponential? Ryan's comment above has a potential answer, but we just wanted to double check. As well as this being a useful conversion to have, we were planning to update the sim func to be able to specify knee frequency directly.

@rdgao
Copy link
Contributor

rdgao commented Mar 15, 2022

okay I somehow have zero memory of this, but at a quick glance, I think this might be a bit more complicated, though Ryan's derivations look good, at least in terms of the algebra.

I'm a bit occupied in the next week or so for cosyne, but I'll take a look after I come back and maybe play around with some sims just to check (ofcourse you guys can do this as well). I'm actually not sure what the "right" answer for the knee_freq is when there's two slopes, bc it's not so easy to define where the tapering off happens when there's no flat plateau

@TomDonoghue
Copy link
Member Author

@rdgao - there's no particular rush here, this can definitely wait for after CoSyne! I played around with this, and the current candidate version, and it seemed close, but maybe not quite right - I need to dig in a little more.

Practically speaking, I'm going to suggest we split out the knee conversion from the rest of this PR, which does other (mostly unrelated) refactors of the simulations. I've opened #301 to continue the knee discussion, from which we can coordinate what to do next for that. Otherwise, I've reverted a couple lines of code from this PR, so that the rest of the PR can be merged, and we can use a new PR for the knee stuff.

@ryanhammonds - does this splitting up of updates make sense to you? If so, as far as I'm aware, this PR is now good to merge, right? Let me know if you have any other thoughts on what is currently in this PR - if it looks good to you, I'll merge it in!

@ryanhammonds
Copy link
Member

@TomDonoghue Splitting the knee freq update in a different PR sounds good to me, since that will need further review / checking. I went back through the current diff and everything makes sense / looks good!

@TomDonoghue TomDonoghue merged commit b6d9ee7 into main Mar 18, 2022
@TomDonoghue TomDonoghue deleted the sim branch March 18, 2022 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.2 Updates to go into a 2.2.0 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants