Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Zero padding + Welch's PSD #318

Merged
merged 8 commits into from
May 14, 2024
Merged

[ENH] Zero padding + Welch's PSD #318

merged 8 commits into from
May 14, 2024

Conversation

ryanhammonds
Copy link
Member

@ryanhammonds ryanhammonds commented Aug 17, 2023

Adds ability to zeropad windows used in Welch's psd via a npad kwarg. This can result in smoother spectra. Working on timescales led me to think autoregressive PSD was a smoother method, but it really was the process of zero padding prior to taking the fft that lead to a smooth spectra. Similarily, zero padding the signal prior to fft is similar to zero padding either AR or ACF coefficients prior to fft'ing.

Here's an example using timescales (orange is a zeropadded welch's spectrum), but it also works for powerlaws.

Screenshot 2023-09-02 at 2 23 25 PM
import numpy as np
import matplotlib.pyplot as plt
from neurodsp.spectral import compute_spectrum_welch
from neurodsp.sim import sim_synaptic_current

# Simulate timescale
np.random.seed(0)

n_seconds = 5
fs = 1000
sig = sim_synaptic_current(n_seconds, fs, tau_d=0.01)

# Default Welch
f, p = compute_spectrum_welch(sig, fs, f_range=(1, 200))
plt.loglog(f, p, alpha=.5, label='Default Welch')

# Smoothed Welch
nperseg = 25
noverlap = 0
nfft = 2000

f, p = compute_spectrum_welch(sig, fs, nperseg=nperseg, noverlap=0, nfft=nfft, f_range=(1, 200))
plt.loglog(f, p * 20, label='Zero Padded (Smoothed) Spectrum')
plt.legend();

@codecov-commenter

This comment was marked as resolved.

@ryanhammonds
Copy link
Member Author

The latest adds an fast_len bool argument, which uses scipy.fft.next_fast_len to compute spectra faster. This doesn't really change compute time with the default nperseg and noverlap when n_seconds=1 and fs=1000. It's most notable when nperseg is prime. This has also been implemented to be compatible with the new npad option. Here are examples that speeds up compute_spectrum by 2-3x:

import numpy as np
from neurodsp.spectral import compute_spectrum
from neurodsp.utils import normalize_sig
from timescales.sim import sim_branching
from timescales.conversions import convert_knee

# Settings
np.random.seed(0)
fs = 1000
tau = convert_knee(20)

nperseg = 99
npad = 1000
nwindows = 10
nsamples = ((2*npad) + nperseg) * nwindows # prime -> slow 

# Simulate
sig = normalize_sig(sim_branching(nsamples/fs, fs, tau, 1000), 0, 1)

With npad:

%%timeit
f, p = compute_spectrum(sig, fs, nperseg=nperseg, noverlap=0, npad=npad, f_range=(1, 200))

11.1 ms ± 344 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
f, p = compute_spectrum(sig, fs, nperseg=nperseg, noverlap=0, npad=npad, f_range=(1, 200), fast_len=True)

5.15 ms ± 59.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Without npad:

%%timeit
compute_spectrum(sig, fs, nperseg=2099, noverlap=0, f_range=(1, 200))

646 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeit
compute_spectrum(sig, fs, nperseg=2099, noverlap=0, f_range=(1, 200), fast_len=True)

206 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

@TomDonoghue
Copy link
Member

This looks awesome! I think this is a very useful

I've done a quick scan of the code, and it all looks good, but before a deeper dive, I wanted to check / ask about how best to specify padding. As I currently understand, npad is defined as padded samples per side - so for something like nperseg=1000, npad=500, then the total FFT length would be 2000, right? I feel like the user is perhaps more likely to care what the total FFT length is (which is the important aspect for things like frequency resolution) than what the pad length is. An alternative specification would be to specify something like total_fft_length, and if total_fft_length > nperseg then it gets padded to meet the length (so in this setup, nperseg is the number of data points per segment, total_fft_length is the number values per fft, and the total amount of padding is total_fft_length - nperseg).

This would also make our API broadly consistent with how MNE specifies this (see n_fft arg here).

Thoughts on tweaking the input values for this? I'm hoping it wouldn't be too much of an annoying change if we want to go for this.

@voytek
Copy link
Contributor

voytek commented Aug 25, 2023

Yeah, in playing with this my thoughts were similar to what Tom says above. As a user I not only want to know how long my total zero-padded signal is (with or without the 2**x rounding), but I would also love to have the option to spit out the frequency resolution, as well as see a visualization of the window function overlaid with a segment of data where the zeros have been added. I wish we could hire Eric L. back for a bit, because my inclination would be to de-mean and de-trend the window, then zero-pad, then apply a windowing function, but I believe from a DSP perspective this isn't the right way to do it. I believe you're supposed to window then zero-pad. But the window-then-pad approach still kills a ton of the data at the edge of your window, whereas the preprocess-then-pad-then-window seems like it'd allow for more of your data window to contribute to your spectrum.

@ryanhammonds
Copy link
Member Author

@TomDonoghue I integrated your feedback and removed npad arg and added nfft, which controls the total window size: nfft - nperseg == npad.

@voytek, the freq res is now equal to fs/nfft. Here is a snippet to plot windows:

import matplotlib.pyplot as plt
from neurodsp.spectral.utils import trim_spectrum, window_pad

# PSD
nperseg = 25
nfft = 500
npad = nfft - nperseg
noverlap = 0

# Window
sig_pad, nperseg, noverlap = window_pad(sig, nperseg, noverlap, npad, fast_len=False)
sig_windowed = sig_pad.reshape(-1, nperseg)

# Plot first window
plt.plot(sig_windowed[0])
Screenshot 2023-09-02 at 2 55 51 PM

Demeaning windows before padding creates an odd psd. If windows are demeaned, then recombined back into a 1d signal, power at low freqs should be reduced since the slow structure is messed up from demeaning short windows. But I'm not sure why this is the case when computing the spectrogram of windows separately then averaging. Understanding how zero-padding works at the edges would be useful!

Screenshot 2023-09-02 at 3 08 09 PM

@voytek
Copy link
Contributor

voytek commented Sep 12, 2023

I mean, this looks good to me but we need to do much more in-depth testing on padding!

@TomDonoghue TomDonoghue added the 2.3 Updates to go into a 2.3.0. label Sep 26, 2023
@TomDonoghue
Copy link
Member

Okay - coming back to this:

  • I think we all agree / like the padding option, fast computation, and new API for doing this?
  • It seems like there are some possible things to explore with padding, but that can be beyond / separate to this PR? Am I right in thinking this PR is basically ready to merge (no known issues), and future work could perhaps figure out more about padding - or is there something we need to understand more about the whole padding / demeaning thing before we merge this?

@ryanhammonds
Copy link
Member Author

I think this should be good to go unless you see anything that's odd

@TomDonoghue
Copy link
Member

I did some minor direct edits to merge to main and do a bit of linting - otherwise since this all seems good to go, I'm going to merge!

@TomDonoghue TomDonoghue merged commit b52f552 into main May 14, 2024
7 checks passed
@TomDonoghue TomDonoghue deleted the zeropad branch May 14, 2024 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.3 Updates to go into a 2.3.0.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants