Surge.fft() and numpy.fft.fft() seem to produce different results #94

SreehariRamMohan · 2019-01-23T18:28:30Z

I am attempting to plot a periodogram in Swift of a signal using Surge.

Code:

var fft_mat = Surge.pow(Surge.fft(signal), 2)
var const_mult = 2.0/Double(signal.count)
for var i in 0..<fft_mat.count { //multiply fft_mat by const_mult
    fft_mat[i] = Float(const_mult) * fft_mat[i]
}
var pgram = fft_mat

Plotting pgram yields the following results

However, after loading and creating the exact same periodogram in Python I get a very different periodogram.

Code:
pgram = (2.0/len(signal)) * numpy.power(numpy.fft.fft(signal), 2)

Since I am using the exact same method to plot the periodogram (and the same data as well), I was wondering if there are some differences in the implementation of the Surge fft and numpy fft which might cause this issue?

The text was updated successfully, but these errors were encountered:

alejandro-isaza · 2019-01-28T04:24:58Z

Looks like they are using different scales. How are you computing the x-axis? Maybe try removing pow. Can you share the contents of signal?

SreehariRamMohan · 2019-01-28T04:31:12Z

Yeah. Here is the dummy data used to create the above signals/graphs above.

Column 1, "RightLegAngleSignal", is what I used to construct the periodogram in Swift as well as numpy/python.

The last column, "pgram", is the periodogram which Surge outputted (First graph above).

If you run the python code on Column 1, "RightLegAngleSignal", you will get the second graph.

Thanks so much

all-signals.csv.zip

alejandro-isaza · 2019-01-28T04:33:19Z

I see there are a ton of nans, did you try filtering those out?

SreehariRamMohan · 2019-01-28T04:45:48Z

Yes. In both Swift and Python, I filtered them out.

alejandro-isaza · 2019-01-28T04:48:03Z

I get a different result, are you on the latest version?

SreehariRamMohan · 2019-01-28T04:49:38Z

I'm on Surge 2.0.0. Are both your Python and Swift periodograms matching?

alejandro-isaza · 2019-01-28T04:51:00Z

Try 2.2.0

SreehariRamMohan · 2019-01-28T05:22:27Z

Thanks I'll try that right now

SreehariRamMohan · 2019-01-28T05:41:33Z

Ok I just ran the app again (using Surge 2.2.0) and generated a new CSV, Swift Periodogram, and Numpy Periodogram.

Data attached below:

Numpy Periodogram

Swift Periodogram

They are still different (but have some similarities in shape). Do you think this has something to do with the symmetry of the FFT around the center point? Also it also looks like the scales are different, do you know why that is?

Thanks so much
all-signals.csv.zip

gburlet · 2019-08-05T18:01:23Z

DSP guy here. I've managed to match Surge's FFT and numpy's FFT. They are not equivalent as it stands. Here are the differences:

Surge's FFT actually calculates the power spectrum with vDSP_zvmags and then reverses that operation later on by taking the square root and scaling. If you remove the subsequent line of code with vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)] then you get the power spectrum.
Numpy has no normalization. In other libraries, sometimes I have seen a scaling coefficient on the magnitudes, sometimes not. Surge's line vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)] scales the magnitudes by 2/N where N is the window size.
Surge's FFT doesn't handle non powers of 2 for window size. It's probable that the person commenting above didn't choose a power of 2 and that's why the frequency response graph looks of similar shape but scaled and translated.
Surge's FFT doesn't have a windowing function at all. Common choices are hamming or hann windows. Very rarely are FFTs done with a boxcar (rectangular) window.
Surge's FFT returns the full magnitude array which is unnecessary for real valued signals since it's mirrored about the nyquist rate.

I rewrote the Surge FFT code to window with a hamming window and return the power spectrum (no scaling) to match the librosa one-liner: S = np.abs(librosa.core.stft(x, n_fft=w, hop_length=w, window=signal.hamming, center=False)) ** 2. Under the hood, librosa uses Numpy's FFT.

I tested on 1 frame of the same audio wav file within Python and on iOS using surge and got basically the same results (except for some floating point precision differences).

public func powerSpectrum(_ input: [Float]) -> [Float] {
        // apply hamming window
        // window has slightly different precision than scipy.window.hamming (float vs double)
        var win = [Float](repeating: 0, count: input.count)
        vDSP_hamm_window(&win, vDSP_Length(input.count), 0)
        let input_windowed = mul(input, win)
        
        var real = [Float](input_windowed)
        var imaginary = [Float](repeating: 0.0, count: input_windowed.count)
        var splitComplex = DSPSplitComplex(realp: &real, imagp: &imaginary)
        
        let length = vDSP_Length(floor(log2(Float(input_windowed.count))))
        let radix = FFTRadix(kFFTRadix2)
        let weights = vDSP_create_fftsetup(length, radix)
        withUnsafeMutablePointer(to: &splitComplex) { splitComplex in
            vDSP_fft_zip(weights!, splitComplex, 1, length, FFTDirection(FFT_FORWARD))
        }
        
        // zvmags yields power spectrum: |S|^2
        var magnitudes = [Float](repeating: 0.0, count: input_windowed.count)
        withUnsafePointer(to: &splitComplex) { splitComplex in
            magnitudes.withUnsafeMutableBufferPointer { magnitudes in
                vDSP_zvmags(splitComplex, 1, magnitudes.baseAddress!, 1, vDSP_Length(input_windowed.count))
            }
        }
        
        vDSP_destroy_fftsetup(weights)
        
        return magnitudes
    }

Not sure what the actionable items are for the Surge maintainer(s) to patch up the FFT function. It's not necessarily wrong, I think it just needs some more parameters so people can match output from other libraries. Possibly adding a window parameter to the Surge FFT function as well as an option to specify whether to scale the result. In my opinion letting people scale the result outside of this function is probably best.

tomekid · 2020-05-14T10:02:09Z

Hey thanks for writing that. What does the mul(input,win) do? does it multiply the two arrays? hows does that work?

gburlet · 2020-05-14T15:27:06Z

Element-wise multiplication

regexident · 2020-05-14T19:00:07Z

I'm personally not familiar enough with the FFT, so …

@gburlet would you consider your modified implementation a bug fix or an alternative to our current one? Or put differently: would you say our current implementation is wrong and should be fixed, or is it simply a different way of using FFT and we might want to provide a choice to the user?

Either way I'd be more than happy to merge a PR addressing the issue appropriately. 🙂

gburlet · 2020-05-14T19:35:41Z

would you say our current implementation is wrong and should be fixed, or is it simply a different way of using FFT and we might want to provide a choice to the user?

I would say the Surge FFT algorithm returns a scaled amplitude spectrum (something like np.abs(S) where you take the absolute value of the complex components in the FFT) so I wouldn't really call it an FFT, which typically returns the complex components e.g., scipy rfft. I would say your standard Surge user doesn't really care because most people in audio end up wanting the amplitude spectrum anyways so they don't have to deal with complex numbers.

My personal recommendation would be to try to match output of heavy-use libraries like numpy, scipy for compatibility between experimental systems (done off device, e.g., machine learning experimentation) and production systems (e.g., prediction on device).

Specific Recommendations:

I would remove the scaling of 2/N. People scale FFTs in different ways and nobody likes looking through source code to see how something is scaled to reverse the operation and add their own scaling. Removing scaling would also match NumPy. Keep the sqrt though, so you don't have the power spectrum.
Allowing the user to specify a window function (hamming, hann, etc.) is critical and this is something that Surge lacks. Nobody runs FFTs with a rectangular window due to bad spectral leakage. In fact, by default a lot of other FFT libraries use a hamming window by default and therefore default output in Surge can be quite different.
Enforce window size power of 2 or snap to nearest power of 2: all FFT libraries I've encountered require the input window size size or number of points in the FFT to be a power of 2. DFTs usually allow non powers of 2, but with FFTs this is usually a requirement due to the butterfly algorithm (divide and conquer strategy). This is the issue the OP ran into without knowing why it was happening.

regexident · 2020-05-15T08:34:34Z

This is great, thanks a lot @gburlet!

brett-eugenelabs · 2020-10-24T06:28:16Z

@gburlet Do you have the source for the mul() in the following line?
let input_windowed = mul(input, win)

gburlet · 2020-10-24T18:09:31Z

Hi @brett-eugenelabs, it's element-wise multiplication: same function as vDSP_vmul in Accelerate framework or Surge.elmul in Surge. Hope that helps 👍

abokhalel2 · 2021-06-15T14:17:37Z

@gburlet is there a possibility in your example to make hop_length and window_size different numbers? I'm trying to find a way to make them match the results of torch.stft or librosa.core.stft when hop_length and window_size being different numbers.

This is my torch.stft example in case needed.

import torch
import torchaudio

win_length = 4096
hop_length = 1024
win = torch.hann_window(win_length)
wav, sr = torchaudio.load('audio_example.wav')
stft = torch.stft(wav, win_length, hop_length=hop_length, window=win)

gburlet · 2021-07-07T00:04:42Z

Hi @abokhalel2, my example is just for the FFT on a single window, not an STFT. To get the STFT you would sliding window the samples and send it to the function above. Typically both the window and hop size are powers of 2 with the hop being some division of the window size. Hope that helps.

vade · 2023-01-12T18:05:23Z

I have a perhaps naive question for @gburlet - (im not a DSP guy at all so forgive me if this is a newbie q) - why do you fill only the real components of of the DSP Split Complex with the source audio frame?

Looking at Apples Mel sample code on the vDSP page, they bind the incoming time domain audio signal via vDSP_ctoz into a split complex, filling both the real and imaginary components.

With their implementation, for a FFT size of 400, you'd get 200 complex samples back, or 200 |S|^2 magnitude values.

With your implementation, you'd get 400 magnitude values back? Am I understanding that correctly?

And if so, what are the differences in the FFT output when populating the Split Complex with only real components, vs running vDSP_ctoz?

Thank you in advance. This thread has been INCREDIBLY helpful in trying to match Torches STFT code to produce a Log Mel Spectrogram. Im not there yet, but im getting closest thanks to this thread.

gburlet · 2023-01-12T18:12:43Z

@vade I unfortunately don't have time / brain bandwidth to jump into this in more detail but see above, this point specifically, which may help you understand:

Surge's FFT returns the full magnitude array which is unnecessary for real valued signals since it's mirrored about the nyquist rate.

vade · 2023-01-12T18:44:36Z

Hey @gburlet - no worries, Your input here has been super helpful. I appreciate the prompt reply! thank you!

gburlet · 2023-01-12T18:56:27Z

@vade I see you're contributing to an OpenAI whisper CoreML project. I like that.

Find my email online and I'll share my Accelerate vDSP Mel Spectrogram code written in swift privately. It took a while to make it match librosa output. Not sure if it will match Torches STFT code but might give you another reference point. Also, from what I remember, it's really fast.

vade · 2023-01-12T18:56:59Z

Much obliged! For real!

annoybot · 2023-02-10T20:32:37Z

@gburlet I'm trying to use your powerSpectrum() code above and I'm getting the error:

No exact matches in call to global function 'mul'

At the line:

let input_windowed = mul(input, win)

Am I missing something? What kind of multiplication is this element wise, or ...? Sorry if this question seems obtuse.

gburlet · 2023-02-10T20:34:50Z

@annoybot see above, it has been answered

annoybot · 2023-02-10T20:39:59Z

Ah yes, indeed it has been answered, thanks. 😬

annoybot · 2023-02-10T20:54:20Z

@gburlet, I hope the following question is not too off topic for this thread.

Do you think there would be an advantage in using Welch's method over a plain FFT when processing sound data from an instrument?

I was wondering if it would help eliminate noise?

alejandro-isaza added the question label Jan 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surge.fft() and numpy.fft.fft() seem to produce different results #94

Surge.fft() and numpy.fft.fft() seem to produce different results #94

SreehariRamMohan commented Jan 23, 2019

alejandro-isaza commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

alejandro-isaza commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

alejandro-isaza commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

alejandro-isaza commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

gburlet commented Aug 5, 2019 •

edited

tomekid commented May 14, 2020

gburlet commented May 14, 2020

regexident commented May 14, 2020

gburlet commented May 14, 2020

regexident commented May 15, 2020

brett-eugenelabs commented Oct 24, 2020

gburlet commented Oct 24, 2020

abokhalel2 commented Jun 15, 2021

gburlet commented Jul 7, 2021

vade commented Jan 12, 2023 •

edited

gburlet commented Jan 12, 2023

vade commented Jan 12, 2023

gburlet commented Jan 12, 2023

vade commented Jan 12, 2023

annoybot commented Feb 10, 2023 •

edited

gburlet commented Feb 10, 2023

annoybot commented Feb 10, 2023 •

edited

annoybot commented Feb 10, 2023

Surge.fft() and numpy.fft.fft() seem to produce different results #94

Surge.fft() and numpy.fft.fft() seem to produce different results #94

Comments

SreehariRamMohan commented Jan 23, 2019

alejandro-isaza commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

alejandro-isaza commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

alejandro-isaza commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

alejandro-isaza commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

SreehariRamMohan commented Jan 28, 2019

gburlet commented Aug 5, 2019 • edited

tomekid commented May 14, 2020

gburlet commented May 14, 2020

regexident commented May 14, 2020

gburlet commented May 14, 2020

regexident commented May 15, 2020

brett-eugenelabs commented Oct 24, 2020

gburlet commented Oct 24, 2020

abokhalel2 commented Jun 15, 2021

gburlet commented Jul 7, 2021

vade commented Jan 12, 2023 • edited

gburlet commented Jan 12, 2023

vade commented Jan 12, 2023

gburlet commented Jan 12, 2023

vade commented Jan 12, 2023

annoybot commented Feb 10, 2023 • edited

gburlet commented Feb 10, 2023

annoybot commented Feb 10, 2023 • edited

annoybot commented Feb 10, 2023

gburlet commented Aug 5, 2019 •

edited

vade commented Jan 12, 2023 •

edited

annoybot commented Feb 10, 2023 •

edited

annoybot commented Feb 10, 2023 •

edited