Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surge.fft() and numpy.fft.fft() seem to produce different results #94

Open
SreehariRamMohan opened this issue Jan 23, 2019 · 28 comments
Open
Labels

Comments

@SreehariRamMohan
Copy link

I am attempting to plot a periodogram in Swift of a signal using Surge.

Code:

var fft_mat = Surge.pow(Surge.fft(signal), 2)
var const_mult = 2.0/Double(signal.count)
for var i in 0..<fft_mat.count { //multiply fft_mat by const_mult
    fft_mat[i] = Float(const_mult) * fft_mat[i]
}
var pgram = fft_mat

Plotting pgram yields the following results

ios-periodogram

However, after loading and creating the exact same periodogram in Python I get a very different periodogram.

Code:
pgram = (2.0/len(signal)) * numpy.power(numpy.fft.fft(signal), 2)

pure-numpy-periodogram

Since I am using the exact same method to plot the periodogram (and the same data as well), I was wondering if there are some differences in the implementation of the Surge fft and numpy fft which might cause this issue?

@alejandro-isaza
Copy link
Collaborator

Looks like they are using different scales. How are you computing the x-axis? Maybe try removing pow. Can you share the contents of signal?

@SreehariRamMohan
Copy link
Author

Yeah. Here is the dummy data used to create the above signals/graphs above.

Column 1, "RightLegAngleSignal", is what I used to construct the periodogram in Swift as well as numpy/python.

The last column, "pgram", is the periodogram which Surge outputted (First graph above).

If you run the python code on Column 1, "RightLegAngleSignal", you will get the second graph.

Thanks so much

all-signals.csv.zip

@alejandro-isaza
Copy link
Collaborator

I see there are a ton of nans, did you try filtering those out?

@SreehariRamMohan
Copy link
Author

Yes. In both Swift and Python, I filtered them out.

@alejandro-isaza
Copy link
Collaborator

I get a different result, are you on the latest version?

@SreehariRamMohan
Copy link
Author

I'm on Surge 2.0.0. Are both your Python and Swift periodograms matching?

@alejandro-isaza
Copy link
Collaborator

Try 2.2.0

@SreehariRamMohan
Copy link
Author

Thanks I'll try that right now

@SreehariRamMohan
Copy link
Author

Ok I just ran the app again (using Surge 2.2.0) and generated a new CSV, Swift Periodogram, and Numpy Periodogram.

Data attached below:

Numpy Periodogram
pure-numpy-periodogram

Swift Periodogram
ios-periodogram

They are still different (but have some similarities in shape). Do you think this has something to do with the symmetry of the FFT around the center point? Also it also looks like the scales are different, do you know why that is?

Thanks so much
all-signals.csv.zip

@gburlet
Copy link

gburlet commented Aug 5, 2019

DSP guy here. I've managed to match Surge's FFT and numpy's FFT. They are not equivalent as it stands. Here are the differences:

  1. Surge's FFT actually calculates the power spectrum with vDSP_zvmags and then reverses that operation later on by taking the square root and scaling. If you remove the subsequent line of code with vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)] then you get the power spectrum.
  2. Numpy has no normalization. In other libraries, sometimes I have seen a scaling coefficient on the magnitudes, sometimes not. Surge's line vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)] scales the magnitudes by 2/N where N is the window size.
  3. Surge's FFT doesn't handle non powers of 2 for window size. It's probable that the person commenting above didn't choose a power of 2 and that's why the frequency response graph looks of similar shape but scaled and translated.
  4. Surge's FFT doesn't have a windowing function at all. Common choices are hamming or hann windows. Very rarely are FFTs done with a boxcar (rectangular) window.
  5. Surge's FFT returns the full magnitude array which is unnecessary for real valued signals since it's mirrored about the nyquist rate.

I rewrote the Surge FFT code to window with a hamming window and return the power spectrum (no scaling) to match the librosa one-liner: S = np.abs(librosa.core.stft(x, n_fft=w, hop_length=w, window=signal.hamming, center=False)) ** 2. Under the hood, librosa uses Numpy's FFT.

I tested on 1 frame of the same audio wav file within Python and on iOS using surge and got basically the same results (except for some floating point precision differences).

public func powerSpectrum(_ input: [Float]) -> [Float] {
        // apply hamming window
        // window has slightly different precision than scipy.window.hamming (float vs double)
        var win = [Float](repeating: 0, count: input.count)
        vDSP_hamm_window(&win, vDSP_Length(input.count), 0)
        let input_windowed = mul(input, win)
        
        var real = [Float](input_windowed)
        var imaginary = [Float](repeating: 0.0, count: input_windowed.count)
        var splitComplex = DSPSplitComplex(realp: &real, imagp: &imaginary)
        
        let length = vDSP_Length(floor(log2(Float(input_windowed.count))))
        let radix = FFTRadix(kFFTRadix2)
        let weights = vDSP_create_fftsetup(length, radix)
        withUnsafeMutablePointer(to: &splitComplex) { splitComplex in
            vDSP_fft_zip(weights!, splitComplex, 1, length, FFTDirection(FFT_FORWARD))
        }
        
        // zvmags yields power spectrum: |S|^2
        var magnitudes = [Float](repeating: 0.0, count: input_windowed.count)
        withUnsafePointer(to: &splitComplex) { splitComplex in
            magnitudes.withUnsafeMutableBufferPointer { magnitudes in
                vDSP_zvmags(splitComplex, 1, magnitudes.baseAddress!, 1, vDSP_Length(input_windowed.count))
            }
        }
        
        vDSP_destroy_fftsetup(weights)
        
        return magnitudes
    }

Not sure what the actionable items are for the Surge maintainer(s) to patch up the FFT function. It's not necessarily wrong, I think it just needs some more parameters so people can match output from other libraries. Possibly adding a window parameter to the Surge FFT function as well as an option to specify whether to scale the result. In my opinion letting people scale the result outside of this function is probably best.

@tomekid
Copy link

tomekid commented May 14, 2020

Hey thanks for writing that. What does the mul(input,win) do? does it multiply the two arrays? hows does that work?

@gburlet
Copy link

gburlet commented May 14, 2020

Element-wise multiplication

@regexident
Copy link
Collaborator

I'm personally not familiar enough with the FFT, so …

@gburlet would you consider your modified implementation a bug fix or an alternative to our current one? Or put differently: would you say our current implementation is wrong and should be fixed, or is it simply a different way of using FFT and we might want to provide a choice to the user?

Either way I'd be more than happy to merge a PR addressing the issue appropriately. 🙂

@gburlet
Copy link

gburlet commented May 14, 2020

would you say our current implementation is wrong and should be fixed, or is it simply a different way of using FFT and we might want to provide a choice to the user?

I would say the Surge FFT algorithm returns a scaled amplitude spectrum (something like np.abs(S) where you take the absolute value of the complex components in the FFT) so I wouldn't really call it an FFT, which typically returns the complex components e.g., scipy rfft. I would say your standard Surge user doesn't really care because most people in audio end up wanting the amplitude spectrum anyways so they don't have to deal with complex numbers.

My personal recommendation would be to try to match output of heavy-use libraries like numpy, scipy for compatibility between experimental systems (done off device, e.g., machine learning experimentation) and production systems (e.g., prediction on device).

Specific Recommendations:

  • I would remove the scaling of 2/N. People scale FFTs in different ways and nobody likes looking through source code to see how something is scaled to reverse the operation and add their own scaling. Removing scaling would also match NumPy. Keep the sqrt though, so you don't have the power spectrum.
  • Allowing the user to specify a window function (hamming, hann, etc.) is critical and this is something that Surge lacks. Nobody runs FFTs with a rectangular window due to bad spectral leakage. In fact, by default a lot of other FFT libraries use a hamming window by default and therefore default output in Surge can be quite different.
  • Enforce window size power of 2 or snap to nearest power of 2: all FFT libraries I've encountered require the input window size size or number of points in the FFT to be a power of 2. DFTs usually allow non powers of 2, but with FFTs this is usually a requirement due to the butterfly algorithm (divide and conquer strategy). This is the issue the OP ran into without knowing why it was happening.

@regexident
Copy link
Collaborator

This is great, thanks a lot @gburlet!

@brett-eugenelabs
Copy link

@gburlet Do you have the source for the mul() in the following line?
let input_windowed = mul(input, win)

@gburlet
Copy link

gburlet commented Oct 24, 2020

Hi @brett-eugenelabs, it's element-wise multiplication: same function as vDSP_vmul in Accelerate framework or Surge.elmul in Surge. Hope that helps 👍

@abokhalel2
Copy link

@gburlet is there a possibility in your example to make hop_length and window_size different numbers? I'm trying to find a way to make them match the results of torch.stft or librosa.core.stft when hop_length and window_size being different numbers.

This is my torch.stft example in case needed.

import torch
import torchaudio

win_length = 4096
hop_length = 1024
win = torch.hann_window(win_length)
wav, sr = torchaudio.load('audio_example.wav')
stft = torch.stft(wav, win_length, hop_length=hop_length, window=win)

@gburlet
Copy link

gburlet commented Jul 7, 2021

Hi @abokhalel2, my example is just for the FFT on a single window, not an STFT. To get the STFT you would sliding window the samples and send it to the function above. Typically both the window and hop size are powers of 2 with the hop being some division of the window size. Hope that helps.

@vade
Copy link

vade commented Jan 12, 2023

I have a perhaps naive question for @gburlet - (im not a DSP guy at all so forgive me if this is a newbie q) - why do you fill only the real components of of the DSP Split Complex with the source audio frame?

Looking at Apples Mel sample code on the vDSP page, they bind the incoming time domain audio signal via vDSP_ctoz into a split complex, filling both the real and imaginary components.

With their implementation, for a FFT size of 400, you'd get 200 complex samples back, or 200 |S|^2 magnitude values.

With your implementation, you'd get 400 magnitude values back? Am I understanding that correctly?

And if so, what are the differences in the FFT output when populating the Split Complex with only real components, vs running vDSP_ctoz?

Thank you in advance. This thread has been INCREDIBLY helpful in trying to match Torches STFT code to produce a Log Mel Spectrogram. Im not there yet, but im getting closest thanks to this thread.

@gburlet
Copy link

gburlet commented Jan 12, 2023

@vade I unfortunately don't have time / brain bandwidth to jump into this in more detail but see above, this point specifically, which may help you understand:

Surge's FFT returns the full magnitude array which is unnecessary for real valued signals since it's mirrored about the nyquist rate.

@vade
Copy link

vade commented Jan 12, 2023

Hey @gburlet - no worries, Your input here has been super helpful. I appreciate the prompt reply! thank you!

@gburlet
Copy link

gburlet commented Jan 12, 2023

@vade I see you're contributing to an OpenAI whisper CoreML project. I like that.

Find my email online and I'll share my Accelerate vDSP Mel Spectrogram code written in swift privately. It took a while to make it match librosa output. Not sure if it will match Torches STFT code but might give you another reference point. Also, from what I remember, it's really fast.

@vade
Copy link

vade commented Jan 12, 2023

Much obliged! For real!

@annoybot
Copy link

annoybot commented Feb 10, 2023

@gburlet I'm trying to use your powerSpectrum() code above and I'm getting the error:

No exact matches in call to global function 'mul'

At the line:

let input_windowed = mul(input, win)

Am I missing something? What kind of multiplication is this element wise, or ...? Sorry if this question seems obtuse.

@gburlet
Copy link

gburlet commented Feb 10, 2023

@annoybot see above, it has been answered

@annoybot
Copy link

annoybot commented Feb 10, 2023

Ah yes, indeed it has been answered, thanks. 😬

@annoybot
Copy link

@gburlet, I hope the following question is not too off topic for this thread.

Do you think there would be an advantage in using Welch's method over a plain FFT when processing sound data from an instrument?

I was wondering if it would help eliminate noise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants