STFT Reconstruction from Mel Spectrograms #57

tasercake · 2020-10-27T07:07:21Z

I've been playing around with trying to reconstruct an STFT spectrogram from a Mel spectrogram (derived using the MelSpectrogram class) and wondered if you might be interested in incorporating something of this sort into nnAudio.

I've created a Colab Notebook to demonstrate my results. The reconstruction quality as of now is slightly inferior to that of librosa, but is orders of magnitude faster. I tried my hand at some hyperparameter tuning, but judging by the values used by Torchaudio and Librosa, it seems like a lot more iterations (and a much lower LR?) are needed to achieve optimal reconstruction quality (which I don't have the compute resources to run hyperparameter search for). I've included some quick quality/speed comparisons in the Colab notebook.

My implementation is based on Librosa's mel_to_stft and TorchAudio's InverseMelScale.

If this is something you might be interested in adding to nnAudio, I'd be happy to open a pull request for further review.

The text was updated successfully, but these errors were encountered:

KinWaiCheuk · 2020-10-27T15:12:41Z

Hi tasercake, I was also looking at it since few days ago. Gradient descend does not work well in this case since we are dealing with a sparse matrix (The Mel filter banks). It will take forever for gradient descend to find the right solution.

To make it works better and faster, you need non-negative least squares (NNLS) instead. There is no existing NNLS function in pytorch and you need to use L-BFGS-B algorithm to build your own NNLS in pytorch. Someone has already implmented L-BFGS-B in pytorch, you might want to use it to build the pytorch version of NNLS.
https://github.com/hjmshi/PyTorch-LBFGS.

I will push a better verison of Griffin-Lim in a few days (My existing Griffin-Lim is also based on gradient descend which is also not as good as the librosa result, the new version of Griffin-Lim will be as good as librosa since it will be a direct clone from it). It might come into handy when you implment the InverseMelScale since you just need to add the mel_to_stft to this new version of Griffin-Lim to finish the InverseMelScale.

And yes, this feature would be very useful and I have been wanting to implment it. Thanks for you help.

KinWaiCheuk · 2020-10-28T03:29:09Z

Hi tasercake, after a second thought, I don't think NNLS is the right way to go, since it does not provide us the inverse matrix. Therefore we need to keep calling this function over and over again to estimate the STFT. Then it seems your approach is better.

Now, my idea is to estimate the inverse matrix for mel filter banks (probably use your approach), and use it for mel_to_stft conversion and then use griffim lim to get the audio back. Therefore this STFT reconstruction should be able to integrate with our existing Griffin_Lim ultimately.

f0k · 2021-05-10T08:57:07Z

Note that if you get the normalization correct, you also get quite decent results by just transposing the mel filterbank (in-the-wild example: https://github.com/bkvogel/griffin_lim/blob/master/run_demo.py#L110). It's also possible to use the pseudoinverse of the mel filterbank, but I found this often introduces audible artifacts.

KinWaiCheuk · 2021-05-10T09:18:57Z

you also get quite decent results by just transposing the mel filterbank

That is interesting to know. I have thought of that before, but it seemed too good to be true, so in the end I didn't try it. I would give it a try when I have time, but pull requests are welcome.

KinWaiCheuk added the enhancement New feature or request label Oct 27, 2020

KinWaiCheuk mentioned this issue Apr 13, 2021

Is Gammatone invertible? #89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STFT Reconstruction from Mel Spectrograms #57

STFT Reconstruction from Mel Spectrograms #57

tasercake commented Oct 27, 2020 •

edited

KinWaiCheuk commented Oct 27, 2020

KinWaiCheuk commented Oct 28, 2020 •

edited

f0k commented May 10, 2021

KinWaiCheuk commented May 10, 2021

STFT Reconstruction from Mel Spectrograms #57

STFT Reconstruction from Mel Spectrograms #57

Comments

tasercake commented Oct 27, 2020 • edited

KinWaiCheuk commented Oct 27, 2020

KinWaiCheuk commented Oct 28, 2020 • edited

f0k commented May 10, 2021

KinWaiCheuk commented May 10, 2021

tasercake commented Oct 27, 2020 •

edited

KinWaiCheuk commented Oct 28, 2020 •

edited