Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STFT Reconstruction from Mel Spectrograms #57

Open
tasercake opened this issue Oct 27, 2020 · 4 comments
Open

STFT Reconstruction from Mel Spectrograms #57

tasercake opened this issue Oct 27, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@tasercake
Copy link
Contributor

tasercake commented Oct 27, 2020

I've been playing around with trying to reconstruct an STFT spectrogram from a Mel spectrogram (derived using the MelSpectrogram class) and wondered if you might be interested in incorporating something of this sort into nnAudio.

I've created a Colab Notebook to demonstrate my results. The reconstruction quality as of now is slightly inferior to that of librosa, but is orders of magnitude faster. I tried my hand at some hyperparameter tuning, but judging by the values used by Torchaudio and Librosa, it seems like a lot more iterations (and a much lower LR?) are needed to achieve optimal reconstruction quality (which I don't have the compute resources to run hyperparameter search for). I've included some quick quality/speed comparisons in the Colab notebook.

My implementation is based on Librosa's mel_to_stft and TorchAudio's InverseMelScale.

If this is something you might be interested in adding to nnAudio, I'd be happy to open a pull request for further review.

@KinWaiCheuk
Copy link
Owner

Hi tasercake, I was also looking at it since few days ago. Gradient descend does not work well in this case since we are dealing with a sparse matrix (The Mel filter banks). It will take forever for gradient descend to find the right solution.

To make it works better and faster, you need non-negative least squares (NNLS) instead. There is no existing NNLS function in pytorch and you need to use L-BFGS-B algorithm to build your own NNLS in pytorch. Someone has already implmented L-BFGS-B in pytorch, you might want to use it to build the pytorch version of NNLS.
https://github.com/hjmshi/PyTorch-LBFGS.

I will push a better verison of Griffin-Lim in a few days (My existing Griffin-Lim is also based on gradient descend which is also not as good as the librosa result, the new version of Griffin-Lim will be as good as librosa since it will be a direct clone from it). It might come into handy when you implment the InverseMelScale since you just need to add the mel_to_stft to this new version of Griffin-Lim to finish the InverseMelScale.

And yes, this feature would be very useful and I have been wanting to implment it. Thanks for you help.

@KinWaiCheuk KinWaiCheuk added the enhancement New feature or request label Oct 27, 2020
@KinWaiCheuk
Copy link
Owner

KinWaiCheuk commented Oct 28, 2020

Hi tasercake, after a second thought, I don't think NNLS is the right way to go, since it does not provide us the inverse matrix. Therefore we need to keep calling this function over and over again to estimate the STFT. Then it seems your approach is better.

Now, my idea is to estimate the inverse matrix for mel filter banks (probably use your approach), and use it for mel_to_stft conversion and then use griffim lim to get the audio back. Therefore this STFT reconstruction should be able to integrate with our existing Griffin_Lim ultimately.

@f0k
Copy link

f0k commented May 10, 2021

Note that if you get the normalization correct, you also get quite decent results by just transposing the mel filterbank (in-the-wild example: https://github.com/bkvogel/griffin_lim/blob/master/run_demo.py#L110). It's also possible to use the pseudoinverse of the mel filterbank, but I found this often introduces audible artifacts.

@KinWaiCheuk
Copy link
Owner

you also get quite decent results by just transposing the mel filterbank

That is interesting to know. I have thought of that before, but it seemed too good to be true, so in the end I didn't try it. I would give it a try when I have time, but pull requests are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants