Spectrograms not updating well at low frequency bins #115

arachid1 · 2022-01-21T15:03:06Z

Hello, Thanks for putting in place a really useful library!

I'm working on the pneumonia detection problem. My dataset is super imbalanced, with 2000+ non-pneumonia cases and 142 cases, but I decided to stick with 142 cases of each label to keep the dataset balanced.

I am trying to apply the STFT layer in the following model:

with the following parameters:

self.spec_layer = Spectrogram.STFT(n_fft=256, hop_length=128, sr=8000, trainable=True, output_format="Magnitude")

Now, I'm observing some modifications of the spectrograms as it trains, but it seems like the trained spectrogram mainly gets updated at the higher frequency bins. It should be the low-frequency bins that inform the neural network of decision-making, since lung sounds are of the range 0-4000Hz and I sample at 8000 Hz. Here is a spectrogram of a pneumonia sample before training:

and here its updated version at, respectively, epochs 10, 50, and 150:

Since it's really hard to visualize, I generate a difference map ( = trained spectrogram at given epoch - original untrained spectrogram). Here are the difference maps at, respectively, epoch 10, 50 and 150:

It's difficult to see but there are some slight modifications of the lower frequency bins 0-24, only it's little, and barely any for bins 0-12.

Some of the training parameters are

parameters.lr = 1e-4
parameters.n_epochs = 150
parameters.batch_size = 32
parameters.audio_length = 5

I use nnAudio == 0.2.6.

The text was updated successfully, but these errors were encountered:

KinWaiCheuk · 2022-01-24T02:36:04Z

Hi @arachid1. Thanks for using nnAudio! The STFT kernel updates are gradient driven (and gradient are calculated from the loss function that you used). Just like other neural network layers in PyTorch, sometimes it is quite difficult to understand why the model decided to update some parameters but not the others.

However, since you already have the prior knowledge that most of the important information are in the low frequencies regions, is it possible for you to set trainable=True and then freeze all those high frequency bins? If you are using PyTorch Lightning, one simple example would be like this.

import torch
import torch.nn as nn
from nnAudio import Spectrogram
import pytorch_lightning as pl
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
import torch.optim as optim
import matplotlib.pyplot as plt

X, Y = make_blobs(1000,44100,centers=10, cluster_std=10)
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size=0.2, random_state=0)

trainset = torch.utils.data.TensorDataset(torch.from_numpy(X_train).float(),torch.from_numpy(y_train))
trainloader = torch.utils.data.DataLoader(trainset, batch_size=100,shuffle=True, num_workers=2)

class Model(pl.LightningModule):
    def __init__(self):
        super(Model, self).__init__()
        self.stft_layer = Spectrogram.STFT(trainable=True, output_format='Magnitude')
        self.classifier = nn.Linear(1025*87,10)

    def on_after_backward(self):
        # freeze bins 20-1025
        # i.e. only updates bins 0-19
        self.stft_layer.wsin.grad[20:] = 0
        self.stft_layer.wcos.grad[20:] = 0
        
    def forward(self, x):
        x = self.stft_layer(x)
        x = self.classifier(x.flatten(1))
        return x
    
    
    def training_step(self, batch, batch_idx):
        pred = self(batch[0])
        loss = torch.nn.functional.cross_entropy(pred, batch[1])
        
        return torch.nn.functional.cross_entropy(pred, batch[1])
        
    def configure_optimizers(self):
        r"""Configure optimizer."""
        return optim.Adam(self.parameters())

model = Model()

original_weight = model.stft_layer.wsin.detach()

model.classifier.weight.grad

trainer = pl.Trainer(max_epochs=2, gpus=1)

trainer.fit(model, trainloader)

changed_weight = model.stft_layer.wsin.detach()

# check if bin 0-20 are still the same after training
print(torch.equal(original_weight[:20],changed_weight[:20]))
# It should return False

# check if bin 20-1025 are still the same after training
print(torch.equal(original_weight[20:],changed_weight[20:]))
# It should return True

If you are using plain PyTorch, you can set the gradients for higher bins to 0 after loss.backward() and before optimizer.step().

This is just my suggestion, I am not sure if it works or not. But if it works, I think it is worth an update in nnAudio to let the users to control which STFT bins to be trained. Please feel free to update me which the latest result!

arachid1 · 2022-01-27T17:33:22Z

Thanks for your suggestion!

Unfortunately, it doesn't seem to make a great deal of difference, as illustrated with the same spectrogram before and after training below:

but it does work, could be a very useful tool, and helped me investigate a lot.

The more upper-frequency bins I freeze -> the more the lower frequency bins vary (that's expected, even if it's not enough for now) -> the MORE the kernels change! I visualize the kernel changes for, respectively, freezing all bins above 10, 25, 50 and 100 frequency bins:

and here are the respective weights:

My guess is that the kernels extract a lot more from lower frequencies when running STFT because my data is richer in those frequencies, so if those get backpropagated more, it makes sense that the kernels change more too. If you have more thoughts on this, please let me know.

It still doesn't explain why the higher frequencies change a lot but that could be due to a lot of things, but this is a great step towards explainable ML.

Another way to frame the odd finding:

It seems to struggle with the higher values on the spectrograms. They do not vary much, and it's illustrated in my difference maps. On the other end, the lower values change a lot more. Per the example in my first post, the initial spectrogram will have low negative values at high-frequency bins, such as -8. Then, after training, it will become higher with backpropagation, becoming -4 for example. So in my difference map (trained - original), we get -4 - (-8) = 4, which explains the red/high value in the higher frequency bins of the DIFFERENCE map. I tried normalizing the audios and spectrograms in different ways and it doesn't change anything either...

KinWaiCheuk · 2022-01-28T01:54:00Z

Thanks for your update! Regarding the reason why the higher bins backpropagate more than the lower bins despite richer information in lower bin, I also have no idea.
But it reminds me of a paper that our team saw a few weeks ago, which is about restricted trainable v.s. unrestricted trainable front-ends.
https://arxiv.org/pdf/2109.02774.pdf

Although in their case, they focused on Mel spectrograms rather than STFT, I think their idea still applies to our case.

They found out that unrestricted kernel training is harmful to the model performance. nnAudio is unrestricted since we do not impose any constraints on how the kernel parameters update. While in their paper, they imposed the shape constraints (either triangular, rectangular, gaussian) on the kernels.

So to control the (40,241) or (num_mels, n_fft//2+1) mel filter basis, they use only 80 parameters (40 band widths, and 40 centrals) to control the triangular filter locations, as oppose to nnAudio which uses 40*241 parameters to fully control everything. To me, I think if we provide too much degree of freedom, backprop might not be able to do a great job.

I think we might also want to apply some sorts of constraints to the STFT kernels to get a better result? I haven't tried this ideaa yet, but I agree that understanding this is a great step towards explainable ML! I am looking forward to see your new findings!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spectrograms not updating well at low frequency bins #115

Spectrograms not updating well at low frequency bins #115

arachid1 commented Jan 21, 2022 •

edited

KinWaiCheuk commented Jan 24, 2022 •

edited

arachid1 commented Jan 27, 2022 •

edited

KinWaiCheuk commented Jan 28, 2022

Spectrograms not updating well at low frequency bins #115

Spectrograms not updating well at low frequency bins #115

Comments

arachid1 commented Jan 21, 2022 • edited

KinWaiCheuk commented Jan 24, 2022 • edited

arachid1 commented Jan 27, 2022 • edited

KinWaiCheuk commented Jan 28, 2022

arachid1 commented Jan 21, 2022 •

edited

KinWaiCheuk commented Jan 24, 2022 •

edited

arachid1 commented Jan 27, 2022 •

edited