This is a repo for Speech Enhancement tutorials (Especially for time-frequency domain). You can experiment with various Speech enhancement techniques through this repo.
- 2024.05.15 Upload codes
- Upload baseline codes
- Upload performance rank table
- Upldate performance rank table
- Add some explanations
- Add some analysis tools
- Add current DNN-based SE models
- Upldate references
This repo is tested with Ubuntu 22.04, PyTorch 2.0.1, Python3.10, and CUDA11.7. For package dependencies, you can install them by:
pip install -r requirements.txt
- Install the necessary libraries.
- Download the VoiceBank+DEMAND database or prepare your own database and place it in '../Dataset/' folder.
├── 📦 SE_Tutorials
│ └── 📂 models
│ └── 📂 ref
│ └── ...
│ └── ED_FNN.py
│ └── ED_CNN.py
│ └── options.py
│ └── train_interface.py
│ └── ...
└── 📦 Dataset
└── 📂 VBD (or ...)
└── 📂 train
└── clean
└── noisy
└── 📂 test
└── clean
└── noisy
- You can simply change any parameter settings if you need to adjust them. (options.py)
We have prepared a .ipynb file so you can just run it.
Technologies available in this repo are as follows:
- generate noisy database
- normalization
- compression
- domain
- joint loss function
- perceptual loss function
- adversarial train
The scores shown in this table are based on the values written in their paper.
Model | Params (M) | Causality | PESQ | CSIG | CBAK | COVL | STOI | SSNR | Year | Input | Code |
---|---|---|---|---|---|---|---|---|---|---|---|
Noisy | - | - | 1.97 | 3.35 | 2.44 | 2.63 | 0.91 | 1.68 | - | - | - |
SEGAN | 97.47 | ✗ | 2.16 | 3.48 | 2.94 | 2.80 | 0.92 | 7.73 | 2017 | Time | ✗ |
MetricGAN | - | ✗ | 2.86 | 3.99 | 3.18 | 3.42 | - | - | 2019 | Magnitude | ✓ |
PHASEN | 0.92 | ✗ | 2.99 | 4.21 | 3.55 | 3.62 | - | 10.08 | 2020 | Magnitude+Phase | ✗ |
Please get in touch with us if you have any questions or suggestions.
E-mail: allmindfine@yonsei.ac.kr