WAVLab
affiliated with @ LTI/CMU.
This is Watanabe’s Audio and Voice (WAV) Lab at the Language Technologies Institute of Carnegie Mellon University. Our research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing.
The end-of-semester presentation, 05.07.2025
selected publications
- ASR InterspeechOWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and CleaningIn Proceedings of Interspeech 2025
- SSL EMNLPTowards Robust Speech Representation Learning for Thousands of LanguagesIn Proceedings of EMNLP 2024
- ASR InterspeechEFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual ScenariosIn Proceedings of Interspeech 2024
- SS TASLPTF-GridNet: Integrating Full- and Sub-Band Modeling for Speech SeparationIEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
- ASR TASLPEnd-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio, Speech, and Language Processing 2023
- ASR&SSL InterspeechML-SUPERB: Multilingual Speech Universal PERformance BenchmarkIn Proceedings of Interspeech 2023
- ASR&SLU&MT ICMLBranchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and UnderstandingIn Proceedings of the International Conference on Machine Learning (ICML) 2022
- SD CSLA review of speaker diarization: Recent advances with deep learningComputer Speech & Language 2022
- SE ICASSPCONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENTIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
- SLU ICASSPESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNETIn Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022