WAVLab | 2025 Papers

Summarization EMNLP

Summarizing Speech: A Comprehensive Survey

Fabian Retkowski, Maike Züfle, Andreas Sudmann, Dinah Pfau, Shinji Watanabe, Jan Niehues, and Alexander Waibel

In Proceedings of EMNLP 2025
ASR APSIPA

Phoneme-grapheme Dictionary-based Prompting for Robust Proper Noun Recognition in Japanese ASR

Ryuga Sugano, Hiroaki Sato, Asahi Sakuma, Tadashi Kumano, Yoshihiko Kawai, and Shinji Watanabe

In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2025
SLU&Dialogue ASRU

AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks

Leander Maben, Gayathri Lakshmy, Srijith Radhakrishnan, Siddhant Arora, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
Evaluation ASRU

VERSA-v2: A Modular and Scalable Toolkit for Speech and Audio Evaluation with Expanded Metrics, Visualization, and LLM Integration

Jiatong Shi, Bo-Hao Su, Shikhar Bharadwaj, Yiwen Zhao, Shih-Heng Wang, Jionghao Han, Haoran Wang, Wei Wang, Wenhao Feng, Yuxun Tang, Nezih Topaloğlu, Siddhant Arora, Jinchuan Tian, William Chen, Hye-jin Shim, Wangyou Zhang, Wen-Chin Huang, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
SSL ASRU

Evaluating Self-Supervised Speech Models via Text-based LLMs

Takashi Maekaku, Keita Goto, Jinchuan Tian, Yusuke Shinohara, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
LID ASRU

Geolocation-Aware Robust Spoken Language Identification

Qingzheng Wang, Hye-jin Shim, Jiancheng Sun, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
Music ASRU

Robust Training of Singing Voice Synthesis Using Prior and Posterior Uncertainty

Yiwen Zhao, Jiatong Shi, Yuxun Tang, William Chen, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
ASR&Diarization ASRU

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder

Muhammad Shakeel, Yui Sudo, Yifan Peng, Chyi-Jiunn Lin, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
Tokenizer ASRU

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

Jiatong Shi, Haoran Wang, William Chen, Chenda Li, Wangyou Zhang, Jinchuan Tian, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
Compression ASRU

SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR

Pu Wang, Shinji Watanabe, and Hugo Van hamme

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
SE ASRU

Less is More: Data Curation Matters in Scaling Speech Enhancement

Chenda Li, Wangyou Zhang, Wei Wang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Yihui Fu, Marvin Sach, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, and Yanmin Qian

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
ASR ASRU

Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting

Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
SE ASRU

URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition

Jiahe Wang, Chenda Li, Wei Wang, Wangyou Zhang, Samuele Cornell, Marvin Sach, Robin Scheibler, Kohei Saijo, Yihui Fu, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, and Yanmin Qian

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
SE&Evaluation ASRU

Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment

Wei Wang, Wangyou Zhang, Chenda Li, Jiatong Shi, Shinji Watanabe, and Yanmin Qian

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
Dialogue ASRU

Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training

Sathvik Udupa, Shinji Watanabe, Petr Schwarz, and Jan Cernocky

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
Speech-LLM ASRU

Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM

Jiatong Shi, Chunlei Zhang, Jinchuan Tian, Junrui Ni, Hao Zhang, Shinji Watanabe, and Dong Yu

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
Audio WASPAA

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder

Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, and Shinji Watanabe

In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025
Audio WASPAA

Learning Robust Spatial Representations from Binaural Audio through Feature Distillation

Holger Bovbjerg, Jan Østergaard, Jesper Jensen, Shinji Watanabe, and Zheng-Hua Tan

In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025
Music&Evaluation ISMIR

Aligning Text-to-Music Evaluation with Human Preferences

Yichen Huang, Zachary Novack, Koichi Saito, Jiatong Shi, Shinji Watanabe, Yuki Mitsufuji, John Thickstun, and Chris Donahue

In Proceedings of ISMIR 2025
TTS&Dataset Interspeech

The text-to-speech in the wild (TITW) dataset

Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas Evans, Joon Son Chung, Shinnosuke Takamichi, and Shinji Watanabe

In Proceedings of Interspeech 2025
ASR Interspeech

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

Qingzheng Wang, Jiancheng Sun, Yifan Peng, and Shinji Watanabe

In Proceedings of Interspeech 2025
Dataset&Dialogue Interspeech

Scalable Spontaneous Speech Dataset (SSSD): Crowdsourcing Data Collection to Promote Dialogue Research

Zaid Sheikh, Shuichiro Shimizu, Siddhant Arora, Jiatong Shi, Samuele Cornell, Xinjian Li, and Shinji Watanabe

In Proceedings of Interspeech 2025
SLU&Dialogue Interspeech

A Chain-of-Thought Reasoning Approach to E2E Spoken Dialogue Systems with an Open-Source Toolkit

Siddhant Arora, Jinchuan Tian, Hayato Futami, Jee-weon Jung, Jiatong Shi, Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe

In Proceedings of Interspeech 2025
Dataset Interspeech

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset

Brian Yan, Injy Hamed, Shuichiro Shimizu, Vasista Lodagala, William Chen, Olga Iakovenko, Bashar Talafha, Amir Hussein, Alexander Polok, Kalvin Chang, Dominik Klement, Sara Althubaiti, Puyuan Peng, Matthew Wiesner, Thamar Solorio, Ahmed Ali, Sanjeev Khudanpur, Shinji Watanabe, Chih-Chen Chen, Zhen Wu, Karim Benharrak, Anuj Diwan, Samuele Cornell, Eunjung Yeo, Kwanghee Choi, Carlos Carvalho, and Karen Rosero

In Proceedings of Interspeech 2025
ASR Interspeech

Exploring Linear Variant Transformers and k-NN Memory Inference for Long-Form ASR

Carlos Ferreira Carvalho, Jinchuan Tian, William Chen, Yifan Peng, Alberto Abad, and Shinji Watanabe

In Proceedings of Interspeech 2025
AV Interspeech

The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition

Ming Gao, Shilong Wu, Hang Chen, Jun Du, Chin-Hui Lee, Shinji Watanabe, Jingdong Chen, Sabato Marco Siniscalchi, and Odette Scharenborg

In Proceedings of Interspeech 2025
Evaluation Interspeech

Uni-VERSA: Versatile Evaluation of Speech with a Unified Framework

Jiatong Shi, Hye-Jin Shim, and Shinji Watanabe

In Proceedings of Interspeech 2025
S2ST Interspeech

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs

Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Yuki Ito, Hassan Shahmohammadi, Siddhant Arora, and Shinji Watanabe

In Proceedings of Interspeech 2025
SE Interspeech

Interspeech 2025 URGENT Speech Enhancement Challenge

Kohei Saijo, Wangyou Zhang, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Yihui Fu, Wei Wang, Tim Fingscheidt, and Shinji Watanabe

In Proceedings of Interspeech 2025
Summarization Interspeech

Pick and Summarize: Integrating Extractive and Abstractive Speech Summarization

Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Ryo Fukuda, William Chen, and Shinji Watanabe

In Proceedings of Interspeech 2025
SE Interspeech

Lessons Learned from the URGENT 2024 Speech Enhancement Challenge

Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, and Yanmin Qian

In Proceedings of Interspeech 2025
Compression Interspeech

Context-Driven Dynamic Pruning for Large Multi-Modal Foundation Model

Masao Someki, Shikhar Bharadwaj, Atharva Anand Joshi, Chyi-Jiunn Lin, Jinchuan Tian, Jee-weon Jung, Markus Müller, Nathan Susanj, Jing Liu, and Shinji Watanabe

In Proceedings of Interspeech 2025
Speech-LLM Interspeech

OpusLM: A Family of Open Unified Speech Language Models

Jinchuan Tian, William Chen, Yifan Peng, Jiatong Shi, Siddhant Arora, Shikhar Bharadwaj, Maekaku Takashi, Yusuke Shinohara, Keita Goto, Xiang Yue, Chao-Han Huck Yang, and Shinji Watanabe

In Proceedings of Interspeech 2025
Health Interspeech

Explainable Depression Detection using Masked Hard Instance Mining

Patawee Prakrankamanant, Shinji Watanabe, and Ekapol Chuangsuwanich

In Proceedings of Interspeech 2025
ASR Interspeech

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Yifan Peng, Shakeel Muhammad, Yui Sudo, William Chen, Jinchuan Tian, Chyi-Jiunn Lin, and Shinji Watanabe

In Proceedings of Interspeech 2025
ASR Interspeech

The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties

William Chen, Chutong Meng, Jiatong Shi, Martijn Bartelds, Shih-Heng Wang, Hsiu-Hsuan Wang, Rafael Mosquera, Sara Hincapie, Dan Jurafsky, Antonis Anastasopoulos, Hung-yi Lee, Karen Livescu, and Shinji Watanabe

In Proceedings of Interspeech 2025
Tokenizer Interspeech

On-device Streaming Discrete Speech Units

Kwanghee Choi, Masao Someki, Emma Strubell, and Shinji Watanabe

In Proceedings of Interspeech 2025
Dataset Interspeech

GALAXY: A Large-Scale Open-Domain Dataset for Multimodal Learning

Yihan Wu, Yichen Lu, Yijing Chen, Jiaqi Song, William Chen, Ruihua Song, and Shinji Watanabe

In Proceedings of Interspeech 2025
Tokenizer Interspeech

Differentiable K-means for Fully-optimized Discrete Token-based ASR

Kentaro Onda, Yosuke Kashiwagi, Emiru Tsunoo, Hayato Futami, and Shinji Watanabe

In Proceedings of Interspeech 2025
ASR Interspeech

DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition

Yui Sudo, Yosuke Fukumoto, Shakeel Muhammad, Yifan Peng, Chyi-Jiunn Lin, and Shinji Watanabe

In Proceedings of Interspeech 2025
SSL Interspeech

DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective

Hyung-gun Chi, Zakaria Aldeneh, Tatiana Likhomanenko, Oggi Rudovic, Takuya Higuchi, Li-Wei Chen, Shinji Watanabe, and Ahmed Hussen Abdelaziz

In Proceedings of Interspeech 2025
Speech-LLM ACL

SIQ: Exterminating Speech Intelligence Quotient Cross Cognitive Levels in Voice Understanding Large Language Models

Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, and Sadao Kurohashi

In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2025
ASR&ST ICML

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

William Chen, Jinchuan Tian, Yifan Peng, Brian Yan, Chao-Han Huck Yang, and Shinji Watanabe

In Proceedings of the International Conference on Machine Learning (ICML) 2025
SLU&Dialogue NAACL

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems

Siddhant Arora, Yifan Peng, Jiatong Shi, Jinchuan Tian, William Chen, Shikhar Bharadwaj, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Shuichiro Shimizu, Vaibhav Srivastav, and Shinji Watanabe

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
Evaluation NAACL

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Jiatong Shi, Hye-jin Shim, Jinchuan Tian, Siddhant Arora, Haibin Wu, Darius Petermann, Jia Qi Yip, You Zhang, Yuxun Tang, Wangyou Zhang, Dareen Safar Alharthi, Yichen Huang, Koichi Saito, Jionghao Han, Yiwen Zhao, Chris Donahue, and Shinji Watanabe

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
Speech-LLM NAACL

ESPnet-SpeechLM: An Open Speech Language Model Toolkit

Jinchuan Tian, Jiatong Shi, William Chen, Siddhant Arora, Yoshiki Masuyama, Takashi Maekaku, Yihan Wu, Junyi Peng, Shikhar Bharadwaj, Yiwen Zhao, Samuele Cornell, Yifan Peng, Xiang Yue, Chao-Han Huck Yang, Graham Neubig, and Shinji Watanabe

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
Pronunciation NAACL

Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment

Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, and David R Mortensen

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
Speech-LLM NAACL

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning

Yifan Peng, Krishna C Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, and Boris Ginsburg

In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
Evaluation ICLR

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, Yi-Jen Shih, Jiatong Shi, William Chen, Xuanjun Chen, Chi-Yuan Hsiao, Puyuan Peng, Shih-Heng Wang, Chun-Yi Kuan, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Fabian Ritter-Gutierrez, Ming To Chuang, Kuan-Po Huang, Siddhant Arora, You-Kuan Lin, Eunjung Yeo, Kalvin Chang, Chung-Ming Chien, Kwanghee Choi, Cheng-Hsiu Hsieh, Yi-Cheng Lin, Chee-En Yu, I-Hsiang Chiu, Heitor R. Guimarães, Jionghao Han, Tzu-Quan Lin, Tzu-Yuan Lin, Homu Chang, Ting-Wu Chang, Chun Wei Chen, Shou-Jen Chen, Yu-Hua Chen, Hsi-Chun Cheng, Kunal Dhawan, Jia-Lin Fang, Shi-Xin Fang, Kuan-Yu Fang Chiang, Chi An Fu, Hsien-Fu Hsiao, Ching Yu Hsu, Shao-Syuan Huang, Lee Chen Wei, Hsi-Che Lin, Hsuan-Hao Lin, Hsuan-Ting Lin, Jian-Ren Lin, Ting-Chun Liu, Li-Chun Lu, Tsung-Min Pai, Ankita Pasad, Shih-Yun Shan Kuan, Suwon Shon, Yuxun Tang, Yun-Shao Tsai, Jui-Chiang Wei, Tzu-Chieh Wei, Chengxi Wu, Dien-Ruei Wu, Chao-Han Huck Yang, Chieh-Chi Yang, Jia Qi Yip, Shao-Xiang Yuan, Vahid Noroozi, Zhehuai Chen, Haibin Wu, Karen Livescu, David Harwath, Shinji Watanabe, and Hung-yi Lee

In Proceedings of the International Conference on Learning Representations (ICLR) 2025
Dialogue ICLR

Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics

Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, and Shinji Watanabe

In Proceedings of the International Conference on Learning Representations (ICLR) 2025
Compression ICLR

Context-aware Dynamic Pruning for Speech Foundation Models

Masao Someki, Yifan Peng, Siddhant Arora, Markus Müller, Athanasios Mouchtaris, Grant Strimel, Jing Liu, and Shinji Watanabe

In Proceedings of the International Conference on Learning Representations (ICLR) 2025
Speaker ICASSP

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels

Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Li-Wei Chen, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Tatiana Likhomanenko, and Barry-John Theobald

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
ASR ICASSP

Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking

Brian Yan, Vineel Pratap, Shinji Watanabe, and Michael Auli

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
ASR ICASSP

Hypothesis Clustering and Merging: MultiTalker Speech Recognition with Speaker Token Estimation

Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
TTS ICASSP

Preference Alignment Improves Language Model-Based TTS

Jinchuan Tian, Chunlei Zhang, Jiatong Shi, Hao Zhang, Jianwei Yu, Shinji Watanabe, and Dong Yu

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
SSL ICASSP

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

Li-Wei Chen, Zakaria Aldeneh, Takuya Higuchi, He Bai, Ahmed Hussen Abdelaziz, Shinji Watanabe, Alexander Rudnicky, Tatiana Likhomanenko, and Barry-John Theobald

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
Speech-Text ICASSP

Bridging Speech and Text Foundation Models with ReShape Attention

Takatomo Kano, Atsunori Ogawa, Marc Delcroix, William Chen, Ryo Fukuda, Kohei Matsuura, Takanori Ashihara, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
SSL&ASR ICASSP

Investigation of Spatial Self-Supervised Learning and Its Application to Target Speaker Speech Recognition

Yoshiaki Bando, Samuele Cornell, Satoru Fukayama, and Shinji Watanabe

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
AVSR AAAI

Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization

Yihan Wu, Yichen Lu, Yifan Peng, Xihua Wang, Ruihua Song, and Shinji Watanabe

In Proceedings of the AAAI Conference on Artificial Intelligence 2025