1. Summarization EMNLP
    Summarizing Speech: A Comprehensive Survey
    Fabian Retkowski, Maike Züfle, Andreas Sudmann, Dinah Pfau, Shinji Watanabe, Jan Niehues, and Alexander Waibel
    In Proceedings of EMNLP 2025
  2. ASR APSIPA
    Phoneme-grapheme Dictionary-based Prompting for Robust Proper Noun Recognition in Japanese ASR
    Ryuga Sugano, Hiroaki Sato, Asahi Sakuma, Tadashi Kumano, Yoshihiko Kawai, and Shinji Watanabe
    In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2025
  3. SLU&Dialogue ASRU
    AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks
    Leander Maben, Gayathri Lakshmy, Srijith Radhakrishnan, Siddhant Arora, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  4. Evaluation ASRU
    VERSA-v2: A Modular and Scalable Toolkit for Speech and Audio Evaluation with Expanded Metrics, Visualization, and LLM Integration
    Jiatong Shi, Bo-Hao Su, Shikhar Bharadwaj, Yiwen Zhao, Shih-Heng Wang, Jionghao Han, Haoran Wang, Wei Wang, Wenhao Feng, Yuxun Tang, Nezih Topaloğlu, Siddhant Arora, Jinchuan Tian, William Chen, Hye-jin Shim, Wangyou Zhang, Wen-Chin Huang, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  5. SSL ASRU
    Evaluating Self-Supervised Speech Models via Text-based LLMs
    Takashi Maekaku, Keita Goto, Jinchuan Tian, Yusuke Shinohara, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  6. LID ASRU
    Geolocation-Aware Robust Spoken Language Identification
    Qingzheng Wang, Hye-jin Shim, Jiancheng Sun, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  7. Music ASRU
    Robust Training of Singing Voice Synthesis Using Prior and Posterior Uncertainty
    Yiwen Zhao, Jiatong Shi, Yuxun Tang, William Chen, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  8. ASR&Diarization ASRU
    Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
    Muhammad Shakeel, Yui Sudo, Yifan Peng, Chyi-Jiunn Lin, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  9. Tokenizer ASRU
    PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
    Jiatong Shi, Haoran Wang, William Chen, Chenda Li, Wangyou Zhang, Jinchuan Tian, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  10. Compression ASRU
    SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR
    Pu Wang, Shinji Watanabe, and Hugo Van hamme
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  11. SE ASRU
    Less is More: Data Curation Matters in Scaling Speech Enhancement
    Chenda Li, Wangyou Zhang, Wei Wang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Yihui Fu, Marvin Sach, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, and Yanmin Qian
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  12. ASR ASRU
    Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
    Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  13. SE ASRU
    URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition
    Jiahe Wang, Chenda Li, Wei Wang, Wangyou Zhang, Samuele Cornell, Marvin Sach, Robin Scheibler, Kohei Saijo, Yihui Fu, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, and Yanmin Qian
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  14. SE&Evaluation ASRU
    Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
    Wei Wang, Wangyou Zhang, Chenda Li, Jiatong Shi, Shinji Watanabe, and Yanmin Qian
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  15. Dialogue ASRU
    Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
    Sathvik Udupa, Shinji Watanabe, Petr Schwarz, and Jan Cernocky
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  16. Speech-LLM ASRU
    Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM
    Jiatong Shi, Chunlei Zhang, Jinchuan Tian, Junrui Ni, Hao Zhang, Shinji Watanabe, and Dong Yu
    In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) 2025
  17. Audio WASPAA
    OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
    Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, and Shinji Watanabe
    In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025
  18. Audio WASPAA
    Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
    Holger Bovbjerg, Jan Østergaard, Jesper Jensen, Shinji Watanabe, and Zheng-Hua Tan
    In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025
  19. Music&Evaluation ISMIR
    Aligning Text-to-Music Evaluation with Human Preferences
    Yichen Huang, Zachary Novack, Koichi Saito, Jiatong Shi, Shinji Watanabe, Yuki Mitsufuji, John Thickstun, and Chris Donahue
    In Proceedings of ISMIR 2025
  20. TTS&Dataset Interspeech
    The text-to-speech in the wild (TITW) dataset
    Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas Evans, Joon Son Chung, Shinnosuke Takamichi, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  21. ASR Interspeech
    Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC
    Qingzheng Wang, Jiancheng Sun, Yifan Peng, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  22. Dataset&Dialogue Interspeech
    Scalable Spontaneous Speech Dataset (SSSD): Crowdsourcing Data Collection to Promote Dialogue Research
    Zaid Sheikh, Shuichiro Shimizu, Siddhant Arora, Jiatong Shi, Samuele Cornell, Xinjian Li, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  23. SLU&Dialogue Interspeech
    A Chain-of-Thought Reasoning Approach to E2E Spoken Dialogue Systems with an Open-Source Toolkit
    Siddhant Arora, Jinchuan Tian, Hayato Futami, Jee-weon Jung, Jiatong Shi, Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  24. Dataset Interspeech
    CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
    Brian Yan, Injy Hamed, Shuichiro Shimizu, Vasista Lodagala, William Chen, Olga Iakovenko, Bashar Talafha, Amir Hussein, Alexander Polok, Kalvin Chang, Dominik Klement, Sara Althubaiti, Puyuan Peng, Matthew Wiesner, Thamar Solorio, Ahmed Ali, Sanjeev Khudanpur, Shinji Watanabe, Chih-Chen Chen, Zhen Wu, Karim Benharrak, Anuj Diwan, Samuele Cornell, Eunjung Yeo, Kwanghee Choi, Carlos Carvalho, and Karen Rosero
    In Proceedings of Interspeech 2025
  25. ASR Interspeech
    Exploring Linear Variant Transformers and k-NN Memory Inference for Long-Form ASR
    Carlos Ferreira Carvalho, Jinchuan Tian, William Chen, Yifan Peng, Alberto Abad, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  26. AV Interspeech
    The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
    Ming Gao, Shilong Wu, Hang Chen, Jun Du, Chin-Hui Lee, Shinji Watanabe, Jingdong Chen, Sabato Marco Siniscalchi, and Odette Scharenborg
    In Proceedings of Interspeech 2025
  27. Evaluation Interspeech
    Uni-VERSA: Versatile Evaluation of Speech with a Unified Framework
    Jiatong Shi, Hye-Jin Shim, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  28. S2ST Interspeech
    Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
    Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Yuki Ito, Hassan Shahmohammadi, Siddhant Arora, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  29. SE Interspeech
    Interspeech 2025 URGENT Speech Enhancement Challenge
    Kohei Saijo, Wangyou Zhang, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Yihui Fu, Wei Wang, Tim Fingscheidt, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  30. Summarization Interspeech
    Pick and Summarize: Integrating Extractive and Abstractive Speech Summarization
    Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Ryo Fukuda, William Chen, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  31. SE Interspeech
    Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
    Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, and Yanmin Qian
    In Proceedings of Interspeech 2025
  32. Compression Interspeech
    Context-Driven Dynamic Pruning for Large Multi-Modal Foundation Model
    Masao Someki, Shikhar Bharadwaj, Atharva Anand Joshi, Chyi-Jiunn Lin, Jinchuan Tian, Jee-weon Jung, Markus Müller, Nathan Susanj, Jing Liu, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  33. Speech-LLM Interspeech
    OpusLM: A Family of Open Unified Speech Language Models
    Jinchuan Tian, William Chen, Yifan Peng, Jiatong Shi, Siddhant Arora, Shikhar Bharadwaj, Maekaku Takashi, Yusuke Shinohara, Keita Goto, Xiang Yue, Chao-Han Huck Yang, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  34. Health Interspeech
    Explainable Depression Detection using Masked Hard Instance Mining
    Patawee Prakrankamanant, Shinji Watanabe, and Ekapol Chuangsuwanich
    In Proceedings of Interspeech 2025
  35. ASR Interspeech
    OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
    Yifan Peng, Shakeel Muhammad, Yui Sudo, William Chen, Jinchuan Tian, Chyi-Jiunn Lin, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  36. ASR Interspeech
    The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties
    William Chen, Chutong Meng, Jiatong Shi, Martijn Bartelds, Shih-Heng Wang, Hsiu-Hsuan Wang, Rafael Mosquera, Sara Hincapie, Dan Jurafsky, Antonis Anastasopoulos, Hung-yi Lee, Karen Livescu, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  37. Tokenizer Interspeech
    On-device Streaming Discrete Speech Units
    Kwanghee Choi, Masao Someki, Emma Strubell, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  38. Dataset Interspeech
    GALAXY: A Large-Scale Open-Domain Dataset for Multimodal Learning
    Yihan Wu, Yichen Lu, Yijing Chen, Jiaqi Song, William Chen, Ruihua Song, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  39. Tokenizer Interspeech
    Differentiable K-means for Fully-optimized Discrete Token-based ASR
    Kentaro Onda, Yosuke Kashiwagi, Emiru Tsunoo, Hayato Futami, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  40. ASR Interspeech
    DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition
    Yui Sudo, Yosuke Fukumoto, Shakeel Muhammad, Yifan Peng, Chyi-Jiunn Lin, and Shinji Watanabe
    In Proceedings of Interspeech 2025
  41. SSL Interspeech
    DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective
    Hyung-gun Chi, Zakaria Aldeneh, Tatiana Likhomanenko, Oggi Rudovic, Takuya Higuchi, Li-Wei Chen, Shinji Watanabe, and Ahmed Hussen Abdelaziz
    In Proceedings of Interspeech 2025
  42. Speech-LLM ACL
    SIQ: Exterminating Speech Intelligence Quotient Cross Cognitive Levels in Voice Understanding Large Language Models
    Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, and Sadao Kurohashi
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2025
  43. ASR&ST ICML
    OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
    William Chen, Jinchuan Tian, Yifan Peng, Brian Yan, Chao-Han Huck Yang, and Shinji Watanabe
    In Proceedings of the International Conference on Machine Learning (ICML) 2025
  44. SLU&Dialogue NAACL
    ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
    Siddhant Arora, Yifan Peng, Jiatong Shi, Jinchuan Tian, William Chen, Shikhar Bharadwaj, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Shuichiro Shimizu, Vaibhav Srivastav, and Shinji Watanabe
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
  45. Evaluation NAACL
    VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
    Jiatong Shi, Hye-jin Shim, Jinchuan Tian, Siddhant Arora, Haibin Wu, Darius Petermann, Jia Qi Yip, You Zhang, Yuxun Tang, Wangyou Zhang, Dareen Safar Alharthi, Yichen Huang, Koichi Saito, Jionghao Han, Yiwen Zhao, Chris Donahue, and Shinji Watanabe
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
  46. Speech-LLM NAACL
    ESPnet-SpeechLM: An Open Speech Language Model Toolkit
    Jinchuan Tian, Jiatong Shi, William Chen, Siddhant Arora, Yoshiki Masuyama, Takashi Maekaku, Yihan Wu, Junyi Peng, Shikhar Bharadwaj, Yiwen Zhao, Samuele Cornell, Yifan Peng, Xiang Yue, Chao-Han Huck Yang, Graham Neubig, and Shinji Watanabe
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
  47. Pronunciation NAACL
    Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment
    Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, and David R Mortensen
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
  48. Speech-LLM NAACL
    VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
    Yifan Peng, Krishna C Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, and Boris Ginsburg
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2025
  49. Evaluation ICLR
    Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
    Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, Yi-Jen Shih, Jiatong Shi, William Chen, Xuanjun Chen, Chi-Yuan Hsiao, Puyuan Peng, Shih-Heng Wang, Chun-Yi Kuan, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Fabian Ritter-Gutierrez, Ming To Chuang, Kuan-Po Huang, Siddhant Arora, You-Kuan Lin, Eunjung Yeo, Kalvin Chang, Chung-Ming Chien, Kwanghee Choi, Cheng-Hsiu Hsieh, Yi-Cheng Lin, Chee-En Yu, I-Hsiang Chiu, Heitor R. Guimarães, Jionghao Han, Tzu-Quan Lin, Tzu-Yuan Lin, Homu Chang, Ting-Wu Chang, Chun Wei Chen, Shou-Jen Chen, Yu-Hua Chen, Hsi-Chun Cheng, Kunal Dhawan, Jia-Lin Fang, Shi-Xin Fang, Kuan-Yu Fang Chiang, Chi An Fu, Hsien-Fu Hsiao, Ching Yu Hsu, Shao-Syuan Huang, Lee Chen Wei, Hsi-Che Lin, Hsuan-Hao Lin, Hsuan-Ting Lin, Jian-Ren Lin, Ting-Chun Liu, Li-Chun Lu, Tsung-Min Pai, Ankita Pasad, Shih-Yun Shan Kuan, Suwon Shon, Yuxun Tang, Yun-Shao Tsai, Jui-Chiang Wei, Tzu-Chieh Wei, Chengxi Wu, Dien-Ruei Wu, Chao-Han Huck Yang, Chieh-Chi Yang, Jia Qi Yip, Shao-Xiang Yuan, Vahid Noroozi, Zhehuai Chen, Haibin Wu, Karen Livescu, David Harwath, Shinji Watanabe, and Hung-yi Lee
    In Proceedings of the International Conference on Learning Representations (ICLR) 2025
  50. Dialogue ICLR
    Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
    Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, and Shinji Watanabe
    In Proceedings of the International Conference on Learning Representations (ICLR) 2025
  51. Compression ICLR
    Context-aware Dynamic Pruning for Speech Foundation Models
    Masao Someki, Yifan Peng, Siddhant Arora, Markus Müller, Athanasios Mouchtaris, Grant Strimel, Jing Liu, and Shinji Watanabe
    In Proceedings of the International Conference on Learning Representations (ICLR) 2025
  52. Speaker ICASSP
    Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
    Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Li-Wei Chen, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Tatiana Likhomanenko, and Barry-John Theobald
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
  53. ASR ICASSP
    Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
    Brian Yan, Vineel Pratap, Shinji Watanabe, and Michael Auli
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
  54. ASR ICASSP
    Hypothesis Clustering and Merging: MultiTalker Speech Recognition with Speaker Token Estimation
    Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
  55. TTS ICASSP
    Preference Alignment Improves Language Model-Based TTS
    Jinchuan Tian, Chunlei Zhang, Jiatong Shi, Hao Zhang, Jianwei Yu, Shinji Watanabe, and Dong Yu
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
  56. SSL ICASSP
    Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
    Li-Wei Chen, Zakaria Aldeneh, Takuya Higuchi, He Bai, Ahmed Hussen Abdelaziz, Shinji Watanabe, Alexander Rudnicky, Tatiana Likhomanenko, and Barry-John Theobald
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
  57. Speech-Text ICASSP
    Bridging Speech and Text Foundation Models with ReShape Attention
    Takatomo Kano, Atsunori Ogawa, Marc Delcroix, William Chen, Ryo Fukuda, Kohei Matsuura, Takanori Ashihara, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
  58. SSL&ASR ICASSP
    Investigation of Spatial Self-Supervised Learning and Its Application to Target Speaker Speech Recognition
    Yoshiaki Bando, Samuele Cornell, Satoru Fukayama, and Shinji Watanabe
    In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
  59. AVSR AAAI
    Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
    Yihan Wu, Yichen Lu, Yifan Peng, Xihua Wang, Ruihua Song, and Shinji Watanabe
    In Proceedings of the AAAI Conference on Artificial Intelligence 2025