Revolutionizing Speech and Audio Evaluation

A Comprehensive Speech and Audio Evaluation Toolkit

Introducing VERSA: A Comprehensive Speech and Audio Evaluation Toolkit

The WAVLab team is excited to announce the public release of VERSA (Versatile Evaluation of Speech and Audio), our comprehensive toolkit designed to revolutionize how researchers and developers evaluate speech and audio quality.

Why We Built VERSA

Audio quality assessment has long been fragmented across numerous specialized metrics, each requiring different setups, dependencies, and formats. This fragmentation creates significant barriers for researchers and practitioners alike:

Implementing multiple metrics requires navigating various codebases
Comparing results across different papers becomes challenging due to inconsistent implementations
Setting up evaluation pipelines consumes valuable research time
Reproducing published results is often unnecessarily difficult

VERSA solves these problems by providing a unified framework that brings together over 80 evaluation metrics under a single, easy-to-use interface.

What Sets VERSA Apart

Comprehensive Coverage

VERSA provides seamless access to more than 80 evaluation and profiling metrics with 10x variants, covering:

Perceptual quality metrics that correlate with human perception
Intelligibility measures critical for speech applications
Technical measurements for detailed audio analysis
Distributional metrics to evaluate statistical properties across collections

Practical Benefits

Integrate once, access everything: No need to implement multiple evaluation tools
Consistent interface: Evaluate different aspects of audio quality through a unified API
Flexible input support: Works with file paths, SCP files, and Kaldi-style ARKs
Built for scale: Distributed evaluation support with Slurm integration
ESPnet compatibility: Tightly integrated with the popular ESPnet framework

Getting Started

Installation

git clone https://github.com/wavlab-speech/versa.git
cd versa
pip install .

For metrics requiring additional dependencies, our tools directory provides convenient installers.

Quick Example

Evaluating speech quality is as simple as:

python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt /path/to/reference/audio \
    --pred /path/to/generated/audio \
    --output_file results \
    --io dir

Real-World Applications

VERSA enables researchers and developers to:

Benchmark speech enhancement systems across multiple quality dimensions
Evaluate text-to-speech models for naturalness and intelligibility
Assess audio codec performance with comprehensive quality metrics
Compare speech recognition outputs with integrated WER and other metrics
Profile audio datasets for statistical properties and quality issues

Interactive Demo

Want to see VERSA in action? Try our interactive demo from

Our Gradio Huggingface Demo
Our Interspeech 2024 Tutorial: Colab Demonstration

Join the Community

VERSA is open-source and community-driven. We welcome contributions and feedback:

Star us on GitHub:
Report issues or suggest features on our GitHub repository
Contribute: Check our contributing guidelines

Citation

If you find VERSA useful in your research, please cite our papers:

NAACL 2025 Demo Paper:

Shi, J., Shim, H., Tian, J., Arora, S., Wu, H., Petermann, D., Yip, J. Q., Zhang, Y., Tang, Y., Zhang, W., Alharthi, D. S., Huang, Y., Saito, K., Han, J., Zhao, Y., Donahue, C., & Watanabe, S. (2025). VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music. Proceedings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics – System Demonstration Track. OpenReview

SLT 2024 Paper:

Shi, J., Tian, J., Wu, Y., Jung, J., Yip, J. Q., Masuyama, Y., Chen, W., Wu, Y., Tang, Y., Baali, M., Alharthi, D., Zhang, D., Deng, R., Srivastava, T., Wu, H., Liu, A., Raj, B., Jin, Q., Song, R., & Watanabe, S. (2024). ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs For Audio, Music, and Speech. 2024 IEEE Spoken Language Technology Workshop (SLT), 562-569. DOI: 10.1109/SLT61566.2024.10832289

Looking Forward

We’re committed to continuously improving VERSA with new metrics, enhanced usability, and expanded documentation. Stay tuned for upcoming features and don’t hesitate to reach out with suggestions!