Course Logistics

  • Instructor: Shinji Watanabe
  • TAs: Jiatong Shi, Siddhant Arora
  • Time: MW 3:30PM – 4:50PM
  • Location: GHC 4211
  • Discussion: Piazza

Grading

  • Grading policies
    • Student presentation
    • Assignments
    • Term Project
  • We will use gradescope

Syllabus

  • This is a tentative schedule.
  • The slides will be uploaded right before the lecture (in piazza).
  • The vidoes will be uploaded irregulaly after the lecture due to the edit process (in piazza).
Date Lecture Topics Slides/Videos
1/16 Course overview Course explanation and introduction
1/23 Speech processing overview
1/25 Speech recognition part I
1/30 ESPnet tutorial I
2/1 ESPnet tutorial II
2/6 Speech recognition part II
2/8 SSL models for speech recognition
2/13 Speaker Recognition
2/15 Speaker Diarization
2/20 Language model
2/22 Database, Data preparation
2/27 Multi-speaker ASR
3/1 Midterm project event
3/13 Multilingual speech recognition
3/15 Speech translation
3/20 Speech/audio classification
3/22 Spoken language understanding
3/27 Single-channel speech enhancement
3/29 Multi-channel speech enhancement
4/3 Text to speech (text2mel)
4/5 Text to speech (vocoder, joint model)
4/10 System I: speech-to-speech translation
4/12 System II: spoken dialog system
4/17 Guest lecture
4/19 Guest lecture
4/24 Project event I
4/26 Project event II

Assignments

Will be announced during the course