Speech Processing (11-492/11-692/18-495)
Course Logistics
- Instructor: Shinji Watanabe
- TAs: Jiatong Shi, Siddhant Arora
- Time: MW 3:30PM – 4:50PM
- Location: GHC 4211
- Discussion: Piazza
Grading
- Grading policies
- Student presentation
- Assignments
- Term Project
- We will use gradescope
Syllabus
- This is a tentative schedule.
- The slides will be uploaded right before the lecture (in piazza).
- The vidoes will be uploaded irregulaly after the lecture due to the edit process (in piazza).
| Date | Lecture | Topics | Slides/Videos |
|---|---|---|---|
| 1/16 | Course overview | Course explanation and introduction | |
| 1/23 | Speech processing overview | ||
| 1/25 | Speech recognition part I | ||
| 1/30 | ESPnet tutorial I | ||
| 2/1 | ESPnet tutorial II | ||
| 2/6 | Speech recognition part II | ||
| 2/8 | SSL models for speech recognition | ||
| 2/13 | Speaker Recognition | ||
| 2/15 | Speaker Diarization | ||
| 2/20 | Language model | ||
| 2/22 | Database, Data preparation | ||
| 2/27 | Multi-speaker ASR | ||
| 3/1 | Midterm project event | ||
| 3/13 | Multilingual speech recognition | ||
| 3/15 | Speech translation | ||
| 3/20 | Speech/audio classification | ||
| 3/22 | Spoken language understanding | ||
| 3/27 | Single-channel speech enhancement | ||
| 3/29 | Multi-channel speech enhancement | ||
| 4/3 | Text to speech (text2mel) | ||
| 4/5 | Text to speech (vocoder, joint model) | ||
| 4/10 | System I: speech-to-speech translation | ||
| 4/12 | System II: spoken dialog system | ||
| 4/17 | Guest lecture | ||
| 4/19 | Guest lecture | ||
| 4/24 | Project event I | ||
| 4/26 | Project event II |
Assignments
Will be announced during the course