Speech Recognizer analysis
For this assignment, you will work on pairs/groups to compare speech recognizers according to one of the following dimentins
- Diachronic: Changes within one recognizer from 2000 to 2006.
- Synchronic: Changes within across recognizers in one time frame (2006 preferred).
- Application/language specific: Differences between the general descriptions (e.g. as described in the journal papers) and the particular application or language.
Guidelines for Presentations
1. Overview
- Task (dimension of comparison)
- Companies/groups
2. 2-3 interesting challenges/approaches/accomplishments described in the papers
3. Evaluation results and the contribution of the things you've described to the results (postive or negative)
Topic/paper selection process
I'll send you a message on Latte. Respond with your top 3 choices of topic from the list of 8 below. Each topic has some suggested papers, but you can also look at others on the list (the IEEE papers are very complete and the NIST results slides summarize performance). You can also do additional research, but it is not required. There is a lot of information here.
You will have time in class to meet briefly to get started (date TBD). Make sure you've read the papers assigned to you and come up with a list of interesting similarities or differences among the papers. You may use any of the other papers for background if you think it will be helpful. It's important you're ready to discuss the papers in class.
Your group will find a set of 2-3 interesting things you'd like to present on, focusing on the differences over time, between recognizers, in language, etc. and create a plan for a ~10 minute presentation (you won't have time to complete it, just to discuss the points and divvy up the work). Make sure to include a high level description (e.g. who's recognizer it is, what the domain is, etc). And your names! Also list the papers you used. Submit the final set of slides on Latte before class on the first day of presentations (see schedule)
NOTE: This is an exercise in reading papers you won't completely understand. They were written by teams of individuals, each specializing in just one aspect of the systems. Find places where the techniques and results are understandable and interesting. Try to "get through" the areas that are hard and focus on the areas you understand more.
- 1. BBN Diachronic (BN): BBN 2000, BBN_2004_BN, BBN-LIMSI-iEEE06
- 2. Conversational vs. Broadcast News: BBN_2004_CT, BBN_2004_BN, BBN-LIMSI-iEEE06, NIST_rt04f_stt_results
- BBN 2004 Broadcast News
- BBN 2004 Conversational Telephony
- BBN 2006 IEEE Journal Paper
- NIST 2004 Speech Recognition Results
- 3. IBM Diachronic: IBM2000, IBM 2004, IBM_ieee_2006
- 4. Synchronic across major players: BBN_2004, IBM_2004, SRI_2004, NIST_rt04f_stt_results
- 5. Language (Chinese): BBN_2007_Mandarin, IBM 2007 Arabic, SRI-IEEE06, NIST_rt04f_stt_results
- 6. Language (Arabic): BBN_2005_Arabic, SRI-IEEE06, NIST_rt04f_stt_results
- 7. Application (Meeting recorder): SRI_ICSI_2007_meeting_Recorder, IBM_2007_lecture_meeting, RT07Results-v08
- 8. Application (Term detction): IBM STD 06sigir07, Queensland phonetic search, STD06-NIST-English_Results
Layout of the papers in time and topic
BBN | IBM | SRI | NIST | |
---|---|---|---|---|
2000 | The 2000 BBN Byblos LVCSR system | Recent Improvements in Speech Recognition Performance | The SRI March 2000 Hub-5 Conversational Speech Transcription System | NIST 2004 STT |
2004 | The BBN RT04 Broadcast News Transcription System The 20004 BBN/LIMSI English CT speech Recognition System |
The IBM Conversational Telephony System for Rich Transcription | SRI’s 2004 Broadcast News Speech to Text System | NIST 2004 STT |
2006 | Advances in the Transcription …within the combined EARS BBN/LIMSI system | Advances in Speech Transcription at IBM under the DARPA EARS program | Recent Innovations in SPeech0to0text Transcription at SRI-ICSI-UW | NIST 2004 STT |
Languages | Progress in the BBN 2007 Mandarin Speech to Text System BBN 2005 Arabic |
IBM 2007 Arabic | SRI 2006 IEEE Journal Paper | NIST 2004 STT |
Apps: Meeting Recorder | The IBM Rich Transcriptin 2007 ... for Lecture Meetings | The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System | NIST 2007 Meeting Recorder Results | Apps: Spoken Term Detection | IBM Spoken Term Detection (STD) 2006 | Queensland Spoken Term Detection 2006 | NIST 2006 STD Results/td> |
Links to all papers:
- NIST 2004 Speech Recognition Results
- NIST 2004 Meeting Results Evaluation
- NIST 2007 Meeting Recorder Results
- NIST 2006 STD Results
- BBN 2000
- BBN 2004 Broadcast News
- BBN 2004 Conversational Telephony
- BBN 2005 Arabic
- BBN 2007 Mandarin
- BBN 2006 IEEE Journal Paper
- IBM 2000
- IBM 2004
- IBM 2007 Arabic
- IBM 2007 Lecture Meetings
- IBM 2006 IEEE Journal Paper
- IBM 2007 Arabic
- IBM Spoken Term Detection (STD) 2006
- Queensland Spoken Term Detection 2006
- SRI 2000
- SRI 2004
- SRI 2007 Lecture Meetings
- SRI 2006 IEEE Journal Paper