Speech Recognizer analysis

For this assignment, you will work on pairs/groups to compare speech recognizers according to one of the following dimentins

  • Diachronic: Changes within one recognizer from 2000 to 2006.
  • Synchronic: Changes within across recognizers in one time frame (2006 preferred).
  • Application/language specific: Differences between the general descriptions (e.g. as described in the journal papers) and the particular application or language.

Guidelines for Presentations

1. Overview

  • Task (dimension of comparison)
  • Companies/groups

2. 2-3 interesting challenges/approaches/accomplishments described in the papers

3. Evaluation results and the contribution of the things you've described to the results (postive or negative)

Topic/paper selection process

I'll send you a message on Latte. Respond with your top 3 choices of topic from the list of 8 below. Each topic has some suggested papers, but you can also look at others on the list (the IEEE papers are very complete and the NIST results slides summarize performance). You can also do additional research, but it is not required. There is a lot of information here.

You will have time in class to meet briefly to get started (date TBD). Make sure you've read the papers assigned to you and come up with a list of interesting similarities or differences among the papers. You may use any of the other papers for background if you think it will be helpful. It's important you're ready to discuss the papers in class.

Your group will find a set of 2-3 interesting things you'd like to present on, focusing on the differences over time, between recognizers, in language, etc. and create a plan for a ~10 minute presentation (you won't have time to complete it, just to discuss the points and divvy up the work). Make sure to include a high level description (e.g. who's recognizer it is, what the domain is, etc). And your names! Also list the papers you used. Submit the final set of slides on Latte before class on the first day of presentations (see schedule)

NOTE: This is an exercise in reading papers you won't completely understand. They were written by teams of individuals, each specializing in just one aspect of the systems. Find places where the techniques and results are understandable and interesting. Try to "get through" the areas that are hard and focus on the areas you understand more.

Layout of the papers in time and topic

  BBN IBM SRI NIST
2000 The 2000 BBN Byblos LVCSR system Recent Improvements in Speech Recognition Performance The SRI March 2000 Hub-5 Conversational Speech Transcription System NIST 2004 STT
2004 The BBN RT04 Broadcast News Transcription System
The 20004 BBN/LIMSI English CT speech Recognition System
The IBM Conversational Telephony System for Rich Transcription SRI’s 2004 Broadcast News Speech to Text System NIST 2004 STT
2006 Advances in the Transcription …within the combined EARS BBN/LIMSI system Advances in Speech Transcription at IBM under the DARPA EARS program Recent Innovations in SPeech0to0text Transcription at SRI-ICSI-UW NIST 2004 STT
Languages Progress in the BBN 2007 Mandarin Speech to Text System
BBN 2005 Arabic
IBM 2007 Arabic SRI 2006 IEEE Journal Paper NIST 2004 STT
Apps: Meeting Recorder The IBM Rich Transcriptin 2007 ... for Lecture Meetings The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System NIST 2007 Meeting Recorder Results
Apps: Spoken Term Detection IBM Spoken Term Detection (STD) 2006 Queensland Spoken Term Detection 2006 NIST 2006 STD Results/td>

Links to all papers: