Due Dates for 2014 still under construction

Submitting Assignments:

When submitting assignments

  • Zip all files together with your name in the file name
  • Include you name in the zip file
  • Submit through Latte

Late submission rules

  • You can only get an extension by asking for permission before the due date
  • No permission will be granted after the due date except for cases of dire emergency
  • While I accept all reasonable excuses, don't overuse the privilege
  • Waiting until the 11th hour to start an assignment is not a reasonable excuse

Links to Assignments

Assignment 1: Create and test a speech Grammar on the AT&T Mashup

Language Modeling Paper Reviews: Presentation on October 21

Assignment Three: Language Models and Perplexity Due October 28

Speech Recognizer Analysis and Presentations

Final Project: Build a speech application for submission to the AVIOS Student Contest

Assignment 1: Create and test a speech Grammar on the AT&T Mashup

Part One: Create and test a grammar for ordering Pizza

Basic installation and testing: Due 9/12 (nothing to submit)

Grades will be based BOTH on the elegance (reusability and perspecuity) of your grammars and on the actual performance.

Step 1: Create your own batch test for a pizza ordering grammar

Write a grammar for a one sentence pizza order (this is just to work on grammar and recognition, not dialog yet). Use the Dominos menu, but limit it to pizzas, not drinks, salads, etc. https://order.dominos.com/en/pages/order/menu.jsp#/menu/category/all/

Use the WBNF grammar format described in the AT&T Mashup Guide. Start with a small grammar, with just a couple options in order to test the format.

Create your own test set by writing out 10 sentences that are "in grammar" and record them as wav files. The file names should be in the form YOURNAME_001, YOURNAME_002,... Double check the recordings against the transcripts to make sure they are accurate (you can change the transcription rather than re-record, but just make sure the results are in grammar). The transcription should have one utterance per line with the file name in parens at the end., as follow. The format of the file names and the "reference" transcription will be important for later parts of the assignment.

I'd like two large cheese pizzas with extra cheese (YOURNAME_001)

NOTE: The files need to be 8KHz, so you may need to downsample them to work on the mashup. Sox is a good choice for those who like command line interfaces. Many other audio apps allow you to downsample.

Step 2: Test the grammar on the AT&T Mashup

Sign up to be a developer for the AT&T Mashup. Here's the linK: (NOTE: Use this link and DO NOT sign up to be an AT&T Developer)

https://service.research.att.com/smm/register.jsp

(NOTE: Use this link and DO NOT sign up to be an AT&T Developer) You can get the latest Mashup documentation once you have signed up.

Upload your grammar to the mashup. Create a program (in any language you want) that loops through your directory of wav files and sends them to the Mahsup. (HINT: Start small. Upload a small grammar for something like "One large cheese pizza please" and just put one wav file in the directory before you tackle the rest of the menu.) Here's a link to a sample test script for the actual call to the mashup. You need to parameterize the filenames and put in your own grammar name and uuid. NOTE: Not all computers come with wget. FIgure out how to get it if you don't have it already.

Use your own files for an initial test. We will provide an additional test soon.

Some notes:
- If you run into some issue and you solve it, please share it with the class so everyone's not spinning the same wheels using the latte forum.
- If you run into some issue and can't solve it, ask forum. I'll have a contact at AT&T for us to use, but I don't want that person bombarded by the same questions over and over. Help each other first.

Step 3: Score the results to get Word Error Rate (WER), Insertions, Deletions, etc.

Download SCLIte (links below)

http://www.itl.nist.gov/iad/mig/tools/ (Speech Recognition Scoring Toolkit)

http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm

Write a script that takes the output of the Mashup and turn it into the ".trn" format used for SCLite. Using that and the transcription file you created, run your results through SCLite. Look at the results and make any grammar changes that would improve performance and run it again.

Submit your script, grammar, and sclite results (sys and pra files). Zip the files, put your name in the zip file, and submit under "Baseline_Mashup_results" on Latte.

Step 4

Improve results and test on a broader test set: due Sept 19

A new set of audio files and reference file will be made available here. Run this set through the Mashup and the results through SClite. This will produce an error rate for each "speaker" in the set. Keep this as your baseline.

Review the results and make changes to your grammar to improve the results. You can iterate on this until you are satisfied.

Submit on September 19th:

1. The sclite files for your final results.

2. A comparison of the baseline results and final results to show how your grammar changes improved performance.

3. A brief analysis of the remaining errors discussing what you think the cause of the errors are (e.g. accents, noise, consfusable workds, etc.)

Assignment 1b: Moving to Statistical Language Models. Due October 3.

This next assignment will be use statistical language models. The first part is using our finite state grammar to generate training data. Next, you will also be asked to collect/create training data. We will be using this data directly in the Mashup, as David Thomson showed you, and running other kinds of LM creation and evaluation tools.

Part One: Evaluate a statistical grammar based on sentences generated from your grammar.

Use the sentence generator (which you will write given Cory's code to modify--see Latte) to generate sentences from your finite state grammar.

For the Generator, you are to produce a program that will generate sentences from your grammar (lots of sentences, capturing as much variability as possible)

You may work in pairs (trios at the most) and you can use as much of the attached code as you want, except that the end code must go from the BNF format (roughly the same as JSGF) to sentences, not go through GRXML.

I am attaching are two versions of Cory Sentence Generation Code from 2013 and 2014. Only the earlier one starts from JSGF, but he may have improved his code so I'm attaching both.

Your goal for this part of the assignment is to get sentences from your grammars. Your learning goal is to read someone else's code and reformulate it to do what you want (a *very* important skill in industry).

Part Two:

In this part, you will find out how well the sentences you produced matched the test using both perplexity and word error rate. You should use the original test set to get a first cut on this performance. We're looking at whether a new "true test" is available, but in either case you'll need to submit your "dev test" numbers.

  • Create the training data
    • Generate 1000 sentences (to start)
    • Check the format required by the Mashup and make sure your data is in the right format for each.
  • Word error rate
    • Now take that set of sentences and use them to generate a statistical language model in the Mashup and run the test through.
    • Compute the word error rate, etc, as you did before using SClite.
  • What to submit:
    • A description of what you did, including how many sentences you used
    • A comparison with your previous grammar-based results and why you think they were better/worse

Part Three: There's no data like more data. (John will do this. You just need to submit 1000 sentences by TBD)

  • We will make the combined training data from all of you available in a single training set
  • Use this new training set to do one more perplexity run and one more Mashup run.
  • Turn in your perplexity and WER results (which theoretically should be the same for all of you.)
  • NOTE: For this part, you do not need to change anything. Just run the same model building, perplexity, and Mashup calls you ran on your individual sets.

Assignment Two: Pocketsphinx

Part One: Testing your grammar on Pocketsphinx

In this assignment you will do exactly what you did for the Mashup using Pocketsphinx, which is available on Sourceforge (and maybe elsewhere). Download it, unpack it, and follow the instructions in the INSTALL file.

NOTE: You'll just be doing the grammar part now. We'll build statistical grammars for Pocketsphinx when we get to the language modeling section of the course

  • Use the same grammar, changing it to JSGF format (I think just the header)
  • Submit the .sys and .pra, and your grammar

NOTE: Pocketsphinx is a lot slower than the Mashup. Leave some time to run it.

Part Two: Testing a statistical grammar on Pocketsphinx

Using the same sentences you used for the AT&T grammar, create a statistical language model for Pocketsphinx. Instructions are on the Pocketsinphinx wiki (http://cmusphinx.sourceforge.net/wiki/tutoriallm). There is a web service, but I'm not sure how well it works.

HOLD OFF on this til we understand how to better use stat grammars in pocketsphinx.

Assignment Three: Language Models and Perplexity Due October 28

For this assignment, you are going to use an off the shelf, open source toolkit from CMU (
CMUCU toolkit) to build the models and compute the perplexity. (there is an alternative from SRI, the a href="http://www.speech.sri.com/projects/srilm/">SRILM toolkit which some students have found easier, but try the CMU one first). There is paper on CMUCU toolkit, and documentation on typcial usage for CMUCU (the documentation overall is pretty good).
  • Data for this assignment is on Googledocs:
    • DataForF2014.zip contains 3 sets of human generated sentences, each in .text and .snor formats (.text has the , makrers, .snor has the initials of who wrote it).
    • F2013_all_setnences.txt contains one file with all the sentences generated by students last year
    • S2014AllGenSents_byGram.zip contains generated sentences grouped according to the grammar that generated them
    • S2014AllGenSents.train single file containing all the sentences generated in the summer 2014 class
  • Step 1: Produce a baseline perplexity
    • Use the 1000 sentences you generated to create the model
    • Use the existing test set to determine the perplexity on that model
  • Step 2: Try to improve the model by lowering the perplexity
    • Build a model with the human generated sentences (not your test set) and use it to filter automatically generated sentences but picking sentences or sets of sentences that have lower perplexity
    • Take the filtered sentences, build another model and test your original text set to see if you've lowered perplexity.
    • Repeat. Note you can also adjust the LM parameters, use different backofm methods, generate more sentences, etc.
  • Step 3: I'll give you a new test set to run, both extrinsic in the mashup with your filtered set of sentences and intrinsic with perplexity in the CMU toolkit with whatever parameter changes tyou want (and maybe in pocketsphinx if there's any easy way to do it).
  • Submit
    • Baseline perplexity of the test set (80 sents) against a model built from your generated sentences.
    • Try to improve the perplexity at least three different ways
    • A descriptiion of each things you did, eg. filter the sentences (could be multiple ways), create new data, change parameters, etc and the resulting effect on perplexity.
    • The final improved perplexity on the original test set
    • The perplexity of your improved model on the new test set

Language Modeling Paper Reviews: Presentation on October 21

Work in pairs. Select a paper from the list below.

  • Determine what problem the authors are trying to address.
  • Find 1-2 other papers that are trying to solve the same problem or using a similar approach
    • preferably a newer paper and from different authors
    • papers from the reference list or from the same authors will be acceptable if you can't find others)
  • Create a short (3-5 slide) presentation that covers the following:
    • What core aspect of speech recognition is the paper addressing?
    • What specific techniques is the paper describing?
    • What is the intuition or motivation for the approach?
    • How was the technique evaluated?
    • What did you find that was particularly interesting (can be positive or negative)

    Papers to choose from

    Final Project: Build a speech application for submission to the AVIOS Student Contest

    Commercial tools have been made available through AVIOS (Applied Voice Input Output Society): http://www.avios.org/studenttools.htm as part of a student contest. (Extra credit will be given to students who submit their applications to the AVIOS contest). You may work alone or in pairs for this assignment. No more than three to a team.

    Tool and Topic Discussion: Due Friday October 3:

    • Come up with 3 possible applications
    • You can do this in groups of 2-3 or individually (you can then form groups based on the ideas
    • Review the tools and take notes on: (NOTE: The toolset is still being revised as of 9/7. I'll update this part when it's set)
      • What part of the problem do they solve?
      • Can you use it to build a full speech application? On what platforms?
      • How do you specify the dialog/control structure?
      • How do you define grammars, prompts?
      • What are its stated advantages? Do you see any disadvantages?
      • What questions are unanswered?
      • NOTE: You are not expected to fully test the toolset, just evaluate what they present, including any demos.

    Proposal: Due October 17:

    Design: Due November 7:

    • Pick a toolset
    • Describe your application
    • Describe what resources you’ll need (grammar, prompts, etc)
    • Describe what back end data, etc you need and where you expect to get it
    • Indicate what issues you are running up against and might need help with
    • Submission: Same procedure. Add to or extend existing document set

    MVP: Running prototype (Minimal Viable Product) of application: Due Dec 1. Submit the following:

    • Description of the applications functionality (does not have to be fully implemented
    • Instructions for how to test the application (indictates limits)
    • Start your website. See below in Contest Details for what to include. Make sure your "ApplicationOverview" file has a link to the website

    Contest Details

    • Review the contest rules: http://avios.org/resources/contestrules2014.pdf
    • Submit the "Intent to Summit" form (this is late, but I'll send my apologies)
    • When you actually submit it (after the 17th), your instructions need to be accessible on a website. You can put up your own, or you can put pages up on the class website by sending them to me and I'll make sure they are linked.
    • If you're going to use the class website, use the following URL onthe "Submit form: http://www.cs.brandeis.edu/~cs136a/Fall2013_speech_apps.html
    • This page has all of the groups listed, and I'll link whatever pages you provide to the title of the application.
    • The website should describe the purpose of the application and include:
      • a. Instructions for installing the application if it must be installed by contest judges
      • b. Instructions for accessing and using the application
      • c. Any credentials such as user ids needed to access the application
      • d. Any limitations that users need to be aware of
      • e. Additional information such a description of the technologies used, evaluations, or user comments is optional

    Final application and presentations: Due December 15th.

    Your presentations/demonstrations should be about 15 minutes and include the following:

    • Description of the purpose of the application and the functionality
    • "Boxology" of the major components and how they interact
    • Additional information on any you think are particularly interesting
    • HIghlights of how it meets the contest criteris of robustness, usefulness, technical superiority, user friendliness, innovation, and creativity.
    • Instructions for how to test the application (same as contest rules above)
    • Demonstrate the application. Show the functionality, how it works best, and some of its limitiations.
    • Discuss future work.
    • Submit to AVIOS contest right away!