CS136a Assignments 2014

Due Dates for 2014 still under construction

Submitting Assignments:

When submitting assignments

Zip all files together with your name in the file name
Include you name in the zip file
Submit through Latte

Late submission rules

You can only get an extension by asking for permission before the due date
No permission will be granted after the due date except for cases of dire emergency
While I accept all reasonable excuses, don't overuse the privilege
Waiting until the 11th hour to start an assignment is not a reasonable excuse

Links to Assignments

Assignment 1: Create and test a speech Grammar on the AT&T Mashup

Language Modeling Paper Reviews: Presentation on October 21

Assignment Three: Language Models and Perplexity Due October 28

Speech Recognizer Analysis and Presentations

Final Project: Build a speech application for submission to the AVIOS Student Contest

Assignment 1: Create and test a speech Grammar on the AT&T Mashup

Part One: Create and test a grammar for ordering Pizza

Basic installation and testing: Due 9/12 (nothing to submit)

Grades will be based BOTH on the elegance (reusability and perspecuity) of your grammars and on the actual performance.

Step 1: Create your own batch test for a pizza ordering grammar

Write a grammar for a one sentence pizza order (this is just to work on grammar and recognition, not dialog yet). Use the Dominos menu, but limit it to pizzas, not drinks, salads, etc. https://order.dominos.com/en/pages/order/menu.jsp#/menu/category/all/

Use the WBNF grammar format described in the AT&T Mashup Guide. Start with a small grammar, with just a couple options in order to test the format.

Create your own test set by writing out 10 sentences that are "in grammar" and record them as wav files. The file names should be in the form YOURNAME_001, YOURNAME_002,... Double check the recordings against the transcripts to make sure they are accurate (you can change the transcription rather than re-record, but just make sure the results are in grammar). The transcription should have one utterance per line with the file name in parens at the end., as follow. The format of the file names and the "reference" transcription will be important for later parts of the assignment.

I'd like two large cheese pizzas with extra cheese (YOURNAME_001)

NOTE: The files need to be 8KHz, so you may need to downsample them to work on the mashup. Sox is a good choice for those who like command line interfaces. Many other audio apps allow you to downsample.

Step 2: Test the grammar on the AT&T Mashup

Sign up to be a developer for the AT&T Mashup. Here's the linK: (NOTE: Use this link and DO NOT sign up to be an AT&T Developer)

https://service.research.att.com/smm/register.jsp

(NOTE: Use this link and DO NOT sign up to be an AT&T Developer) You can get the latest Mashup documentation once you have signed up.

Upload your grammar to the mashup. Create a program (in any language you want) that loops through your directory of wav files and sends them to the Mahsup. (HINT: Start small. Upload a small grammar for something like "One large cheese pizza please" and just put one wav file in the directory before you tackle the rest of the menu.) Here's a link to a sample test script for the actual call to the mashup. You need to parameterize the filenames and put in your own grammar name and uuid. NOTE: Not all computers come with wget. FIgure out how to get it if you don't have it already.

Use your own files for an initial test. We will provide an additional test soon.

Some notes:
- If you run into some issue and you solve it, please share it with the class so everyone's not spinning the same wheels using the latte forum.
- If you run into some issue and can't solve it, ask forum. I'll have a contact at AT&T for us to use, but I don't want that person bombarded by the same questions over and over. Help each other first.

Step 3: Score the results to get Word Error Rate (WER), Insertions, Deletions, etc.

Download SCLIte (links below)

http://www.itl.nist.gov/iad/mig/tools/ (Speech Recognition Scoring Toolkit)

http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm

Write a script that takes the output of the Mashup and turn it into the ".trn" format used for SCLite. Using that and the transcription file you created, run your results through SCLite. Look at the results and make any grammar changes that would improve performance and run it again.

Submit your script, grammar, and sclite results (sys and pra files). Zip the files, put your name in the zip file, and submit under "Baseline_Mashup_results" on Latte.

Step 4

Improve results and test on a broader test set: due Sept 19

A new set of audio files and reference file will be made available here. Run this set through the Mashup and the results through SClite. This will produce an error rate for each "speaker" in the set. Keep this as your baseline.

Review the results and make changes to your grammar to improve the results. You can iterate on this until you are satisfied.

Submit on September 19th:

1. The sclite files for your final results.

2. A comparison of the baseline results and final results to show how your grammar changes improved performance.

3. A brief analysis of the remaining errors discussing what you think the cause of the errors are (e.g. accents, noise, consfusable workds, etc.)

Assignment 1b: Moving to Statistical Language Models. Due October 3.

This next assignment will be use statistical language models. The first part is using our finite state grammar to generate training data. Next, you will also be asked to collect/create training data. We will be using this data directly in the Mashup, as David Thomson showed you, and running other kinds of LM creation and evaluation tools.

Part One: Evaluate a statistical grammar based on sentences generated from your grammar.

Use the sentence generator (which you will write given Cory's code to modify--see Latte) to generate sentences from your finite state grammar.

For the Generator, you are to produce a program that will generate sentences from your grammar (lots of sentences, capturing as much variability as possible)

You may work in pairs (trios at the most) and you can use as much of the attached code as you want, except that the end code must go from the BNF format (roughly the same as JSGF) to sentences, not go through GRXML.

I am attaching are two versions of Cory Sentence Generation Code from 2013 and 2014. Only the earlier one starts from JSGF, but he may have improved his code so I'm attaching both.

Your goal for this part of the assignment is to get sentences from your grammars. Your learning goal is to read someone else's code and reformulate it to do what you want (a *very* important skill in industry).

Part Two:

In this part, you will find out how well the sentences you produced matched the test using both perplexity and word error rate. You should use the original test set to get a first cut on this performance. We're looking at whether a new "true test" is available, but in either case you'll need to submit your "dev test" numbers.

Create the training data

Generate 1000 sentences (to start)
Check the format required by the Mashup and make sure your data is in the right format for each.

Word error rate

Now take that set of sentences and use them to generate a statistical language model in the Mashup and run the test through.
Compute the word error rate, etc, as you did before using SClite.

What to submit:

A description of what you did, including how many sentences you used
A comparison with your previous grammar-based results and why you think they were better/worse

Part Three: There's no data like more data. (John will do this. You just need to submit 1000 sentences by TBD)

We will make the combined training data from all of you available in a single training set
Use this new training set to do one more perplexity run and one more Mashup run.
Turn in your perplexity and WER results (which theoretically should be the same for all of you.)
NOTE: For this part, you do not need to change anything. Just run the same model building, perplexity, and Mashup calls you ran on your individual sets.

Assignment Two: Pocketsphinx

Part One: Testing your grammar on Pocketsphinx

In this assignment you will do exactly what you did for the Mashup using Pocketsphinx, which is available on Sourceforge (and maybe elsewhere). Download it, unpack it, and follow the instructions in the INSTALL file.

NOTE: You'll just be doing the grammar part now. We'll build statistical grammars for Pocketsphinx when we get to the language modeling section of the course

Use the same grammar, changing it to JSGF format (I think just the header)
Submit the .sys and .pra, and your grammar

NOTE: Pocketsphinx is a lot slower than the Mashup. Leave some time to run it.

Part Two: Testing a statistical grammar on Pocketsphinx

Using the same sentences you used for the AT&T grammar, create a statistical language model for Pocketsphinx. Instructions are on the Pocketsinphinx wiki (http://cmusphinx.sourceforge.net/wiki/tutoriallm). There is a web service, but I'm not sure how well it works.

HOLD OFF on this til we understand how to better use stat grammars in pocketsphinx.

Assignment Three: Language Models and Perplexity Due October 28

For this assignment, you are going to use an off the shelf, open source toolkit from CMU (CMUCU toolkit) to build the models and compute the perplexity. (there is an alternative from SRI, the a href="http://www.speech.sri.com/projects/srilm/">SRILM toolkit which some students have found easier, but try the CMU one first). There is paper on CMUCU toolkit, and documentation on typcial usage for CMUCU (the documentation overall is pretty good).

Data for this assignment is on Googledocs:
- DataForF2014.zip contains 3 sets of human generated sentences, each in .text and .snor formats (.text has the , makrers, .snor has the initials of who wrote it).
- F2013_all_setnences.txt contains one file with all the sentences generated by students last year
- S2014AllGenSents_byGram.zip contains generated sentences grouped according to the grammar that generated them
- S2014AllGenSents.train single file containing all the sentences generated in the summer 2014 class
Step 1: Produce a baseline perplexity

Use the 1000 sentences you generated to create the model
Use the existing test set to determine the perplexity on that model

Step 2: Try to improve the model by lowering the perplexity

Build a model with the human generated sentences (not your test set) and use it to filter automatically generated sentences but picking sentences or sets of sentences that have lower perplexity
Take the filtered sentences, build another model and test your original text set to see if you've lowered perplexity.
Repeat. Note you can also adjust the LM parameters, use different backofm methods, generate more sentences, etc.

Step 3: I'll give you a new test set to run, both extrinsic in the mashup with your filtered set of sentences and intrinsic with perplexity in the CMU toolkit with whatever parameter changes tyou want (and maybe in pocketsphinx if there's any easy way to do it).
Submit
- Baseline perplexity of the test set (80 sents) against a model built from your generated sentences.
- Try to improve the perplexity at least three different ways
- A descriptiion of each things you did, eg. filter the sentences (could be multiple ways), create new data, change parameters, etc and the resulting effect on perplexity.
- The final improved perplexity on the original test set
- The perplexity of your improved model on the new test set

Language Modeling Paper Reviews: Presentation on October 21

Work in pairs. Select a paper from the list below.

Determine what problem the authors are trying to address.

Find 1-2 other papers that are trying to solve the same problem or using a similar approach

preferably a newer paper and from different authors
papers from the reference list or from the same authors will be acceptable if you can't find others)

Create a short (3-5 slide) presentation that covers the following:

What core aspect of speech recognition is the paper addressing?
What specific techniques is the paper describing?
What is the intuition or motivation for the approach?
How was the technique evaluated?
What did you find that was particularly interesting (can be positive or negative)

Papers to choose from

CMU: Qin, Long, Ming Sun, and Alexander Rudnicky. "OOV detection and recovery using hybrid models with different fragments." Proc. Interspeech. 2011.
JHU: Parada, C, Dredze M, Filimonov D, Jelinek F. 2010.Contextual information improves oov detection in speech. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. :216–224

UW: Schwarm, Sarah E., Ivan Bulyko, and Mari Ostendorf. "Adaptive language modeling with varied sources to cover new vocabulary items." IEEE transactions on speech and audio processing 12.3 (2004): 334-342.
Li, X, Nguyen P, Zweig G, Bohus D. 2009. Leveraging multiple query logs to improve language models for spoken query recognition. Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. :3713–3716.
IBM: Mamou, Jonathan, et al. "Improved spoken query transcription using co-occurrence information." Twelfth Annual Conference of the International Speech Communication Association. 2011.
Nuance: Tam, Yik-Cheung, and Paul Vozila. "Unsupervised Latent Speaker Language Modeling." Twelfth Annual Conference of the International Speech Communication Association. 2011.
Microsoft: Hakkani-Tur, Dilek, Larry Heck, and Gokhan Tur. "Exploiting query click logs for utterance domain detection in spoken language understanding." Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011.
SRI: Tur, Gokhan, and Andreas Stolcke. "Unsupervised Language Model Adaptation for Meeting Recognition." Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. Vol. 4. IEEE, 2007.
MIT: Gruenstein, Alex, Ian McGraw, and Andrew Sutherland. "A self-transcribing speech corpus: collecting continuous speech with an online educational game." SLaTE Workshop. 2009.
University of Washington: Bulyko, Ivan, Mari Ostendorf, and Andreas Stolcke. "Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures." Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers-Volume 2. Association for Computational Linguistics, 2003

Final Project: Build a speech application for submission to the AVIOS Student Contest

Commercial tools have been made available through AVIOS (Applied Voice Input Output Society): http://www.avios.org/studenttools.htm as part of a student contest. (Extra credit will be given to students who submit their applications to the AVIOS contest). You may work alone or in pairs for this assignment. No more than three to a team.

Tool and Topic Discussion: Due Friday October 3:

Come up with 3 possible applications
You can do this in groups of 2-3 or individually (you can then form groups based on the ideas
Review the tools and take notes on: (NOTE: The toolset is still being revised as of 9/7. I'll update this part when it's set)

What part of the problem do they solve?
Can you use it to build a full speech application? On what platforms?
How do you specify the dialog/control structure?
How do you define grammars, prompts?
What are its stated advantages? Do you see any disadvantages?
What questions are unanswered?
NOTE: You are not expected to fully test the toolset, just evaluate what they present, including any demos.

Proposal: Due October 17:

Select your application

Write up your design, including the following elements:

Application description: High level description of the purpose of the application

User profiles (“personas”): Write user profiles of 3 users with different demographics

Use cases: Outline 3 use cases of tasks that can be accomplished in your application. See examples in Use Cases for a VXML Application
Scenarios. For each use case:

Write 2 scenarios of the steps someone would do to complete a task
Write a scenario of with a problem or error
At least one of your scenarios for each use case should involve voice

IMPORTANT: All of this may change (and in fact would be expected to) as you get deeper into building the application. But in order to be condsidered for change, it needs to be recorded.

Submission procedures:

Keep all of your documents in the class GoogleDocs

LINK TO CLASS FOLDER ON GOOGLEDOCS. Bookmark this. You need the link to get in.
Create a folder within "MobileApplications" for your group. Label your docs clearly so we can find them to grade
Create one file called "ApplicationOverview" with the core information in it (app name and brief description, group members)

Design: Due November 7:

Pick a toolset
Describe your application
Describe what resources you’ll need (grammar, prompts, etc)
Describe what back end data, etc you need and where you expect to get it
Indicate what issues you are running up against and might need help with
Submission: Same procedure. Add to or extend existing document set

MVP: Running prototype (Minimal Viable Product) of application: Due Dec 1. Submit the following:

Description of the applications functionality (does not have to be fully implemented
Instructions for how to test the application (indictates limits)
Start your website. See below in Contest Details for what to include. Make sure your "ApplicationOverview" file has a link to the website

Contest Details

Review the contest rules: http://avios.org/resources/contestrules2014.pdf
Submit the "Intent to Summit" form (this is late, but I'll send my apologies)
When you actually submit it (after the 17th), your instructions need to be accessible on a website. You can put up your own, or you can put pages up on the class website by sending them to me and I'll make sure they are linked.
If you're going to use the class website, use the following URL onthe "Submit form: http://www.cs.brandeis.edu/~cs136a/Fall2013_speech_apps.html
This page has all of the groups listed, and I'll link whatever pages you provide to the title of the application.
The website should describe the purpose of the application and include:

a. Instructions for installing the application if it must be installed by contest judges
b. Instructions for accessing and using the application
c. Any credentials such as user ids needed to access the application
d. Any limitations that users need to be aware of
e. Additional information such a description of the technologies used, evaluations, or user comments is optional

Final application and presentations: Due December 15th.

Your presentations/demonstrations should be about 15 minutes and include the following:

Description of the purpose of the application and the functionality
"Boxology" of the major components and how they interact
Additional information on any you think are particularly interesting
HIghlights of how it meets the contest criteris of robustness, usefulness, technical superiority, user friendliness, innovation, and creativity.
Instructions for how to test the application (same as contest rules above)
Demonstrate the application. Show the functionality, how it works best, and some of its limitiations.
Discuss future work.
Submit to AVIOS contest right away!

CS136a Assignments Fall 2014

Submitting Assignments:

Links to Assignments

Assignment 1: Create and test a speech Grammar on the AT&T Mashup

Language Modeling Paper Reviews: Presentation on October 21

Assignment Three: Language Models and Perplexity Due October 28

Speech Recognizer Analysis and Presentations

Final Project: Build a speech application for submission to the AVIOS Student Contest

Assignment 1: Create and test a speech Grammar on the AT&T Mashup

Part One: Create and test a grammar for ordering Pizza

Basic installation and testing: Due 9/12 (nothing to submit)

Step 4

Submit on September 19th:

Assignment 1b: Moving to Statistical Language Models. Due October 3.

Part One: Evaluate a statistical grammar based on sentences generated from your grammar.

Part Two:

Part Three: There's no data like more data. (John will do this. You just need to submit 1000 sentences by TBD)

Assignment Two: Pocketsphinx

Part One: Testing your grammar on Pocketsphinx

Part Two: Testing a statistical grammar on Pocketsphinx

Assignment Three: Language Models and Perplexity Due October 28

Language Modeling Paper Reviews: Presentation on October 21

Papers to choose from

Final Project: Build a speech application for submission to the AVIOS Student Contest

Tool and Topic Discussion: Due Friday October 3:

Proposal: Due October 17:

Design: Due November 7:

MVP: Running prototype (Minimal Viable Product) of application: Due Dec 1. Submit the following:

Contest Details

Final application and presentations: Due December 15th.

Your presentations/demonstrations should be about 15 minutes and include the following: