JBS: Summer 2015

GOAL: Evaluate the performance of 3 speech recognizers

AT&T Mashup Speech Server
CMU Pocketsphinx
Google Chrome
Additional tools you'll need

SOX: The swiss army knife of Audio tools (or some other audio analysis tool)
SCLite: Speech evaluation software produced by NIST and used by speech research labs worldwid

FINAL SUBMISSION DUE MONDAY JUNE 22nd end of the day

Grades will be based BOTH on the elegance (reusability and perspecuity) of your grammars and on the actual performance.

Submit a zip file on latte with:

For each test (mashup 1 & 2, pocket sphinx and chrome) include the grammar file (where applicable), the .sys and .pra files. Make sure the file names or directory structure indicate which test each belongs to.
A document summarizing and analyzing the results:

with a chart showing sum/avg line of the sys file for baseline mashup, improved mashup, pocket sphinx and chrome

e.g.: Test # Snt # Wrd Corr Sub Del Ins Err S.Err

For your two mashup runs, discuss two examples from the .pra files of where performance improved and what changes you made to get that improvement and two examples of errors in the improved one and what you might have changed in the grammar to fix that.

Compare the best mashup run and chrome. Discuss examples where each had problems.
Say something constructive about pocket sphinx (this may be the most challenging part).

NOTE: If you did not get pocket sphinx working, email me with a detailed description of the block and your grammar. If it's not a known and fixable, I'll send you back results you can use in your writeup.

Steps:

Test 1:
- Record and transcribe a test set of 10 sentences for ordering pizza. Submit.
- Create a finite state grammar for ordering pizza. Run your audio through the AT&T Mashu.
- Evaluate the performance using the NIST software on your audio and the audio test set I give you.
- Improve your grammar and try again.
Test 2:
- Install Pocketsphinx on your computer
- Alter your grammar file, etc. to fit the format and naming conventions for Pocketsphinx
- Run a regression test
- Evaluate performance
Test 3:
- Alter the audio format to work on Google Chrome
- Run a test
- Evaluate performance

Data Preparation:

Record 10 sentences each a one sentence pizza order. Use the Dominos menu, but limit it to pizzas, not other things like drinks. https://order.dominos.com/en/pages/order/menu.jsp#/menu/category/all/

The first sentence should be very simple ("I'd like to order a large cheese pizza" so you can run a test of all the plumbing.
Name the files with the following format
- yourname>_1.wav, <yourname>_2.wav ...
The recordings should be .wav files at 16KHz (standard for microphone recordings). For these tests you will need one set of audio at 16kHz and one set at 8kHz. The unix tool "sox" is onw way to downsample audio, however there are other options as well. Check out the Audio Analysis tools page on Googledocs
You should put the 2 sets of .wav files (8kHz and 16Khz) in separate directories clearly marked.

Create a transcript file named <yourname>test.ref in the following format, making sure that the string in the parentheses is the same name as the audio file (minus the .wav extension). Double check the recordings against the transcripts to make sure they are accurate.
- I'd like to order a large cheese pizza (<yourname>_1)
- GIve me a pepperoni pizza with peppers and onions (<youname>_2)

Test 1: Create your own batch test for a pizza ordering grammar on the AT&T Mashup

Goal: Write a grammar for a one sentence pizza order. Use the Dominos menu, but limit it to pizzas, not drinks or other items. https://order.dominos.com/en/pages/order/menu.jsp#/menu/category/all/

Step 1: Write a grammar

To start: create a very simple pizza grammar (something like "I'd like a large cheese pizza") using the jsgf fomat. Use the WBNF grammar format described in the AT&T Mashup Guide. This format is very simiilar to JSGF except for the header and start symbol, which you'll use later. You can find info on jsgf at http://www.w3.org/TR/jsgf/.

Once that works, expand the grammar to accept any Dominos pizza order (of just pizzas).

Step 2: Run it on the AT&T Mashup

Sign up to be a developer for the AT&T Mashup, Here's the linK: (NOTE: Use this link and DO NOT sign up to be an AT&T Developer). Here is the Mashup Documentation (It's also available on the site once you have signed up.)

https://service.research.att.com/smm/register.jsp

Upload your grammar to the mashup and compile it. If there are errors in the grammar they will show up in the log window. Create a script that loops through your directory of wav files and sends them to the Mahsup. (HINT: Start small. Upload a small grammar for something like order "One large cheese pizza" and just put one wav file in the directory before you tackle the rest of the menu.) Sample scripts, etc are in the class dropbox.

Sample script to call the mashup with one audio file: call_reco.sh You need to parameterize the filenames and put in your own grammar name and uuid.

Sample script to loop through a directory of .wav files and calls the shell script on each on each one: test_through_mashup.pl. If you don't have/know perl, you can write it in anything.

Run the Mashup on all 10 files, so you will have 10 .emma results

More sample files:

Sample grammar file: NewsCC.wbnf
Sample output of the Mashup: C_help_46.emma
Sample transcription file: NewsCC_recotests_3-28.ref
Some notes:
If you run into some issue and you solve it, please share it with the class using Piazza so everyone's not spinning the same wheel.
If you run into some issue and can't solve it, ask forum. I'll have a contact at AT&T for us to use, but I don't want that person bombarded by the same questions over and over. Help each other first.
The AT&T research group was recently sold to Interactions. If you can't reach the Mashup (and you had succeeded in the past) let me know since they will be moving the servers at some point. I'll try to keep you all posted.

Step 3: Evaluate the grammar

Download SCLIte (links below)

http://www.itl.nist.gov/iad/mig/tools/ (Speech Recognition Scoring Toolkit)
http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm

Write a script that takes the output of the Mashup (emma format with one file per utterance) and turn it into the "trn" format used for SCLite (one file for all results, same as .ref used for transcription). Name that file with the extension ".hyp" (usually the name is the same as the .ref file). They should have the same number of lines--one for each audio file.

Run SCLite on your .ref and .hyp files. Look at the results and make any grammar changes that would improve performance and run it again.

sclite –r yourname_test.ref -h yourname_test.hyp -i rm –O results_dir/ -o all;

Submit your grammar, and sclite results (sys and pra files).

Step 4: Creating and testing a statistical grammar on the Mashup.

The Mashup will take a file of sentences and compile it into a statistical grammar.

Use the file All_sentences.train (on the googledrive) upload it into the mashup, compile, and run the test through. Follow the same procedure as the before to pull the results from the emma files into a .hyp and score with sclite.

Compare the errors to your handwritten grammar. Where is it better? Where is it worse?

Test 2: Create and test a speech Grammar pocketsphinx

Step 1: Set up Pocketsphinx

Turn your .bnf grammar into a jsgf grammar
Install Pocketsphinx on your computer
There may be multiple ways to run an off-line test in Pocketsphinx. Here's what the process I used:
in pocketsphpinx/test/regression/ use test-hub4.cards.sh as a model to create your own regression test
Create a directory pocketsphinx/test/data and put your grammar, audio and transcription there.
Create a ".fileid" file to list the test audio and put it in your data directory. NOTE: Again, start with a small test set (1 file, then your 10 files)
You call your version of the script test-hub4.cards.sh
Logs file and results will be in the same directory as the script

Step 2: Run

Run the full test set
Run SCLite on the .hyp file

Submit:

.sys and .pra files

NOTE: Also submit your 20 sentences (wav files and reference file). Make sure they match and are in the right format. Due Monday, [[June 30th]].

Test 3: Run your audio through Google Chrome

http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/

CS115 Speech Recognition Analaysis Assignments