Tools for Multimodal Development

Expanding Debbie Dahl's 2014 SpeechTek presentation on Tools for MultiModal Development

DRAFT

AVIOS: Applied Voice Input/Output Society

avios.org
Student Contest 2014-15
Mobile Voice Conference Presentations 2008 - 2014
AVOIS list of tools for speech application development (original)

Application Development links

There are many available free and open source components for building every part of a multimodal application Understand your requirements and evaluate carefully!

Speech Recognizers

Free

Microsoft Windows Speech Recognition 8.0
Android Speech Recognition API
WebSpeech API forChrome. IntroThe new JavaScript Web Speech API makes it easy to add speech recognition to your web pages. This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later.
iSpeech (for mobile) iSpeech
- The iSpeech: iSpeech API allows developers to implement Text-To-Speech (TTS) and Automated Voice Recognition (ASR) in any Internet-enabled application. They have SDKs for various mobile, computer, and web platforms:
- Mobile
- Desktop/Server
- Web
Developers can build programs that use iSpeech services by using the correct iSpeech SDKs.
- Note, while it says "Free" and there is no cost to sign up as a developer, there is also a price per use and price per download model, and itn't not completely clear what is free and what costs.
Speech Mashup Guide More on the Mashup from Thomas

Brandeis is in the Apple iOS Univeristy Developer Program , which provides the Apple Developer Library There is a lot of informatin available from development videos to sample code. Let me know if you're going to do this, since I need to give you log ins and register your devices

Open source

Sphinx-3 (C/C++), Sphinx-4 (Java)
PocketSphinx (for embedded systems) and PocketSphinx.js (Javascript)
Kaldi
- Kaldi is developed at JOhns Hopkins Univeristy, Itis similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend.
Open Ears (uses PocketSphinx)
- OpenEars makes it simple for you to add speech recognition and synthesized speech/TTS to your iPhone app quickly and easily. It doesn't use the network and there are no hidden costs or accounts to set up. If you have more specific app requirements, the OpenEars Plugin Platform lets you drag and drop advanced functionality into your app when you're ready. It lets you easily implement round-trip English and Spanish language speech recognition and English text-to-speech on the iPhone, iPod and iPad and uses the open source CMU Pocketsphinx, CMU Flite, and CMUCLMTK libraries, and it is free to use in an iPhone, iPad or iPod app

Low cost for development

AT&T Mobile Developer program
Nuance Mobile Developer program (NDEV)

Speech Synthesis

Speech Technology and Speech Recognition: AT&T Labs

http://www.youtube.com/watch?v=V0uwydE0HaA&feature=youtube_gdata

Open source

Festival
DFKI Mary (supports SSML and EmotionML)
Flite (Festival Lite, small footprint)
eSpeak (formant synthesis)

Free

OpenEars (iPhone)
Google (Android and Chrome)

Audio Analysis

Analyze your audio with the "Swiss Army Knife" of audio analysis programs

http://sox.sourceforge.net/

Wave Surfer: http://www.speech.kth.se/wavesurfer/
Transcriber 1.5.1 http://sourceforge.net/projects/trans/files/transcriber/

Other Speech Processing

EmoVoice (open source emotion recognition from voice)
ALIZE (open source speaker recognition and diarization)
MSR Identity Toolbox (open source speaker recognition)

Natural Language Understanding

Open source

Stanford CoreNLP tools (Java)
- Stanford CoreNLP provides a set of natural language analysis tools which can take raw text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, etc. Stanford CoreNLP is an integrated framework. Its goal is to make it very easy to apply a bunch of linguistic analysis tools to a piece of text. Starting from plain text, you can run all the tools on it with just two lines of code. It is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and which should be disabled. Its analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications. It includes the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, the coreference resolution system, the sentiment analysis, and the bootstrapped pattern learning tools.
Appache OpenNLP (Java)
- The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.
NLTK (Python)

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum.
LingPipe (Java)
- LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like:
  Find the names of people, organizations or locations in news
  Automatically classify Twitter search results into categories
  Suggest correct spellings of queries

Free (with conditions)

Wit.AI
<get diagram>

Dialog

AIML (Artificial Intelligence Markup Language) http://www.alicebot.org/aiml.html
- AIML (Artificial Intelligence Markup Language) is an XML-compliant language that's easy to learn, and makes it possible for you to begin customizing an Alicebot or creating one from scratch within minutes.
OpenDIAL https://code.google.com/p/opendial/
- OpenDial is a Java-based, domain-independent software toolkit for the development of robust and adaptive dialogue systems. Dialogue understanding, management and generation are expressed in OpenDial through probabilistic rules encoded in a simple XML format.
JVoiceXML http://jvoicexml.sourceforge.net
- A free VoiceXML interpreter for JAVA with an open architecture for custom extensions. Demo implementation platforms are supporting JAVA APIs such as JSAPI and JTAPI.
Apache Commons SCXML http://commons.apache.org/proper/commons-scxml/
- State Chart XML (SCXML) is currently a Working Draft specification published by the World Wide Web Consortium (W3C). SCXML provides a generic state-machine based execution environment based on Harel State Tables. SCXML is a candidate for the control language within multiple markup languages coming out of the W3C (see the latest Working Draft for details). Commons SCXML is an implementation aimed at creating and maintaining a Java SCXML engine capable of executing a state machine defined using a SCXML document, while abstracting out the environment interfaces.

Knowledge/Ontologies <get diagram>

Wolfram Alpha developer (personal/experimental)

Meanings: WordNet

Natural Language Generation Software

Allow more variation than templates Easier to maintain More complex to implement Downloadable systems: http://aclweb.org/aclwiki/index.php?title=Downloadable_NLG_systems

Development Tools

Dialog (and more generally graph) layout yEd IDE’s Eclipse NetBeans Audio Audacity – audio recording and editing ffmpeg – command line audio processing

yEd Graphical Layout Example:Personal Assistant App <get diagram>

Development Environments

iOS Android Windows Open Web Platform (HTML 5, etc.) AppInventor (Android only) Cross-platform development environments (Adobe PhoneGap/Apache Cordova, Appcelerator)

Open Standards

Often royalty-free (W3C standards)

Some Standards
Control
uman input

Phonetics

The ARPABET

Interesting Articles

Project Ouch - Outing Unfortunate Characteristics of HMMs (Used for Speech Recognition): http://www.icsi.berkeley.edu/icsi/projects/speech/ouch

http://voiceinthemachine.com/2012/07/03/whats-wrong-with-speech-recognition/