InproTK Wiki

An Incremental Spoken Dialogue Processing Toolkit

Status: Beta

Brought to you by: davidschlangen, timobaumann

Tutorial-babelfish

Tutorial Part 3: An Incremental Babelfish

This section will show you how to build a full end-to-end incremental system consisting of a speech input pipeline (see part 1), a speech output pipeline (see part 2), and the brain that sits inbetween.

Look at your Eclipse project for https://github.com/timobaumann/SimplisticBabelfish/ which you checked out in the previous tutorial. To make things easier, there is a very simple implementation for translating texts, based on a table-lookup in babelfish/Translator.

Tasks:

Setup the Pipeline

Our system will consist of several modules: speech input, speech translation, and speech output. These modules will be connected via the InproTK configuration system. Take a look at the file configs/iu-config.xml. The input module is called "currentASRHypothesis" (regardless of whether you use SimpleReco for speech recognition or SimpleText for textual input) and modules that should receive input from its right buffer are listed in <propertylist name="hypChangeListeners">. As you can see, the configuration already has the translationModule listed, which is then further specified. A synthesis module is set up as a listener to the translation module.

Use the -c switch to use this configuration with SimpleText or SimpleReco; in addition, use the -O switch which sets up an audio-out to your speakers:
SimpleText -O -c file:configs/config.xml
SimpleReco -O -c file:configs/config.xml

Implement a basic TranslationModule

babelfish/PartialTranslationModule gives an outline of what your translation module needs to accomplish. Even though your translation module outputs individual words, it is easiest if you output PhraseIUs as these are the expected input for SynthesisModule.

Test your module with SimpleText -- notice what happens when you revoke words that have already been spoken?

Improve TranslationModule

Phrases cannot be revoked that are already being spoken (it's just impossible to revoke what's already out in the world for everybody to see), thus, revokes that come from ASR input should be handled differently. You can cehck for whether a word is already being uttered by calling phrase.isUpcoming(). For example, your module could always lag one word behind the actual recognition, which would also (to some degree) cover word-reordering between languages (see below). Another idea is to apologize (e.g. by uttering "sorry") when a word translation is already spoken (i.e. cannot be revoked anymore) and the original word is being revoked.

Extend the Translator

Currently, Translator receives and outputs individual words, which allows only for word-by-word translations. Word-by-word translations are boring, as they never incur revokes (a future word never changes the translation of previous words).

Extend Translator so that it is fed with all words that are known so far and occasionally leads to revokes when this is called for by different word orderings in German and English. For example:

Nimm --> take
Nimm bitte --> please take

Finally, how about writing an adapter for a real translation API (such as the Microsoft translation API)? That would radically improve translation quality, especially if the translation works well with incremental input.

What's still missing:

The babelfish does not implement any turn-taking, that is, it's processing is purely reactive, it does not decide to start/stop speaking. There is some limited support for turn-taking in InproTK's FloorTracking module which allows for reactions (via time-outs) relative to voice activity (start/end of speech) combined with prosodic analysis.

Wiki: Tutorial