/dicResKit : Dictation Resource Kit

Using the Dictation Resource Kit

http://www.brains-N-brawn.com/dicResKit 9/3/2009 casey chesnut

comment(s)

Introduction

the Dictation Resource Kit (DRK) is an MS tool that allows you to create custom language models. a custom language model allows you to add your own words to SAPI to work with speech recognition (dictation) and speech synthesis. e.g. you could add medical / legal / klingon / other industry specific terms. these language models can be used with Speech Recognition of the OS along with custom applications. and the custom language models can be used by Vista and Windows 7. this article will provide an overview of working with the DRK.

there are 3 steps involved with creating a custom language model : Normalization, Generation, and Compilation. as a (poor) example, this article will show how to create a language model for music genres.

Installation

the DRK must be installed. for now it only supports a limited number of languages.

1) Normalization

first, we need a txt file of the words that will be used as input.

NgramGenre.txt - this file was generated by reading the ID3 tags from a bunch of mp3 files. notice that this list only has unique entries. if you want the end result to put more weight on the most popular genres, then you would not remove duplicate entries.

the normalization process will now parse this txt file and replace abbreviations, symbols, numbers, etc. with their expanded word form. e.g. '0' will become 'zero' and '&' will become 'and'

the DRK is driven by an XML input file

Normalize.xml - options

you have to specify the 'corpus' input file. it also allows you to specify error and output logs.

NormCorpusIn.txt - corpus input file. it contains file pairs for the non-normalized input and the output path for normalized output. this example only has one input and one output, but you can list more input and output pairs.

running the normalization process will result in these 3 output files.

NgramGenreNormalized.txt - this is the output file with the normalized words.

NormLog.txt - empty

NormErrLog.txt - empty

NOTE normalization is optional. also, i wish the text normalization function was built into System.Speech and Microsoft.Speech so that my custom apps could call it directly.  this would allow my apps to perform normalization when the underlying data is being gathered and stored directly in the database for searching against. instead, i have to build the database, run the normalization process, and then write more custom code to map the normalization result back to the dataset for searching.

e.g. string normalizedText = System.Speech.StringUtil.Normalize("non-normalized text");

2) Generation

this step will generate the statistical language model from the list of normalized words. you will need another XML file to drive the process.

GenLM.xml - options

the same as Normalization, it must specify the corpus input file. it must also specify an SLM output file (statistical language model in binary ARPA format). you can also specify an ArpaLM file (text based statistical language model). NOTE there are other options that you can specify such as a list of words that must be included or excluded in the resulting vocabulary.

running the generation process with the config file above will result in 4 output files.

GenLMSlm.slm - binary ARPA format

GenLMArpalm.txt - text ARPA format

GenLMLog.txt - output log

GenLMErrLog.txt - empty

NOTE being able to generate a statistical language model in (binary and text) ARPA formats is cool. at this point, i wish the DRK allowed us to compile this result directly into an n-gram .cfg file which could be used as a System.Speech (desktop) or Microsoft.Speech (server) grammar. these grammar files could also be referenced by command-and-control grammars.

e.g. Grammar g = new Grammar("myN-Gram.cfg");

3) Compilation

this step will compile the statistical language model into a format that is usable by System.Speech. so we need another XML options file.

CompLM.xml

the input is the .slm file result from the Generation step. you can optionally specify a base language model (i.e. English). so this would combine the newly created industry-specific language model with the default language model. it can also take a dictionary file as input.

Dictionary.txt - optional Dictionary file for specifying word pronunciations and capitalization.

running the options file above will result in 5 output files.

CompLMDictInfo.txt - this file contains the pronunciations used by the language model. you will only get this output file if you are not using a base model.

CompLMLog.txt - output log

CompLMErrLog.txt - error log. this shows that some of the input words did not generate a pronunciation

Genre.dlm - binary language model format used by SAPI

Genre.ngr - binary n-gram format used by SAPI

Registration

the final step is to register the language model (.dlm and .ngr files).

Register.txt - adds registry keys to setup the language model as a dictation topic.

NOTE i wish registry keys were not needed because it requires admin privileges.

Speech Recognition (OS)

now that the language model is created and installed ... now we can use it.

if you are using 'Speech Recognition' for the OS, you can select your custom language from Speech Recognition - Dictation Topic - 'select your topic'. you would probably only do this if your custom language model also included the base language model. NOTE the pic below does not show any custom topics.

Custom Code

you can also use the compiled language model in your own custom applications.

SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
sre.SetInputToDefaultAudioDevice();
sre.RecognizeCompleted += new EventHandler(sre_RecognizeCompleted);
//string topic = "grammar:dictation";
//string topic = "grammar:dictation#spelling";
//string topic = "grammar:dictation#HowDoI";
//string topic = "grammar:dictation#URL";
//string topic = "grammar:dictation#Pronunciation";
string topic = "grammar:dictation#Genre";
DictationGrammar dg = new DictationGrammar(topic);
sre.LoadGrammar(dg);
sre.RecognizeAsync();

Helper Library

for my own purpose, creating the options XML files got annoying, so i created a C# library to make the DRK a little easier to use. this allows me to automate re-generation of the language models periodically as the underlying data changes. its called like this :

DrkUtil.GenerateNgram(string "input_corpus_file_path.txt", string "TokenId", bool normalize, bool register);

Conclusion

the DRK is a useful tool to allow us to create custom language models to be used by the OS or within our own custom applications. of course i would like to see it extended to be a little more user friendly (i.e. a .NET library), to support more languages, and to create n-gram .cfg files.

Source

C# source code for the helper library. you will also need to install the DRK itself.

Update

none planned

Future

probably some UCMA articles.