Speech Recognition and the Alphabet

Speech Recognition and the Alphabet

From TeleFlow

Jump to: navigation, search

Contents

Recognition of the Alphabet

Recognition of letters is just beyond the capacity of any current speech recognition engine, and probably all engines in the future as well. Here is why; think of the problems people have spelling to each other. The sounds Bee, Dee, Pee, and Tee are particularly easy to get confused. This has been a problem for humans since the dawn of spelling, and after a few disasters in battle due to miscommunication of simple instructions, the reason why the phonetic alphabet was created.

Alpha, Bravo, Charlie, Delta, Echo, Foxtrot, Golf, Hotel, India, Juliet, Kilo, Lima, Mike, November, Oscar, Papa, Quebec, Romeo, Sierra, Tango, Uniform, Victor, Whiskey, X-ray, Yankee, Zulu

This list was created after study of speech (linguistics) and a determination of what would be most recognized by speakers of any language. If we had all learned these in school, speech recognition would use this approach for gathering letters. Unfortunately, we don't all have the same diction for this effort, and consequently, we have no easy method for computer recognizing of alphabet.

Work Arounds

There are three ways in which you may be able to gather information using voice for alphabet, but by the very nature of the problem, none are particularly elegant.

Database Lookup

This is the way that you can make a system appear to understand. If you have a database of codes, you can narrow down the possibilities.

"Please speak the letters of the code now:"

From this point the speaker may say "A Zee G". Providing an algorithm as listed above, the we may be able to create a list for AZG to be a set of (AZG, APG, ECG, ETG) etc. If your database has listed APG and ETG, you might ask the question:

"Did you say 'APG'?"

From there the speaker would say yes or no, and you could continue to the next question. For this approach, you must have a clean database and an method to select a list of codes that could be specific to the caller.

Continuous Alphabet

We don't recommend this, but other ways to make it easier to recognize the the alphabet is to make alphabet words. For example:

"A" would be spoken "Aay, Bee, Cee" ("A,B,C")
"G" would be spoken "Gee, Ach, Iee" ("G,H,I")

The first letter of the letter group indicates the letter. Some people can do this, many cannot.

Touch Tones Text Messaging

Now that text messaging is a common means to enter letters, why not ask you callers to do the same using the TeleFlow Get Letter step. It can be implemented so that the caller can press touch tones to spell just like text messaging. This works very nicely, and has become familiar to many.

Recommended: Transcription Method

This method is hands down the best method to ensure your customers don't end up swearing at the voice application. Its quick, efficient, and has a very high degree of recognition. Have them record their name, and then spell it:

"Please speak your name, and then spell it now:"

The caller might then say:

"Joe Bloggs, Jay, Oh, Eee, Bee, Ell, Oh, Gee, Gee, Ess"

At the end of the day, a receptionist could take a few minutes at 10 seconds per name, to type them in though a special GUI application. Please contact us for additional information.

Recognition of Names and Places

If you are expecting certain names and places to be spoken, you can define the sounds of those words using the Speech Recognition Grammar Specification(SRGS). With pronunciation breakdowns, it is helpful to resort to Lumenvox's phonemes for pronunciation instead of spelling out your alternate pronunciation. Here's an example, using "Goderich", when pronounced as "God-Rick":

(goderich | "{G AO D R IH K}")

Note the syntax. Phonetic pronunciations are enclosed in double quotes and curly braces. The list of phonemes is in the help file, installed with your version of LumenVox. If you installed Lumenvox in the default location, you'll find this file at

C:\Program Files\Lumenvox\SRE\LumenVox_Engine.chm

You can find the phoneme section in the Contents, under "Programmers Guide", 2nd from the bottom.

In addition, you should trim the pronunciation list. When a list of alternate pronunciations grows beyond a particular size, it ends up confusing Lumenvox (and the programmer!) more than it helps. I have no idea what the "optimal" size is. I'm betting its subjective depending on the word or phrase you're trying to recognize. I would recommend you delete all but the most basic alternate pronunciations of Goderich, including "Goderich" itself (I doubt it is ever pronounced as its spelled), and replace the list with alternate pronunciations constructed from the Lumenvox phonemes.

This is a slower way to build up your grammar, but yields more accurate results. We've had to do this on our own system with a few of the pronunciations. My name looks like this:

((Tim | Tem | Tam) [Forner])

Whereas, this is our SRGS rule for "Support Department":

(("{S AH P AO R T | S AX P AO R T}") [department])

The alternate pronunciations of my name brings me to another point. Speech recognition engines don't "hear" quite the same way we do. You wouldn't think that an alternate pronunciation of "Tam" would be necessary for my name. However, until we put that in there, certain people calling could not get accurate recognition without it. To my ear, even with their accent, the word sounded more like "Tim" than "Tam", but that was the trick to make it work. That's the nature of Speech Recognition. Using Touch-Tones, a seven is always a seven, but in SR, sometimes "Tim" is "Tam"!

The process of updating the SRGS grammars is called training, and can be an exhaustive effort if tools were not created to help. TeleFlow SpeechTrainer was created to make this process easier.