Post new topic Reply to topic  [ 11 posts ] 

Board index : TeleFlow Forums : General

Author Message
 Post subject: Call placement degrades quality of Text To Speech engine
PostPosted: Wed Jun 25, 2008 11:07 am 
Offline

Joined: Wed Mar 12, 2008 1:14 pm
Posts: 59
Location: Atlanta, GA
We are using a Neospeech engine (Kate 8), which sounds fine when played directly on a workstation, but for the duration of the call, the engine clearly degrades in quality. It sounds a little garbled, and sometimes the message is simply unrecognizable by the call recipient. We are playing a WAV file that is 8Mhz, 16-bit, etc, accordingly to a previous forum thread specification. Are there any recommendations for maximizing TTS engine quality over a phone call? Do you have any recommended TTS engines that you've observed still maintain engine quality over the duration of a call?

Thanks in advance,
Susan

_________________
Susan Burkley
Input Technologies
Atlanta, GA ~ GMT -4, Eastern Daylight Time

Those who believe in telekinetics, raise my hand.
- Kurt Vonnegut


Back to top
 Profile YIM 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Wed Jun 25, 2008 11:13 am 
Offline

Joined: Wed Mar 12, 2008 1:14 pm
Posts: 59
Location: Atlanta, GA
...the engine should not be a smiley...but 'Kate 8' ....

:)

_________________
Susan Burkley
Input Technologies
Atlanta, GA ~ GMT -4, Eastern Daylight Time

Those who believe in telekinetics, raise my hand.
- Kurt Vonnegut


Back to top
 Profile YIM 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Thu Jun 26, 2008 9:14 am 
Offline

Joined: Wed Mar 19, 2003 4:28 pm
Posts: 510
Location: Canada
I'm a little confused by your post; You mention TTS being the problem, and then that you are playing a Wav file according to the TeleFlow standard WAV file format.... which is it?

If you are using the Speak Text step, does the text to speak change from call to call? If so, can you capture a log showing the text that the step is speaking from a call where it couldn't be understood? Does that particular text sound OK in Simulator?

Using the "Speak Text" step in a call should have no appreciable affect on the audio (it converts the text using the Engine in exactly the same way, and it converts it to WAV audio of the same format for NMS or Simulator), so something else must be amiss.

Besides using TTS, are you playing .WAV or .VCE files in your application?


Back to top
 Profile WWW 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Thu Jun 26, 2008 9:34 am 
Offline

Joined: Wed Mar 12, 2008 1:14 pm
Posts: 59
Location: Atlanta, GA
We use the TTS engine to record a wav file, then we play the wav file for the recipient. When we play the wav file outside of TeleFlow, it sounds acceptable, but when we play it through TeleFlow for an outbound call, the quality is diminished.

We are not using the Speak Text step - we are using the Play File step.

Thanks,
Susan

_________________
Susan Burkley
Input Technologies
Atlanta, GA ~ GMT -4, Eastern Daylight Time

Those who believe in telekinetics, raise my hand.
- Kurt Vonnegut


Back to top
 Profile YIM 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Thu Jun 26, 2008 6:27 pm 
Offline

Joined: Wed Mar 19, 2003 4:28 pm
Posts: 510
Location: Canada
Since it doesn't make sense that a single file behaves different from call to call, I have a few questions/suggestions:

- Is there one individual WAV file you have created using TTS that is always played? If not, what is the process by which the files are created and/or introduced into the application?

- If there is just one file, why not record it with a microphone and use a canned audio file instead? At the very least, it would be interesting to know whether that audio file worked the same or different.

- If you have the system call you on a land-line(not using a speaker-phone), can you recreate the behavior you describe? (I.e. Does the audio file get garbled/degraded?)

- Are the people receiving these calls and experiencing problems using speaker-phones? Cells? Is it at all possible that their phones are cutting in/out and the system is getting the blame?


Back to top
 Profile WWW 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Thu Jun 26, 2008 6:42 pm 
Offline

Joined: Wed Mar 12, 2008 1:14 pm
Posts: 59
Location: Atlanta, GA
The behaviour is consistent across all calls and all phone types.

The WAV generated is a combination of a message with medications and is generated using a webservice. Since there are thousands of medications, with varying combinations of messages, it didn't make sense to record every variant.

Yes, this behaviour can be recreated.

_________________
Susan Burkley
Input Technologies
Atlanta, GA ~ GMT -4, Eastern Daylight Time

Those who believe in telekinetics, raise my hand.
- Kurt Vonnegut


Back to top
 Profile YIM 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Fri Jun 27, 2008 5:15 am 
Offline

Joined: Wed Mar 12, 2008 1:14 pm
Posts: 59
Location: Atlanta, GA
Also, we've been working with the Microsoft Speech SDK with success outside of TeleFlow, but the files are at a different kHz. We feel that a major part of the degradation occurs when we reduce it to 8. But of course if we try to push the file through at a higher hKz, then she won't work (even though it worked in simulator...). Are we really restricted to 8kHz or are there other options that we can try to WAV quality?

Thanks,

_________________
Susan Burkley
Input Technologies
Atlanta, GA ~ GMT -4, Eastern Daylight Time

Those who believe in telekinetics, raise my hand.
- Kurt Vonnegut


Back to top
 Profile YIM 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Fri Jun 27, 2008 10:10 am 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
What audio format are you converting from? Can the M$ Speech SDK support 8 KHz? Could you send us a sample of the original file and the converted one? Email them to support{at}engenic{dot}com.

Also, how does this relate to the NeoSpeech TTS issue? Are you still having problems using NeoSpeech, or have you worked that out now?


Back to top
 Profile WWW 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Fri Jun 27, 2008 10:22 am 
Offline

Joined: Wed Mar 12, 2008 1:14 pm
Posts: 59
Location: Atlanta, GA
We are not converting WAVs, we are CREATING them using the Speech SDK and the Neospeech TTS engine. The Speech SDK does support 8kHz, but higher kHz sound better (I think the default is 22? - I don't recall). I will send you samples.

Thanks,
Susan

_________________
Susan Burkley
Input Technologies
Atlanta, GA ~ GMT -4, Eastern Daylight Time

Those who believe in telekinetics, raise my hand.
- Kurt Vonnegut


Back to top
 Profile YIM 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Sat Jun 28, 2008 4:05 pm 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
The higher sampling rates sound better on audio devices such as CD players and your PC. However, they don't sound very good over traditional telephony networks. The network and the telephones on it are only able to reproduce sound at a particular rate. There's a bit of a debate of what the optimal audio quality is. There is two camps. Many contend its 8 KHz at 16-bits per sample or 11 KHz at 8-bits. TeleFlow follows NMS communications' lead, which is why we standardized on 8 KHz (in fact, at the time, NMS did not even support 11 KHz 8-bit - all the people in that camp apparently worked at Dialogic).

You should optimize your quality for the device that will be used to replay. In this case that is the phone, and you'll have your best success at 8 KHz 16-bit Mono.

Out of curiosity... why are you using two TTS engines? Does your application mix the two in the same call? If so, that may cause some problems as well. TTS engines have odd "accents". Mixing two engines in the same call would likely sound doubly-odd to the called party.


Back to top
 Profile WWW 
 
 Post subject: Re: Call placement degrades quality of Text To Speech engine
PostPosted: Mon Jun 30, 2008 1:15 pm 
Offline

Joined: Wed Mar 12, 2008 1:14 pm
Posts: 59
Location: Atlanta, GA
Thanks for that 'clarification', Tim. ;) We are only using one TTS engine - but we have two servers...

Take care,
Susan

_________________
Susan Burkley
Input Technologies
Atlanta, GA ~ GMT -4, Eastern Daylight Time

Those who believe in telekinetics, raise my hand.
- Kurt Vonnegut


Back to top
 Profile YIM 
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

Board index : TeleFlow Forums : General


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Style by Midnight Phoenix & N.Design Studio
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.