Your Planet Sustainable?Your Tribe Harmonious?Your Life Vibrant?
Future Proof Ideas since 2005, by Erwin van Lun

Speech technology a little more real

IBM Research has registered a new patent which makes speech synthesis a little more natural. Listen to a demo. It adds in coughs automatically and there are stops and random pauses. The differences are so small they're barely noticeable. According to IBM the voices are almost impossible to separate from real voices.

The voice is made even more convincing because it has been programmed to include verbal tics such as “ums”, “ers” and sighs.

Computer experts at IBM have invented the technology to be used on telephone helplines, satellite navigation systems and even on cameras or iPods.

It is so sophisticated that the devices will be able to pause for effect or cough to attract the users’ attention, spelling an end to the irritating monotone voices that have become a part of everyday modern life.

Andy Aaron, of IBM’s Thomas J Watson research group speech team, said: “These sounds can be incredibly subtle, even unnoticeable, but have a profound psychological effect. It can be extremely reassuring to have a more attentive-sounding voice.

“When you are on the telephone on an automated service helping you fix your computer or buy insurance, this could make the difference between being a happy customer or hanging up and cancelling a service.”

The new technology, called “generating paralinguistic phenomena via markup in text-to-speech syntheses”, has only recently been patented.

Mr Aaron said: “We are almost at the point where the voice is indistinguishable from a human, but that is not our goal. We don’t want to fool anybody.”

The software will even be able to react to a situation, telling us to “shhh” if they are being interrupted or coughing to gain attention.

It will also include an algorithm that can “learn” to add expressions at the correct point in a sentence.

Mark Gretton, from the satellite navigation manufacturer TomTom, said: “There is definitely scope for using non-word prompts to remind stressed-out drivers to take a turn, or simply pay more attention.”

Future Vision by Erwin Van Lun on this article

Image synthesis and speech synthesis are the easiest parts of building artificial life in a virtual world. Recognising objects, people, animals is a little more complex and there are plenty of meanings in this too. Being able to understand complex human communication and the abstract concepts we talk of, that’s a whole step further. Many steps further. But it’ll come too. Once we understand this properly we can talk automatedly about the most complicated subjects in any language, in any accent. That’s coming. It’s coming.

Related trends

Related postings

Archive

Twitter
RSS