The voice is made even more convincing because it has been programmed to include verbal tics such as “ums”, “ers” and sighs.
Computer experts at IBM have invented the technology to be used on telephone helplines, satellite navigation systems and even on cameras or iPods.
It is so sophisticated that the devices will be able to pause for effect or cough to attract the users’ attention, spelling an end to the irritating monotone voices that have become a part of everyday modern life.
Andy Aaron, of IBM’s Thomas J Watson research group speech team, said: “These sounds can be incredibly subtle, even unnoticeable, but have a profound psychological effect. It can be extremely reassuring to have a more attentive-sounding voice.
“When you are on the telephone on an automated service helping you fix your computer or buy insurance, this could make the difference between being a happy customer or hanging up and cancelling a service.”
The new technology, called “generating paralinguistic phenomena via markup in text-to-speech syntheses”, has only recently been patented.
Mr Aaron said: “We are almost at the point where the voice is indistinguishable from a human, but that is not our goal. We don’t want to fool anybody.”
The software will even be able to react to a situation, telling us to “shhh” if they are being interrupted or coughing to gain attention.
It will also include an algorithm that can “learn” to add expressions at the correct point in a sentence.
Mark Gretton, from the satellite navigation manufacturer TomTom, said: “There is definitely scope for using non-word prompts to remind stressed-out drivers to take a turn, or simply pay more attention.”
Future Vision by Erwin Van Lun on this article
Image synthesis and speech synthesis are the easiest parts of building artificial life in a virtual world. Recognising objects, people, animals is a little more complex and there are plenty of meanings in this too. Being able to understand complex human communication and the abstract concepts we talk of, that’s a whole step further. Many steps further. But it’ll come too. Once we understand this properly we can talk automatedly about the most complicated subjects in any language, in any accent. That’s coming. It’s coming.