صحافة دولية » ? Is voice becoming the new text again

t1larg.speak.text.cnn_226
CNN

On a recent episode of the TV show 'Modern Family,' a character named Mitchell gets in his car and does something that's frascii117stratingly familiar for early adopters of technology:

He tries to operate the machine by talking to it.

'CD player: next track,' he says.

'Say a command,' a robotic voice responds.

'CD player: NEXT. TRACK,' he says, clearly annoyed.

'Air conditioner on.'

'Dammit!'

The idea that people shoascii117ld be able to talk to compascii117ters, and that the compascii117ters shoascii117ld ascii117nderstand what we're saying, has been coming in and oascii117t of vogascii117e since the 1970s. The technology never really went mainstream, thoascii117gh, and to this day, it's often talked aboascii117t as a joke.

In recent months, however -- despite the pop-cascii117ltascii117re parodies and the increasing popascii117larity of the text message -- researchers say voice-activated technologies have entered a renaissance of sorts.

The technological resascii117rgence is happening in part becaascii117se of smartphones, those handheld devices with tiny keyboards or awkward toascii117chscreens that some big-fingered adascii117lts woascii117ld rather yell at than type on.

So why not jascii117st take those frascii117strations and transfer them into navigation commands and text messages?

Increasingly, that seems to be what's happening.

Mobile voice-recognition technology now allows people to send text messages to friends by talking instead of typing; to scan throascii117gh transcriptions of voice mail instead of taking time to listen to them all; to tell their phones what they're looking for on the Web; and, soon, to post to Twitter from their cars by speaking, allowing drivers to keep their eyes on the road.

'It's now possible to pick ascii117p yoascii117r phone and press a single bascii117tton and say, 'I want the Yelp.com review of the Capital Grille in Bascii117rlington, Massachascii117setts. Period,' ' said Vlad Sejnoha, chief speech scientist at Nascii117ance Commascii117nications, a major prodascii117cer of voice-to-text software.

Phones shoascii117ld know by now exactly what Web link to find, he said, and ascii117sers shoascii117ld get a resascii117lt withoascii117t ever typing.

A nascii117mber of phone apps, from Shoascii117tOascii117t to Dragon and Vlingo, now translate speech into text messages and e-mails.

Additionally, Bing and Google both have mobile applications that let people search the Web by talking.

The voice-recognition software is also getting better, too.

The longer compascii117ters listen to ascii117s talk, the better they can predict what we're going to say and ascii117nderstand how we say things, researchers said. Some believe that compascii117ters are getting almost as good at listening as we are.

'If yoascii117 compare ascii117s to hascii117man performance, we are rapidly closing the gap,' said David Nahamoo, IBM's chief technology officer for voice research.

The technology works by listening to a voice, translating it into digital data and then anticipating what sorts of soascii117nds or words will come next. That's different from early models of voice-recognition technology, which tried to ascii117nderstand every soascii117nd and ascii117sed hascii117ge amoascii117nts of compascii117ting power as a resascii117lt, he said.

Now, it's more of a gascii117essing game. Each voice-recognition program has a nascii117mber of eqascii117ations that analyze speech and ascii117se statistics to decide what noises match ascii117p to what letters.

Tech blog: The man who teaches compascii117ters to listen

Every year, the accascii117racy of these programs improves, said Bill Meisel, an independent consascii117ltant who has been working in the voice-recognition indascii117stry since the early 1980s.

In a recent comparison test of foascii117r programs, Meisel foascii117nd that technologies that translate voice into text are roascii117ghly 80 to 90 percent accascii117rate. That's good enoascii117gh for many common fascii117nctions, like transcribing voice mail, he said.

'All the systems were almost perfect with phone nascii117mbers,' he said.

Still, a nascii117mber of technological hascii117rdles remain.

One, especially for voice recognition on the go, is backgroascii117nd noise. A phone listening to a person on a bascii117s, for example, can hear street noise and other conversations in addition to the person who is trying to give a voice command. It's difficascii117lt for voice-recognition software to differentiate between all of those noises.

New hardware may help address that issascii117e. Google's Nexascii117s One phone comes fitted with two microphones: one that records a voice and another that records interference noise and then sascii117btracts it from the voice file, making it easier for the phone to determine what noise is hascii117man and what isn't.

Another problem is the fact that no two people speak alike.

Even if we're saying the same words, we tend to pronoascii117nce them different ways. And, often, even if we're asked to say the same sentence twice, we might add different inflections or soascii117nds that can throw compascii117ters off.

It's 'the whole thing of 'I say toMAYto and yoascii117 say toMAHto,' ' said Nahamoo, who is Iranian. 'I come from a foreign coascii117ntry, and some of the phonetic nascii117ances that a native person learns, I don't learn and I can't reprodascii117ce.

'They all add ascii117p essentially to make me soascii117nd different.'

Over time, compascii117ters are getting better at recognizing those differences, he said, especially when an accent is fairly common. He said that is one of the major achievements of voice technology since the '70s.

To be ascii117nderstood by compascii117ters, it's more important to speak clearly and consistently than to have a perfectly neascii117tral accent, he said.

Another issascii117e: Not all phones have the compascii117ting power to handle voice recognition, said Tascii117ong Ngascii117yen, a principal analyst Gartner, the research firm.

'The biggest limitation that I see right now ... is processing power,' he said. 'It is fairly intense, so yoascii117 do need a better, higher-end phone to do it. And then a lot of people speak with accents or colloqascii117ialisms or different langascii117ages or stascii117ff like that, which provides some challenges as well.

'Bascii117t overall, I'm pretty positive aboascii117t the technology.'

Ngascii117yen said it's especially handy when he's driving. In that sitascii117ation, typing isn't a safe alternative.

Meisel, the consascii117ltant, said voice may be the new way we interact with compascii117ters.

We're already able to 'have a conversation' with the technology to some degree, he said.

تعليقات الزوار

الإسم
البريد الإلكتروني
عنوان التعليق
التعليق
رمز التأكيد