Wednesday, August 09, 2006

Babelfish

http://www.economist.com/science/tq/displayStory.cfm?story_id=7001819

"Translation software: The science-fiction dream of a machine that understands any language is getting slowly closer.

IT IS arguably the most useful gadget in the space-farer's toolkit. In “The Hitchhiker's Guide to the Galaxy”, Douglas Adams depicted it as a “small, yellow and leech-like” fish, called a Babel fish, that you stick in your ear. In “Star Trek”, meanwhile, it is known simply as the Universal Language Translator. But whatever you call it, there is no doubting the practical value of a device that is capable of translating any language into another.
Remarkably, however, such devices are now on the verge of becoming a reality, thanks to new “statistical machine translation” software. Unlike previous approaches to machine translation, which relied upon rules identified by linguists which then had to be tediously hand-coded into software, this new method requires absolutely no linguistic knowledge or expert understanding of a language in order to translate it. And last month researchers at Carnegie Mellon University (CMU) in Pittsburgh began work on a machine that they hope will be able to learn a new language simply by getting foreign speakers to talk into it and perhaps, eventually, by watching television.
Within the next few years there will be an explosion in translation technologies, says Alex Waibel, director of the International Centre for Advanced Communication Technology, which is based jointly at the University of Karlsruhe in Germany and at CMU. He predicts there will be real-time automatic dubbing, which will let people watch foreign films or television programmes in their native languages, and search engines that will enable users to trawl through multilingual archives of documents, videos and audio files. And, eventually, there may even be electronic devices that work like Babel fish, whispering translations in your ear as someone speaks to you in a foreign tongue.
This may sound fanciful, but already a system has been developed that can translate speeches or lectures from one language into another, in real time and regardless of the subject matter. The system required no programming of grammatical rules or syntax. Instead it was given a vast number of speeches, and their accurate translations (performed by humans) into a second language, for statistical analysis. One of the reasons it works so well is that these speeches came from the United Nations and the European Parliament, where a broad range of topics are discussed. “The linguistic knowledge is automatically extracted from these huge data resources,” says Dr Waibel.

“Most of the time, the languages that translation researchers deal with in their laboratories are so unfamiliar that they may as well be alien.”

Statistical translation encompasses a range of techniques, but what they all have in common is the use of statistical analysis, rather than rigid rules, to convert text from one language into another. Most systems start with a large bilingual corpus of text. By analysing the frequency with which clusters of words appear in close proximity in the two languages, it is possible to work out which words correspond to each other in the two languages. This approach offers much greater flexibility than rule-based systems, since it translates languages based on how they are actually used, rather than relying on rigid grammatical rules which may not always be observed, and often have exceptions.
Examples abound of the ridiculous results produced by rule-based systems, which are unable to cope in the face of similes, ambiguities or bad grammar. In one example, a sentence written in Arabic meaning “The White House confirmed the existence of a new bin Laden tape” was translated using a standard rule-based translator and became “Alpine white new presence tape registered for coffee confirms Laden.” So it is hardly surprising that researchers in the field have migrated towards statistical translation in the past few years, says Dr Waibel.

Now you're speaking my language

The statistical approach, which starts off without any linguistic knowledge of a language, might seem a strange way of doing things, but it is actually remarkably similar to the way humans attempt to translate languages, says Shou-de Lin, a machine-translation expert who was until recently a researcher at the University of Southern California's Information Sciences Institute (ISI). “It looks at the script and bunches symbols together,” he explains, much as a human mind might try to solve the problem. But in order for this approach to work, the voracious translation systems must be fed with huge numbers of training texts. This prompted Franz Och, Google's machine-translation expert, to boast recently that the search-engine giant would probably have a key role in the future of machine translation, since it has such a huge repository of text.....""

0 Comments:

Post a Comment

<< Home

robotsplace.com