Next: 4. Applications of Machine Up: Machine Translation in Practice Previous: 2. Common Misunderstandings about Contents

Subsections

3. Machine Translation Roentgenized

3.1 Strengths and Weaknesses of Computers in General

To get a first idea of possible applications of computers in translation, one should take a look at the strenghts and weaknesses of computers in general. We might be able to conclude what will be easy and what will be difficult to achieve using computers.

Computers are usually a synonym for very fast calculations. Today's computers easily handle several hundreds of millions operations per second (like adding two numbers). Further, computers have a very high bandwidth, that is they are able to handle huge amounts of incoming and outgoing data in short times (e.g. around 70 Megabytes per second - around 18,000 sheets of typed paper)⁵.

Although these numbers might look impressive, computers have their shortcomings too. At first, computers lack creativity. There are examples for "computer art" but these are usually based on mathematics and could be reproduced by humans as well (though probably requiring an immense amount of time). Secondly, up to now computers never understand the data they are processing. There has always been a human programming the computer and the machine is always limited to what it has been programmed to. Computers sometimes look intelligent but in fact they absolutely dumb. There are huge problems to make computers perform tasks which are considered very simple by humans.

From those properties we can conclude that computers are good at tasks that

are highly repetitive
are "stupid"
do not need any creativity
involve huge amounts of calculations (also repetitive).

3.2 Well-known Problems of MT

There are several well-known problems of machine translation which should not be kept secret here. They are very fundamental and pose difficulties for human translators as well (see also [Riedel, Schwarze 2000]).

Syntactical ambiguities:

The structure of a sentence often depends on semantics, not only on the type of words.

Example: Flying planes can be dangerous. ->Is the correct grouping (Flying planes) or (Flying) (planes)?

Polysemy:

Polysems are words which have several similar meanings. They are difficult to translate since an appropiate word in the target language has to be found.

Homonymy:

Homonyms are several independent words which "share" the same linguistic corpus. They are difficult to translate since their meaning often depends on the context.

Referential Ambiguity:

Pronouns refer to certain words but it is often not clear to which. References might cross sentence boundaries and heavily depend on context and semantics. As a consequence, gender might change from language to language and has to be adjusted accordingly.

Metaphors and symbols:

Both metaphors and symbols depend on the underlying culture and sometimes cannot be translated. There might exist equivalent expressions in the target language which can be located using idiomatic dictionaries.

Synonyms:

There are often several words with almost the same meaning which makes it very difficult to choose the right translation since it depends on context, style and semantics. Differences are often very subtle.

Fuzzy Hedges:

Vague words, terms and expressions like in a sense and irgendwie are called Fuzzy Hedges. Such expressions are language dependent and difficult to translate.

New developments:

As society and technology progress, new words, terms and expressions are introduced. Words might be used in new contexts, new slangs might appear or marketing equips simple phrases with complete new meanings.

Example: The phrase Ich bin drin got associated with being online since AOL run advertisements with Boris Becker in Germany.

3.3 What Computers Probably Will Never Be Able to Translate

Communication of meaning is only one among many functions of language. Language is a social phenomenon. Computers rarely "know" about society and they will therefore have problems with translating utterances for:

demonstrating one's class to the person one is speaking or writing to;
simply venting one's emotions, with no real communication intended;
establishing non-hostile intent with strangers, or simply passing time with them;
telling jokes;
engaging in non-communication by intentional or accidental ambiguity, sometimes also called 'telling lies';
two or more of the above (including communication) at once.

[Gross 1992, p. 110]

3.4 More Linguistic Problems

There are even more problems which sometimes sound simple but are extremely complicated to solve in MT. They are all connected to context: It "is virtually impossible to separate the formulation of even the simplest sentence in any language from the audience to whom it is addressed" [Gross 1992, p. 110]. At the first glance, this is surprising, at the second it becomes obvious.

Further, there are translation tasks which are very complicated for human translators, e.g. stage plays, lyrics, advertising, titles of books, newspaper headlines, poetry etc. "A joke in language A must also become a joke in language B, even if it isn't" [Gross 1992, p. 111].

Translation is almost always associated with understanding the text to translate. Computers do not understand. They simply process.

Apart from MT problems there are real-world problems not yet solved. There are some cross-cultural issues which can not or only hardly be translated. Chinese medicine for example has several branches completely unknown to Western people. The branches themselfes are perfectly logical and consistent in their own terms and have their own explanations and methods for observations, measurements and diagnosis. But the specialized vocabulary is very hard to explain to non-specialists, not to talk about translating it into a foreign language. (Imagine a fully automatic MT system. It would need to know for whom the text to translate is intended. If the text was intended for normal people, the system had to add an explanation about Chinese medicine in general. The other way around, if there was special English vocabulary for that particular branch of Chinese medicine, the MT system would have to figure out what the English text was about and then use the specialized Chinese vocabulary.)

Footnotes

... paper)⁵: Again, research in AI and Robotics has shown that the human brain handles large amounts of "data" in passing -- think of all visual, audible, tactile, smellable and tasteable impressions gathered and processed every second. A lot of different fields are being researched, each of them a basic ability of the brain: face, character and voice recognition, planning, navigation, knowledge representation, learning, etc. No spectacular breakthroughs are expected soon.

Next: 4. Applications of Machine Up: Machine Translation in Practice Previous: 2. Common Misunderstandings about Contents

Tino Schwarze, 2001