A translation client recently called me with a specific request. They needed to respond to emails from around the world in English. They were using online machine translation from French to come up with draft messages that they then revised. But the English of some of their staff was not much better than machine quality. They wanted me to do a final revision of the emails and I agreed. But as a service to the client, I also sent them a link to an article entitled « Writing for machine translation« .
The article in question makes the point that « There is little difference between writing a document for MT and writing a document for human translation. If a human cannot understand what you are trying to say, a computer will fail miserably. » The subtitles in this article give a good summary of tips on how to write for more effective machine translation:
- Keep the structure of your sentence clear, simple and direct;
- If possible, avoid colloquialisms, idiomatic expressions and slang;
- Spell correctly;
- Avoid ambiguity and vague references;
- Use standard, formal language;
- Use the definite article even when you don’t want to.
These may seem like fairly innocent recommendations, but I noticed that this approach can arouse the ire of some translators. For example, a post appeared in a translators’ forum ridiculing the Canadian Government for issuing guidelines to simplify writing in order to improve machine translated copy. The guidelines apparently suggested among others to use shorter sentences. The writer of the post warned that the machine translations would only be superficially revised and to expect some great gaffes.
A good example of how overconfidence in machine translation can actually lead to embarrassing gaffes is mentioned in this article: Google ’embarrassingly’ mistranslates Malaysian gov’t website. Among the translation gems on the Defense Ministry website, we have « clothes that poke eye » for « revealing clothes » in a dress code. Or this piece of Malaysian history: “After the withdrawal of British army, the Malaysian Government take drastic measures to increase the level of any national security threat.” The Malaysian Government is now using human translators.
Simplifying the writing style is not going to eliminate problems in machine translation, only reduce the number of errors. Unlike humans, present-day machines cannot use their understanding of context to resolve ambiguities. But let’s not throw out the baby with the bathwater and reject guidelines for simplifying our writing style.
The idea of reducing the length of some sentences to produce better machine translation fits very well with a longstanding campaign for Plain English. It should be seen as a positive move in favour of writing clarity rather than an insult to writers and translators.
Plain English is a movement promoting clear writing, particularly in the sphere of communication by the government with its citizens. In the article How to Write in Plain English, the very first admonition is to write short sentences. But it’s not about « patronising or oversimplified » writing. « Most experts, » the article states « would agree that clear writing should have an average sentence length of 15 to 20 words. This does not mean making every sentence the same length. Be punchy. Vary your writing by mixing short sentences (like the last one) with longer ones (like this one). Follow the basic principle of sticking to one main idea in a sentence, plus perhaps one other related point. You should soon be able to keep to the average sentence length − used by top journalists and authors − quite easily. » There will likely be less ambiguity in a shorter sentence, and thus better machine translation results.
Want to know how long your sentences are? You can use an online tool based on the Plain English philosophy: Drivel Defence for Text. Just paste in your text and it will count the number of words in sentences and also suggest simpler alternatives for some overused longer words. I tried it on this text. I found the alternative word suggestions not very useful, but identifying long sentences was interesting. On my first run my average sentence length was 22.5 words. After some editing, the length is down to 17.87 words, and in the editing process, I feel that clarity improved.