mardi 14 juin 2011

Nullius in verba

In a question session after a very interesting key-note speech on terminology at a recent conference, someone asked a question about term bases in machine translation. I made the point that machine translation has been around for a very long time, and that it's vogue today is mainly due to Google Translate, a tool that does not implement linguistic resources such as grammatical rules and term bases.
During the networking event of the same day, someone who considers himself a great expert on translation technology came up to me to tell me that I was quite wrong. A Google executive supposedly told this expert at a conference in Australia that Google Translate uses terminology.
Of course, this statement could have several interpretations:
1. you can obviously feed glossaries to Google Translate as two files one in the source language, and the other in the target language, and tell Google Translate that they're translations of each other;
2. you can compare statistical analysis of parallel texts with statistical terminology extraction programs.
What you can't do, is assume that, like some other machine translation tools, Google's purely statistical algorithm has a terminological loop that compares texts that it is translating to more complex terminological information (such as information on field, genre, grammar, use etc.) or grammatical information.
In fact, the person who contradicted me, and who apparently held this exact view, failed to provide any context. Can we assume that the reported statement was a marketing ploy, i.e. it's better to be all things to all people, if you can? I suspect so. In any case, my answer at the time was that we would just have to agree to disagree. My answer today is the motto of the Royal Society: nullius in verba, i.e. take no man's word for it.
And by the way, Google is so unconcerned by debates between the statistically-oriented and the linguistically-oriented schools of machine translation, that they don't even attend the professional conferences in this field (see "Why the Machine Translation Crowd hates Google"). Why should they? What they have realized is that their huge resources of data - from web sites to books - along with their resources in terms of computing power (thousands of servers around the world), put them in a unique position to do statistical machine translation. In other words, machine translation for them is a way to generate another revenue stream from their data and their infrastructure.
So, take no man's word for it.
Actually, to return to the subject of marketing, taking someone's word for something when that person works for a software company, is highly risky. Even within the same company. Years ago, I worked for a CAD/CAM software publisher. They set out to develop and market a state of the art design tool to try to take the lead in the industry. At a meeting where the development team presented the features of the product, they used animation to show what Designer, the new product, could do. The marketing team was impressed. So that's Designer, they said. Yes, answered the development team. The trouble is that neither of them meant the same thing: the marketing team thought that they had seen the finished product in action; the development team actually meant that it was their as yet unreached goal. This misunderstanding led the marketers to oversell an as yet non-existent product. A serious mistake that they paid a very high price for in the end.
So, really, in technical areas, take no man's word for it.

Attention to language always pays

When I was an undergraduate in university, one of the favourite subjects in the media was the "drug sub-culture". One of my friends used to refer to it as the "drab slob culture". And how drab and slovenly it all was.

Unfortunately, the drab slob culture has become the norm rather than the exception, and some of its adherents even govern us. Fighting this trend in my own little corner - and desperately clinging to a belief in the cyclical nature of society - I try to defend correct grammar and usage, and have an excellent example today of how just thinking about these things can save you a lot of money and hassle.
Excellent because it's topical! The IMF has just been victim of a major cyber attack: http://nyti.ms/l8D4AV.

According to Reuters and the BBC, a cyber security expert considers the infiltration to have been a targeted attack. The purpose of the spyware software installed on the IMF's network by this attack was apparently to give a nation state a "digital insider presence".

The New York Times adds that the attack probably took the form of "spear phishing". In this type of attack, a targeted person will receive an e-mail inviting him or her under false pretenses, to click on a link to "malware", malicious software, which is then installed on the victim's computer system.

No matter how good our spam and virus filters are, we have all received phishing e-mail. So, you know as well as I do, that these e-mails are usually written in the language of a borderline illiterate. The mistakes are frequent and enormous. Here's an example that audaciously combines English and French:
"In short, her knowledge of grace doubled, and with it her novels. I guess I'm ashamed to admit it, but I came over here
Le coeur qui crie l'amour,pour cette femme tout les jours,...
celle qui a notre coeur,et notre bonheur ...!"

Up to the end of the first clause, there is no major problem, except that one might wonder if the "grace" mentioned was actually spiritual grace. After all, how many people are particularly concerned about that today? From the behaviour of the majority of people around us, obviously not many.

After that first clause though, things really begin to deteriorate. How can a novel double? In length? If so, you'd have to say so. In sales? Then you'd have to say that the sales of her (sic) novels doubled. Naturally, the second clause of the second sentence should be, "but I went over there". If "come" was used in the vulgar sense, it is ungrammatical to complete it with expressions, like "over here". Because this adverbial points to a destination, it suggests movement towards this destination. So, we naturally interpret the verb as being used as a verb of motion, not a sexual reaction. You may not have thought so, but even the erotic demands correct grammar.

The French is as ungrammatical as the English: we need quotation marks around "l'amour" (because it's apparently what was said), spaces after the commas, "tous les jours" etc.

All this is to say that a little awareness of language goes a long way. When you see an illiterate text like this, how can you think for a second that it deserves your attention? Delete the e-mail right away.

If all the staff of the IMF had complied with this simple policy without exception, the fund would have saved itself a fortune in identifying the problem, estimating the scope of the damage and repairing it. Perhaps the New York times report will prove false, and we'll find out that a language-insensitive employee did not fall victim to the temptations of a phishing scam and so, introduce the spyware into the IMF network ... that the malware got there in some other way. But if it proves to be true, I'll be laughing all the way to the bank (a bank in a country that hasn't been bailed out by the IMF).

mercredi 8 juin 2011

Some tools never die

In a notice from a translation agency looking for a DE-EN translator today, I found the following requirement for translation of a Powerpoint presentation: Trados version 6.5 or above (excluding version 2009) & TagEditor.

Excluding SDL Trados Studio 2009? As if Studio 2009 or Kilgray's MemoQ didn't handle translation of TTX files brilliantly ... much better than TagEditor in fact. All you have to do is create the TTX in TagEditor, and in the case of MemoQ, pre-segment it in Workbench. With either tool, your output file will be a TTX, and the agency won't be able to tell which tool you used.

Let's not forget either that Trados 6.5 is no longer supported and is 8 years old: it was released in 2003. Has translation technology made no progress in 8 years?

Even if you consider using the still supported SDL Trados Suite 2007, you're still stuck in a time warp of 4 years.

How retrograde and technophobic can translation agencies get?

The style manual that should always be at hand

The Chicago Manual of Style has some excellent tips in its recent 16th edition Questions and Answers (http://bit.ly/iKD21w).

You might, for instance, find their take interesting on repetition of molecules with several variants, or repetition of any entity with numbered variants. The specific example they deal with – interleukin – comes from their questioner. There are, in fact, seventeen variants given in the Wikipedia entry, and the abbreviations conform to the following rule: IL-1, IL-2 and so on. Obviously, too many abbreviations can be confusing. What rule should the copy editor follow? The Chicago Manual team suggest the eminently practical rule of giving the full name in any sentence dealing with several variants, following this by the corresponding abbreviation, and using abbreviations only for the rest of the series. This gives, for example, "interleukin 1 (IL-1), IL-5, and IL-7".

As publications manager, you could make the decision whether to allow this rule to apply to larger units such as a paragraph or even a short document. Whatever you decide, you should publish it in your organization style guide.

Dare I correct the manual? Only in the most collaborative spirit. In this issue, one questioner asks about the recurrent headache of whether to repeat units of measurement or percentage signs in expressions like "60 to 65% of subjects ..." Or should it be "60% to 65% of subjects ..." instead? The first form is the correct one according to the Chicago team, except if the abbreviation or symbol is "closed up to the number", i.e. 25%–41%. I presume they meant, "close up to the number" as in, "adjacent with no space in between". It's reassuring that we can all slip up occasionally, isn't it?

You can subscribe to the e-mail newsletter free of charge. The e-mail provides a link to the infinitely more readable HTML page, and you can also submit a question or browse previous Q&A topics. A reasonable annual fee grants you access to the full online version of the manual. Whether you're American or not (I'm not.), I recommend it highly.

mardi 7 juin 2011

PyRoom: sometimes simple solutions are best

Writing requires concentration, and while you may not need to take as radical a step as the young writers at the Iowa Writer's Workshop described in Tradition trumps Twitter at Iowa Writers' Workshop, you need to find situations and tools conducive to concentrating on what you're writing.

Georges Simenon, the creator of Inspector Maigret, used to shut himself away with a good stock of sharpened pencils and paper so that he didn't even run the risk of interrupting the flow of his writing to sharpen a pencil. PyRoom (http://bit.ly/jv74EQ) can be your digital equivalent of Simenon's technique. Because it's a fullscreen editor without buttons, widgets, formatting options, menus and with only the minimum in terms of dialog windows, you can shut yourself off from your operating system's rich graphical environment and focus on writing and writing only. The screenshot below shows you just what I mean.

I've been using it myself for a year or so whenever I have any writing task that's not obviously straight-forward. This can be in any field - technical, financial, creative - and can involve as little as one or two paragraphs. Every time, PyRoom allows me to collect my thoughts and get them down on paper (so to speak), no matter which language I'm writing in.

Saving your text and leaving PyRoom require standard keyboard shortcuts: Ctrl+s and Alt + F4 respectively. When you're draft is underway, you can continue in PyRoom or switch to another application to format it.
Developed by Florian Heinle, PyRoom is only available for Linux at the moment, although a Windows version is planned. For the impatient, WriteRoom on the Mac, and DarkRoom for Windows are similar environments that are already available. Or install PyRoom on a linux virtual machine running on your host.

Give it a go: there's nothing like a blank black screen for focusing the mind.