lundi 31 octobre 2011

Change growing apace?

Today is Halloween, and to celebrate, fellow blogger, Miguel Llorens, has relaunched his blog of a year ago featuring his public answer to a rather manipulative e-mail from Didier Hélin, VP of Lionbridge (LIOX): http://bit.ly/s4klbn. Hélin's e-mail was to persuade translation suppliers used by Lionbridge to accept a 5% reduction in their rates on the basis of the economic climate. Llorens outed Hélin by a public reply through his blog. Quite appropriate considering that Hélin's piece was one of those no-reply jobs.

The anniversary re-post shows great timing: all around the world, people are participating in the Occupy * movements, and at the same time, the figures on the pay rises among the top 1% of earners are coming out showing enormous increases with no justification in their performance, while the rest of us mortals - faced with inflation - are constantly losing ground.

Perhaps the days of translators taking things lying down are coming to a close: bad publicity may be better than none at all, but it can't help a company like Lionbridge if their corporate customers begin to wonder about the motivation and loyalty of their independent suppliers.

So, dear reader, since Hélin's outing, have you seen more negotiation between equals in the translation marketplace? Let us know!

dimanche 2 octobre 2011

Word 2010 Annoyances / 1

If you've bought a new Windows laptop or desktop recently, chances are that it has a trial version of Office 2010 pre-installed. So, you may already be using this suite, in particular Word 2010, or will be using it quite soon.

What are Word 2010's inherent annoyances?

If you're a translator, there's one huge annoyance that you will run into every time you open a file that a customer has sent you: Protected View.

The Microsoft people have come to the conclusion that many users simply download large numbers of files from the Internet, have no antivirus software, and open the files without thinking that they might contain virus-ridden macros. So, they've decided that the safest bet is to consider this to be a universal use case, and only allow any file downloaded from the Internet to be opened in Protected View, i.e. read only mode.

If you are a translator and receive documents, for example newsletters, with a certain amount of boilerplate text that your customer doesn't even want you to include in your statistics at a discount rate for 100% matches or context matches, you probably already select all the text to be ignored and use character formatting to hide it so that your translation environment doesn't take it into account. This, of course, means editing the file.

You may also receive Word documents generated from PDF files (in other words, your customer didn't have the source file). These require a considerable amount of editing before you can translate them (see Kevin Lossner's post on this subject: http://bit.ly/nJmqH8).

For these two use cases, the Protected View default setting will drive you crazy ... at least, initially because it means yet another extra step: as shown below, you have to click the "Allow editing" button and wait a fraction of a second until Word allows you to do what you want.


Luckily, you can disable Protected View once and for all. Doing this isn't recommended by Microsoft, but how often do your customers send you files with viruses? It's not really in keeping with their profile, nor is the MS general user in keeping with yours either.

Here's how to do it:
  1. Click File | Options.
  2. Click Trust Center from the left sidebar

  3. Click the Trust Center Settings button in the main window.

  4. Now in Trust Center dialog box, select Protected View from the left sidebar to disable protected view for files that have been downloaded from the internet.

  5. Click OK.
Now you're set to optimize any Word file that a customer sends you without ever having to remember again to allow editing.

This is only one of the annoyances for translators in Word 2010 and The next one I'll be presenting involves the Save as feature. What others have you come across? Send me a brief description by a comment on this post, or by twitter: @dmydmy. I'll write a complete description here and if possible, provide a workaround too.


jeudi 22 septembre 2011

From one thought to another

Isn't it funny how some ideas never die? Especially when hacks (in the old and "honourable" sense of the word) have to churn out a certain number of lines of copy per day. I'm amazed to read that for some the fact that Microsoft 8 may prevent you from installing Linux-Windows dual boot setups is apparently interesting news: http://zd.net/oBlwiE.

Surely dual-booting is a 90s subject if ever there was one. After all, with reasonable hardware resources, you can easily run a Linux virtual machine on Windows or vice versa. That way, you can switch from one OS to the other without rebooting.

And might what goes for hacks not be true for translation software marketers, gurus, mavens and such as well? You've got to keep that copy flowing. So, why not resuscitate that old saw, MT, machine translation? There's been an awful lot of talk about it lately, and the subject's as old as the hills: it's been around since the early 1950s, which in terms of technology is like the Jurassic era.

To vaccinate ourselves against the seductive hype of marketers and gurus, we'd do well to remember that MT is a part of Artificial Intelligence. Now I ask you, when was the last time you heard someone talk seriously about that?

mardi 14 juin 2011

Nullius in verba

In a question session after a very interesting key-note speech on terminology at a recent conference, someone asked a question about term bases in machine translation. I made the point that machine translation has been around for a very long time, and that it's vogue today is mainly due to Google Translate, a tool that does not implement linguistic resources such as grammatical rules and term bases.
During the networking event of the same day, someone who considers himself a great expert on translation technology came up to me to tell me that I was quite wrong. A Google executive supposedly told this expert at a conference in Australia that Google Translate uses terminology.
Of course, this statement could have several interpretations:
1. you can obviously feed glossaries to Google Translate as two files one in the source language, and the other in the target language, and tell Google Translate that they're translations of each other;
2. you can compare statistical analysis of parallel texts with statistical terminology extraction programs.
What you can't do, is assume that, like some other machine translation tools, Google's purely statistical algorithm has a terminological loop that compares texts that it is translating to more complex terminological information (such as information on field, genre, grammar, use etc.) or grammatical information.
In fact, the person who contradicted me, and who apparently held this exact view, failed to provide any context. Can we assume that the reported statement was a marketing ploy, i.e. it's better to be all things to all people, if you can? I suspect so. In any case, my answer at the time was that we would just have to agree to disagree. My answer today is the motto of the Royal Society: nullius in verba, i.e. take no man's word for it.
And by the way, Google is so unconcerned by debates between the statistically-oriented and the linguistically-oriented schools of machine translation, that they don't even attend the professional conferences in this field (see "Why the Machine Translation Crowd hates Google"). Why should they? What they have realized is that their huge resources of data - from web sites to books - along with their resources in terms of computing power (thousands of servers around the world), put them in a unique position to do statistical machine translation. In other words, machine translation for them is a way to generate another revenue stream from their data and their infrastructure.
So, take no man's word for it.
Actually, to return to the subject of marketing, taking someone's word for something when that person works for a software company, is highly risky. Even within the same company. Years ago, I worked for a CAD/CAM software publisher. They set out to develop and market a state of the art design tool to try to take the lead in the industry. At a meeting where the development team presented the features of the product, they used animation to show what Designer, the new product, could do. The marketing team was impressed. So that's Designer, they said. Yes, answered the development team. The trouble is that neither of them meant the same thing: the marketing team thought that they had seen the finished product in action; the development team actually meant that it was their as yet unreached goal. This misunderstanding led the marketers to oversell an as yet non-existent product. A serious mistake that they paid a very high price for in the end.
So, really, in technical areas, take no man's word for it.

Attention to language always pays

When I was an undergraduate in university, one of the favourite subjects in the media was the "drug sub-culture". One of my friends used to refer to it as the "drab slob culture". And how drab and slovenly it all was.

Unfortunately, the drab slob culture has become the norm rather than the exception, and some of its adherents even govern us. Fighting this trend in my own little corner - and desperately clinging to a belief in the cyclical nature of society - I try to defend correct grammar and usage, and have an excellent example today of how just thinking about these things can save you a lot of money and hassle.
Excellent because it's topical! The IMF has just been victim of a major cyber attack: http://nyti.ms/l8D4AV.

According to Reuters and the BBC, a cyber security expert considers the infiltration to have been a targeted attack. The purpose of the spyware software installed on the IMF's network by this attack was apparently to give a nation state a "digital insider presence".

The New York Times adds that the attack probably took the form of "spear phishing". In this type of attack, a targeted person will receive an e-mail inviting him or her under false pretenses, to click on a link to "malware", malicious software, which is then installed on the victim's computer system.

No matter how good our spam and virus filters are, we have all received phishing e-mail. So, you know as well as I do, that these e-mails are usually written in the language of a borderline illiterate. The mistakes are frequent and enormous. Here's an example that audaciously combines English and French:
"In short, her knowledge of grace doubled, and with it her novels. I guess I'm ashamed to admit it, but I came over here
Le coeur qui crie l'amour,pour cette femme tout les jours,...
celle qui a notre coeur,et notre bonheur ...!"

Up to the end of the first clause, there is no major problem, except that one might wonder if the "grace" mentioned was actually spiritual grace. After all, how many people are particularly concerned about that today? From the behaviour of the majority of people around us, obviously not many.

After that first clause though, things really begin to deteriorate. How can a novel double? In length? If so, you'd have to say so. In sales? Then you'd have to say that the sales of her (sic) novels doubled. Naturally, the second clause of the second sentence should be, "but I went over there". If "come" was used in the vulgar sense, it is ungrammatical to complete it with expressions, like "over here". Because this adverbial points to a destination, it suggests movement towards this destination. So, we naturally interpret the verb as being used as a verb of motion, not a sexual reaction. You may not have thought so, but even the erotic demands correct grammar.

The French is as ungrammatical as the English: we need quotation marks around "l'amour" (because it's apparently what was said), spaces after the commas, "tous les jours" etc.

All this is to say that a little awareness of language goes a long way. When you see an illiterate text like this, how can you think for a second that it deserves your attention? Delete the e-mail right away.

If all the staff of the IMF had complied with this simple policy without exception, the fund would have saved itself a fortune in identifying the problem, estimating the scope of the damage and repairing it. Perhaps the New York times report will prove false, and we'll find out that a language-insensitive employee did not fall victim to the temptations of a phishing scam and so, introduce the spyware into the IMF network ... that the malware got there in some other way. But if it proves to be true, I'll be laughing all the way to the bank (a bank in a country that hasn't been bailed out by the IMF).

mercredi 8 juin 2011

Some tools never die

In a notice from a translation agency looking for a DE-EN translator today, I found the following requirement for translation of a Powerpoint presentation: Trados version 6.5 or above (excluding version 2009) & TagEditor.

Excluding SDL Trados Studio 2009? As if Studio 2009 or Kilgray's MemoQ didn't handle translation of TTX files brilliantly ... much better than TagEditor in fact. All you have to do is create the TTX in TagEditor, and in the case of MemoQ, pre-segment it in Workbench. With either tool, your output file will be a TTX, and the agency won't be able to tell which tool you used.

Let's not forget either that Trados 6.5 is no longer supported and is 8 years old: it was released in 2003. Has translation technology made no progress in 8 years?

Even if you consider using the still supported SDL Trados Suite 2007, you're still stuck in a time warp of 4 years.

How retrograde and technophobic can translation agencies get?

The style manual that should always be at hand

The Chicago Manual of Style has some excellent tips in its recent 16th edition Questions and Answers (http://bit.ly/iKD21w).

You might, for instance, find their take interesting on repetition of molecules with several variants, or repetition of any entity with numbered variants. The specific example they deal with – interleukin – comes from their questioner. There are, in fact, seventeen variants given in the Wikipedia entry, and the abbreviations conform to the following rule: IL-1, IL-2 and so on. Obviously, too many abbreviations can be confusing. What rule should the copy editor follow? The Chicago Manual team suggest the eminently practical rule of giving the full name in any sentence dealing with several variants, following this by the corresponding abbreviation, and using abbreviations only for the rest of the series. This gives, for example, "interleukin 1 (IL-1), IL-5, and IL-7".

As publications manager, you could make the decision whether to allow this rule to apply to larger units such as a paragraph or even a short document. Whatever you decide, you should publish it in your organization style guide.

Dare I correct the manual? Only in the most collaborative spirit. In this issue, one questioner asks about the recurrent headache of whether to repeat units of measurement or percentage signs in expressions like "60 to 65% of subjects ..." Or should it be "60% to 65% of subjects ..." instead? The first form is the correct one according to the Chicago team, except if the abbreviation or symbol is "closed up to the number", i.e. 25%–41%. I presume they meant, "close up to the number" as in, "adjacent with no space in between". It's reassuring that we can all slip up occasionally, isn't it?

You can subscribe to the e-mail newsletter free of charge. The e-mail provides a link to the infinitely more readable HTML page, and you can also submit a question or browse previous Q&A topics. A reasonable annual fee grants you access to the full online version of the manual. Whether you're American or not (I'm not.), I recommend it highly.