lundi 31 octobre 2011

Change growing apace?

Today is Halloween, and to celebrate, fellow blogger, Miguel Llorens, has relaunched his blog of a year ago featuring his public answer to a rather manipulative e-mail from Didier Hélin, VP of Lionbridge (LIOX): http://bit.ly/s4klbn. Hélin's e-mail was to persuade translation suppliers used by Lionbridge to accept a 5% reduction in their rates on the basis of the economic climate. Llorens outed Hélin by a public reply through his blog. Quite appropriate considering that Hélin's piece was one of those no-reply jobs.

The anniversary re-post shows great timing: all around the world, people are participating in the Occupy * movements, and at the same time, the figures on the pay rises among the top 1% of earners are coming out showing enormous increases with no justification in their performance, while the rest of us mortals - faced with inflation - are constantly losing ground.

Perhaps the days of translators taking things lying down are coming to a close: bad publicity may be better than none at all, but it can't help a company like Lionbridge if their corporate customers begin to wonder about the motivation and loyalty of their independent suppliers.

So, dear reader, since Hélin's outing, have you seen more negotiation between equals in the translation marketplace? Let us know!

dimanche 2 octobre 2011

Word 2010 Annoyances / 1

If you've bought a new Windows laptop or desktop recently, chances are that it has a trial version of Office 2010 pre-installed. So, you may already be using this suite, in particular Word 2010, or will be using it quite soon.

What are Word 2010's inherent annoyances?

If you're a translator, there's one huge annoyance that you will run into every time you open a file that a customer has sent you: Protected View.

The Microsoft people have come to the conclusion that many users simply download large numbers of files from the Internet, have no antivirus software, and open the files without thinking that they might contain virus-ridden macros. So, they've decided that the safest bet is to consider this to be a universal use case, and only allow any file downloaded from the Internet to be opened in Protected View, i.e. read only mode.

If you are a translator and receive documents, for example newsletters, with a certain amount of boilerplate text that your customer doesn't even want you to include in your statistics at a discount rate for 100% matches or context matches, you probably already select all the text to be ignored and use character formatting to hide it so that your translation environment doesn't take it into account. This, of course, means editing the file.

You may also receive Word documents generated from PDF files (in other words, your customer didn't have the source file). These require a considerable amount of editing before you can translate them (see Kevin Lossner's post on this subject: http://bit.ly/nJmqH8).

For these two use cases, the Protected View default setting will drive you crazy ... at least, initially because it means yet another extra step: as shown below, you have to click the "Allow editing" button and wait a fraction of a second until Word allows you to do what you want.


Luckily, you can disable Protected View once and for all. Doing this isn't recommended by Microsoft, but how often do your customers send you files with viruses? It's not really in keeping with their profile, nor is the MS general user in keeping with yours either.

Here's how to do it:
  1. Click File | Options.
  2. Click Trust Center from the left sidebar

  3. Click the Trust Center Settings button in the main window.

  4. Now in Trust Center dialog box, select Protected View from the left sidebar to disable protected view for files that have been downloaded from the internet.

  5. Click OK.
Now you're set to optimize any Word file that a customer sends you without ever having to remember again to allow editing.

This is only one of the annoyances for translators in Word 2010 and The next one I'll be presenting involves the Save as feature. What others have you come across? Send me a brief description by a comment on this post, or by twitter: @dmydmy. I'll write a complete description here and if possible, provide a workaround too.


jeudi 22 septembre 2011

From one thought to another

Isn't it funny how some ideas never die? Especially when hacks (in the old and "honourable" sense of the word) have to churn out a certain number of lines of copy per day. I'm amazed to read that for some the fact that Microsoft 8 may prevent you from installing Linux-Windows dual boot setups is apparently interesting news: http://zd.net/oBlwiE.

Surely dual-booting is a 90s subject if ever there was one. After all, with reasonable hardware resources, you can easily run a Linux virtual machine on Windows or vice versa. That way, you can switch from one OS to the other without rebooting.

And might what goes for hacks not be true for translation software marketers, gurus, mavens and such as well? You've got to keep that copy flowing. So, why not resuscitate that old saw, MT, machine translation? There's been an awful lot of talk about it lately, and the subject's as old as the hills: it's been around since the early 1950s, which in terms of technology is like the Jurassic era.

To vaccinate ourselves against the seductive hype of marketers and gurus, we'd do well to remember that MT is a part of Artificial Intelligence. Now I ask you, when was the last time you heard someone talk seriously about that?

mardi 14 juin 2011

Nullius in verba

In a question session after a very interesting key-note speech on terminology at a recent conference, someone asked a question about term bases in machine translation. I made the point that machine translation has been around for a very long time, and that it's vogue today is mainly due to Google Translate, a tool that does not implement linguistic resources such as grammatical rules and term bases.
During the networking event of the same day, someone who considers himself a great expert on translation technology came up to me to tell me that I was quite wrong. A Google executive supposedly told this expert at a conference in Australia that Google Translate uses terminology.
Of course, this statement could have several interpretations:
1. you can obviously feed glossaries to Google Translate as two files one in the source language, and the other in the target language, and tell Google Translate that they're translations of each other;
2. you can compare statistical analysis of parallel texts with statistical terminology extraction programs.
What you can't do, is assume that, like some other machine translation tools, Google's purely statistical algorithm has a terminological loop that compares texts that it is translating to more complex terminological information (such as information on field, genre, grammar, use etc.) or grammatical information.
In fact, the person who contradicted me, and who apparently held this exact view, failed to provide any context. Can we assume that the reported statement was a marketing ploy, i.e. it's better to be all things to all people, if you can? I suspect so. In any case, my answer at the time was that we would just have to agree to disagree. My answer today is the motto of the Royal Society: nullius in verba, i.e. take no man's word for it.
And by the way, Google is so unconcerned by debates between the statistically-oriented and the linguistically-oriented schools of machine translation, that they don't even attend the professional conferences in this field (see "Why the Machine Translation Crowd hates Google"). Why should they? What they have realized is that their huge resources of data - from web sites to books - along with their resources in terms of computing power (thousands of servers around the world), put them in a unique position to do statistical machine translation. In other words, machine translation for them is a way to generate another revenue stream from their data and their infrastructure.
So, take no man's word for it.
Actually, to return to the subject of marketing, taking someone's word for something when that person works for a software company, is highly risky. Even within the same company. Years ago, I worked for a CAD/CAM software publisher. They set out to develop and market a state of the art design tool to try to take the lead in the industry. At a meeting where the development team presented the features of the product, they used animation to show what Designer, the new product, could do. The marketing team was impressed. So that's Designer, they said. Yes, answered the development team. The trouble is that neither of them meant the same thing: the marketing team thought that they had seen the finished product in action; the development team actually meant that it was their as yet unreached goal. This misunderstanding led the marketers to oversell an as yet non-existent product. A serious mistake that they paid a very high price for in the end.
So, really, in technical areas, take no man's word for it.

Attention to language always pays

When I was an undergraduate in university, one of the favourite subjects in the media was the "drug sub-culture". One of my friends used to refer to it as the "drab slob culture". And how drab and slovenly it all was.

Unfortunately, the drab slob culture has become the norm rather than the exception, and some of its adherents even govern us. Fighting this trend in my own little corner - and desperately clinging to a belief in the cyclical nature of society - I try to defend correct grammar and usage, and have an excellent example today of how just thinking about these things can save you a lot of money and hassle.
Excellent because it's topical! The IMF has just been victim of a major cyber attack: http://nyti.ms/l8D4AV.

According to Reuters and the BBC, a cyber security expert considers the infiltration to have been a targeted attack. The purpose of the spyware software installed on the IMF's network by this attack was apparently to give a nation state a "digital insider presence".

The New York Times adds that the attack probably took the form of "spear phishing". In this type of attack, a targeted person will receive an e-mail inviting him or her under false pretenses, to click on a link to "malware", malicious software, which is then installed on the victim's computer system.

No matter how good our spam and virus filters are, we have all received phishing e-mail. So, you know as well as I do, that these e-mails are usually written in the language of a borderline illiterate. The mistakes are frequent and enormous. Here's an example that audaciously combines English and French:
"In short, her knowledge of grace doubled, and with it her novels. I guess I'm ashamed to admit it, but I came over here
Le coeur qui crie l'amour,pour cette femme tout les jours,...
celle qui a notre coeur,et notre bonheur ...!"

Up to the end of the first clause, there is no major problem, except that one might wonder if the "grace" mentioned was actually spiritual grace. After all, how many people are particularly concerned about that today? From the behaviour of the majority of people around us, obviously not many.

After that first clause though, things really begin to deteriorate. How can a novel double? In length? If so, you'd have to say so. In sales? Then you'd have to say that the sales of her (sic) novels doubled. Naturally, the second clause of the second sentence should be, "but I went over there". If "come" was used in the vulgar sense, it is ungrammatical to complete it with expressions, like "over here". Because this adverbial points to a destination, it suggests movement towards this destination. So, we naturally interpret the verb as being used as a verb of motion, not a sexual reaction. You may not have thought so, but even the erotic demands correct grammar.

The French is as ungrammatical as the English: we need quotation marks around "l'amour" (because it's apparently what was said), spaces after the commas, "tous les jours" etc.

All this is to say that a little awareness of language goes a long way. When you see an illiterate text like this, how can you think for a second that it deserves your attention? Delete the e-mail right away.

If all the staff of the IMF had complied with this simple policy without exception, the fund would have saved itself a fortune in identifying the problem, estimating the scope of the damage and repairing it. Perhaps the New York times report will prove false, and we'll find out that a language-insensitive employee did not fall victim to the temptations of a phishing scam and so, introduce the spyware into the IMF network ... that the malware got there in some other way. But if it proves to be true, I'll be laughing all the way to the bank (a bank in a country that hasn't been bailed out by the IMF).

mercredi 8 juin 2011

Some tools never die

In a notice from a translation agency looking for a DE-EN translator today, I found the following requirement for translation of a Powerpoint presentation: Trados version 6.5 or above (excluding version 2009) & TagEditor.

Excluding SDL Trados Studio 2009? As if Studio 2009 or Kilgray's MemoQ didn't handle translation of TTX files brilliantly ... much better than TagEditor in fact. All you have to do is create the TTX in TagEditor, and in the case of MemoQ, pre-segment it in Workbench. With either tool, your output file will be a TTX, and the agency won't be able to tell which tool you used.

Let's not forget either that Trados 6.5 is no longer supported and is 8 years old: it was released in 2003. Has translation technology made no progress in 8 years?

Even if you consider using the still supported SDL Trados Suite 2007, you're still stuck in a time warp of 4 years.

How retrograde and technophobic can translation agencies get?

The style manual that should always be at hand

The Chicago Manual of Style has some excellent tips in its recent 16th edition Questions and Answers (http://bit.ly/iKD21w).

You might, for instance, find their take interesting on repetition of molecules with several variants, or repetition of any entity with numbered variants. The specific example they deal with – interleukin – comes from their questioner. There are, in fact, seventeen variants given in the Wikipedia entry, and the abbreviations conform to the following rule: IL-1, IL-2 and so on. Obviously, too many abbreviations can be confusing. What rule should the copy editor follow? The Chicago Manual team suggest the eminently practical rule of giving the full name in any sentence dealing with several variants, following this by the corresponding abbreviation, and using abbreviations only for the rest of the series. This gives, for example, "interleukin 1 (IL-1), IL-5, and IL-7".

As publications manager, you could make the decision whether to allow this rule to apply to larger units such as a paragraph or even a short document. Whatever you decide, you should publish it in your organization style guide.

Dare I correct the manual? Only in the most collaborative spirit. In this issue, one questioner asks about the recurrent headache of whether to repeat units of measurement or percentage signs in expressions like "60 to 65% of subjects ..." Or should it be "60% to 65% of subjects ..." instead? The first form is the correct one according to the Chicago team, except if the abbreviation or symbol is "closed up to the number", i.e. 25%–41%. I presume they meant, "close up to the number" as in, "adjacent with no space in between". It's reassuring that we can all slip up occasionally, isn't it?

You can subscribe to the e-mail newsletter free of charge. The e-mail provides a link to the infinitely more readable HTML page, and you can also submit a question or browse previous Q&A topics. A reasonable annual fee grants you access to the full online version of the manual. Whether you're American or not (I'm not.), I recommend it highly.

mardi 7 juin 2011

PyRoom: sometimes simple solutions are best

Writing requires concentration, and while you may not need to take as radical a step as the young writers at the Iowa Writer's Workshop described in Tradition trumps Twitter at Iowa Writers' Workshop, you need to find situations and tools conducive to concentrating on what you're writing.

Georges Simenon, the creator of Inspector Maigret, used to shut himself away with a good stock of sharpened pencils and paper so that he didn't even run the risk of interrupting the flow of his writing to sharpen a pencil. PyRoom (http://bit.ly/jv74EQ) can be your digital equivalent of Simenon's technique. Because it's a fullscreen editor without buttons, widgets, formatting options, menus and with only the minimum in terms of dialog windows, you can shut yourself off from your operating system's rich graphical environment and focus on writing and writing only. The screenshot below shows you just what I mean.

I've been using it myself for a year or so whenever I have any writing task that's not obviously straight-forward. This can be in any field - technical, financial, creative - and can involve as little as one or two paragraphs. Every time, PyRoom allows me to collect my thoughts and get them down on paper (so to speak), no matter which language I'm writing in.

Saving your text and leaving PyRoom require standard keyboard shortcuts: Ctrl+s and Alt + F4 respectively. When you're draft is underway, you can continue in PyRoom or switch to another application to format it.
Developed by Florian Heinle, PyRoom is only available for Linux at the moment, although a Windows version is planned. For the impatient, WriteRoom on the Mac, and DarkRoom for Windows are similar environments that are already available. Or install PyRoom on a linux virtual machine running on your host.

Give it a go: there's nothing like a blank black screen for focusing the mind.

vendredi 27 mai 2011

The Utility of Tweeting

I never thought I'd be writing under this title, but tweeting definitely has its uses. I was recently at a conference in Budapest where several members of the audience were tweeting from their cell phones. At the same time, the conference organizers were showing the tweet thread on the screen behind the panel. Great idea. Unfortunately, the tweeters mainly limited themselves to reporting to the outside world some key points being made by the panel.
This is where I got frustrated. I had several substantive comments to make on various things that the panel were saying. For example, this was a discussion on the subject, "Has Translation Technology a Future?" and one of the panelists asked herself whether technical writers had tool standards and how they used them, only to have another tell her he was certain they did. It's frustrating to have the parallel activity of technical writer and to know that aside from ISO 9000, they don't, and that standard is too sketchy to be of any use. As for authoring itself, the best they have is an architecture, DITA, and approaches like info-mapping. That would have made a perfect tweet, and would have livened up the debate.
Maybe next time I'll be equipped.

jeudi 26 mai 2011

SDL LiveContent 2011 Death of Technical Documentation as we know it

This catchy SDL marketing release title (see http://is.gd/qc8yhG) has cost me a certain amount of time.

First of all, reading the release whetted my appetite for more information. So, I followed the SDL link to read the white paper and a case study. Frankly at the end, except for one brief phrase hidden in a mass of verbiage, I wasn't very much wiser. Was LiveContent a tool, a framework, a platform, software as a service or what? In terms of the technical documentation that you would write and publish using it, was it single sourcing, a structured authoring environment, minimalism, a content value chain or what? Is it web-based, client-server, a content management system, component content management or what?

I quickly learned that SDL considers it to be nothing less than a paradigm shift. I wonder how many products have made that claim in the last twenty years? The main white paper was perhaps the most informative of the three documents. However, in SDL's depiction of the supposed paradigm shift, publishing technical information before the shift would have only taken place in siloes, each impervious to the other. These siloes are presented very convincingly as if this view of things is the only one that could possibly be true. Trouble is, it isn't the only one. It's an exaggeration of just one tendency out of many, and one that I doubt ever existed in such an extreme form. We learn, for example, that Product Marketing, Product managers, technical writers and support personnel wrote respectively, specifications, design information, product documentation and support information, all for varying publics.

This does not correspond to my own experience at all. Technical documentation can come under the remit of the CTO or of Product Marketing, or can be split up under the aegis of the various product managers. Support identifies documentation errors and gaps, and communicates them to the technical writing team, which corrects them or extends the documentation. Nothing like setting up straw dogs to prove your point, is there?

Obviously, publishing technical information has been gradually moving in the direction sketched out in the white paper for years. So, although there are certainly trends in technical documentation, the existence of a paradigm shift is far from obvious. Certainly, any shift, whether a radical one like a "paradigm shift" or a gradual one, relies on far more than - as suggested in the SDL marketing material - tools.

Tools alone, without methods and procedures, mean nothing. They allow us to implement our methods and procedures, but nothing more. And trying to use them without solid methods and procedures that are reviewed on a regular basis can lead to catastrophe.

Given the industry trend and the time scale that it's taking place in, it will come as no surprise that SDL LiveContent is only one solution out of many. There are also Bluestream, Componize for Alfresco, DITA Exchange built on top of Sharepoint, Ixiasoft's TEXTML server and DITA CMS? Interestingly, each solution takes a different approach. This could well mean that one of them may offer tools similar to the ones that you already use, thereby reducing the learning curve.

In this context, what is LiveContent's USP? Identifying and explaining the USP from a technical standpoint should be the goal of any white paper, and the case studies provided along with it should support this
message clearly. Not to do so and, at the same time, to make radical unsupported claims is to oversell. A very risky tactic indeed. Since SDL acquired Tridion a few years ago, they have had the potential to be a leader in this area. If they have indeed realized this potential, let them communicate their leadership to us clearly.

mercredi 11 mai 2011

Nullius in verba

In a question session after a very interesting key-note speech on terminology at a recent conference, someone asked a question about term bases in machine translation. I made the point that machine translation has been around for a very long time, and that its vogue today is mainly due to Google Translate, a tool that does not implement linguistic resources such as grammatical rules and term bases. During the networking event of the same day, someone who considers himself a great expert on translation technology came up to me to tell me that I was quite wrong. A Google executive supposedly told this expert at a conference in Australia that Google Translate uses terminology. Of course, this statement could have several interpretations:
1. you can obviously feed glossaries to Google Translate as two files one in the source language, and the other in the target language, and tell Google Translate that they're translations of each other;
2. you can compare statistical analysis of parallel texts with statistical terminology extraction programs.
What you can't do, is assume that, like some other machine translation tools, Google's purely statistical algorithm has a terminological loop that compares texts that it is translating to more complex terminological information (such as information on field, genre, grammar, use etc.) or grammatical information.

 In fact, the person who contradicted me, and who apparently held this exact view, failed to provide any context. Can we assume that the reported statement was a marketing ploy, i.e. it's better to be all things to all people, if you can? I suspect so. In any case, my answer at the time was that we would just have to agree to disagree. My answer today is the motto of the Royal Society: nullius in verba, i.e. take no man's word for it. And by the way, Google is so unconcerned by debates between the statistically-oriented and the linguistically-oriented schools of machine translation, that they don't even attend the professional conferences in this field (see "Why the Machine Translation Crowd hates Google"). Why should they? What they have realized is that their huge resources of data - from web sites to books - along with their resources in terms of computing power (thousands of servers around the world), put them in a unique position to do statistical machine translation. In other words, machine translation for them is a way to generate another revenue stream from their data and their infrastructure. So, take no man's word for it.

Actually, to return to the subject of marketing, taking someone's word for something when that person works for a software company, is highly risky. Even within the same company. Years ago, I worked for a CAD/CAM software publisher. They set out to develop and market a state of the art design tool to try to take the lead in the industry. At a meeting where the development team presented the features of the product, they used animation to show what Designer, the new product, could do. The marketing team was impressed. So that's Designer, they said. Yes, answered the development team. The trouble is that neither of them meant the same thing: the marketing team thought that they had seen the finished product in action; the development team actually meant that it was their as yet unreached goal. This misunderstanding led the marketers to oversell an as yet non-existent product. A serious mistake that they paid a very high price for in the end. So, really, in technical areas, take no man's word for it.

mercredi 20 avril 2011

Thoughts on the future of translation technology

A recent panel discussion took place at MemoQFest on the subject "Has Translation Technology a Future".
What struck me about the panel's remarks was that:
1. they were primarily about tools; even the discussions of standards such as TMX and XLIFF were tool-oriented;
2. no one dealt with the importance of differing statistical reporting from tool to tool for translation data provided in the same standard format.
Here are my thoughts on these two issues:
1. The tool-oriented discussion. Even though the title of the discussion leads us first of all to tools, it is, I think, an error to consider tools outside of the Methods-Tools-Procedures relationship. The future of the technology we use depends on how we use the tools according to methods and procedures, not just on the tools themselves and the functionality they provide. And as far as standards are concerned, what about the method and procedure-oriented ones that exist for our industry such as EN 15038? How will the tools evolve along with these in the future? In particular, how are we going to use the tools in our costing methods and procedures. This brings us to my next point.
2. Statistics. The commercial model of our industry has been entirely based on word counts - and specifically, segment matches percentages -  for 20 years now. It is a paradigm. The obvious fact is that a statistical algorithm is necessary to produce these stats from aligned source and target language files, and discounts for fuzzy and context matches and such are only possible if the persistent data from the iterations of whatever algorithm is being used are stored in a format that allows us to use them again. So, we not only need either open or standard exchange formats, we need to know what criteria the various algorithms use (because they don't produce comparable results).
However, the real problem with rates that are based purely on segment matching and word count statistics is that they don't necessarily represent the cost of a translation. Statistical data are only one aspect of actual translation cost. This cost is better represented by the time taken for each stage in the translation process. This is because other factors can double or triple the cost. These factors include: subject field, quality of writing (is it, for example, clear?), the availability of relevant terminological data, document sensitivity (is the doc. internal or external, marketing or reporting) etc. Translation cannot be reduced to Adam Smith's pin factory. Nor can it be reduced to human resource management as last year's key note speaker seemed to think. It is more like agribusiness, where you have everything from factory farming to organic farming, from fast food to slow food, from cafeterias to gourmet restaurants, from hypermarkets (...) to small specialty shops.
I think that the current paradigm is running out of steam. More and more translators and LSPs are realizing that it is an abstraction that is no longer entirely satisfactory.
Finally given that Daniel Brockmann, one of the driving forces behind this paradigm (which has allowed our industry to grow rapidly ...), attended MemoQFest, wouldn't his participation in this discussion have made it more representative? Perhaps he didn't want to. If so, too bad. I for one am looking forward to a discussion in the future where all the actual issues are dealt with.