Friday, September 18, 2015


There's a difference between computer assisted translation (CAT) or machine-assisted translation, and machine translation.

CAT Tools: SDL Trados, WordFast

Translation Management Systems: XTRF,

On machine translation:

Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel.[12]He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word.[13] Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.
Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.[citation needed]
Claude Piron, a long-time translator for the United Nations and the World Health Organization, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved:
Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six [more] hours of work. There are ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoner of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia.[14]

The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of AI than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.

Machine translation seems to be workiing well when the original document is written in "controlled language" but that takes a certain style of writing:

"The Limits of Machine Translation"

When you translate a law, a job application, a fire emergency instruction, a military order, or a medical prescription, you do not want the translation to be “fairly clear” and “almost accurate,” but clearand accurate. As two more recent researchers dryly commented, “a 95% system in the worst case produces a translated text analogous to a jar of cookies, only 5% of which are poisoned” (Carbonell and Tomita, 1987, p. 69).

Thursday, September 17, 2015


"Every minute of every day over 200 million emails are sent, nearly 600 websites are created, and some 48 hours of video are uploaded on YouTube. Most of that content is mutlilingual – nearly three-quarters of internet content is in languages other than English."

Translation is big business, and it’s growing bigger every day. The worldwide language services market has grown at an annual rate of 8% for the past 3 years. Estimates for 2014 put the world market at nearly US$40 billion, according to research by Common Sense Advisory. The Centre for Next Generation Localisation reports that localization is the 4th fastest-growing industry in the United States.

The Globalization and Localization Association is comprised of members worldwide who specialize in localization, translation, internationalization, and globalization. Every day they help companies, non-profit organizations, and governments communicate effectively to global audiences. They do this by making sure the content of their clients’ communications is culturally sensitive and presented in languages that their audiences understand.

Internationalization (known as "i18n" for short):  Its main purpose is to make sure that the source content is ready to go into multiple languages. This means i18n occurs at the beginning of content and product development, not after the content is ready for translation.

Localization (sometimes referred to as “l10n”): The aim of localization is to give a product the look and feel of having been created for the target market and to eliminate, or at least minimize, local sensitivities.

Until the 1990s, translators used the same tools they had been using for the previous 2000 years. The means of writing had changed, but for the most part, manual steps were required for almost everything. The same could be said for those who requested translations—the client.
Note that the quality issues of MT blur the drafting/revising boundary. One top search engine recently produced the following translation for me from Arabic: “O uterine poor and God will not lose anything to you said Amen like emoticon Admin: Hafid utsav.” This requires more than revising.


So with a 100% match, the TM program essentially produces the draft (66% of the work). The work of the translator in revising that draft would therefore be the remaining 33%. So a rate of 33% of the no-match rate is, on its face, what the market has been paying translators for this work. The value of the efficiency gain would seem to be the word count multiplied by 66% of the conventional per-word rate. That’s one theory, anyway.

Divvying up the gain

So how would the value of the efficiency gain be divided? When TM was novel, clients sometimes offered nothing for a 100% match. In the same vein, some translators gave no discount for TM. Agencies sometimes did both: charging the client full price while asking translators to provide 100% matches for free.

That was then. The atomized nature of the translation market may impede the flow of information, but word does get out. Translators and agencies compete by offering a share of those efficiency gains to clients, and arbitrage drives prices to clients down. Over- or under-discounting freelance translators notice drops in their bottom line. And agencies who send texts to translators with 100% matches removed and non-matching segments arrayed with the underlying flow missing find translators demanding higher rates.

In a market without other disruptions, such as oversupply or undersupply of translators, these efficiency gains might be shared equally, each party getting a third. 


"A translator working on a complete, non-repetitive text without TM—I’ll call this bespoke translation—translates differently. Rotely translating the same sentence the same way twice (or more) degrades readability, encouraging readers to skip text, so a bespoke translator is aware of rhythm, pacing, and logic. 

TM pushes in the opposite direction: it loves one-to-one correspondence and handling of segments in isolation. And many texts—procedure-heavy manuals are an obvious example—benefit from describing the same task identically each time. Indeed, such language is not intended to be linearly read; it is to be consumed when needed."

"TM programs have increasing awareness of context. A group of 100% matches occurring in the same sequence as a previous source is likely to be usable as is. In this case, the value of the translator’s revising task drops further because the value of the drafting performed by the TM has risen beyond 66%. The percentage hits bottom when a client genuinely wants the translator or agency to do absolutely nothing with a 100% match. The compensation to the translator is now a nuisance charge to visually process and make translate/no-translate decisions for every such segment, tasks that make demands on working memory, a limited resource. (See Neil Betteridge’s 2009 translation of Torkel Klingberg’s The Overflowing Brain: Information Overload and the Limits of Working Memory (Oxford: Oxford University Press).)"

And yet. Good clients actually pay more, not less. As localization professional Anna Schuster reports, firms that provide the most useful, extensive TMs—developed over years by double-checking high-quality translations contributed by skilled translators—discount the least. They need top quality, so they value experienced professional translators. Such clients may produce style guides, references, glossaries. They know translators read closely and encourage translators to report issues.  They believe bilingually sensitive revising has an edge over monolingual editing that can justify the cost.

About Agile Software Development process:
(as opposed to waterfall models --
About "Agile Localization":

interesting point on "reducing translation waste"

Rule 4 – “Reduce, Reuse and Recycle”

Localization can generate a lot of waste if not planned properly. So, it is key to become “green” in order to become more “agile”.
Reduce, Reuse, Recycle


It is clear that reducing the localization effort will have a positive impact on a team’s agility. This could be achieved in 2 ways: by validating the localization scope and by reducing the translation waste generated during the localization process.
  • Reducing Localization Scope
The Localization Manager’s job is to ensure the company localizes the right product and content into the right language set. At Adobe, we have had situations in which we were localizing too much content. For example, using Adobe’s Digital Marketing Suite, we discovered that Russian customers prefer reading Development documentation (such as API descriptions) in English rather than in Russian. We were able to save a lot of time and cost by removing this component from our localization requirements.
Similarly, through market research, we discovered that most Middle-Eastern Creative Suite customers prefer to use an English user interface with Arabic or Hebrew documentation. This combination makes English content such as videos and tutorials more accessible to them.
In short, tracking web analytics and engaging with customers, power users, pre-release testers and geos constitute a great way to validate the localization requirements and improve agility.
  • Reducing Localization Waste
Once the localization requirements are confirmed, it is key to limit the translation waste generated during the localization process. This obviously impacts the translators’ work but also the bandwidth of the localization staff.
Sweep Away Waste!An effective way to reduce localization waste is by understanding its root cause. At Adobe, we categorize all localization defects through a common set of keywords, which provides us with a good picture of the issues faced across products. We can then develop solutions to reduce, if not eliminate, these defects.
Localization waste sometimes originates from English strings -assuming English is the source language. Indeed, translations created before English strings get finalized will need to be revisited and will likely generate some waste.
In the agile world, we can’t afford that extra time, so it is important to validate the English content before handing it off to the translators. Doing something as simple as spell checking can help to reduce a lot of localization waste. In a product such as InDesign, about 3% of the English user interface strings are updated once they get reviewed for spelling and grammatical mistakes. For a product that is localized into 25 languages, this represents a waste equivalent to 75% of a single language scope!
Also, many of the software localization testing activities are necessary because localization is happening out of context. Solving that problem can tremendously speed up the localization process. In an ideal world, localization should be a product feature that allows translators to translate the user interface in-context. Facebook did a great job in this area by enabling translators (in this case its user community) to translate and provide feedback within the application itself. Alternatively, translators should be provided context information through builds, screenshots or meta-data information (e.g. developer comments, feature name, expected delivery time, etc.).
To reduce waste, it is also recommended that localizers develop glossaries, style guides and tools that leverage previous localizations.
Ultimately, it’s critical for translators to validate their work as they translate. That way, activities down the production line can be eliminated or reduced, which makes the entire process more agile.


Reuse when it makes sense!Reusing strings can sometimes be a source of challenging defects in software localization, so it has to be handled carefully. For example, the English string “none” could be translated as “aucun” or “aucune” in French based on the gender of the noun to which it refers.
That said, reusing strings – in the same context – could also help to improve agility, since these strings won’t need to be translated multiple times.
An area where Adobe has experienced positive results with reducing and reusing English content is in our instructional content. In documentation, Adobe relies on Acrolinx to control the quality of the English (source) content. Authors need to use a certain authoring style (e.g. shorter sentences) and are encouraged to leverage existing paragraphs (e.g. legal disclaimers). This improves consistency in the English documentation and has the great benefit of reducing the localization workload too.
Similarly, DITA (read Reduce, Reuse and Recycle: Developing a Reuse Strategy for DITA) and Content Management Systems such as Adobe Experience Manager (formerly known as Day CQ) are designed to reuse/share content across multiple channels and publications.


Recycling is the process of transforming existing materials (or waste) such that they could be reused again – sometimes for a totally different purpose. Creating polar fleeces from used plastic bottles or isolating walls using old denim jeans are classic examples of recycling.
Such transformations can apply to translations too. Translators don’t need to translate every sentence from scratch. Translation technologies such as Translation Memories and Machine Translation engines can help translators recycle previous translations and speed up the translation process. At Adobe, we have experienced dramatic productivity gains when we used these technologies. In general, a translator supported by these technologies will deliver in an hour what other translators would deliver in a day. These are impressive gains that contribute to localization agility too.