Lost in translation: Facebook’s royal translation error

Photo by Jon Russell, CC BY 2.0 https://creativecommons.org/licenses/by/2.0, via Wikimedia Commons

On 28 July 2020, widely-followed media outlet, Thai PBS, posted a live stream of the candle-lighting ceremony celebrating His Majesty the King’s birthday on its Facebook page.

The caption of the live stream, said:

“[Live] Candle-lighting ceremony to celebrate the birthday of HM the King on July 2018, 2020 at 6.45 PM”

However, a glitch with Facebook’s translation tool changed the words “King’s birthday” to “King’s Memorial Day.,” in the Thai translation.

Expecting to see words celebrating his birthday, rather than the language displayed in memorial of death, many people became furious- with some even calling for the resignation of Thai PBS’s executives.

Consequently, Thai PBS published a statement the next day, passing the blame to the social media giant. Facebook swiftly admitted fault, issued a “profound apology” to the Thai people, and temporarily turned off auto-translation from English to Thai while the tool was fixed.

So, how did Facebook get the translation so wrong? 

Translation AI cannot speak royal Thai

Facebook’s translation AI learns language from experience. As it relates to the Thai language, the tool hasn’t yet learned enough of royal vocabulary to understand the “royal version” of the word “birthday” (วันเฉลิมพระชนมพรรษา). 

So, it picked another royal word most closely associated with “candle-lighting ceremony,” which is “Memorial Day.” This translates more literally as a death anniversary (วันคล้ายวันสวรรคต).

Though this specific instance seems like a simple translation faux pas, the more significant issue of the matter involves understanding how Facebook’s AI made this mistake and how the AI actually “learns from experience.”

Facebook has always used AI to translate users’ posts, and this is a tool that has become more sophisticated over the years due to investment in new technology.

In 2017, Facebook upgraded from using a simple dictionary-like tool that translates posts word-for-word, to a more sophisticated AI tool that considers the context of the posts before translating.

The AI utilizes what’s known as “long short-term memory neural networks”, which aim to partially replicate the mechanism of the human brain’s short term memory function, albeit with the advantage of being a machine.

How the AI works, in a nutshell, is: it keeps a data bank of sentences in the source language and their translations, called a “word pair.” The memory of each word pair is updated over time based on the new data it collects as users interact on— and with —the platform.

To date, Facebook has been able to leverage its massive user base to collect billions of word pairs, which is now encompassing over 2,000 translation directions. This collection of word pairs allows the AI translation tool to refine its translations, making the text output read more naturally when compared to word-for-word translations.

Unavoidably, there are still some words in the Thai language without any straightforward translation, and this is where the tool runs into a problem. When in a situation that does not have a precise contextual translation from Thai to English, Facebook’s AI defaults to choosing the word (or words) with the closest alignment based on historical inputs instead.

28 July is when this method revealed its flaws. Since ordinary Thai people rarely use royal language in everyday interactions, consequently there are minuscule samples of royal Thai vocabulary that the AI can learn from. So, the tool did not know these words yet, and instead of turning itself off to prevent a disaster, it displayed the very untimely and controversial translation error.

Thai is a tough nut to crack

In fairness, Thai is a difficult language to translate directly for a few reasons, and this applies to computers and humans (native Thai speakers) alike.

First, when forming complete sentences in the Thai language, single words are not separated by spaces. So, it is understandable that the AI struggles to identify the correct words in the presence of less context. A simplified example is the phrase “ตากลม,” which can mean either “round eyes” or “drying in the wind.” Like so many others, this phrase cannot be easily distinguished even by human readers without seeing other words in the same sentence, due mainly to the lack of space between words.

Second, the Thai language often uses spaces to break sentences and separate words into a list instead of using punctuation, such as commas and periods. The use, or lack, of punctuation creates an additional challenge for the AI because identifying separate sentences frequently requires an understanding of entire paragraphs.

Lastly, the volume of data available for AI training in the Thai language is quite small compared to other languages translated by AI on Facebook. The most extensive data set available for the Thai language machine learning tool contains one million word pairs- which is tiny when compared to the 40 million word pairs available in French.

Now, add in some rarely-used Thai royal vocabulary into the mix. It should now make more sense why the Thai language is one of the most challenging languages for this tool to translate accurately, without any human oversight. 

Google Translate users are probably familiar with this, as well. Passages automatically translated into the Thai language usually appear stilted and, at times, incoherent.

Next-gen translation AI

Last month, the results of a new generation of AI called GPT-3 started to surface. The software is being developed by OpenAI, a research lab co-founded by Elon Musk. It is the third iteration of the machine learning model specialized in natural language processing.

To date, the results are impressive. As of now, the new AI translates most phrases fluently, and its “wow factor” is that it can also write essays, poems, and even programming language codes on its own.

The rapid growth of this tech is possible because GPT-3 trains AI on a much larger data set than its predecessor by orders of magnitude. 

To illustrate the scale, the whole of the English Wikipedia only accounts for 0.6% of the total data the AI learns- and it was designed to train on the entirety of the internet.

Hopefully, this means the new generation of AI will be capable enough to tackle the complexities and nuances of translating the Thai language in its complexities, royal and otherwise. 

This article was originally published at https://thisrupt.co/current-affairs/facebook-royal-translation-error/ on August 14, 2020 and authored by Kanop Soponvijit, the owner of this blog.