Natural reading

TL;DR

Google translate is a helpful tool, but the results can be misleading… especially before 2016

Have you ever used Google Translate? When we have a basic level of understanding for some foreign language, but have trouble understanding a particular sentence, Google Translate may hint at the structure of the sentence. The result churned out by Google Translate may serve as the reference based on which we may piece together the real meaning of the sentence. Yet, we all well know that the machine-generated text can’t be the ground truth, our own discretion is still required.

After the introduction of the neural network technology in 2016, the translation suggested by Google Translate (let’s call it GT for short, since it will be mentioned repeatedly in this article) became ever more precise and human-like. However, it still yields error from time to time, especially when:

  • A sentence is taken out of its context
  • A particular concept doesn’t exist in the target language
  • The same sentence is to be translated into multiple target languages

Here’s an example of a sentence which could make GT go awry, taken from a manga I have been reading for a while. The context of the sentence is that the invincible main character was asked by her girl friends about self-defense moves. And she said:

護身ったって、そんなカラまれるコトないからな

This sentence can be roughly translated as “Self-defense move, huh? Well… I’ve never been caught up in such a situation (that I’d need self-defense), though.” But what GT yields is “Even if you defend yourself, you can’t get caught like that.” (as of 2023 Apr. 30th) Japanese is highly context-dependent, to fully understand a sentence, we have to consider when it is spoken, and WHY it is spoken — that is, we have to speculate the intention of the speaker. GT lacks the information about the context and the ability to conjure one, thus yielding bizarre translations like this one.

Things get even more absurd when we ask GT to translate the same sentence into another language, say Traditional Chinese. Here it goes, “即使你為自己辯護,你也不能這樣被抓住“, which basically means “Even if you defend yourself, you can’t get caught like that.” But the sense of the word “defend” in the Chinese translation is “to act as attorney for” rather than “to drive danger or attack away from” as in the original sentence. Apparently, when I asked GT to translate a Japanese sentence to Chinese, it first translated the sentence to English, then it translated the English sentence to Chinese, losing track of the original meaning — about which GT had no idea in the first place.

GT finds the statistical relationship between words within the same language or even across languages, but it has no idea about what this relationship means.

That’s why we sometimes find GT results hilarious. In fact, before 2016, the results generated by the old version of GT were almost always hilarious. The sentences were passable, the meaning was more or less conveyed, but the wordings and arrangements were just outright weird. While reading the sentences, we could almost hear a robot speaking a language it can barely grasp. It was just awkward.

In my Master’s degree study, I looked into the manifestation of this awkwardness in our brains.

Readability: an emergent property of texts

We might all agree that texts generated by GT is not particularly readable, but it’s not that easy to articulate what factors contribute to the “readability” of a text. While I’m writing this article, a “Readability analysis” panel to the right of my WordPress editor was keeping track of the readability index of my writing. It suggested that the following features may lead to a low readability:

  • Containing a lot of hard or unusual words
  • Containing a lot of sentences in passive voice
  • Containing a lot of long sentences/paragraphs/sections
  • Lacking variability in the way sentences begins
  • Not using enough transition words

Obviously, none of these is the problem of any single word or sentence. A rare word like “floccinaucinihilipilification” is just that, rare. We wouldn’t say that this word has low readability. The same roughly goes with sentences. A single sentence in passive voice? No problem. A single long sentence? Probably fine. We wouldn’t consider a passive sentence or a long sentence particularly unreadable. But yeah, if a long sentence is also convoluted in clause structure, we’d say it’s not very readable. Like this example sentence I found while searching for “long and convoluted sentences”:

The life story to be told of any creative worker is therefore by its very nature, by its diversion of purpose and its qualified success, by its grotesque transitions from sublimation to base necessity and its pervasive stress towards flight, a comedy.

Note that the words in this sentence are not outrageously hard, but the way they got strung together leads to a complicated sentence. So, basically,

Readability is an “emergent” property that lies not in any single element, but in how these elements are combined.

Therefore, to study the neural effect of readability, I can’t use individual words or sentences as our experiment material. Instead, I adopted whole articles which people may encounter in everyday life. In other words, I conducted my experiment with “natural stimuli” (more about natural stimuli can be found here)

Reading, as naturally as possible during an functional magnetic resonance imaging (fMRI) scan

In this experiment, I looked at not only the effect of readability (resulting from different sources of translation), but also the genre of the texts — fiction vs article. Consequently, each of my participants read four types of articles:

  • Human-translated fictions
  • Machine-translated fictions
  • Human-translated news
  • Machine-translated news

The articles were chosen from well-accepted English publications with reliable official Traditional Chinese translations. Namely, fictional articles were selected from Reader’s Digest, and news articles were selected from The New York Times. The original (English) version were fed to the then pre-neural-network GT to generate machine-translated versions (FYI, this study was done during 2014-15).

Here are some examples of the English version, the human-translated version, and the machine-translated version of a fictional article and a news article. For simplicity, I only show the first paragraph of each article. If you happen to know Traditional Chinese, you can experience firsthand the difference between human and machine translations.

Fictional article (English): “Where Frogs Quack”

Fictional article (Human-translated): “青蛙都是「呱呱呱」?”

Fictional article (Machine-translated): “青蛙都是「呱呱呱」?”

News article (English): “Creatures That Hide in the Ocean”

News article (Human-translated): “海洋動物藏身於無掩蔽處”

News article (Machine-translated): “海洋動物藏身於無掩蔽處”

During the experiment, participants silently read such passages during an fMRI scan. While the articles were natural reading materials, I didn’t want my participants to read completely naturally so as to involve a lot of regressions (re-reading previous sentences), which might result in drastically different reading time across participants. Therefore, I designed a scrolling reading interface which constrained participants’ field of vision to the current line plus or minus 2 lines.

An example of the interface used to present reading material to participants. Shown in this example is a machine-translated fictional article.

This design ensured participants are generally reading the same part of the article at the same time, enabling the analysis adopted for this experiment.

Inter-subject correlation revealed the neural difference between translation styles and genres

If a bunch of people experienced the same sequence of events at the same pace, at least somewhere in their brains, the neural response time-course should be very similar. For example, when different people listen to the same song, the time-course (or waveform) in their primary auditory cortex should be almost the same — thus highly correlated. That’s because the primary auditory cortex more or less faithfully represents incoming audio signals, that is, the song.

This is the idea underlying the “inter-subject correlation (ISC)” analysis.

In so-called “higher-order” regions in the brain, the neural time-courses are not replications of the perceived events. Instead, they may track the feelings, interpretations, or predictions for the events a person is experiencing. Because feelings, interpretations, and predictions can vary across individuals, how correlated the time-courses are between people may reflect how universal the response to an experience is. If a story is very explicit and obviously happy, we may observe high inter-subject correlation in these higher-order regions. On the contrary, if a story is ambiguous, the inter-subject correlation value can be low. More about inter-subject correlation and related analysis methods can be found here.

I applied this inter-subject correlation analysis to the neuroimaging data collected from natural reading. My hypothesis is that there should be a difference in the degree of inter-subject correlation between different translation styles, and between different genres.

And such a difference I found.

Human-translated articles yielded greater synchrony between readers than machine-translated articles. This effect resided in the right brain, in a region called the “precuneus“, which has been associated with social cognition, or the ability to imagine other people’s mental states.

This is the right brain, facing us with its medial surface, meaning that we are viewing from the left-hand side of the brain, but setting the left brain transparent. The more yellow the color, the higher the inter-subject correlation is for human-translated articles than for machine-translated articles.

So, I probably found the neural/mental effect of the low-readability characteristic of machine translations: it gives people a hard time understanding the mental states of the characters involved, or coming up with some thoughts of their own regarding the events described by the articles. Even when the articles are really simple, people just couldn’t understand them in the same way.

As for the difference between genres, I found greater inter-subject correlation for fictional articles than news articles. This is again in accordance with common wisdom, though the effect of the difference lies in not in the precuneus, but is found mostly in the primary visual cortex (which faithfully reflect the visual input) and the anterior temporal cortex (temporal pole), which is usually associated with the integration of meanings during reading. As for what this indicates, interested readers are welcome to read my published work.

Re-visiting this study done in 2014 after so many years (it’s 2022 when I wrote this), I found much to improve regarding the design and analysis method. But to my knowledge, this work is still among the first ones to apply “natural stimuli” and “inter-subject correlation” techniques to study reading. Thus, I’m still proud of this work, and thankful to my Master’s thesis advisor, Dr. Fa-Hsuan Lin for letting me play with these ideas previously unheard of in the lab.

Related scientific publications

  • Lin, F.-H., Liu, Y.-F., Lee, H.-J., Chang, C. H. C., Jaaskelainen, I. P., Yeh, J.-N., & Kuo, W.-J. (2019). Differential brain mechanisms during reading human vs. machine translated fiction and news texts. Scientific reports
  • Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., & Malach, R. (2004). Intersubject Synchronization of Cortical Activity During Natural Vision. Science
  • Hasson, U., Malach, R., & Heeger, D. J. (2010). Reliability of cortical activity during natural stimulation. Trends in Cognitive Sciences
  • Honey, C. J., Thompson, C. R., Lerner, Y., & Hasson, U. (2012). Not Lost in Translation: Neural Responses Shared Across Languages. Journal of Neuroscience
  • Bedny, M., Pascual-Leone, A., & Saxe, R. R. (2009). Growing up blind does not change the neural bases of Theory of Mind. Proceedings of the National Academy of Sciences