One of the complaints I have often heard, long before I started working at Google as a linguist on Natural Language Processing, was that results on Google Translate for a number of African languages were terrible.
As an early adopter of translation technology I agreed with them because I had run into many of the same problems.
At the time, the internet never seemed suitably adjusted to my language, Yorùbá, the way it did for other more prominent ones. Google Translate (and before it Babel Fish) did better at translating text from French or German or Spanish into English, and vice versa, than it did with Yorùbá or Hausa or Swahili. It began to seem like there was a conspiracy to deny African, and other minority languages, the same pride of place in the new electronic universe.
There were other things that drove home the point over time. When I was a student at the University of Ìbàdàn in the early 2000s, there was no way to type texts in Yorùbá language using Microsoft Word. Not in any way that can combine the diacritics usually placed on top of and under the vowel, a fundamental part of successfully writing in Yorùbá. Being a tone language forced into the limited frame of the Latin script, Yorùbá has had to deal with the compromise that compels it to adapt itself to texts not created to deal with its peculiarities. Unicode was of no help. So, a surname like mine, Ọlátúbọ̀sún, would have had to have been written as either Olátúbòsún (which, is not the same name, since the o and ọ sounds in Yorùbá are different vowels, contrasted in their horizontal tongue direction), or Ọlatubọsun, which uses the right vowels but is toneless and thus without meaning. There were very few softwares that could solve the problem, but they were either expensive, or messed up the font.
This problem is part of why it has been hard to find properly-written/formatted Yorùbá texts on the internet. Due to the absence of means to write it, and the incompetence of a generation raised on such a dearth of resources (which, in any case, predates the internet to the early eighties when the economic downturn in Nigeria ensured that all the publishing houses who had specialised in Yorùbá and other indigenous language publications folded up), when the internet came along, all ostensible speakers of the many local languages around the country could only properly write in English and other European languages. Today, to drive home the point, the web is about 55% entirely in English and Africa is almost totally absent, mostly because of the lack of sufficient tools to write the language. (On Twitter today, for example, each diacritic on a typed Yorùbá word is counted as one character, rather than as an appendage, thus reducing how much space one has to express oneself).
One of the things we did, when I started work on a multimedia dictionary of names at YorubaName.com in 2015, was release a free tonemarking software, for Mac and Windows, which became the first (and probably still the only) free tool of such nature for Igbo and Yorùbá. When I later began working at Google, I also helped set up a mobile version through the GBoard app. I have done a few more things in that direction over the years.
When I worked on the Journalistic Style Guide for the BBC Yorùbá and Igbo which shortly began broadcasting in Nigeria, one of the things I wrote into it was the insistence that the Service write these languages to appropriate orthography — not just for the benefit of users many of who might be new learners, but also of future technology. Siri, Google Home, and Amazon Echo don’t exist in many of our languages because — among other reasons, some connected to commercial motivations — the work it would take to create them is complicated by the absence of a good usable online corpora in the language.
Google Translate is not run by linguists but by engineers and neural networks, which use millions of texts taken from the internet to find patterns. So, ironically, the poor quality of the translations the machine produces is related to the poor input it gets from speakers/users of these languages online.
And so, when linguists insist that more people who speak Yorùbá (or Igbo, Edo, Hausa, Fulfulde, etc) use their languages more on the internet where most of us now spend much of our time, we are trying to revitalise the languages. Speaking them to our children is, of course, one of the major ways to keep the languages alive. But using them online, and making them adaptable to modern technologies in many different ways, is of equal (if not greater) importance. If all of our business will be conducted using technology in the coming future, then whichever language is not present in these technologies is effectively on its way to destruction.
- Kọ́lá Túbọ̀sún, founder of YorubaName.com, is a linguist, scholar, and creative writer. In 2016, he was awarded the Premio Ostana, a prize for mother tongue writing and advocacy by Chambra D’Oc in Italy, becoming the first African to be so-honoured. He lives in Lagos.