Finding an optimised keyboard layout for Malagasy [ongoing]

The current keyboard layout in use by most Malagasy language speakers puts whoever who wants to write in Malagasy at a huge disadvantage. It is impossible to write quickly in their language without stressing out their hand muscles. A typical malagasy sentence is quite often longer than a French one due to word length. Depending on the text sample, It may vary from 7% longer (compare the first 10 verses of the Chapter 1 of the Gospel of John) to 20% longer for more complex texts. A text that had required 10 hours to be written in French will take 11 up to 14 hours for Malagasy. At the scale of a company, or even a country, that is a huge waste of time, mostly due to a legacy that has lost all its relevance as keyboards do not have the same constraints as typewriters.
To tell you my story: since I’ve got my Samsung tablet, I’ve almost never used the default Samsung keyboard. So what did write my text messages with? I’m using my own keyboard layout; I’ll show you why and how.

A quick review on Malagasy uses

Before I get to the point, let’s see on what my fellow Malagasy citizens type their Malagasy language text with:

azerty.jpg
Fig. 1: AZERTY keyboard, made by French as an imitation of the American QWERTY

This, ladies and gentlemen, is the layout that is currently being used and known by most of the 24 million people in Madagascar. No need to say that their fellow citizen swho have emigrated to France also use it.
The problem is that layout is not suitable for Malagasy. At all.
heatkey.jpg
Fig.2: Heat map on an AZERTY keyboard used to type in Malagasy.

The heat map above has been generated using the Malagasy version of the Rainilaiarivony Wikipedia article. As a Wikimedia contributor, I’ve had the pleasure to type it… using the AZERTY keyboard. It was really a pain, and it looked like you did a lot of effort only to get less than the English version from which I had been translating.
azerty1.jpg
Fig. 3: In an AZERTY keyboard, when typing in Malagasy, your left pinky travels A LOT

That is also felt by my fellow citizens, a lot of whom have taken bad writing habits like writing SMS. That habit is sometimes taken to a new level, so that an unexperimented reader may find difficult or even impossible to read a text written in that SMS-style writing.
Even though most people browse the Web in French or English far more often than in Malagasy, using the QWERTY/AZERTY layouts is a pain, even if this is all we have, and even if this is what most people will know. Even if it’ll never have the success of the traditional layouts, I’ll give my two cents for a layout optimised for Malagasy language

Solutions

To palliate this strong disadvantage given to Malagasy regarding keyboard typing speed. I’d been using the German Neo keyboard layout. This was an already good alternative to the QWERTY which I’d been using for 4 years, but it was still sub-optimal, as my left little pinky is above a letter that is never used in Malagasy, my mother tongue.

neo
Fig. 4: German Neo Layout (see: neo-layout.org)

While looking for a solution to my problem I’ve discovered patorjk.com. From a given text, this website basically calculates which keys are most hit while the text is typed. From those keys’ position, a rating will be given. That rating takes into account for 1/3 the distance your finger had moved, how you use your fingers for 1/3 and how you often you have to switch fingers and hands while typing for 1/3. The higher the rating, the lower your hands will have to travel to type the text; so mechanically you’d be less tired typing the text in an optimal keyboard than in a standardised one.
So for our Rainilaiarivony text, there are the rating for the keyboards:
rating
Fig. 5: Layout ratings

The loser here is clearly the AZERTY, used by most of my fellow citizens. The standardised  Dvoraks are good candidates for typing Malagasy, and maybe we should consider those keyboards since they are widely supported in modern operating systems.
Here is what the programmer Dvorak looks like:
dvorak.jpg
Fig. 6: Programmer Dvorak Keyboard

Setting the Malagasy Optimised Layout

First version (7 November 2016)

The Dvorak score was impressive at the first sight, but the Dvorak was not the optimal layout for Malagasy. The one which the algorithm had found optimal was the following one:

malagasy1.jpg
Fig.7: Algorithmically generated Layout from patorjk.com (some keys’ positions have been frozen for more practicality)

That layout looks pretty decent but the keys are put in a little bit messy way. On the basis of that keyboard, the German Neo and the arrangement of a bunch of standard ergonomic keyboards I’ve come out to the following layout:
malagasy
Fig. 8: Own-made keyboard (the Malagasy Keyboard)

I’ve rerun the analysis on the same Rainilaiarivony article on that keyboard and a couple others. Here are the ratings:
ratings2
Fig. 9: Ratings of the Malagasy keyboard layout on the basis of the Malagasy version of the Rainilaiarivony

Well, to say the least, it looks like I’ve done way more than what the algorithm had succeeded to find. I’m pretty sure the layout I’ve designed is not very far from the perfect Malagasy-optimised Dvorak. Let’s go further into the report and see the row usage comparison.
row-usage.jpg
Fig. 10: Row usage comparison.

Yes, the AZERTY is an absolute typist horror when it comes to Malagasy.
The use rate of the home row for the our Malagasy keyboard is not very far from the optimal/personnalized layout generated by the algorithm.

version 2 (13 November 2016)

ergo1
Fig. 11: Hot keys on the second attempt.

Well, after a few day testing the keyboard layout I’ve got on the first attempt, I’ve felt some mandatory re-tuning of the optimised keyboard. That implied moving some keys to get the hot ones (the ones I have to hit most to type down my text) right under my index and my right middle finger. Since the left finger almost always type vowels, I’ve made them stay as most as possible at the home row unless you want to type some foreign words – in which case you’ll have some gymnastic to do.
ergo3.jpg
Fig.12: Finger usage of various keyboards.

As shown in fig. 12, the total number of hits in the Rainilaiarivony article is distributed as such: ~53% for the left hand and ~47% for the right hand. This excludes the thumb hitting the spacebar.
ergo4.jpg
Fig.13: Second attempt’s rating.

We’re getting better. Though the article is the same, I’ve switched to selecting the article from its HTML form. Since working on the article over and over again may constitute some bias, I’ve tried using some text samples from the Sarasara Tsy Ambaka.
I took quite a huge text sample (containing ~260,000 characters). It took a while to process but it takes out much of the bias related to the Rainilaiarivony article. The results still makes our Malagasy optimised keyboard the best layout ever to exist for the Malagasy language (cf. figure 14)
ergo5.jpg
Fig. 14: Layout ratings comparison.

I have to note that the calculated optimised layout gets closer and closer to the one I’ve designed, at least for the home row. Have a look:
ergo6.jpg
Fig. 15: The calculated layout. Looks a bit familiar, right?

As of this second version, we have an fairly optimised layout for Malagasy language, i.e. you’ll gradually type faster as your hand muscles get used to the new layout. Even for typing other languages such as French, this layout surpasses the AZERTY as the latter keyboard layout had been initially made to avoid the jamming of typewriters.

My conclusions

I may never say it much enough: the AZERTY keyboard is the absolute worst keyboard to type Malagasy with. Even the QWERTY does better. The Dvorak is a pretty good candidate for a widespread “more ergonomic” layout due to its presence in all modern widespread operating systems, but there is better.
Even if the French have designed the BÉPO layout for their language, it has failed to replace the omnipresent and inherited AZERTY slow layout. There is only one person I know who uses it on a daily basis. We also have to add to the fact that BÉPO has been around since 2008 and the Klavie Malagasy (“Malagasy Keyboard”) has only been written about just now, in 7th November 2016. As heavy as it is, the legacy left by AZERTY is highly likely to continue to be used in Madagascar probably for decades as long as keyboard typing exists, even if we relevantly know that the AZERTY layout is totally unsuitable to write French let alone Malagasy.
Right now I’m typing this article in English on a QWERTY keyboard. I’m planning to translate it to Malagasy as it gets more complete in order to reach more of the target audience.
I’ve already implemented that layout on my tablet so I’ve got all the time I need to adapt my fingers from the old Neo layout to the new Klavie Malagasy.

Updates

v2.1 as of 19 December 2017

Attached a PDF file containing the test corpus. A slightly better version has been proposed in the comments (thanks Ian!); and even though it has lower score than the v2.0, it has a really awesome idea of putting the T on the home row.
To better track all the changes, the project now has its own repository on Github. Long live open source!

Resources

Five ways to enrich Wiktionary

Since 2010, I’ve been contributing to the Malagasy Wiktionary.
It has become a habit now: every month, every week, every day, and almost every morning and evening, I turn on the web browser to check what’s going on on Wiktionary, and what I can do to add further content.
Some days, I get so interested in adding some pieces of information that I feel like writing a program to add it in the next hours.
And some days, I don’t feel like contributing, and them I’m just looking at the recent changes to check if pages have been vandalised in my absence, or if some pages have been fixed by other users.
Still there are several ways to contribute to Wiktionary. Here are five of them:
(1) Write pages manually. This is the most basic yet most tedious work to do. This is how everyone start, and this will is how most of us will contribute probably for the next 30 years. In 2045, Wiktionary or even Wikipedia in its current form will probably become obsolete or be self-editing.
Before this happens, you’ve got to put in a lot of work. Still, you can increase your efficiency by learning to write code, then:
(2) Write a program that writes pages that you may need to fix. Simple, since the last three years, I’ve been concentrating on how to do this. But as time passes a lot of pages get created, and even with a lot rate of error, you end up with thousands of pages of potentially wrong information. OK, but you also end up with even more pages with correct information. Coupled with synonyms dictionary and advanced NLP you can have it write definitions of words that can’t be translated directly to the target language.
(3) Write a program that reads newspapers to find the words to be created. With a very complete dictionary it gets difficult to find missing words. You won’t have the will to read dozens of newspaper articles every day, so have a program read them for you and find all missing words for you. After that, write a program to detect all compound words and add them to the Wiktionary if you feel like it. The next-level of this kind program would be an almost-real-time word scraper which analyses text flow for e.g. Twitter and lists all missing words at the end of the day.
Learning to code is one thing, but adding information and know what piece of information to add are two different things. Whenever you have an idea, or interesting lexicographic datasets under your eyes, get to code and add those bits of information to the Wiktionary. Do so in compliance with copyright laws.
(4) Navigate through dictionaries and add exotic words. Passionate about word etymology? Are you learning a language? Do the words not exist in Wiktionary? Feel free to add them. Always do so in compliance with copyright laws. Compiling several dictionaries and definitions may be attributed as original work but never do verbatim copy of word definitions. I did this one time and almost get sued because of a complaint of a copyright owner. If you feel you’re good enough in AI and NLP, write a program to reformulate and translate the sentences.
Code is strong, code is powerful. It requires a lot of time to write good one. It requires a lot of time to become good at coding, and not everyone feels like learning it. So what to do?
(5) Contribute to your native language Wiktionary. English put apart, Wiktionary is written in 170 different languages. A huge number of them have below 100,000 pages. Malagasy, my native tongue, has 3.75 million only thanks to my efforts in trying to create the biggest dictionary in Malagasy that has ever existed. If your native language is English, get interested in other languages and add new words in them, be it at the English Wiktionary or elsewhere. What, you are not passionate about languages? Add obscure English slang terms then.

African language Wikimedia projects summary

A few months ago I wrote an article which summarises my history on the Malagasy Wiktionary, and more generally my history on Malagasy language Wikimedia projects.
I am back here to write a short summary recapitulating the current progression of African language WMF projects. In this article you’ll learn about the current stage of African language projects and their trend.
In terms of community size, the biggest African-language community is the Afrikaans language Wikipedia community; followed by Egyptian Arabic speaking community and Swahili speaking community.
If we look closer to the statistics. The award goes to the Afrikaans language Wikipedia community which has 7 to 8 very active contributors (performing more than 100 edits per month).
The Egyptian Arabic Wikipedia community counts 2-3 very active contributors, which is big for an African language but very small comparing to Standard Arabic community counting more than twenty times more active users (83 very active users in June 2013), most of them being Egyptian contributors.
About Swahili, the number of very active users is one to two. On a 2-year term, this number can be averaged to 1. But the number of active users (i.e. making more than 5 edits per month) is 9 in average, which is a fine thing for a language that is spoken in countries where internet access is quite hard.
These numbers were obviously averaged from July 2011 to June 2013, so it smoothes short-term variations.
In terms of raw article size, the biggest African language Wikimedia project is the Malagasy Wiktionary – which currently counts 2.5 million articles, only smaller than English and bigger than French! – , the Malagasy Wikipedia (40,000+ articles) and the Yoruba Wikipedia (30,000+ articles), followed by the Afrikaans and the Swahili language Wikipedias (respectively 27,000+ and 25,000+ articles).
The Malagasy Wiktionary balecame very big for reasons you can read here, the Malagasy Wikipedia is big thanks to geography articles (~20,000 articles) and celestial objects (~8,000 articles); the Yoruba Wikipedia is made big by articles about people and also celestial objects (~15,000 objects).
Many Wikimedians who consult the statistics should know that the number of content pages does not determine the quality or the comprehensiveness of an encyclopedia. Judging wikis by article count is like judging a book by the appearance of its cover. And many book readers and critics know that looking at the cover is not enough to judge a novel. Here, by its raw size, the Malagasy language dominate in the two biggest projects (Wikipedia and Wiktionary) but that doesn’t mean it has a very active community.
To judge about the quality, comprehensiveness and completeness of the articles of such wikis, it is better to dive into this kind of statistics where scores are given by the absence/presence of vital articles and the size (number of characters) of such articles (if they exist). That kind of statistics are better than article count and page depth which can be inflated by the use of bot and the generation of tons of non-article pages (talk pages, subpages, redirects…).
According to the List of Wikipedias by sample of articles, the best scored African language Wikipedia is the Afrikaans Wikipedia, which ranks 58th and the Swahili Wikipedia (79th) followed by Egyptian Arabic, Yoruba and Somali Wikipedias. Malagasy Wikipedia is quite far behind and ranks 155th which is only higher than Lingala (161st), Wolof (175th) and Shona (187th) Wikipedias having less than 5,000 articles. Which means article count is only the cover of the book and thus some efforts have to be done there to make Malagasy Wikipedia more comprehensive.
What about the trend?
Less than a year ago, some Wikipedias found a way to grow in number of article thanks to species databases. The first ones I saw to grow this way are Winaray and Cebuano Wikipedias. Winaray Wikipedia gained 100,000 articles primarily thanks to low quality geography stubs (consisting in one or two sentences), and secondarily thanks to articles about species, animal and vegetal ones, making it to have 510,000 articles. Cebuano has more than decupled in article count within the last 50 weeks, from 40,000 to more than 500,000 articles. This mania of creating article about species has propagated to Swedish and Dutch Wikipedia which has recently surpassed the German Wikipedia, and in response to that, the latter Wikipedia seemed to have boycotted the Dutch Wikipedia, by deleting the link to the Dutch Wikipedia in the German language Wikipedia main page.
Now let’s write about the growth trend of African language Wikimedia projects. First off, let’s talk about Wikipedias, then Wiktionaries and finally other «minor» Wikimedia projects.

Wikipedia language edition

Current article count

Growth (in 300 days) (1)

Malagasy

40,619

+2,415

Yoruba

30,624

+582

Afrikaans

27,801

+3,928

Swahili

25,368

+1,232

Amharic

12,722

+1,015

Egyptian Arabic

10,764

+1,939

Somali

2,830

+383

Lingala

2,035

+118

Kinyarwanda

1,816

+7

Kabyle

1,517

+778

Wolof

1,172

+49

Kongo

826

+135

Northern Sotho

688

Igbo

739

+44

Zulu

586

+22

Setswana

496

–1

Bambara

392

+6

Siswati

368

+6

Ewe

302

+12

Hausa

291

+17

Oromo

276

+36

Tigrinya

259

+2

Tsonga

250

+7

Sango

204

+17

Kirundi

192

+8

Sesotho

189

+44

Akan

179

+17

Fulfude

166

+12

Luganda

166

–2

Twi

157

+12

Chamorro

157

+6

Xhosa

151

+10

(1) Calculated following this site, data retrieved in July 26th 2013.
On Wikipedia, the growth is slow comparing to other languages spoken in developped countries, where Internet access is easy and unexpensive to the normal citizen. The African language with the biggest community grows at approximately 5,000 articles per year, which is fairly high comparing to Swahili which growth is almost twice lower. If the current trend continues, the Afrikaans Wikipedia will surpass the Yoruba language Wikipedia next year, and the Malagasy Wikipedia in the next 2 years, as the two current biggest Wikipedias are stagnating in article growth.
On smaller Wikipedias, the trend is positive, though slow. All open Wikipedias have more than 100 articles.
The biggest of them is the Malagasy Wiktionary which has its growth kept by the use of Bot-Jagwar. Owned by myself, Bot-Jagwar runs from the Cloud, so it works regardless my computer and my internet connection’s healths. Thanks to it, the Malagasy Wiktionary gains 300 to 500 content pages daily. Automations eases many things in many ways, but automated processes can fail. So I have to keep an eye not only on the source code but also to entries generated thanks to that source code.
African language Wikipedias are slowly but surely gaining articles as time passes. There seems to be a moratorium in closing African language Wikipedias, and this is fine because languages mainly spoken in developping countries need time to develop a community. Furthermore, the official language in these countries, especially African ones, are very often not the local language.

Kurzweil Curve showing growth of computing power. It shows that all human brains can be simulated by 2050.
Kurzweil Curve showing growth of computing power. It shows that all human brains can be simulated by 2050. What about having billions of “virtual” contributors on Wikipedia in 2050? Source (kraxinglogic.com)

An increase of bot-made articles (which constitute nowadays 20% of articles created in Wikipedia) can indicate that in a near future, perhaps in 25 or 30 years, a bot will be able to write article like humans do. This is because Ray Kurzweil predicts the ability to simulate the human brain to be possible in twelve years and that current computers’ calculation power were supercomputers’ in the 1990s.
What about me? Well, it’s been a while since my last big article on the Malagasy Wikipedia. And according to the list of Wikipedias by sample of article, several hundreds of article needed in all Wikipedias are missing, so my first goal for Wikipedia is to fill these gaps, slowly but yet surely. I prefer contributing about geography, but as I am the only contributor of the Wiki, I have to fill gaps a bit everywhere : Biography, Chemistry, Sports, etc. At that pace, I can barely create three or four articles per day. At that pace, I can fill the 1,000 articles that every Wikipedia should have list whithin the year.
It’s been a while since the last time I blogged in Malagasy, So this article will be followed by a Malagasy language article. Perhaps a translation of this one, perhaps a new one.
Useful resources
To read further about what’s mentioned here.

  1. The law of Accelerating Returns by Kurzweil
  2. http://www.wikistatistics.net for all statistics about Wikimedia projects

[polldaddy poll=7298306]

Search on Google using Python scripts

What about a free unlimited Google API? In the past, Google provided such thing, but it is definitely deprecated (due to abuses?). The new Search API needs money ($5 for 1,000 queries), and the free API has a limited use of 100 queries per day. Without any money, you won’t get far. After getting that information. I let down that project… Until I contribute to Wiktionary!

Extracting words from Malagasy daily newspapers to Malagasy Wiktionary weren’t actually an easy thing to program. At the first version of the script. It only can parse RSS feeds, and is very slow compared to what I used to know. It is because it loads approx. 400,000 words at each launch.
While doing that work. I have noticed that there are a plenty of words that are actually compounded words.This notice gave me an idea: anticipate through looking on google search whether the word exists or not: because on 1,300 roots contained on the Malagasy Wiktionary, I can potentially make 1.7 million by combining two nouns,  2.2 billion with three, and likely 2.8 trillion using four roots. That is enormous, and even at full regime, I will never be able to look for them all: at 5 queries per second (fastest rate I’ve ever had) it will take respectively 4 days with 2 roots, 14 years with three and eventually 177 centuries (17,700 years) for four roots. This is the first reason for which I have decided to try hacking Google Search to see if the word combination has already been used.

First, I looked to the page source, and it is very, very complicated to understand. I even think that this page was made by bot as html tag names are not written in a human language. I also have tried to use the URL but it is actually very, very long, with characters that look more like hashes and keys (?), not findable as they don’t explicitly appear on the main page form. At first sight, this kind of project is likely to fall…

I have found on the Web a post describing how to use the Google Search without any API. But there was a problem: the discussion is almost three years old. And when downloaded, the search engine has visibly been changed: it is very probable that a Google employee reported that discussion leading the company to take adequate measures. When I ran the script, all I could see was that there was nothing operational: no results were given when doing any search. I still keep an eye on the downloaded script. And I am trying to find something which can solve this problem. This script just avoided me to spend hours and hours reinventing a (square) wheel.

Once this problem is solved, at least temporarily, the source code will be released on SourceForge: Bot-Jagwar. It will rapidly fall into deprecation, so if there are peoples willing to update the script. They’ll be welcome :).

Birao KDE amin’ny teny malagasy

Raha ny fantatro ankehitriny dia mbola tsy tanteraka ny fandikana amin’ny teny malagasy ny birao KDE. Nojereko teo amin’ny tranonkala ofisialin’i KDE localization ilay izy, dia ny teny eoropeana no hitako manjakazaka : fiteny portogey, anglisy, frantsay… Noho izany ny tenako dia nanan-kevitra ny handika ilay birao amin’ny teny malagasy. Fantatro fa betsaka ny asa mbola hatao, anefa koa ny fotoana mihakely. Na dia izany aza, aleo mampandroso ny zavatra toa izay mamela azy eo amin’ny toerany dia miandry azy ho vitan’ny hafa izay mety tsy ho avy mihitsy.

Mikasika ny rindrankajy ampiasaiko indray ho an’ny dikan-teny dia Lokalize no ampiasiko. Io no hitako mora ampiasaina.

An introduction to Kriollatino language

Kriollatino language is an agglutinative language, which means that this language uses a system of affixes to derivate words, not as English does by using words of different systems. There is no grammatical case inflection, but particles do the same work.

Kriollatino language uses a modified latin alphabet which contains 35 letters (37 if we include the 26-letter classical latin alphabet) – including á, í, ú, ù, é, ė, ó, þ, đ, ś, ć -. Each letter represents one sound, but changes can be made in the way of easing pronounciation.

So, here are a few simple sentences to start speaking in that language :

Benveno ! : Welcome !

Hao : to say hello, in general case. Followed by a proper or a simple noun.
Bondio : another way of greeting. Can also be used for farewells.
Bonvivo : To wish someone a long live. Used to quit someone for a long time or forever.
Mi apele Johano. E tu, tu apele ki ayo ?  : My name is Johano. And you, what is your name ?

To learn more words in Kriollatino, here is a participative multilingual dictionary on which words of any language are translated and/or explained in Kriollatino language. Just to give you an idea of the current status of the language. This dictionary at this time has just translated a few words in english (word list)

Kriollatino, fiteny noforonina : toko voalohany

Tsy vao androany akory io hevitra io fa efa hatry ny ela no ketrehina tato an-doha : ny famoronana teny vaovao iray. Amiko manokana aloha io fiteny noforonina izay mitondra ny anarana “Kriollatino” io dia tsy natao ho fiteny iraisam-pirenena mora ianarana, na dia anisan’ny iriako ho an’ny fiteny noforoniko aza izany. Ny tena antony namoronana io fiteny Kriollatino io dia fiteny natao ho teny ofisialin’ny firenena iray lehibe izay ao amin’ny Ranomasimbe Pasifika. Izay firenena izay moa dia noforonina ihany koa mazava ho azy.Efa ho dimy taona any ho any izay ny vangovangon-drakibolana voalohany natao ho an’io teny io ; Kriollatino anaran’ilay fiteny tamin’izany fotoana izany : fiteny somary mitovy amin’ny teny espaniola fa manana ny maha-izy azy manokana, ary mbola sarotra ianarana nohon’ny teny espaniola ilay izy, ka noho izaho tsy manam-potoana ny hamehy azy ary mameno ny rakibolany, nadaboko ilay hevitra ary lasa nivily tany amin’ny zavatra hafa.

Na dia izany aza, dia tsy nadaboka tanteraka ilay hevitra : notsòriko hatrany ny fitsipi-pitenenana mikasika ny matoanteny (fanafoanana ireo matoanteny tsy manara-pitsipika) , ny anarana iombonana (fanaovana ny anarana ho karazana tokana, ary isa roa), ary fanampiana tovona aman-tovana vaovao izay manome ny maha-izy azy manokana.

Raha ny rakibolany no tena jerena dia hita fa sahala ny fiteny esperanto ilay fiteny. Anefa rehefa vakiana ny famaritana somary lava ary ny lahatsoratra lavalava voasoratra amin’ilay fiteny, dia hita fa samihafa tanteraka.

Ankehitriny izao ny tenako mbola am-panoratana ny rakibolana ary ny fitsipi-pitenenana. Izay mbola tsy vita noho ny fiovan’ny endrik’ilay fiteny matetika, noho ny fipoiran’ny hevitra vaovao…