[Mediawiki-i18n] An update on localisation in MediaWiki (2008)

Siebrand Mazeland s.mazeland at xs4all.nl
Thu Jan 1 15:54:40 UTC 2009

On 31 December 2007 I sent an e-mail, to which this is a follow up[1].

First things first, because not everyone reads e-mails completely:
* MediaWiki localisation (that is the translation of English source messages
  other languages) depends on you! If you speak a language other than
  care about this and like translating, go to http://translatewiki.net,
  a user and start contributing translations for MediaWiki messages and
  extension messages. When your localisation is complete, keep coming back
  regularly to re-complete it and do quality control. Thank you in advance
  all your contributions and effort.
* The i18n and L10n area of MediaWiki requires continuous efforts. If this
  of FOSS has your interest, please do not hesitate and offer your
  skills to further MediaWiki's i18n and L10n capabilities[5,6].

All statistics are based on MediaWiki 1.14 alpha, SVN version r45277
(1 January 2009). Comparisons are to MediaWiki 1.12 alpha, SVN version
(31 December 2007).

* Localisation or l10n - the process of adapting the software to be as
  as possible to a specific locale (in scope)
* Internationalisation or i18n - the process of ensuring that an application
  capable of adapting to local requirements (out of scope)

MediaWiki has a user interface (UI) definition for 348 languages (up from
Of those languages at least 26 language codes are duplicates and/or serve a
purpose for usability[2]. Reporting on them, however, is not relevant. So
MediaWiki in its current state supports 322 languages (up from 302). To be
able to generate statistics on localisation, a MessagesXx.php file should be
present in languages/messages. There currently are 326 such files (up from
262), of which 27 are redirects from the duplicates/usability group or just
empty[3]. So MediaWiki has an active in-product localisation for 299
(up from 236).

The MediaWiki core product recognises several collections of localisable
content (three of which are defined in messageTypes.inc):
* 'normal' messages that can be localised (2168 - up 26% from 1726)
* optional messages that can be localised, which usually only happens for
  languages not using a Latin script (173 - up 7% from 161)
* ignored messages that should not be localised (149 - up 49% from 100)
* namespace names and namespace aliases (17 - no change)
* magic words (132 - up 10% from 120)
* special page names (86 - up 13% from 76)
* other (directionality, date formats, separators, book store lists, link
  trail, and others)

Localisation of MediaWiki revolves around all of the above. Reporting is
on the normal messages only.

MediaWiki is more than just the core product. On
http://www.mediawiki.org/wiki/Category:All_extensions some 1200 extensions
60% from 750) have some kind of documentation. This analysis will scope only
to the code currently present in svn.wikimedia.org/svnroot/mediawiki/trunk.
The source code repository contains give or take 370 extensions (up 61% from
230). Of those 370 extensions, about 300 contain messages that can be
in the UI in some use case (debugging excluded). Out of those 300, about 35
have an exotic implementation for localisation support, no localisation at
(just English text in the code), or are outdated, broken or obsolete.
Compared to last year, when there were about 5 different 'standard'
implementations of i18n in extensions, the situation has changed a lot. The
vast majority of extensions now make use of $wgExtensionMessagesFiles and
wfLoadExtensionMessages. Currently some 6,000 messages for extensions can be
localised in a consistent way (up 200% from 2,000).

==MediaWiki localisation in practice==
Ways to to MediaWiki localisation have not changed a lot in the past year.
Still, the changes that have taken place have a profound impact on the
and volume of localisation for MediaWiki.
* in local wikis: we have scavenged all Wikimedia wikis for existing
  translations, imported those in the base product and tried to recruit
  translators to audit and extend the centralised localisation. This project
  has been very succesful, and aside from a few exceptions, local wikis now
  customise and no longer do base localisation.
* through bugzilla/svn: A user of MediaWiki submits patches for core
  and/or extensions. These users are mostly part of a wiki community that is
  part of Wikimedia. Compared to last year, this group of localisation
  contributors has decreased. The maintainer for German for example started
  working in Betawiki, because he stated that he was no longer able to keep
  up with the workload. Languages that remain getting (very) frequent
  through subversion are Danish, Hebrew, and Chinese (4 variants). The
  of localisations maintained this way has dropped from more than 10 to
  6. Localisation updates submitted through Bugzilla are virtually
* through Betawiki: In the past year Betawiki has about doubled in size
  (translators, translations, supported products, traffic, etc.). 95% or
  of the localisation volume for MediaWiki goes through this wiki, and a lot
  of development has been done on the Translate extension in the past year.
  Betawiki staff remains committed to MediaWiki i18n and L10n, and still has
  strong belief in collaborative localisation.

==The professional amateur approach==
2008 was also the year in which MediaWiki localisation got outside stimuli.
grant given to Stichting Open Progress[8] by Hivos[9] enabled us to provide
bounties for less resources languages[10], and enabled us to have an end of
translation rally[11]. Niklas Laxström[12] participated in the Finnish
Coding Project[13], which led to a more feature rich and usable Translate
extension, allowing translators to be more productive. We (Betawiki staff
Stichting Open Progress) intend to continue trying to get funding to improve
language support for MediaWiki and FOSS in general. If you think you can
in any way to achieve this goal, please do no hesitate to contact any of us.

Multiple developers contribute on i18n and L10n features. Most, if not all
features that are added to subversion these days are audited for i18n or
L10n omissions. These are usually corrected quickly after being discovered.
Core messages and extension messages have been made more consistent. These
are ongoing processes, that we need your help and awareness for[5,7].

==MediaWiki localisation statistics==
Per end of 2007 MediaWiki localisation has no longer only focused on a
translation of core messages, but also on messages used in the most often
cases. This resulted in a set of about 25% of the MediaWiki core messages
really have to be translated before a language is really usable with
Because software like MediaWiki is ever changing, an updated of this list
be released in the coming week[4].

Daily statistics for MediaWiki and extension localisation are created at
http://translatewiki.net/wiki/Translating:Group_statistics. Last year, some
(arbitrary) milestones have been set for four collections of MediaWiki
related messages. For the usability of MediaWiki in a particular language,
the group 'core most used' is the most important. A language must qualify
MediaWiki to have 'minimal support' for that language in the first group.
Reaching further milestones indicates the maturity of a localisation:
* core most used (485 messages): 98%
* core (2,168 messages): 90%
* wikimedia extensions (1,067 messages): 90%
* extensions (6,013 messages): 65%

Currently the following numbers of languages have passed the above
* core most used: 109 (33.9% of supported languages - up 132% from 47)
* core: 68 (21.1% of supported languages - up 39% from 49)
* Wikimedia extensions: 36 (11.2% of supported languages - up 260% from 10)
* extensions: 21 (6.5% of supported languages - up 200% from 7)

As you can see, the changes in the past year are gigantic. And these changes
have been accomplished even though many messages have disappeared from and
have and been added to all the message groups. Currently MediaWiki core
contains 303,863 messages (up 77% from 171,261 ultimo 2007).

So... Is MediaWiki doing well on localisation? Just like last year, my
opinion is that we do a proper job, but can still do a lot better. Observing
that there are more than 250 Wikipedias that all use the Wikimedia Commons
media repository, and that 109 languages have a minimal localisation, there
a lot of room for improvement. With the Wikimedia Foundation using Single
Login, MediaWiki must do better.

Last year I mentioned a few example cases of languages that did very well,
also a few language that didn't do well. So what happened there? Well, Hindi
got a boost, but sank away as messages were added without an active
Asturian and Extremaduran have no active maintainers. Bikol Central is not
doing to bad, and Lower Sorbian and Galician are doing great. Languages from
Asia have heavily improved their MediaWiki localisation last year. But where
are the languages from Africa? In the past year we have seen steady
contributions for Amharic and Afrikaans, some for Swahili, Wolof and Yoruba,
but all in all, just not enough to provide (native) speakers with a user
interface in their own language.

With the Wikimedia Foundation aiming to put MediaWiki to good use in
countries and products like NGO-in-a-box that include MediaWiki, the
of MediaWiki as a tool in creating and preserving knowledge in the languages
the world is huge. We have to tap into that potential and *you* (yes, I am
you read this far and are now reading my appeal) can help. If you know
that are proficient in a language and like contributing to localisation,
point them in the right direction. If you know of organisations that can
localise MediaWiki: please approach them and ask them to help.

We have all the tools to successfully localise MediaWiki into any of the
or so languages that have been classified in ISO 639-3. We only need one
per language to make and effort and make it happen. Reaching the first
milestone (core most used) takes about six hours of work. Using Betawiki or
gettext file, little to no technical knowledge is required.

This was the pitch, basically the same as in 2007, but with more experience
data. Three of the four goals I set in last years' e-mail have not been
reached. I did not take into account how rapidly MediaWiki would grow, or
quickly we could standardise i18n implementation for extensions. Goals for
MediaWiki localisation per end of 2009 remain largely the same as for 2008:
* core most used: 130 languages with 98% or more localised
* core: 90 languages with 90% or more localised
* wikimedia extensions: 50 languages with 90% or more localised
* extensions: 30 languages with 65% or more localised

I would like to wish everyone involved in any aspect of MediaWiki a
wonderful 2009.


Siebrand Mazeland

[4] http://translatewiki.net/wiki/Most_often_used_messages_in_MediaWiki
[5] i18n Bugzilla issues:
[6] Translate extension bugs and feature requests:
[7] http://translatewiki.net/wiki/Support
[8] http://www.openprogress.org/Stichting_Open_Progress
[9] http://www.hivos.nl/eng
[10] http://translatewiki.net/wiki/Translating:Language_project
[11] http://translatewiki.net/wiki/Betawiki:News/Newsletter_2008-12-2
[12] http://nike.fixme.fi/blag/
[13] http://www.coss.fi

More information about the Mediawiki-i18n mailing list