[Mediawiki-i18n] Providing the effective language of messages

Adrian Heine adrian.heine at wikimedia.de
Tue Apr 12 11:01:40 UTC 2016

Hi everyone,

as some of you might know, I'm a software developer at Wikimedia 
Deutschland, working on Wikidata. I'm currently focusing on improving 
Wikidata's support for languages we as a team are not using on a daily 
basis. As part of my work I stumbled over a shortcoming in MediaWiki's 
message system that – as far as I see it – prevents me from doing the 
right thing(tm). I'm asking you to verify that the issue I see indeed is 
an issue and that we want to fix it. Subsequently, I'm interested in 
hearing your plans or goals for MediaWiki's message system so that I can 
align my implementation with them. Finally, I am hoping to find someone 
who is willing to help me fix it.

== The issue ==

On Wikidata, we regularly have content in different languages on the 
same page. We use the HTML lang and dir attributes accordingly. For 
example, we have a table with terms for an entity in different 
languages. For missing terms, we would display a message in the UI 
language within this table. The corresponding HTML (simplified) might 
look like this:

<div id="mw-content-text" lang="UILANG" dir="UILANG_DIR">
   <table class="entity-terms">
     <tr class="entity-terms-for-OTHERLANG1" lang="OTHERLANG1" 
       <td class="entity-terms-for-OTHERLANG1-label">
         <div class="wb-empty" lang="UILANG" dir="UILANG_DIR">
           <!-- missing label message -->

This works great as long as the missing label message is available in 
the UI language. If that is not the case, though, the message is 
translated according to the defined language fallbacks. In that case, we 
might end up with something like this:

<div class="wb-empty" lang="arc" dir="rtl">No label defined</div>

That's obviously wrong, and I'd like to fix it.

== Fixing it ==

For fixing this, I tried to make MessageCache provide the language a 
message was taken from [1]. That's not too straight-forward to begin 
with, but while working on it I realized that MessageCache is only 
responsible for following the language fallback chain for database 
translations. For file-based translations, the fallbacks are directly 
merged in by LocalisationCache, so the information is not there anymore 
at the time of translating a message. I see some ways to fix this:

* Don't merge messages in LocalisationCache, but perform the fallback on 
request (possibly caching the result)
* Tag message strings in LocalisationCache with the language they are in 
(sounds expensive to me)
* Tag message strings as being a fallback in LocalisationCache (that way 
we could follow the fallback until we find a language in which the 
message string is not tagged as being a fallback)

What do you think?

[1] https://gerrit.wikimedia.org/r/282133

Adrian Heine né Lang

Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0

Imagine a world, in which every single human being can freely share in 
the sum of all
knowledge. That‘s our commitment.

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg 
unter der
Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.

More information about the Mediawiki-i18n mailing list