[Mediawiki-i18n] Providing the effective language of messages

Purodha Blissenbach purodha at blissenbach.org
Wed Apr 13 10:13:00 UTC 2016


Hi Arian,

your diagnosis is completely right. Btw. I've filed some bugs for this
kind of mess since few years. Things gradually improved :-(

Imho, the message object needs to be enabled to return a direction and 
a
language code (BCP 47 code, to be more precise) that reflects the true
value for fallback messages etc. Currently I do not see a real us case
for the question "Is this message a fallback message" but I bet, 
someone
will find one, so I suggest to make that queriable, too.

LocalisationCache should keep "is_fallback" and "has language code X" 
for
messages. Alternatively for the latter, a pointer to to a language 
object
might do as well.

We do not have a chance to produce correct HTML with mixed languages,
if we do not even know what language a string is in. We must however,
in all instances of language strings,
- either check the DOM for the current language, and enclose messages
   in a proper language wrapper if needed, or
- emit language wrappers unconditionally and have tidy clean them up.

Purodha

On 12.04.2016 13:01, Adrian Heine wrote:
> Hi everyone,
>
> as some of you might know, I'm a software developer at Wikimedia
> Deutschland, working on Wikidata. I'm currently focusing on improving
> Wikidata's support for languages we as a team are not using on a 
> daily
> basis. As part of my work I stumbled over a shortcoming in 
> MediaWiki's
> message system that – as far as I see it – prevents me from doing the
> right thing(tm). I'm asking you to verify that the issue I see indeed
> is an issue and that we want to fix it. Subsequently, I'm interested
> in hearing your plans or goals for MediaWiki's message system so that
> I can align my implementation with them. Finally, I am hoping to find
> someone who is willing to help me fix it.
>
> == The issue ==
>
> On Wikidata, we regularly have content in different languages on the
> same page. We use the HTML lang and dir attributes accordingly. For
> example, we have a table with terms for an entity in different
> languages. For missing terms, we would display a message in the UI
> language within this table. The corresponding HTML (simplified) might
> look like this:
>
> <div id="mw-content-text" lang="UILANG" dir="UILANG_DIR">
>   <table class="entity-terms">
>     <tr class="entity-terms-for-OTHERLANG1" lang="OTHERLANG1"
> dir="OTHERLANG1_DIR">
>       <td class="entity-terms-for-OTHERLANG1-label">
>         <div class="wb-empty" lang="UILANG" dir="UILANG_DIR">
>           <!-- missing label message -->
>         </div>
>       </td>
>     </tr>
>   </div>
> </div>
>
> This works great as long as the missing label message is available in
> the UI language. If that is not the case, though, the message is
> translated according to the defined language fallbacks. In that case,
> we might end up with something like this:
>
> <div class="wb-empty" lang="arc" dir="rtl">No label defined</div>
>
> That's obviously wrong, and I'd like to fix it.
>
> == Fixing it ==
>
> For fixing this, I tried to make MessageCache provide the language a
> message was taken from [1]. That's not too straight-forward to begin
> with, but while working on it I realized that MessageCache is only
> responsible for following the language fallback chain for database
> translations. For file-based translations, the fallbacks are directly
> merged in by LocalisationCache, so the information is not there
> anymore at the time of translating a message. I see some ways to fix
> this:
>
> * Don't merge messages in LocalisationCache, but perform the fallback
> on request (possibly caching the result)
> * Tag message strings in LocalisationCache with the language they are
> in (sounds expensive to me)
> * Tag message strings as being a fallback in LocalisationCache (that
> way we could follow the fallback until we find a language in which 
> the
> message string is not tagged as being a fallback)
>
> What do you think?
>
> [1] https://gerrit.wikimedia.org/r/282133
>
> Thanks,
> --
> Adrian Heine né Lang
> SOFTWARE DEVELOPER
>
> Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Phone: +49 (0)30 219 158 26-0
> http://wikimedia.de
>
> Imagine a world, in which every single human being can freely share
> in the sum of all
> knowledge. That‘s our commitment.
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. 
> V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der
> Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> _______________________________________________
> Mediawiki-i18n mailing list
> Mediawiki-i18n at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n




More information about the Mediawiki-i18n mailing list