[Mediawiki-i18n] Internationalizing project names

Erik Bernhardson ebernhardson at wikimedia.org
Wed Sep 16 15:59:46 UTC 2015


On Tue, Sep 15, 2015 at 11:13 PM, Gerard Meijssen <gerard.meijssen at gmail.com
> wrote:

> One question, when you search for  ''Ревест-Сен-Мартен", why did you not
> consider every language that uses the Cyrillic script? It is as likely to
> find something in Serbian, Macedonian, Belarusian etc ...
> Thanks,
>      GerardM
>
>
The rest of the discussion is happening on the phab ticket, but i'll answer
this here.  We are using a language detection algorithm that has been
trained against tweets. Tweets are not, on average, as short as  the
searches we are detecting the language of but it does an ok job. Trey did a
great job putting together an analysis[1] of this language detection algo.
We will also be using his work there to evaluate other language detection
methods and perhaps change what we are using in the future.

So the short of it is, we chose russian instead of serbian because the
machine learning algorithm said so.

[1]
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_Evaluation
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/mediawiki-i18n/attachments/20150916/d959771e/attachment.html>


More information about the Mediawiki-i18n mailing list