[Mediawiki-i18n] Please view and comment CAPTCHA images in 154 languages
Federico Leva (Nemo)
nemowiki at gmail.com
Tue Apr 1 22:30:42 UTC 2014
Today I made a couple patches that should address most of the problems
reported as well as handle RTL languages and multilingual blacklist. I'm
mostly using some Unicode magic which is quite well hidden in some
obscure libraries, we'll see if it works. :)
In case it's not clear, for now I'm focusing on the *MediaWiki* side of
the matter; the Wikimedia side, i.e. where to use what and how, is
something we'll worry about when we actually have this option (or
others) available in the codebase.
A couple questions below.
P. Blissenbach, 31/03/2014 17:13:
> captchas having two lines
> of identcal text [...] and accept either input.
This would need to be filed as separate enhancement request.
Shimmin, 31/03/2014 20:02:
> If you actually want the captchas to make any sense in terms of word
> combination and construction, that would be a whole different issue.
> There's inflection, rules on what happens when words are run together
> (spelling changes for one), and so on.
I suppose you're only talking of the morphological side here, right? The
current patch contains a couple lines to handle hyphenation for Finnish,
because it was originally provided by Nikerabbit, but we're definitely
not going to build a universal grammar of univerbation in a MediaWiki
script. Unless someone comes up with a general solution I think we'll
drop that part.
If this turns out to be confusing, I'd rather just show the two (or N)
words as separate words, what do you think? This can be done in a
separate patch; once we introduce some other security improvements, I
think the challenge of identifying where one word ends and the next
starts may be redundant.
>
> Quite a few of the l look like i in this font, which seems problematic.
This is indeed a problem with sans serif fonts but the broad majority
thinks they are better. We can try to pick clearer fonts but most help
will come from words being familiar to humans. I may upload more tests
with this font, though: https://commons.wikimedia.org/wiki/File:AndBasR.pdf
> Should this be "leigh"?
Yes. If incorrect, please edit: https://en.wiktionary.org/?oldid=23059687
>
> Looks like "neuscanshoil" with a random -y added, a hangover from
> English behaviour?
Same problem as with Malayam and others; the last version will avoid
combining single letters to other words.
>
> [...]
> (though Aaue is a proper name) [...]
>
> Perick is also a proper name [...]
Do others think proper names are a problem? If yes they might be easy
enough to remove, usually they're tagged as such on Wiktionary.
Otherwise, this adds some cheap variety in our dictionaries.
>
> The form "vaayl" is a rare grammar-induced form of an unusual word
In this case it's again a proper noun, no idea how correct or how
current: <https://en.wiktionary.org/?oldid=21902154>
>
> Hard to read, could be "hiu shee" or "niu shee"
It was "hiu": no "niu" in our dictionary. If the latter is a valid word,
you should add it to Wiktionary and then we can try to figure out
something to exclude confusable words.
Once again, the proposed approach is to rely on a mix of Unicode magic
and self-healing (wiki) dictionary. Neither is enough alone.
>
> This one means "arctic castration" (spoiy = castration). Not obscene,
> but maybe not for everyone?
Well, it could fall under "obscene" for some definition of the word. I'm
now blacklisting also "pejorative" and "offensive" words, those who care
can try and see if their label edits survive on the wiki.
https://en.wiktionary.org/wiki/Wiktionary:Context_labels
Nemo
More information about the Mediawiki-i18n
mailing list