[Translators-l] Fwd: [Wikisource-l] An it.source gadget to manage diacritics

Philippe Verdy verdy_p at wanadoo.fr
Thu Aug 3 23:38:22 UTC 2017

It is worth mentioning that diacritics are not the only decomposable
characters, and that Korean Hangul syllables are also decomposable
algorithmically, which could be used to avoid retyping a syllabic cluster
after 2 jamos (a leading consonnant and a vowel) or 3 jamos (a leading
consonnant, a vowel and a trailing consonnant) have been composed.
Also, I hope that the method will recompose the characters to NFC form once
they have been edited and the selection caret goes to another position.

A simple way to do this: don't add any new button, but just press
Alt+Backspace to remove only the last character in the NFD form of a
character and recompose it immediately after the deletion. This way the
text in the edtable buffer is always in NFC form, the NFD form is only used
internally and temporarily when handling the Alt+Backspace key (which may
be repeated and should be able to remove even a non-decomposable character).

Final notes:
* some characters that are composed to NFC are no longer decomposable by
the NFD form, because this decomposition is prohibited in NFD form as well.
This is the case for "overstriking" diacritics like the slash when they
occur in some canonical composition pairs, or a few "compatibility"
diacritics whose decomposition is possible but are not recomposable with
another character than a base letter, where one of them will compose but
not the other one which will remain after the composed character.
* And ideally when entering any composable diacritic or a Hangul vowel
jamo or Hangul traiing consonnant jamo anywhere, the character(s) before
them should be check to see if this forms a NFC composition. In some case
you'll need to look backward over possibly long sequences because of
canonical reordering (but never more than 254 codepoints given that
reordering can only occur in sequences of diacritics with distinct non-zero
combining classes, and there cannot be more than 254 classes; in fact there
are not even 254 classes assigned in Unicode; canonical reordering also
never occurs in Hangul syllables between jamos, and their composable
sequences are limited to 2 or 3; so you don't need to scan backward over
large buffers, this will still be very fast during input)
* There's no easy way to select an isolated diacritic, but the
Alt+Backspace keystroke that drops a diacritic cuold place it in the
clipboard, to allow pasting it somewhere else: press Alt+Backspace than
CTRL+Z to cancel the deletion, the diacritic is in the clipboard and you
can paste it easily anywhere else: it can be useful to fix a text where not
all diacritics have been entered (useful notably for Arabic or Hebrew): it
will be faster than using long palettes of letters with diacritics, and
instead of using palettes with precomposed characters, only isoalted
diacritics and Hangul vowels or trailing consonnants would be placed in the
palette. ==> This would greatly improve the usability for Latin as well: we
would show only the base letters (A-Z and a-z Basic Latin could be dropped
from the palette, or hidden by default, as they are on all keyboards and
never difficult to enter, but additional letters will be useful such as the
open o; the diacritics would have more space to be selectable, starting by
the most frequent ones: acute, grace, diaeresis, circumflex,
cedilla, caron, macron, hacek, dot above, and hook; if the palette has a
setup for a particular language, it should still show its "natural"
alphabet, and then its own diacritics, before listing other rare

2017-08-04 0:34 GMT+02:00 Pine W <wiki.pine at gmail.com>:

> I haven't tried this, but it sounds like translators and Language
> Engineering folks might be interested.
> Pine
> ---------- Forwarded message ----------
> From: Alex Brollo <alex.brollo at gmail.com>
> Date: Tue, Aug 1, 2017 at 12:10 AM
> Subject: [Wikisource-l] An it.source gadget to manage diacritics
> To: wikisource list <wikisource-l at lists.wikimedia.org>
> Just to let it known, some it.source contributors are using a comfortable
> gadget to manage diacritics - it can delete, replace or add a pretty large
> list of diacritical marks to any character with a single click.
> It uses .normalize() string method, so decomposing-recomposing (when
> possible) unicode characters and allowing to manage diacritics alone
> indipendently from base ascii character.
> Perhaps is this gadget   "rediscovering the wheel"....? Anyway, the code
> is here: https://it.wikisource.org/wiki/MediaWiki:Gadget-pulsan
> ti-diacritici.js
> Alex brollo
> _______________________________________________
> Wikisource-l mailing list
> Wikisource-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
> _______________________________________________
> Translators-l mailing list
> Translators-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/translators-l
-------------- section suivante --------------
Une pièce jointe HTML a été nettoyée...
URL: <https://lists.wikimedia.org/pipermail/translators-l/attachments/20170804/e5866a1f/attachment.html>

More information about the Translators-l mailing list