Hello all,

The question of case changing in Greek has come up in another thread.
Whilst the details here aren't XeTeX (or even TeX) specific, given the
interest by members of the list I hope I can take advantage to ask about
the area.

For work on LaTeX3/expl3 we've put together an approach to case changing
in XeTeX (and LuaTeX) that is not tied to a 1-1 mapping.

One of the design ideas behind the code was to allow a way to tackle
context- and language-dependent changes. At the same time, to date we
have used the Unicode docs to define case mappings. Thus the 'standard'
mappings follow those in UnicodeData.txt (1-1 lower/title/upper) and
SpecialCasing.txt (more complex cases).

Included in that 'standard' set up is the final sigma rule for Greek
text. For performance reasons that code has been set up to assume that a
sigma is final if it is followed by a space, a control sequence or a
character from the list

    ) ] } . : ; , ! ? ' "

Other potential additions are welcome as is testing of what we have
done. (There seem to be a lot of edge cases. For example, what happens
if a sigma is immediately followed by a number, say in a computational
identifier.)

What has not been covered at all to date is any special handling of
accents. As indicated in the other thread, it seems that the handling of
accents in Greek is non-trivial. Notable, we have an implementation
which separates out title case from upper case and have the idea of
language-dependent mappings. Thus it would be perfectly possible to have
logic 'Retain accents on the first letter of a word when title casing;
remove them when upper casing'. Similarly, I wonder if there are
differences in practice related to the nature of the text: modern
writing vs. historical text, etc. Again, this can be added if there is a
clear set of rules to follow.

Detailed information is most welcome.
--
Joseph Wright


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Reply via email to