[Bug 459991] Re: Unicode Phoencian block, 1090X, not displayed in correct direction

Phil Stone Fri, 14 May 2010 11:56:00 -0700

Let me give a few closing comments for anyone that may find this
bug/thread in the future and wonder what happened.


This was one of several bugs I found while investigating the Phoenician
unicode block.  I was setting out to type set a Bible using Phoenician,
since this was the alphabet the Bible was originally written in.  I also
needed a complete tool chain for dealing with this alphabet.  I
eventually did get that done, link below.  If anyone reading this in the
future doesn't have a launchpad account, and needs the Phoenician
resources I'll mention below, please use the contact links from the
following website/page.

http://www.bibletimepress.com/bibles

This Open Office bug was by far the smallest of the bugs I found, though
it was an early one since it was easy to test parts of the needed tool-
chain.

Turns out most Java apps cannot handle either, the key one being
Eclipse, because apparently nobody respects the surrogate pairs that are
used for these code block values.  Surrogate pairs were added after the
original Java language specification was written.  Remember Phoenician
is 1090X, a 20 bit value.  So Eclipse was full of bugs related to syntax
highlighting and editing when any surrogate pair related unicode value
is entered.  Once these values made it onto a line in a file edited by
eclipse the line could no longer be safely edited.

I opened a bug there too, and over the months learned a lot.  The
problem is so pervasive the Eclipse guys seem to think this will never
be solved.  I would add that it will never be solved in Java apps
because it not in the control of the Java team to fix, they've set
surrogate pair standards that nobody follows in practice. Early Java
language educational resources get wrong, so to do most Java
programmers.

I also found that the use of these values in web browsers is not supported 
enough for any practical use, especially server side
fonts and the MS .eot file format does not handle, at least not using open 
source .ttf to .eot conversion tools.  It may be that Windows cannot handle 
anything more than 16 bit unicode, though I don't know for sure.  The system 
for displaying the Unicode value in a box for missing characters does not work 
above 16 bits.

The default font used for Phoenician in the Unicode standard and thus
used in Ubuntu is from the last known historical inscription, about 318
AD, probably the worst choice that could have been made, as this was a
language used for 1800 years earlier using a very different, and much
better, and much more common, letter form.

Kate, the Kubuntu text editor, could not handle these unicode values
either.

Latex, the type setting program, was also unable to handle this range
well, though with some unusual, pre-alpha, macro packages designed
primarily for typesetting the Koran, it came close.

I also found that this particular code block is missing several very
important values, including the most important inter-word separator, so
the block itself is defective. Since they only assigned 5bits, or 32
possible values, there isn't room to fix and also include the missing
vowels.

I also found that this alphabet was originally bi-directional,
boustrophedon.  There is essentially no support, anywhere, for that.
But, that also suggested that many problems would be solved using it
left-to-right in most cases.  This turned out to solve a bunch of
problems, including the ease of learning the language.

My fix was to rebuild the Phoenician code page at 0xEF00, within the 16
bit Unicode private use area, left-to-right, with better choices of
letter placements, including the inter-word space character and
Phoenician vowels.

I have the X keyboard files needed for this block, various fonts and
related tools should someone need to type them, or display them.  The
keyboard layout is designed for English language touch typists, and can
be learned in an hour.

Open Office does work fine with this solution, as does eclipse, as does
Latex (XeTeX), as does Kate, even KMail, and the .ttf to .eot format
converters also work, so too all the browsers that someone might be
using.  (Though it is still not easy to style.)

This solution risks collisions with other private use area code blocks.
That has not been a problem in practice.

Thanks to everyone who looked at this hard problem.  It was the tip of
an iceberg.

Phil

-- 
Unicode Phoencian block, 1090X, not displayed in correct direction
https://bugs.launchpad.net/bugs/459991
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 459991] Re: Unicode Phoencian block, 1090X, not displayed in correct direction

Reply via email to