Re: Unified Canadian Syllabics

Rick McGowan Mon, 09 Feb 2004 19:33:43 -0800

Chris --

Note: I am not speaking officially, just giving my opinions.

> http://www.languagegeek.com/issues/ucas_unicode.html

Sorry I have no opinions at all about the major questions you are asking on the above page, and we probably need to involve some experts. The source documents for the UCAS encoding are not in electronic form, and most people couldn't put their hands on any. Ken Whistler might have one around, and somewhere buried in my garage is a copy of the large format final report by the "Canadian Aboriginal Syllabics Encoding Committee" (CASEC 1994) that led to the encoding some years ago for Unicode 3.0. One operative word for the encoding is "unified", and the committee expended a great deal of effort to get buy-in and to unify unifiable things, based on shapes (and names were a problem there, of course since the same shapes can be used for completely unrelated syllables between languages). I'm rather surprised at the extent of your comments regarding missing characters and/or suggestions for different methods of encoding things that haven't been brought up before. I do wonder how much of this *could* be accounted for by explicit unifications made by the UCAS encoding committee.

On your web pages you say:

> Please download the Aboriginal Serif Unicode font to view these pages
> properly

While it's wonderful to see someone making and distributing (for free even) such a comprehensive font for North Am languages, I personally have grave reservations about the extended use of private use (PUA) space in the font and on your web pages. It's a bit horrifying, really. The syndrome of "please use our proprietary encoding and font to view these pages" is one of the diseases that Unicode is designed to get rid of. At least some of your web pages are UTF-8 encoded, and yet they can't be viewed with other "compliant" fonts like Code2000 because you're making extensive use of PUA characters -- even for things easily found in the encoding. One example: a case where you use î U+E292 instead of á U+144E; this occurs in the line labelled "North Slavey" on page http://www.languagegeek.com/syl/languages.html). I've attached below this an example: the two lines show your example rendered with your font, above, and with Code2000 below; I'm speaking of the 5th encoded character on each line. Note that your encoding is also apparently inconsistent.

It's totally understandable that your pages would contain *some* small number of PUA sequences in UCAS-languages for which encodings are still lacking; but you make no mention of this fact anywhere: you're portraying the entire thing as if it's standard Unicode, and it's demonstrably not. In fact, it has a lot of serious errors, apparently. Casual observation shows at least several instances where you are not using a character that exists in the encoding, but are instead using a PUA entity, as in my example above.

Now, to be fair, this could be partially the result of some failure on the part of the Unicode Standard to fully explain the usage of UCAS characters; but the first sentence of the UCAS block intro should spell out the intent clearly: "The characters in this block are a unification of various local syllabaries of Canada into asingle repertoire based on character appearance."

Then there are the Latin & other glyphs... The way to approach most of the pre-composed Latin glyphs that you put in private use zone is to have proper Truetype or Opentype tables for constructing appropriate glyphs from Unicode sequences. For example, the Unicode character sequence <U+00C6, U+0303> should produce the glyph you have encoded at U+F682 in the PUA. And your font should have tables for doing that.

While it is OK theoretically to put anything at all in the PUA, it is very bad practice to put *this* kind of thing into the PUA -- both the pre-composed Latin and the re-encoding of already encoded syllabics -- and this practice will definitely lead to non-interchangeable data leaking out into the world.

You really should fix this font, and the encoding of your pages, and warn your users very strongly against use of the PUA encodings in existing versions of the font. And if you provide any data converters from older versions, you should provide converters to a corrected encoding as well. I can only hope the font so far isn't widely distributed, so that damage from naive users blindly using PUA data that assume these encodings isn't much of a problem.

Note: I am not speaking officially, just giving my opinions.

Rick

ucas-ex.gif
Description: Binary data

Re: Unified Canadian Syllabics

Reply via email to