On 23/07/2003 03:20, Paul Nelson (TYPOGRAPHY) wrote:

Please look at the definition of GCJ and other such characters.
Understand the differences between CGJ and ZWJ/ZWNJ.

This discussion is very disturbing to me because after reading through
the L2 document register it is unclear what is the difference between
GCJ and ZWJ use.

The fact that you desire a control character to not be treated as such
greatly concerns me. This really feels like people are trying to figure
out any way to twist existing constructs to avoid fixing the
normalization weights. I am alarmed from the implications of putting
control characters in place to somehow subvert the normalization.

In an ideal world we would simply correct these values. However, it has
been strongly communicated by the UTC that this cannot be done without
jeoparizing stability agreements with IETF. Peter Constable has posted a
document in the register on this topic that suggests a duplication of
characters as a solution.

Can we please have this topic put on the agenda for the next meeting of
the UTC?

Regards,

Paul





I have been doing a little research into the defined properties of CGJ. I note also that according to http://www.unicode.org/book/preview/ch03.pdf it is defined in Unicode 4.0 as a "Default Ignorable". Well, I am not surprised that some people are confused because http://www.unicode.org/Public/4.0-Update/UCD-4.0.0.html#Default_Ignorable_Code_Point tells me "For more information, see UAX #29: Text Boundaries <http://www.unicode.org/reports/tr29/>.", but the string "ignorable" is not found in UAX #29. But from a Google search I found http://www.unicode.org/review/pr-5.html, desribed as "/text excerpted from the Unicode Standard/", section number 5.22 given so I suppose this is from the unpublished chapter 5 of Unicode 4.0. According to this, "Default ignorable code points are those that should be ignored by default in rendering (unless explicitly supported)... An implementation should ignore default ignorable characters in rendering whenever it does /not/ support the characters." So my suggestion that a renderer should simply ignore CGJ is far from twisting the requirements of Unicode, it is in fact a requirement of Unicode 4.0 though one that I am hardly surprised that some people have missed.

The internal process by which a particular renderer implements ignoring a glyph is a matter for a particular implementation. John Hudson and I have suggested a mechanism for doing this with Uniscribe by treating the character internally as a normal character with a blank glyph and always ligating it with the preceding character. There may be other mechanisms which are cleaner. But in any case it seems to be a requirement not just for fixing this Hebrew problem but for conformance with Unicode as a whole that some such mechanism is implemented, so that CGJ is ignored by the renderer unless some specific behaviour is defined. In the case of rendering Hebrew, there seems to be no pressing need to define specific behaviour as the default is at least close to what is required.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to