I have been doing a little research into the defined properties of CGJ. I note also that according to http://www.unicode.org/book/preview/ch03.pdf it is defined in Unicode 4.0 as a "Default Ignorable". Well, I am not surprised that some people are confused because http://www.unicode.org/Public/4.0-Update/UCD-4.0.0.html#Default_Ignorable_Code_Point tells me "For more information, see UAX #29: Text Boundaries <http://www.unicode.org/reports/tr29/>.", but the string "ignorable" is not found in UAX #29. But from a Google search I found http://www.unicode.org/review/pr-5.html, desribed as "/text excerpted from the Unicode Standard/", section number 5.22 given so I suppose this is from the unpublished chapter 5 of Unicode 4.0. According to this, "Default ignorable code points are those that should be ignored by default in rendering (unless explicitly supported)... An implementation should ignore default ignorable characters in rendering whenever it does /not/ support the characters." So my suggestion that a renderer should simply ignore CGJ is far from twisting the requirements of Unicode, it is in fact a requirement of Unicode 4.0 though one that I am hardly surprised that some people have missed.Please look at the definition of GCJ and other such characters. Understand the differences between CGJ and ZWJ/ZWNJ.
This discussion is very disturbing to me because after reading through the L2 document register it is unclear what is the difference between GCJ and ZWJ use.
The fact that you desire a control character to not be treated as such greatly concerns me. This really feels like people are trying to figure out any way to twist existing constructs to avoid fixing the normalization weights. I am alarmed from the implications of putting control characters in place to somehow subvert the normalization.
In an ideal world we would simply correct these values. However, it has been strongly communicated by the UTC that this cannot be done without jeoparizing stability agreements with IETF. Peter Constable has posted a document in the register on this topic that suggests a duplication of characters as a solution.
Can we please have this topic put on the agenda for the next meeting of the UTC?
Regards,
Paul
The internal process by which a particular renderer implements ignoring a glyph is a matter for a particular implementation. John Hudson and I have suggested a mechanism for doing this with Uniscribe by treating the character internally as a normal character with a blank glyph and always ligating it with the preceding character. There may be other mechanisms which are cleaner. But in any case it seems to be a requirement not just for fixing this Hebrew problem but for conformance with Unicode as a whole that some such mechanism is implemented, so that CGJ is ignored by the renderer unless some specific behaviour is defined. In the case of rendering Hebrew, there seems to be no pressing need to define specific behaviour as the default is at least close to what is required.
-- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/