Re: The result of the plane 14 tag characters review.

George W Gerrity Sun, 17 Nov 2002 18:04:04 -0800

At 08:49 -0700 2002-11-14, John H. Jenkins wrote:

On Wednesday, November 13, 2002, at 12:07 AM, George W Gerrity wrote:
In an effort to unify all character and pictographs, the decision was made to unify CJK characters by suppressing most variant forms. That turns out to be the single greatest objection from users -- especially Japanese -- and somehow we need a low-level way of indicating the target language in the context of multilingual text.

The plane 14 tags seem to be appropriate to do this, giving a hint to the font engine as to a good choice of alternate glyphs, where available.

A couple of points.

1) There are two kinds of variant problems coming out from Unihan. The way objections are stated based on these variant problems is, respectively:

Japanese readers will be forced to read Japanese text with Chinese glyphs!

and

Mr. Watanabe won't be able to insert the variant glyph for his name that he prefers into a document!

The first objection is, and always has been, a non-issue, and is the only aspect of the problem that the Plane 14 tags could hope to deal with. The issue is not a language one, but a locale one, to begin with.

Yes, although language and region can be encoded, as in en-us, or en-uk. The reason for providing and encoding is in multilingual texts, where one would hope that in each case, the rendering is appropriate. A good example is the production of multilingual manuals, which seem to be more and more common these days. I agree that in this example, higher-level markup would do all that is necessary.

Moreover, the typical practice in Japanese typography (at least) is to use Japanese-preferred glyphs even when displaying Chinese text. Japanese users do *not* expect the text to switch back-and-forth between Chinese and Japanese glyphs as the language varies.

How do Chinese feel about this? They might find it objectionable to have to read Chinese in Japanese glyphs in a multilingual document.

Given this, the best solution to the problem is to use fonts aimed at the specific locale. This means that a Japanese user who goes to read her email at an Internet café in Hong Kong may see things unexpectedly, true, but it really handles 99.99+% of the problem.

...

The second objection could not be solved by the Plane 14 tags. The two solutions that are possible are to separately encode every glyphic variant which someone, somewhere, sometime may find necessary to distinguish in plain text, or to use variant markers. It is the latter solution which the UTC has adopted.

2) From a technical standpoint, the Plane 14 tags do not really lend themselves to use with the main complex script font engines available. I don't know enough about Graphite to really speak to it, but in the case of OpenType and AAT it is true that protocols are already available to use Japanese/SC/TC/Korean/Vietnamese glyphs for a run of text. These existing protocols, however, depending on information external to the text itself.

To keep the information internal to the text, or, more accurately, internal to the glyph stream, one would have to have the ability to enter a state once a certain character (or glyph) is encountered and remain in that state indefinitely. Neither OpenType nor AAT allow this. OpenType does not use a state engine internal to the glyph stream for processing, and AAT resets the state at the beginning of each line.

How do they handle bidi?

What would have to happen is that the rendering engine would have to find these characters within the text stream, massage the text data so as remove them and mark the text with the equivalent higher-level information, and then render the result.

The problem here is that the libraries such as Uniscribe and ATSUI which provide Unicode rendering do not deal with the text as a whole (at least, this is definitely true with ATSUI and is probably true with Uniscribe, although I don't know for sure). That is, the Plane 14 tag may be found in the first paragraph of the text, but when the client hands the text off to the library, they may hand off only a later portion because that's all that needs to be drawn. The library then does not have access to this information and will not render the text correctly.

This basically means that the onus is on the client to parse the presence of these tags in the text and make appropriate adjustments when it hands off the text to Uniscribe or ATSUI for rendering. As such, there is no real advantage gained by having these tags embedded directly in the text over having them in the same layer as font, point size, and other typographic preferences. Indeed, it becomes inconvenient to have them in a different layer as it means that the client has to do *two* levels of processing to derive this information, rather than just one.

Thank you. This clarification of the way the renderers work is very helpful in understanding why plane 14 tags are relatively useless, but it confuses me as to how the bidi algorithm can work: it certainly requires state be kept at the rendering level.

George

Re: The result of the plane 14 tag characters review.

Reply via email to