On 11/9/2011 1:18 AM, "Martin J. Dürst" wrote:
I tried to find something like a normative description of the default bidi class of unassigned code points.

In UTR #9, it says (http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types):

Unassigned characters are given strong types in the algorithm. This is an explicit exception to the general Unicode conformance requirements with respect to unassigned characters. As characters become assigned in the future, these bidirectional types may change. For assignments to character types, see DerivedBidiClass.txt [DerivedBIDI] in the [UCD].

The DerivedBidiClass.txt file, as far as I understand, is mainly a condensation of bidi classes into character ranges (rather than giving them for each codepoint independently as in UnicodeData.txt). I.e. it can at any moment be derived automatically from UnicodeData.txt, and is as such not normative.

Why is it then that the default class assignments are only given in this file (unless I have overlooked something)? And why is it that they are only given in comments?

Because the UnicodeData.txt file has no header (for historical compatibility).

Because, like the practice of putting <style> in HTML inside comments, these things (@missing) are in comments to protect older parsers.
I'm trying to create a program that takes all the bidi assignments (including default ones) and creates the data part of a bidi algorithm implementation, but I don't feel confident to code against stuff that's in comments. Any advice? Is it possible that this could be fixed (making it more normative, and putting it in a form that's easier to process automatically)?

I've confidently parsed these comments for years now.

The one things that's worse than parsing these comments is to move to an incompatible scheme.

That said, apparently, for some properties the default information is contained in the PropertyValuieAliases.txt file, where it is inconveniently located for people who want to parse just one property, but conveniently located for those who want to assemble the whole database. (And, worse, where it adds a code-point dependency to the information in that file that wasn't there from the beginning - but at least the @missing syntax hasn't changed too much).

A./

Reply via email to