On 20/08/2003 11:23, Rick McGowan wrote:

This notice is relevant to anyone dealing with programming languages, query
specifications, regular expressions, scripting languages, and similar domains.

The Proposed Draft UTR #31: Identifier and Pattern Syntax will be discussed at
the UTC meeting next week. Part of that document (Section 4) is a proposal for
two new immutable properties, Pattern_White_Space and Pattern_Syntax. As
immutable properties, these would not ever change once they are introduced into
the standard, so it is important to get feedback on their contents beforehand.

The UTC will not be making a final determination on these properties at this
meeting, but it is important that any feedback on them is supplied as early in
the process as possible so that it can be considered thoroughly. The draft is
found at http://www.unicode.org/reports/tr31/ and feedback can be submitted as
described there.

Regards,
        Rick McGowan
        Unicode, Inc.






I'm a little concerned at the implications of counting zero width characters like LRM and RLM as white space. They can easily find their way unnoticed into the middle of patterns e.g. when copying from a text which has added these characters to ensure correct directionality. I wonder if it might be better to add a new category of ignored characters, such that one of these found on its own doesn't count as a separator but it is ignored i.e. treated as part of the white space if found adjacent to white space. Of course the details of this need a little more thought, e.g. does one of these actually count as part of the pattern, but I hope you see what I am getting at.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to