RE: Proposed Draft UTR #31 - Syntax Characters

Marco Cimarosti Tue, 26 Aug 2003 12:07:29 +0000

I posted my feedbacks through the report forms. The text of the two posts is
attached.


(I considerably shortened the list of non-Latin punctuation marks that I
suggest to exclude from identifiers, although I added two of the Hebrew
punctuation marks suggested by Kirk.)

_ Marco

Feedback on UTR#31 (draft 1): Full/Half-Width Characters.

I suggest that all compatibility character which are labelled <wide>, <narrow> and 
<small> and whose compatibility decompositions is already in class <Pattern_Syntax> be 
added in class <Pattern_Syntax> as well.

In practice, I am suggesting to add the following lines to section "4.1 Proposed 
Pattern Properties":

        FE50..FE52 ; Pattern_Syntax # SMALL COMMA..SMALL FULL STOP
        FE54..FE57 ; Pattern_Syntax # SMALL SEMICOLON..SMALL EXCLAMATION MARK
        FE59..FE66 ; Pattern_Syntax # SMALL LEFT PARENTHESIS..SMALL EQUALS SIGN
        FE68..FE6B ; Pattern_Syntax # SMALL REVERSE SOLIDUS..SMALL COMMERCIAL AT
        FF01..FF0F ; Pattern_Syntax # FULLWIDTH EXCLAMATION MARK..FULLWIDTH SOLIDUS
        FF1A..FF20 ; Pattern_Syntax # FULLWIDTH COLON..FULLWIDTH COMMERCIAL AT
        FF3B..FF40 ; Pattern_Syntax # FULLWIDTH LEFT SQUARE BRACKET..FULLWIDTH GRAVE 
ACCENT
        FF5B..FF5E ; Pattern_Syntax # FULLWIDTH LEFT CURLY BRACKET..FULLWIDTH TILDE
        FF5F..FF61 ; Pattern_Syntax # FULLWIDTH LEFT WHITE PARENTHESIS..HALFWIDTH 
IDEOGRAPHIC FULL STOP
        FF64       ; Pattern_Syntax # HALFWIDTH IDEOGRAPHIC COMMA
        FFE0..FFE2 ; Pattern_Syntax # FULLWIDTH CENT SIGN..FULLWIDTH NOT SIGN
        FFE4..FFE5 ; Pattern_Syntax # FULLWIDTH BROKEN BAR..FULLWIDTH YEN SIGN
        FFE8..FFEE ; Pattern_Syntax # HALFWIDTH FORMS LIGHT VERTICAL..HALFWIDTH WHITE 
CIRCLE

Rationale. These characters are almost identical, visually and semantically, to their 
"normal width" counterparts. Allowing such characters in identifiers means allowing 
identifiers which look identical to expressions of a totally different kind. E.g., an 
identifier such as "foo，bar" (where "，" is U+FF0C FULLWIDTH COMMA), would look 
identical to expression "foo, bar" (identifier "foo" + comma + space + identifier 
"bar").

Regards.
Marco Cimarosti ([EMAIL PROTECTED])

Feedback on UTR#31 (draft 1): Non-Latin Punctuation.

I suggest that a small set of non-Latin punctuation marks be added in class 
<Pattern_Syntax>. Each one of the punctuation marks that I am suggesting to include 
complies with the following conditions:

1) It is very similar in shape to an ASCII-range character which is already in class 
<Pattern_Syntax>;

2) It is very similar in function to an ASCII-range character already which is in 
class <Pattern_Syntax>;

3) It is used in the modern orthography of modern languages and/or it is commonly 
available on national keyboards;

4) It is not commonly used to form words or phrases which may be used as identifiers.

In practice, I am suggesting to add the following lines to section "4.1 Proposed 
Pattern Properties":

        037E       ; Pattern_Syntax # GREEK QUESTION MARK
        0387       ; Pattern_Syntax # GREEK ANO TELEIA
        055C..055E ; Pattern_Syntax # ARMENIAN EXCLAMATION MARK..ARMENIAN QUESTION MARK
        0589       ; Pattern_Syntax # ARMENIAN FULL STOP
        05C0       ; Pattern_Syntax # HEBREW PUNCTUATION PASEQ
        05C3       ; Pattern_Syntax # HEBREW PUNCTUATION SOF PASUQ
        060C..060D ; Pattern_Syntax # ARABIC COMMA..ARABIC DATE SEPARATOR
        061B       ; Pattern_Syntax # ARABIC SEMICOLON
        061F       ; Pattern_Syntax # ARABIC QUESTION MARK
        066A..066C ; Pattern_Syntax # ARABIC PERCENT SIGN..ARABIC THOUSANDS SEPARATOR
        06D4       ; Pattern_Syntax # ARABIC FULL STOP
        066D       ; Pattern_Syntax # ARABIC FIVE POINTED STAR
        0964..0965 ; Pattern_Syntax # DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
        10FB       ; Pattern_Syntax # GEORGIAN PARAGRAPH SEPARATOR
        1362..1368 ; Pattern_Syntax # ETHIOPIC FULL STOP..ETHIOPIC PARAGRAPH SEPARATOR

Rationale. Punctuation marks complying with conditions #1 to #3 may easily be cofused 
with ASCII-range characters which are normally used in the syntax of computer 
languages and notations. Allowing such character in identifiers would mean to allow 
identifiers which look almost identical to expressions of a totally different kind. 
E.g., an identifier such as "return;" (where ";" is U+037E GREEK QUESTION MARK), 
looks identical to expression "return;" (identifier or keyword "return" + semicolon). 
However, punctuation marks mentioned in condition #4 (e.g. syllable separators, 
morpheme separators, abbreviation marks, diacritic marks, apostrophes) are excluded 
from my suggestion (i.e. I suggest to allow them in identifiers) because they are 
useful to form words or phrases which may act as identifiers.

Character-by-character rationale. In the following list, I listed each suggested 
character along with the ASCII-range character which looks similar to it (as per 
condition #1 above) and with the ASCII-range character which has a similar function to 
it (as per condition #2).

        Code    Cnd.#1  Cnd.#2  Character name

        037E    ;       ?       GREEK QUESTION MARK
        0387    .       ;       GREEK ANO TELEIA
        055C    ~       !       ARMENIAN EXCLAMATION MARK
        055D    `       ,       ARMENIAN COMMA
        055E    ^       ?       ARMENIAN QUESTION MARK
        0589    :       .       ARMENIAN FULL STOP
        05C0    |       ;       HEBREW PUNCTUATION PASEQ
        05C3    :       .       HEBREW PUNCTUATION SOF PASUQ
        060C    ,       ,       ARABIC COMMA
        060D    ,       ,       ARABIC DATE SEPARATOR
        061B    ;       ;       ARABIC SEMICOLON
        061F    ?       ?       ARABIC QUESTION MARK
        066A    %       %       ARABIC PERCENT SIGN
        066B    ,       .       ARABIC DECIMAL SEPARATOR
        066C    ,       ,       ARABIC THOUSANDS SEPARATOR
        06D4    _       .       ARABIC FULL STOP
        066D    *       *       ARABIC FIVE POINTED STAR
        0964    |       .       DEVANAGARI DANDA
        0965    |       .       DEVANAGARI DOUBLE DANDA
        10FB    :       :       GEORGIAN PARAGRAPH SEPARATOR
        1362    :       .       ETHIOPIC FULL STOP
        1363    :       ,       ETHIOPIC COMMA
        1364    :       ;       ETHIOPIC SEMICOLON
        1365    :       :       ETHIOPIC COLON
        1366    :       :       ETHIOPIC PREFACE COLON
        1367    |       ?       ETHIOPIC QUESTION MARK
        1368    :       .       ETHIOPIC PARAGRAPH SEPARATOR

Regards.
Marco Cimarosti ([EMAIL PROTECTED])

RE: Proposed Draft UTR #31 - Syntax Characters

Reply via email to