[Tinycc-devel] Unicode letter escape

Samir Ribić via Tinycc-devel Fri, 05 Aug 2022 04:33:56 -0700

Tcc supports \u escape sequence inside L"" but I have no idea how to
overcome this problem:
The code inside parse_escape_string function, in this part


           case 'x':
            case 'u':
            case 'U':
                p++;
                n = 0;
                for(;;) {
                    c = *p;
                    if (c >= 'a' && c <= 'f')
                        c = c - 'a' + 10;
                    else if (c >= 'A' && c <= 'F')
                        c = c - 'A' + 10;
                    else if (isnum(c))
                        c = c - '0';
                    else
                        break;
                    n = n * 16 + c;
                    p++;
                }

does not limit the size of the hexadecimal number written after the \u
escape code. Why is this a problem? If the text with an unicode letter is
followed by letters a,b, c, d, e or f, it will be part of the code itself.
For example L"Mogu\u0107i" will display the word "Mogući" as should be,
because the code 0107 is c acute.  However, the word L"Mogu\u0107e" will
not display "Moguće" but "Moguၾ" because 107e is  Myanmar Shan Fa

Section 6.4.3 of C99 standard  ISO/IEC 9899:1999(E) -- Programming
Languages -- C (uchile.cl)
<https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf> states
that \unnnn escape sequence requires exactly four hexadecimal digits, so
the code above needs  to be changed.

_______________________________________________
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

[Tinycc-devel] Unicode letter escape

Reply via email to