On 2022-08-05 13:32:04 +0200, Samir Ribić via Tinycc-devel wrote: > Tcc supports \u escape sequence inside L"" but I have no idea how to > overcome this problem: > The code inside parse_escape_string function, in this part > > case 'x': > case 'u': > case 'U': > p++; > n = 0; > for(;;) { > c = *p; > if (c >= 'a' && c <= 'f') > c = c - 'a' + 10; > else if (c >= 'A' && c <= 'F') > c = c - 'A' + 10; > else if (isnum(c)) > c = c - '0'; > else > break; > n = n * 16 + c; > p++; > } > > does not limit the size of the hexadecimal number written after the \u > escape code. Why is this a problem? If the text with an unicode letter is > followed by letters a,b, c, d, e or f, it will be part of the code itself. > For example L"Mogu\u0107i" will display the word "Mogući" as should be, > because the code 0107 is c acute. However, the word L"Mogu\u0107e" will > not display "Moguće" but "Moguၾ" because 107e is Myanmar Shan Fa > > Section 6.4.3 of C99 standard ISO/IEC 9899:1999(E) -- Programming > Languages -- C (uchile.cl) > <https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf> states > that \unnnn escape sequence requires exactly four hexadecimal digits, so > the code above needs to be changed.
And exactly 8 hexadecimal digits for \U. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) _______________________________________________ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel