Thanks Phillippe,

in that file, all UTF-8 sequences with 5 bytes or more are invalid (they are not "boundary cases").

Thanks.

So the list of "impossible bytes" is longer than documented there.

Is it just a case of moving the boundary cases into the impossible bytes? Or are there impossible bytes that simply aren't in the file?


- the file mixes UTF-8 and UTF-16

Does this file mix UTF-8 and UTF-16? I thought it just had surrogates encoded into UTF-8? Of course a surrogate should never exist in UTF-8.


--
    Theodore H. Smith - Software Developer.
    http://www.elfdata.com




Reply via email to