Philippe Verdy wrote:
> The idea that "if a text (without BOM) looks like valid 
> UTF-8, then it is
> UTF-8; else it uses another legacy encoding" does not work in 
> practice and also leads to too many false positives.

Can you point to actual data/cases?  I don't mean theoretical, I can make up
my own.

> Some problems do
> exist however, with the relaxed rules for UTF-8 as it was 
> defined in the IESG RFC.

Errr, relaxed?  Care to elaborate?  Are you referring to RFC 2279?

> These old texts (that are valid for this old 
> version of the UTF-8 encoding) still exist now

What's particular about these old texts?

-- 
François

Reply via email to