Philippe Verdy wrote: > The idea that "if a text (without BOM) looks like valid > UTF-8, then it is > UTF-8; else it uses another legacy encoding" does not work in > practice and also leads to too many false positives.
Can you point to actual data/cases? I don't mean theoretical, I can make up my own. > Some problems do > exist however, with the relaxed rules for UTF-8 as it was > defined in the IESG RFC. Errr, relaxed? Care to elaborate? Are you referring to RFC 2279? > These old texts (that are valid for this old > version of the UTF-8 encoding) still exist now What's particular about these old texts? -- François