It seems counterintuitive to me that the two byte sequence C0 80 should be replaced by 2 replacement characters under best practices, or that E0 80 80 should also be replaced by 2. Each sequence was legal in early Unicode versions, and it seems that it would be best to treat them as each a single sequence, replacing by a single replacement character.

What are the advantages to replacing them by multiple characters

Reply via email to