Best practices for replacing UTF-8 overlongs

Karl Williamson Mon, 19 Dec 2016 15:10:50 -0800

It seems counterintuitive to me that the two byte sequence C0 80 shouldbe replaced by 2 replacement characters under best practices, or that E080 80 should also be replaced by 2. Each sequence was legal in earlyUnicode versions, and it seems that it would be best to treat them aseach a single sequence, replacing by a single replacement character.


What are the advantages to replacing them by multiple characters

Best practices for replacing UTF-8 overlongs

Reply via email to