What are the advantages to replacing them by multiple characters
It seems counterintuitive to me that the two byte sequence C0 80 should
be replaced by 2 replacement characters under best practices, or that E0
80 80 should also be replaced by 2. Each sequence was legal in early
Unicode versions, and it seems that it would be best to treat them as
each a single sequence, replacing by a single replacement character.
- Best practices for replacing UTF-8 overlongs Karl Williamson
- Re: Best practices for replacing UTF-8 overlongs Markus Scherer
- Re: Best practices for replacing UTF-8 overlongs Richard Wordingham
- Re: Best practices for replacing UTF-8 overlongs J Decker
- Re: Best practices for replacing UTF-8 overlongs Doug Ewell
- RE: Best practices for replacing UTF-8 overlon... Tex Texin
- RE: Best practices for replacing UTF-8 ove... Tex Texin
- Re: Best practices for replacing UTF-8... Martin J. Dürst
- RE: Best practices for replacing UTF-8 overlongs Doug Ewell
- Re: Best practices for replacing UTF-8 overlon... Ken Whistler
- Re: Best practices for replacing UTF-8 ove... Markus Scherer