On 6/26/17 3:09 AM, Eric Grange wrote:
Alas, there is no end in sight to the pain for the Unicode decision to not
make the BOM compulsory for UTF-8.

Making it optional or non-necessary basically made every single text file
ambiguous, with non-trivial heuristics and implicit conventions required
instead, resulting in character corruptions that are neither acceptable nor
understood by users.
Making it compulsory would have made pre-Unicode *nix command-line
utilities and C string code in need of fixing, much pain, sure, but in
retrospect, this would have been a much smarter choice as everything could
have been settled in matter of years.

But now, more than 20 years later, UTF-8 storage is still a mess, with no
end in sight :/

Perhaps the real issue wasn't in making the BOM mark optional, but in giving it TWO uses, by defining the symbol as a Zero-Width Non Breaking Space Character as well as the Byte Order Mark. If its ONLY purpose was to allow for the optional marking of a file as being encoded with Unicode, and with what flavor it was, then it wouldn't have been an issue, all I/O input routines could freely drop it after marking the input method for the file. Since it does have another meaning, things become messy, and we are stuck with trying to decide which wrong thing we should do,

--
Richard Damon

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to