Re: [sqlite] UTF8-BOM not disregarded in CSV import

Richard Damon Mon, 26 Jun 2017 04:17:34 -0700

On 6/26/17 3:09 AM, Eric Grange wrote:

Alas, there is no end in sight to the pain for the Unicode decision to not
make the BOM compulsory for UTF-8.


Making it optional or non-necessary basically made every single text file
ambiguous, with non-trivial heuristics and implicit conventions required
instead, resulting in character corruptions that are neither acceptable nor
understood by users.
Making it compulsory would have made pre-Unicode *nix command-line
utilities and C string code in need of fixing, much pain, sure, but in
retrospect, this would have been a much smarter choice as everything could
have been settled in matter of years.

But now, more than 20 years later, UTF-8 storage is still a mess, with no
end in sight :/

Perhaps the real issue wasn't in making the BOM mark optional, but ingiving it TWO uses, by defining the symbol as a Zero-Width Non BreakingSpace Character as well as the Byte Order Mark. If its ONLY purpose wasto allow for the optional marking of a file as being encoded withUnicode, and with what flavor it was, then it wouldn't have been anissue, all I/O input routines could freely drop it after marking theinput method for the file. Since it does have another meaning, thingsbecome messy, and we are stuck with trying to decide which wrong thingwe should do,


--
Richard Damon

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8-BOM not disregarded in CSV import

Reply via email to