Re: [sqlite] UTF8-BOM not disregarded in CSV import

Peter da Silva Mon, 26 Jun 2017 05:36:13 -0700

On 6/26/17, 2:09 AM, "sqlite-users on behalf of Eric Grange" 
<sqlite-users-boun...@mailinglists.sqlite.org on behalf of egra...@glscene.org> 
wrote:
> Alas, there is no end in sight to the pain for the Unicode decision to not 
> make the BOM compulsory for UTF-8.


It’s not actually providing any “byte order” information. It’s only used for 
round-tripping conversion from other formats that actually require one. 
Therefore it is not required.

Perhaps it should have been called “UTF-8 mark” instead? Then it could have 
been arguably recommended.

Regardless, it is what it is.

As for distinguishing UTF-8 from something like 8859.x or CP1255, if the string 
is all-7-bit it’s ASCII which can be safely treated as UTF-8. If it’s not, then

1. It wouldn’t have had a UTF-8 flag anyway, and
2. odds are very good it’s going to contain at least one byte that’s not valid 
UTF-8. Then you’re falling back to guessing which 8859.x variation to try.

My call is, just use UTF-8 everywhere and if you have some program that’s 
producing 8859.x or something else from the last century... fix it. It’s not 
the UTF-8 storage that’s the mess, it’s the non-UTF-8 storage. 

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8-BOM not disregarded in CSV import

Reply via email to