Mahmoud Al-Qudsi wrote:
> with `.import ……`, SQLite3 includes a BOM (UTF-8) as part of the first
> column of the first record.

The Unicode Standard 9.0 says in section 3.10:
| When represented in UTF-8, the byte order mark turns into the byte
| sequence <EF BB BF>. Its usage at the beginning of a UTF-8 data stream
| is neither required nor recommended by the Unicode Standard,

so you should not use it.

Treating this character as a zero width no-break space, and keeping it,
is a correct interpretation of the file.

> IMHO, this is of particular importance since the latest versions of MS
> Excel default to “UTF-8 CSV” which includes a BOM.

That's wrong:
| When converting between different encoding schemes, extreme care must
| be taken in handling any initial byte order marks. For example, if one
| converted a UTF-16 byte serialization with an initial byte order mark
| to a UTF-8 byte serialization, thereby converting the byte order mark
| to <EF BB BF> in the UTF-8 form, the <EF BB BF> would now be ambiguous
| as to its status as a byte order mark (from its source) or as an
| initial zero width no-break space. If the UTF-8 byte serialization
| were then converted to UTF-16BE and the initial <EF BB BF> were
| converted to <FE FF>, the interpretation of the U+FEFF character would
| have been modified by the conversion. This would be nonconformant
| behavior according to conformance clause C7, because the change
| between byte serializations would have resulted in modification of the
| interpretation of the text. This is one reason why the use of the
| initial byte sequence <EF BB BF> as a signature on UTF-8 byte
| sequences is not recommended by the Unicode Standard.

And Google Docs also thinks it would be a good idea to act against
this recommendation:
<https://productforums.google.com/forum/#!topic/docs/p_jCTwzuIqk>

> Would anyone be opposed to a patch to SQLite that disregarded a BOM
> when found during a csv import operation?

Well, being wrong doesn't mean that Microsoft or Google will change
their behaviour ...


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to