Re: [sqlite] UTF8-BOM not disregarded in CSV import

Scott Robison Mon, 26 Jun 2017 08:08:45 -0700

On Jun 26, 2017 4:05 AM, "Rowan Worth" <row...@dug.com> wrote:

On 26 June 2017 at 16:55, Scott Robison <sc...@casaderobison.com> wrote:

> Byte Order Mark isn't perfectly descriptive when used with UTF-8. Neither
> is dialing a cell phone. Language evolves.
>

It's not descriptive in the slightest because UTF-8's byte order is
*specified by the encoding*.

I fear you may not have read my entire email or at least have missed my
point.

 I'm not advocating one way or
> another, but if a system strips U+FEFF from a text stream after using it
to
> determine the encoding, surely it is reasonable to expect that for all
> supported encodings.
>

?? Are you going to strip 0xFE 0xFF from the front of my iso8859-1 encoded
stream and drop my beautiful smiley? þÿ
Different encodings demand different treatment. BOM is an artifact of
16/32-bit unicode encodings and can kindly keep its nose out of [the
relatively elegant] UTF-8.

One, I'm not going to do anything. Two, clearly I'm taking about the three
byte UTF-8 sequence that decodes to U+FEFF. Three, you are correct about
different encodings. I was trying to move the discussion past the idea of
byte order when what we're really talking about is encoding detection.
ZWNBSP was used for encoding detection because it had a convenient property
that allowed differentiation between multiple encodings and could be safely
ignored. The fact that the Unicode folks renamed it BOM instead of TEI or
BEM or whatever doesn't mean it can't be used with other unicode
transformations. It is neither required, recommended, nor forbidden with
UTF-8, it's up to systems exchanging data to decide how to deal with it.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8-BOM not disregarded in CSV import

Reply via email to