Re: [sqlite] UTF8-BOM not disregarded in CSV import

Eric Grange Mon, 26 Jun 2017 04:04:27 -0700

>Easily solved by never including a superflous BOM in UTF-8 text

And that easy option has worked beautifully for 20 years... not.


Yes, BOM is a misnommer, yes it "wastes" 3 bytes, but in the real world
"text files" have a variety of encodings.
No BOM = you have to fire a whole suite of heuristics or present the user
with choices he/she will not understand.

After 20 years, the choice is between doing the best in an imperfect world,
or perpetuating the issue and blaming others.


On Mon, Jun 26, 2017 at 12:05 PM, Rowan Worth <[email protected]> wrote:

> On 26 June 2017 at 16:55, Scott Robison <[email protected]> wrote:
>
> > Byte Order Mark isn't perfectly descriptive when used with UTF-8. Neither
> > is dialing a cell phone. Language evolves.
> >
>
> It's not descriptive in the slightest because UTF-8's byte order is
> *specified by the encoding*.
>
>  I'm not advocating one way or
> > another, but if a system strips U+FEFF from a text stream after using it
> to
> > determine the encoding, surely it is reasonable to expect that for all
> > supported encodings.
> >
>
> ?? Are you going to strip 0xFE 0xFF from the front of my iso8859-1 encoded
> stream and drop my beautiful smiley? þÿ
> Different encodings demand different treatment. BOM is an artifact of
> 16/32-bit unicode encodings and can kindly keep its nose out of [the
> relatively elegant] UTF-8.
>
> -Rowan
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8-BOM not disregarded in CSV import

Reply via email to