Re: [sqlite] UTF8-BOM not disregarded in CSV import

jose isaias cabrera Mon, 26 Jun 2017 06:02:18 -0700


At the bottom...

-----Original Message-----From: Eric Grange

Sent: Monday, June 26, 2017 3:09 AM
To: SQLite mailing list
Subject: Re: [sqlite] UTF8-BOM not disregarded in CSV import

Alas, there is no end in sight to the pain for the Unicode decision to not
make the BOM compulsory for UTF-8.

Making it optional or non-necessary basically made every single text file
ambiguous, with non-trivial heuristics and implicit conventions required
instead, resulting in character corruptions that are neither acceptable nor
understood by users.
Making it compulsory would have made pre-Unicode *nix command-line
utilities and C string code in need of fixing, much pain, sure, but in
retrospect, this would have been a much smarter choice as everything could
have been settled in matter of years.

But now, more than 20 years later, UTF-8 storage is still a mess, with no
end in sight :/

On Sun, Jun 25, 2017 at 9:16 PM, Cezary H. Noweta <c...@poczta.onet.pl>
wrote:

Hello,

On 2017-06-23 22:12, Mahmoud Al-Qudsi wrote:

I think you and I are on the same page here, Clemens? I abhor the
BOM, but the question is whether or not SQLite will cater to the fact
that the bigger names in the industry appear hell-bent on shoving it
in users’ documents by default.


Given that ‘.import’ and ‘.mode csv’ are “user mode” commands,

perhaps leeway can be shown in breaking with standards for the sake
of compatibility and sanity?


IMHO, this is not a good way to show a leeway. The Unicode Standard has
enough bad things in itself. It is not necessary to transform a good
Unicode's thing into a bad one.

Should SQLite disregard one <EF BB BF> sequence, or all <EF BB BF>
sequences, or at most 2, 3, 10 ones at the beginning of a file? Such
stream can be produced by a sequence of conversions done by a mix of
conforming and ``breaking the standard for the sake of compatibility''
converters.

To be clear: I understand your point very well - ``let's ignore optional
BOM at the beginning'', but I want to show that there is no limit in
such thinking. Why one optional? You have not pointed out what
compatibility with. The next step is to ignore N BOMs for the sake of
compatibility with breaking the standard for the sake of compatibility
with breaking the standard for the sake of... lim = \infty. I cannot see
any sanity here.

The standard says: ``Only UTF-16/32 (even not UTF-16/32LE/BE) encoding
forms can contain BOM''. Let's conform to this.

Certainly, there are no objections to extend an import's functionality
in such a way that it ignores the initial 0xFEFF. However, an import
should allow ZWNBSP as the first character, in its basic form, to be
conforming to the standard.

-- best regards

Cezary H. Noweta
_______________________________________________

I have made a desicion to always include the BOM in all my text fileswhether they are UTF8, UTF16 or UTF32 little or big endian. I think all ofus should also. Just because the "Unicode Gurus" didn't think so, does notmean they are right. I had a doctor give me the wrong diagnose. There werejust too many symptoms that looked alike and they chosed one and went withit. The same thing happened, the Unicode Gurus, they never thought aboutthe problems they would be causing today. Some applications do not placeBOM on UTF8, UTF16 files, and then you have to go and find which one is it,and decode the file correctly. This can all be prevented by having a BOM.Yes, I know I am saying everything every body is, but what I am also sayingis to let us all use the BOM, and also have every application we writewelcome the BOM. One last thing, every application uses internal fileinformation to tell whether a file is able to be read by the application,and whether the application version supports that version of that file, etc.UTF8, UTF16, UTF32, litle or big Endian should have BOM. Thanks.

josé

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8-BOM not disregarded in CSV import

Reply via email to