Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-27 Thread Scott Robison
On Tue, Jun 27, 2017 at 4:18 AM, Richard Hipp wrote: > The CSV import feature of the SQLite command-line shell expects to > find UTF-8. It does not understand other encodings, and I have no > plans to add converters for alternative encodings any time soon. > > The latest version

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-27 Thread Mahmoud Al-Qudsi
Thank you. From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> on behalf of Richard Hipp <d...@sqlite.org> Sent: Tuesday, June 27, 2017 5:18:51 AM To: SQLite mailing list Subject: Re: [sqlite] UTF8-BOM not disregarded in CSV import Th

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-27 Thread Richard Hipp
The CSV import feature of the SQLite command-line shell expects to find UTF-8. It does not understand other encodings, and I have no plans to add converters for alternative encodings any time soon. The latest version of trunk skips over a UTF-8 BOM at the beginning of the input file. -- D.

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-27 Thread Cezary H. Noweta
Hello, On 2017-06-26 17:26, Scott Robison wrote: +1 FAQ quote: Q: When a BOM is used, is it only in 16-bit Unicode text? A: No, a BOM can be used as a signature no matter how the Unicode text is transformed: UTF-16, UTF-8, or UTF-32. Q: How I should deal with BOMs? A: Here are some

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-27 Thread Cezary H. Noweta
On 2017-06-26 15:01, jose isaias cabrera wrote: I have made a desicion to always include the BOM in all my text files whether they are UTF8, UTF16 or UTF32 little or big endian. I think all of us should also. I'm sorry, if I introduced ambiguity, but I had described SQLite's and SQLite

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Scott Robison
On Jun 26, 2017 9:02 AM, "Simon Slavin" wrote: There is no convention for "This software understands both UTF-16BE and UTF-16LE but nothing else.". If it handles any BOMs, it should handle all five. However, it can handle them by identifying, for example, UTF-32BE and

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Peter da Silva
I didn’t mean to imply you had to scan the whole content for a BOM, but rather for illegal characters in the absence of a BOM. On 6/26/17, 10:02 AM, "sqlite-users on behalf of Simon Slavin" wrote: Folks, I’m

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Scott Robison
On Jun 26, 2017 4:05 AM, "Rowan Worth" wrote: On 26 June 2017 at 16:55, Scott Robison wrote: > Byte Order Mark isn't perfectly descriptive when used with UTF-8. Neither > is dialing a cell phone. Language evolves. > It's not descriptive in the

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Simon Slavin
Folks, I’m sorry to interrupt but I’ve just woken up to 11 posts in this thread and I see a lot of inaccurate 'facts' posted here. Rather than pick up on statements in individual posts (which would unfairly pick on some people as being less accurate than others) I’d like to post facts straight

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Peter da Silva
Just occurred to me: another problem with the BOM is that some people who are *not* writing UTF-8 are cargo-culting the BOM in anyway. So you may have to scan the whole file to see if it’s really UTF-8 anyway. You’re better off just assuming UTF-8 everywhere, generating an error (and backing

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread jose isaias cabrera
At the bottom... -Original Message- From: Eric Grange Sent: Monday, June 26, 2017 3:09 AM To: SQLite mailing list Subject: Re: [sqlite] UTF8-BOM not disregarded in CSV import Alas, there is no end in sight to the pain for the Unicode decision to not make the BOM compulsory for UTF-8

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Peter da Silva
On 6/26/17, 2:09 AM, "sqlite-users on behalf of Eric Grange" wrote: > Alas, there is no end in sight to the pain for the Unicode decision to not > make the BOM compulsory for UTF-8. It’s not actually providing any

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Richard Damon
On 6/26/17 3:09 AM, Eric Grange wrote: Alas, there is no end in sight to the pain for the Unicode decision to not make the BOM compulsory for UTF-8. Making it optional or non-necessary basically made every single text file ambiguous, with non-trivial heuristics and implicit conventions required

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Eric Grange
>Easily solved by never including a superflous BOM in UTF-8 text And that easy option has worked beautifully for 20 years... not. Yes, BOM is a misnommer, yes it "wastes" 3 bytes, but in the real world "text files" have a variety of encodings. No BOM = you have to fire a whole suite of

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Rowan Worth
On 26 June 2017 at 16:55, Scott Robison wrote: > Byte Order Mark isn't perfectly descriptive when used with UTF-8. Neither > is dialing a cell phone. Language evolves. > It's not descriptive in the slightest because UTF-8's byte order is *specified by the encoding*.

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Scott Robison
On Jun 25, 2017 1:16 PM, "Cezary H. Noweta" wrote: Certainly, there are no objections to extend an import's functionality in such a way that it ignores the initial 0xFEFF. However, an import should allow ZWNBSP as the first character, in its basic form, to be conforming to

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Scott Robison
On Jun 26, 2017 1:47 AM, "Rowan Worth" wrote: On 26 June 2017 at 15:09, Eric Grange wrote: > Alas, there is no end in sight to the pain for the Unicode decision to not > make the BOM compulsory for UTF-8. > UTF-8 is byte oriented. The very concept of byte

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Rowan Worth
On 26 June 2017 at 15:09, Eric Grange wrote: > Alas, there is no end in sight to the pain for the Unicode decision to not > make the BOM compulsory for UTF-8. > UTF-8 is byte oriented. The very concept of byte order is nonsense in this context as there is no multi-byte

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread J Decker
On Sun, Jun 25, 2017 at 12:16 PM, Cezary H. Noweta wrote: > Hello, > > > The standard says: ``Only UTF-16/32 (even not UTF-16/32LE/BE) encoding > forms can contain BOM''. Let's conform to this. > > I concur with that. Since UTF-8 is only bytes; what would a BOM even change?

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Eric Grange
Alas, there is no end in sight to the pain for the Unicode decision to not make the BOM compulsory for UTF-8. Making it optional or non-necessary basically made every single text file ambiguous, with non-trivial heuristics and implicit conventions required instead, resulting in character

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-25 Thread Cezary H. Noweta
Hello, On 2017-06-23 22:12, Mahmoud Al-Qudsi wrote: I think you and I are on the same page here, Clemens? I abhor the BOM, but the question is whether or not SQLite will cater to the fact that the bigger names in the industry appear hell-bent on shoving it in users’ documents by default.

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-23 Thread Mahmoud Al-Qudsi
” commands, perhaps leeway can be shown in breaking with standards for the sake of compatibility and sanity? Mahmoud From: Clemens Ladisch Sent: Friday, June 23, 2017 2:25 AM To: sqlite-users@mailinglists.sqlite.org Subject: Re: [sqlite] UTF8-BOM not disregarded in CSV import Mahmoud Al-Qudsi wrote

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-23 Thread Clemens Ladisch
Mahmoud Al-Qudsi wrote: > with `.import ……`, SQLite3 includes a BOM (UTF-8) as part of the first > column of the first record. The Unicode Standard 9.0 says in section 3.10: | When represented in UTF-8, the byte order mark turns into the byte | sequence . Its usage at the beginning of a UTF-8

[sqlite] UTF8-BOM not disregarded in CSV import

2017-06-21 Thread Mahmoud Al-Qudsi
Hello all, Let me start off with my apologies if this is a documented issue; I did search the fossil tickets but did not find anything for “BOM”. As of SQLite 3.19.3, under `.mode csv` and with `.import ……`, SQLite3 includes a BOM (UTF-8) as part of the first column of the first record. IMHO,