Alas, there is no end in sight to the pain for the Unicode decision to not make the BOM compulsory for UTF-8.
Making it optional or non-necessary basically made every single text file ambiguous, with non-trivial heuristics and implicit conventions required instead, resulting in character corruptions that are neither acceptable nor understood by users. Making it compulsory would have made pre-Unicode *nix command-line utilities and C string code in need of fixing, much pain, sure, but in retrospect, this would have been a much smarter choice as everything could have been settled in matter of years. But now, more than 20 years later, UTF-8 storage is still a mess, with no end in sight :/ On Sun, Jun 25, 2017 at 9:16 PM, Cezary H. Noweta <c...@poczta.onet.pl> wrote: > Hello, > > On 2017-06-23 22:12, Mahmoud Al-Qudsi wrote: > >> I think you and I are on the same page here, Clemens? I abhor the >> BOM, but the question is whether or not SQLite will cater to the fact >> that the bigger names in the industry appear hell-bent on shoving it >> in users’ documents by default. >> > > Given that ‘.import’ and ‘.mode csv’ are “user mode” commands, >> perhaps leeway can be shown in breaking with standards for the sake >> of compatibility and sanity? >> > > IMHO, this is not a good way to show a leeway. The Unicode Standard has > enough bad things in itself. It is not necessary to transform a good > Unicode's thing into a bad one. > > Should SQLite disregard one <EF BB BF> sequence, or all <EF BB BF> > sequences, or at most 2, 3, 10 ones at the beginning of a file? Such > stream can be produced by a sequence of conversions done by a mix of > conforming and ``breaking the standard for the sake of compatibility'' > converters. > > To be clear: I understand your point very well - ``let's ignore optional > BOM at the beginning'', but I want to show that there is no limit in > such thinking. Why one optional? You have not pointed out what > compatibility with. The next step is to ignore N BOMs for the sake of > compatibility with breaking the standard for the sake of compatibility > with breaking the standard for the sake of... lim = \infty. I cannot see > any sanity here. > > The standard says: ``Only UTF-16/32 (even not UTF-16/32LE/BE) encoding > forms can contain BOM''. Let's conform to this. > > Certainly, there are no objections to extend an import's functionality > in such a way that it ignores the initial 0xFEFF. However, an import > should allow ZWNBSP as the first character, in its basic form, to be > conforming to the standard. > > -- best regards > > Cezary H. Noweta > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users