Jay... Your post neatly articulates virtually every facet of this issue. Thank you. I wish we could get everyone to stop using csv. I hate to look at xml but I often wish everyone would use it instead of csv. I would hate to see any of the sqlite core devs waste time on csv.
Carlos On 9/25/09, Jay A. Kreibich <j...@kreibi.ch> wrote: > On Fri, Sep 25, 2009 at 10:24:15AM +1000, BareFeet scratched on the wall: > > >> In reality, in the thousands of CSV files I've dealt with >> over the years, they all follow the same standard: >> >> 1. Rows are delimited by a new line (return and/or line feed). >> 2. Columns are delimited by a comma. >> 3. "Quoted text" is treated as one value item, including any commas or >> new lines within it. >> 4. A double quote "" is used to put a quote within quotes. >> >> That's it. > > This is more or less the standard put forth by RFC 4180. And if this > is all you've encountered, you're not using very many different > applications or you're primarily dealing with numbers and simple > strings that don't contain quotes or commas. CSV works very very > well if you never get into the question of escapes, but details count. > > Reading the RFC only proves my point. SQLite v3 is older than that > doc, and it pretty much admits the cat was out of the bag a long time > ago. There are a ton of optional and might/may/could sections that > event the format they define has a lot of holes in it (i.e. headers, > or no headers?). > >> Everything I've seen uses this. > > According to the RFC Excel doesn't use double-quotes for anything. > You might not care about Excel, but I'm willing to bet it is one of > the most-- if not the most-- common exporters of CSV. The question > of getting data from Excel into SQLite shows up on the list every now > and then. > >> Some don't need delimiters >> in values, so they don't need quotes, but the encompassing >> specification works for all cases. > > No, it doesn't. Working on a large website that provided CSV exports > for a number of data sources, I've seen plenty of examples that don't > work. Finding a common format that could be exported into a handful > of common desktop apps was so pointless we seriously considered > getting rid of CSV all together, because we got tired of our users > telling us how simple CSV was, and why couldn't we just do this one > thing differently so it would work on *their* application. > >> It's not that big a deal for SQLite to support it, so it should. > > If it is so simple, and you know where the code is... > > This is, perhaps, the biggest fallacy of CSV... people think it > is a "simple" format (it isn't), and assume that code support to > "correctly" (whatever that is) read it is simple. It isn't. The RFC > has a formal grammar that requires over a dozen elements to define! > > Most people setting out to build a CSV reader never think to use a > full grammar and parser-- after all, it is such a "simple" format-- > and find themselves in a mess of code soon enough. Seriously, give > it a try. > > Carlos's Python script (nice!) is a great example. His comment "I am > so grateful I did not have to write a parser for CSV" is dead on. > And, as he points out, the reason the Python module is so good is > that it is adaptive, and really reads five or six different variants > of CSV (something a reader can do but a writer cannot). He was > also able to clobber it all together in a few hours or less (because > someone else spent a few hundred hours on the CSV module), further > proving that advanced support of this kind of thing is really outside > of the scope of SQLite3. After all, the .import command is part of > the shell, not part of the core library. > > > > > CSV is a great quick and dirty format to move data. But it isn't > "simple" and it isn't nearly as universal as many assume. It works > great if you're just moving simple numbers and strings that don't > include commas, but becomes a mess when you get into exceptions. > > Personally, I'd rather have the SQLite team working on core database > features than trying to build a better CSV parser. The problem > is non-trivial and borders on unobtainable and, as Carlos > proved so clearly, there are better, easier, faster ways. > > -j > > > -- > Jay A. Kreibich < J A Y @ K R E I B I.C H > > > "Our opponent is an alien starship packed with atomic bombs. We have > a protractor." "I'll go home and see if I can scrounge up a ruler > and a piece of string." --from Anathem by Neal Stephenson > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users