At 16:36 01/08/2015, Igor wrote: ???? >There are many real problems with CSV - no need to make up imaginary ones. `---
Indeed and I'm surprised noone mentionned this from the very start of this thread: Nulls. There is no real provision in the RFC to represent Null, or distinguish Null from an empty string. I've long established my own convention to transfer data between the language I use most (AutoIt), CSV files and SQLite and I'm very happy with that. Let's look at which datatypes we have to deal with, at least for SQLite: o) string o) integer o) real o) binary o) Null <-- that's very important and most often forgoten! Within my humble convention, a CSV field is either: o) a string delimited by a parametrable delimiter (double quote by default) where embedded delimiters are escaped by doubling o) an integer, optionally signed o) a real (e.g. 3.1415926 or -5.6e-12) o) a binary in 0x0123456789abcdef form o) an empty field or a field containing Null (w/o delimiter) represents Null Fields are separated by separators (comma by default). Rows are terminated by any Unicode line termination (defined by PCRE BSR_ANY). An optional header may be present with column names as a series of string types. That deals with SQLite datatypes. I've additionnally accept datatypes from/to AutoIt to/from SQLite (some of them don't make much sense storing in a DB, but anyway): o) a field containing True or False (w/o delimiter) represents 1 or 0 as a boolean o) a field containing "<-~Default~->" represents the keyword Default o) function references are encoded as strings in the format "<-(myFunc)->" o) pointers are encoded as strings in the format "<-*0123456789"*->" o) Windows handles are encoded in the form "<-&0123456789&->" o) a C-style structure is encoded by "<-{0123456789ABCDEF}->" and the following more exotic ones (their actuel content is not stored): o) a COM object is denoted by the format "<- at objectName@->" o) an array is denoted by the format "<-[3][4]...[2]->" o) all other unrepresentable variants are denoted by "<-?variantType?->" I agree that the convention of using strings like "<- ... ->" is questionable and is merely a hint that some variable(s) should have been handled otherwise, but if one focuses only on SQLite (and Excel or such) datatypes, the format is definitely unambiguous and easily parsed by a single regexp and encoded with minimum effort. The benefit of representing datatypes unambiguously in round-trip operations should be obvious. Granted that doesn't solve the direct import from Excel CSV or some other data source using its own "standard", but I prefer having to code the few lines of AutoIt needed to grab data directly from Excel or other COM objects and format the data according to my own convention than rely on variable formats found around. Again, the issue of not providing a clear way to represent SQL Null is a big hole in the RFC (I understand it was not the purpose at the time it was issued and the fact that it was too late to enforce something better). Also the optional string delimiter makes it essentially impossible to distinguish the integer value 1234 from the string "0001234", another issue that needs addressing.