Ryan, You are quite right, it is not a perfect solution. Thank you for your detailed explanation your's is a much more complete solution. I never knew about the quotes syntax being formal. that is very good to know. Sure enough, it works just as you described:
1ºfield 1ºfield 2ºthis is a comment fieldºnumber 4ºlast field 2ºfield 1ºfield 2ºthis is a comment fieldºnumber 4ºlast field 3ºfield 1ºfield 2ºthis is a comment, has a comma in it. ºnumber 4ºlast field 4ºfield 1ºfield 2º"this is a commentº it has my 0186 in it. "ºnumber 4ºlast field 5ºfield 1ºfield 2º"this is a comment field lets quote ""something"" and see what happens. "ºnumber 4ºlast field 6ºfield 1ºfield 2ºthis is a comment fieldºnumber 4ºlast field On Friday, July 18, 2014 11:49 AM, RSmith <[email protected]> wrote: > > > >On 2014/07/18 16:53, Jonathan Leslie wrote: >> I have a similar situation about to happen. I also have the issue where one >> of the data fields is a text section that the user fills in and of course, >> he is free to put commas in this field. I believe that this will result in >> higgly-piggly (that's a technical term) when I do the import. What I do is >> I never use a CSV file as a "Comma separated Values" file but rather as a >> "Character separated Values" file, but rather use a Character, º >> (<alt>0186), a legal character but not one on the keyboard, as the separator >> character. You can set Excel to do this automatically, and it makes life a >> whole lot easier. I believe with sqlite the >.separator "º"< directive >> will allow the import correctly, and keep the commas in place in the data >> fields. > >Hi Jonathan, while this is a nice solution, it is not a perfect one. What if a >user does enter that character? (Even if not directly >linked to a keyboard key, it can happen). The very correct way to do it is to >simply use quoting, which preserves the text >specifically, whatever characters were used. A correct CSV format can hold any >data (though most readers would have difficulty >reading Chr(0) values, but that is an implementation issue). More importantly, >Excel, Open Office Calc etc, all knows exactly how to >export this correctly and without the higgly-pigglies. The only place I have >seen confusion is where you export it in a country that >uses semi-colon as a separator (yes it happens, mostly in the Netherlands) and >then try to read in another country where the >locality values are different (comma mostly). > >Mostly the problems experienced by people is that they make some home-brew CSV >importer that does not realise how to correctly read >output from a standards-based exporter such as Excel, and then try to change >things like separation or quoting methods to "fix" it >after the fact. > >The method described by the RFC for CSV is quite genius ( I think RFC2048, but >not sure now and not important, but google will know >if you need it) so let me explain a simple way of making csv files. > >The Important bits to remember is: > >1 - Header, first line of the CSV must be the header, must contain field >names, unless you do not have those or they are implicit, >such as in a log-type listing or import to existing structure- but any >standard importer will expect row 1 to contain >field-identifiers if previously unknown. > >2 - Every row must be separated by whichever Line-separation character(s) used >by the operating system - this sometimes causes >problems. Carriage-Return-Linefeed (aka CR/LF) pairs (0x0D+0x0A) are mostly >used in Windows, *nix environments prefer to use just >the Linefeed (LF) character (0x0A). The Windows-way allows other linefeeds or >carriage returns to be inside the text and not affect >line-termination, and the *nix method is more lightweight in data terms, so >both have advantages. Important to note that a >line-separator may appear inside a set of quotes which makes it simply part of >the value, and not indication of line termination - >this is the biggest problem with homegrown importers I've seen, they all try >read the file line-by-line which breaks badly when a >field contains such a CR/LF. > >3 - if a field starts with a Quote (after any leading/trailing white-space was >stripped) any and all characters up to the next quote >is part of the field value, even if they are separators (; or , or CR or LF) >or white-space characters (such as SPACE/TAB/VTAB/etc. >- these are stripped when leading/trailing field values, unless in quotes) and >when we come to the next Quote (typically " or 0x22, >but may differ between implementations), we check whether it's immediately >followed by another quote, which means simply add the >actual quote char and continue, or not, which means the end of the field. >(Note: Any text after the ending quote in a quoted field >and before the next field separator ( ; or , ) is ignored, one can use this >quirk to add some comments or directives even). > >4 - If you are writing fields, if your field contains any Quote characters, >any separators or leading/trailing white-space >characters that must not be discarded, it must be quoted. The best method for >quoting a string to be valid CSV field data is like >this pseudo code: > >QuoteChar = Char(0x22); // Or whatever is used for quoting in your system >for i = last index of value down to first index: // Go downward because >it allows inserting without messing up final index > if value[i] = QuoteChar then insert QuoteChar into value at position i; > // Duplicate any existing quotes >OutputFieldValue = QuoteChar + value + QuoteChar; // Surround the result >with quotes > > >This way whatever is used in your fields do not matter, it will be understood >by any standard-validated CSV reader. Note that quotes >may be added regardless of content, but it is preferred to not add quotes >where not needed as it can cause very significant >weight-gains to a large file, especially one with many fields per row, and >moreso if lots of fields are short or empty. > >The sqlite3.exe (or its Linux counterpart) is tested to understand the correct >formatted CSV as used in most places, not sure if it >recognizes the OS's preference for say ; vs. , separators, but it's an easy >test. If your data gets mangled, it is probably due to >the input being not well-processed, and I hope the above will help to fix that. > >Cheers, >Ryan > > > >_______________________________________________ >sqlite-users mailing list >[email protected] >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

