Hi,

BTW: I hate that TOFU posting, but Outlook doesn't allow me to do it any 
better. I'm sorry for that.

Your sample [123, 9" Nail, ", Caliper, "set, up 5"", hold", 8 ] gives currently 
the following column values (I used > and < to indicate the string bounderies, 
so that any white spaces are noticeable):

        >123< | > 9" Nail, "< | > Caliper< | > "set, up 5"", hold"< | > 8 <

Given that your input has been stripped by the leading spaces after a column 
separator, or if the column separator has been defined as >, < you currently 
get:

        >123< | >9" Nail, "< | >Caliper< | >set, up 5", hold< | >8 <

With my approach you still get:

        >123< | >9" Nail, "< | >Caliper< | >set, up 5", hold< | >8 <

You expect to get:

        >123< | >9 Nail, < | >Caliper< | >set, up 5", hold< | >8 <

But this is not what .import currently does.

To get that, .import would have to ignore every " besides when the cell is 
actually quoted.

This is my current suggestion for the code, dealing with splitting the row into 
columns and dealing with quoting (just from the editor, haven't tested it yet):

      char *z, c;
      int honorQuote = 1;
      int inQuote = 0;
      lineno++;
      azCol[0] = zLine;
      for(i=0, z=zLine; (c = *z)!=0; z++){
        if( c=='"' ){
          if( honorQuote )
          {
            inQuote = 1;
            honorQuote = 0;
          }
          else if( inQuote ){
            inQuote = 0;
            honorQuote = 1;
          }
        }
        else
          honorQuote = 0;
        if( c=='\n' ) lineno++;
        if( !inQuote && c==p->separator[0] && strncmp(z,p->separator,nSep)==0 ){
          *z = 0;
          i++;
          if( i<nCol ){
            azCol[i] = &z[nSep];
            z += nSep-1;
          }
          honorQuote = 1;
        }
      } /* end for */

Bye.

-----Ursprüngliche Nachricht-----
Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
Im Auftrag von RSmith
Gesendet: Mittwoch, 26. Juni 2013 14:48
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] sqlite3: .import command handles quotation incorrectly

Expanding on what Jay replied to:
> Hi,
>
> well we could discuss endlessly, what well formed CSV files are.
Not really, the rules are pretty set in stone, it isn't open to interpretation.


> Given that we cannot fix the generation of the CSV file, why not making the 
> importer a little bit smarter?
>
> The .import command already treats " as literal data, when it doesn't appear 
> at the beginning of the cell, but it requires an even number of " in the same 
> cell to don't get confused in breaking up the row into cells.
>
> All I ask for is to treat an odd number of " in a cell as literal data except 
> when the rule for dequoting applies, i. e. when the cell starts with a ".

This is impossible - How do you know where the end of a cell is if you are 
ignoring/counting Quotes???

For instance, what is the resulting fields from this CSV: [123, 9" Nail, ", 
Caliper, "set, up 5"", hold", 8 ]
In REAL CSV terms that would be:    123  |  9 Nail,  | Caliper | set, up 5", 
hold  |  8

How would you want that to be interpreted?  Once we encounter the first quote, 
how do we know at which quote the field ends? How do 
we know how many quotes are in the field? Or do we just ignore quotes 
altogether and break on every comma - in which case the above 
becomes:   123  |  9" Nail  |  "  |  Caliper  |  "set  |  up 5""  | hold"  |  8
Surely it's easy to see how this is incorrect?
How about your other suggestion about taking fields starting with quotes to be 
quoted in pairs (even numbers) but otherwise ignoring 
odd numbered quotes - how do we know at which quote to stop counting?
Going strictly by your suggestion It might be interpreted as:  123 |  9" Nail  
|  , Caliper,  |  up 5""  |  hold"  |  8  |
or maybe even as:  123  |  9" Nail  |  , Caliper, set  |  up 5""  | hold"  |  8 
 |   (the interpreter would never be sure)

The interpreter working like this is not "a little bit smarter" but rather is 
very much non-conforming.

You have a very specific case of weirdly created data - I already offered help 
to give you an interpreter specific to it and will be 
happy to, but there is no way the SQlite can be altered like this, it will read 
all other CSV files wrong - and making a case 
specifc SQLite is possible (you can alter the C code and compile your own 
easily), but all future updates will need this adjustment. 
Submitting a patch will require your version to work on all other CSV file too 
- which as desribed above - it won't.

is there no way to fix the output? Or use an in-between step of converting the 
data before import from the weird layout to proper 
CSV? (we can help with that).

Have a great day!

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to