Thanks for the test case.  Fixed at
http://www.sqlite.org/src/info/5e239ecda0


On Wed, Dec 11, 2013 at 6:37 AM, Lindsay Lawrence <thinknl...@gmail.com>wrote:

> Hi,
>
> I am currently using sqlite3 version:
>
> 3.8.1 2013-10-17 12:57:35 c78be6d786c19073b3a6730dfe3fb1be54f5657a
>
> I found an edge case bug in the sqlite3 shell when importing csv data with
> fields containing embedded quotes, CRs and LFs:
>
> When a field contains an embedded quote, and that embedded quote is
> immediately followed by an EOL then the current csv parser will consider
> the end of field to be reached and fail to import the row correctly.
>
> For example the following csv, a single data row with column headers will
> fail to import correctly.
>
> column1,column2,column3,column4
> fielddata1,fielddata2,"field ""
> data"" 3",fielddata4
>
>
> The offending code is in shell.c in function
>
> static char *csv_read_one_field(CSVReader *p);
>
> An example fix could be something like the following -- adding a counter
> flag 'cQuoteComplete' to track quotes properly.
>
> static char *csv_read_one_field(CSVReader *p){
>   int c, pc;
>   int cSep = p->cSeparator;
>   int cQuoteComplete = 0;
>   p->n = 0;
>   c = fgetc(p->in);
>   if( c==EOF || seenInterrupt ){
>     p->cTerm = EOF;
>     return 0;
>   }
>   if( c=='"' ){
>     int startLine = p->nLine;
>     int cQuote = c;
>     cQuoteComplete+=1;
>     pc = 0;
>     while( 1 ){
>       c = fgetc(p->in);
>       if( c=='\n' ) p->nLine++;
>       if( c==cQuote ){
>         cQuoteComplete+=1;
>         if( pc==cQuote ){
>           pc = 0;
>           continue;
>         }
>       }
>       if( (c==cSep && pc==cQuote)
>        || (c=='\n' && pc==cQuote)
>        || (c=='\n' && pc=='\r' && p->n>=2 && p->z[p->n-2]==cQuote)
>        || (c==EOF && pc==cQuote)
>       ){
>         if (cQuoteComplete%2 == 0) {
>           do{ p->n--; }while( p->z[p->n]!=cQuote );
>           p->cTerm = c;
>           break;
>         }
>       }
>       if( c==EOF ){
>         fprintf(stderr, "%s:%d: unterminated %c-quoted field\n",
>                 p->zFile, startLine, cQuote);
>         p->cTerm = EOF;
>         break;
>       }
>       csv_append_char(p, c);
>       pc = c;
>     }
>     if( cQuoteComplete%2 != 0 ){
>       fprintf(stderr, "%s:%d: unescaped %c character\n",
>               p->zFile, p->nLine, cQuote);
>     }
>   }else{
>     while( c!=EOF && c!=cSep && c!='\n' ){
>       csv_append_char(p, c);
>       c = fgetc(p->in);
>     }
>     if( c=='\n' ){
>       p->nLine++;
>       if( p->n>1 && p->z[p->n-1]=='\r' ) p->n--;
>     }
>     p->cTerm = c;
>   }
>   if( p->z ) p->z[p->n] = 0;
>   return p->z;
> }
>
>
> I built the sqlite3 shell from the almagamation source and tested the above
> change using my import data; about 1Gb of messy,  but rfc4180 compliant,
> CSV. It all imported cleanly.
>
> sqlite3 is a wonderful bit of software. I have been using it for some time
> now to munge and query multi-gigabyte size data sets and am very impressed
> with its performance and capabilities.
>
> Best Regards,
> Lindsay
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to