Thanks for the test case. Fixed at http://www.sqlite.org/src/info/5e239ecda0
On Wed, Dec 11, 2013 at 6:37 AM, Lindsay Lawrence <thinknl...@gmail.com>wrote: > Hi, > > I am currently using sqlite3 version: > > 3.8.1 2013-10-17 12:57:35 c78be6d786c19073b3a6730dfe3fb1be54f5657a > > I found an edge case bug in the sqlite3 shell when importing csv data with > fields containing embedded quotes, CRs and LFs: > > When a field contains an embedded quote, and that embedded quote is > immediately followed by an EOL then the current csv parser will consider > the end of field to be reached and fail to import the row correctly. > > For example the following csv, a single data row with column headers will > fail to import correctly. > > column1,column2,column3,column4 > fielddata1,fielddata2,"field "" > data"" 3",fielddata4 > > > The offending code is in shell.c in function > > static char *csv_read_one_field(CSVReader *p); > > An example fix could be something like the following -- adding a counter > flag 'cQuoteComplete' to track quotes properly. > > static char *csv_read_one_field(CSVReader *p){ > int c, pc; > int cSep = p->cSeparator; > int cQuoteComplete = 0; > p->n = 0; > c = fgetc(p->in); > if( c==EOF || seenInterrupt ){ > p->cTerm = EOF; > return 0; > } > if( c=='"' ){ > int startLine = p->nLine; > int cQuote = c; > cQuoteComplete+=1; > pc = 0; > while( 1 ){ > c = fgetc(p->in); > if( c=='\n' ) p->nLine++; > if( c==cQuote ){ > cQuoteComplete+=1; > if( pc==cQuote ){ > pc = 0; > continue; > } > } > if( (c==cSep && pc==cQuote) > || (c=='\n' && pc==cQuote) > || (c=='\n' && pc=='\r' && p->n>=2 && p->z[p->n-2]==cQuote) > || (c==EOF && pc==cQuote) > ){ > if (cQuoteComplete%2 == 0) { > do{ p->n--; }while( p->z[p->n]!=cQuote ); > p->cTerm = c; > break; > } > } > if( c==EOF ){ > fprintf(stderr, "%s:%d: unterminated %c-quoted field\n", > p->zFile, startLine, cQuote); > p->cTerm = EOF; > break; > } > csv_append_char(p, c); > pc = c; > } > if( cQuoteComplete%2 != 0 ){ > fprintf(stderr, "%s:%d: unescaped %c character\n", > p->zFile, p->nLine, cQuote); > } > }else{ > while( c!=EOF && c!=cSep && c!='\n' ){ > csv_append_char(p, c); > c = fgetc(p->in); > } > if( c=='\n' ){ > p->nLine++; > if( p->n>1 && p->z[p->n-1]=='\r' ) p->n--; > } > p->cTerm = c; > } > if( p->z ) p->z[p->n] = 0; > return p->z; > } > > > I built the sqlite3 shell from the almagamation source and tested the above > change using my import data; about 1Gb of messy, but rfc4180 compliant, > CSV. It all imported cleanly. > > sqlite3 is a wonderful bit of software. I have been using it for some time > now to munge and query multi-gigabyte size data sets and am very impressed > with its performance and capabilities. > > Best Regards, > Lindsay > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users