On 06/19/2012 04:28 AM, E. Timothy Uy wrote: > Dear Dan, > > With the change from U8_NEXT to U16_NEXT, I am able to insert 一日耶羅波安出. I > was also able to insert the rest of the data set (about 31000 more rows > containing both traditional and simplified Chinese). Is this an ICU error? > Seems like everything should be using U8_ in the tokenizer.
U16_NEXT is correct, as that buffer contains utf-16 characters. Data is converted to utf-16 before it is tokenized as ICU does not provide a break-iterator that operates directly on utf-8. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users