On 06/19/2012 04:28 AM, E. Timothy Uy wrote:
> Dear Dan,
> 
> With the change from U8_NEXT to U16_NEXT, I am able to insert 一日耶羅波安出. I
> was also able to insert the rest of the data set (about 31000 more rows
> containing both traditional and simplified Chinese). Is this an ICU error?
> Seems like everything should be using U8_ in the tokenizer.

U16_NEXT is correct, as that buffer contains utf-16 characters. Data
is converted to utf-16 before it is tokenized as ICU does not provide
a break-iterator that operates directly on utf-8.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to