>> >> inserting the following into my virtual table:
>> >>
>> >> 一日耶羅波安出
>>
>> Can you post the list of codepoints in this text? Or the hex
>> of the utf-16 or utf-8 encoding of the same?
00 4E E5 65 36 80 85 7F E2 6C 89 5B FA 51
Here no problem inserting this string (Mac OSX 10.6.8)
sqlite> cr
On 06/19/2012 04:28 AM, E. Timothy Uy wrote:
> Dear Dan,
>
> With the change from U8_NEXT to U16_NEXT, I am able to insert 一日耶羅波安出. I
> was also able to insert the rest of the data set (about 31000 more rows
> containing both traditional and simplified Chinese). Is this an ICU error?
> Seems like
Dear Dan,
With the change from U8_NEXT to U16_NEXT, I am able to insert 一日耶羅波安出. I
was also able to insert the rest of the data set (about 31000 more rows
containing both traditional and simplified Chinese). Is this an ICU error?
Seems like everything should be using U8_ in the tokenizer.
Thank y
I'll take a look right now. Though my first thought was if you change
U8_NEXT to U16_NEXT, wouldn't you have to change it everywhere else? I
recompiled ICU with U_CHARSET_IS_UTF8 earlier and this did not help.
On Mon, Jun 18, 2012 at 2:06 PM, Dan Kennedy wrote:
> On 06/19/2012 03:39 AM, E. Timo
On 06/19/2012 03:39 AM, E. Timothy Uy wrote:
> If anyone can unravel this mystery, it would be much appreciated. For now,
> I inserted a comma - 一日、耶羅波安出 and it works. I suspect it must be somehow
> that the sequence of bytes encodes another character, which throws the
> tokenizer out of whack or m
If anyone can unravel this mystery, it would be much appreciated. For now,
I inserted a comma - 一日、耶羅波安出 and it works. I suspect it must be somehow
that the sequence of bytes encodes another character, which throws the
tokenizer out of whack or maybe the fts4aux table.
一
19968
%E4%B8%80
日
26085
%E
Thanks for writing back Dan. Using charCodeAt() in Javascript, I have the
following for 一日耶羅波安出:
19968
26085
32822
32645
27874
23433
20986
I tried entering subsets of the data:
一日耶羅波安出 - Error: SQL logic error or missing database <-- target
一日耶羅波安 - Ok
日耶羅波安出 - Ok
耶羅波安出 - Ok
一日耶羅波安出x - Error: SQ
On 06/19/2012 02:11 AM, E. Timothy Uy wrote:
> I recompiled ICU using U_CHARSET_IS_UTF8 and the error persists.
>
> On Mon, Jun 18, 2012 at 11:45 AM, E. Timothy Uy wrote:
>
>> Hopefully someone has some insight on this. I am using FTS4 with
>> tokenize=icu (and PRAGMA encoding="UTF-8"). I'm gett
I recompiled ICU using U_CHARSET_IS_UTF8 and the error persists.
On Mon, Jun 18, 2012 at 11:45 AM, E. Timothy Uy wrote:
> Hopefully someone has some insight on this. I am using FTS4 with
> tokenize=icu (and PRAGMA encoding="UTF-8"). I'm getting getting an error
> inserting the following into my
9 matches
Mail list logo