Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-23 Thread 邱朗
Hi Hideaki, Thanks for your reply which made me figure out why I said icu version does "not" support Chinese: b/c in Chinese '中文' can be tokenize as either '中文' or '中' or '文' so when query '中文' or '中*' I can get the result but no result when query '文'. The same goes to '为什么', which can be be

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-22 Thread Hideaki Takahashi
Hello, full text search index can be used to see how the text is tokenized for both FTS4 and FTS5. for FTS4, fts3tokenize can be used too. sqlite> CREATE VIRTUAL TABLE icu_zh_cn USING fts3tokenize(icu, zh_CN); sqlite> SELECT token, start, end, position FROM icu_zh_cn WHERE INPUT='为什么不支持中文 fts5

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
Hi, It was exactly like you said, my bad, so now I have built an icu version. BUT unfortunately it still does not support CJK, why is that ? qiulangs-MacBook-Pro:sqlite-autoconf-3250100 qiulang$ ./sqlite3 SQLite version 3.25.1 2018-09-18 20:20:44 Enter ".help" for usage hints. Connected to a

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Jens Alfke
> On Sep 20, 2018, at 11:01 PM, 邱朗 wrote: > > https://www.sqlite.org/fts5.html said " > The unicode tokenizer classifies all unicode characters as either "separator" > or "token" characters. By default all space and punctuation characters, as > defined by

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Dan Kennedy
On 09/21/2018 09:44 PM, 邱朗 wrote: I actually first used ./configure CFLAGS="-DSQLITE_ENABLE_ICU `icu-config --cppflags`" LDFLAGS="`icu-config --ldflags`" But I got the error When you ran this configure command, is the first line out output something like the following? bash:

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
I actually first used ./configure CFLAGS="-DSQLITE_ENABLE_ICU `icu-config --cppflags`" LDFLAGS="`icu-config --ldflags`" But I got the error sqlite3.c:184184:10: fatal error: 'unicode/utypes.h' file not found #include Then I added -I -L switches and if I remembered correct I used brew to

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Dan Kennedy
On 09/21/2018 05:21 PM, 邱朗 wrote: Hi, Thanks for replying my question. Following are the error I got when compiling sqlite-autoconf-3250100.tar.gz . The error looks similar to this old discussion http://sqlite.1065341.n5.nabble.com/compiling-Sqlite-with-ICU-td40641.html I am using macOS

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
Hi, Thanks for replying my question. Following are the error I got when compiling sqlite-autoconf-3250100.tar.gz . The error looks similar to this old discussion http://sqlite.1065341.n5.nabble.com/compiling-Sqlite-with-ICU-td40641.html I am using macOS 10.13 & xcode 10 Undefined symbols

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Dan Kennedy
On 09/21/2018 01:38 PM, 邱朗 wrote: I think it could be made to work, or at least, I have experience making it work with CJK based on functionality exposed via ICU. I don't know if the unicode tokenizer uses ICU or if the functionality in ICU that I used is available in the unicode tables. Not

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Scott Robison
On Fri, Sep 21, 2018 at 12:39 AM 邱朗 wrote: > > >I think it could be made to work, or at least, I have experience > >making it work with CJK based on functionality exposed via ICU. I > >don't know if the unicode tokenizer uses ICU or if the functionality > >in ICU that I used is available in the

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
> >I think it could be made to work, or at least, I have experience >making it work with CJK based on functionality exposed via ICU. I >don't know if the unicode tokenizer uses ICU or if the functionality >in ICU that I used is available in the unicode tables. Not >understanding any of the

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Scott Robison
On Fri, Sep 21, 2018 at 12:02 AM 邱朗 wrote: > > https://www.sqlite.org/fts5.html said " The unicode tokenizer classifies all > unicode characters as either "separator" or "token" characters. By default > all space and punctuation characters, as defined by Unicode 6.1, are > considered

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
https://www.sqlite.org/fts5.html said " The unicode tokenizer classifies all unicode characters as either "separator" or "token" characters. By default all space and punctuation characters, as defined by Unicode 6.1, are considered separators, and all other characters as token characters... "

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-20 Thread Scott Robison
On Thu, Sep 20, 2018, 8:21 PM 邱朗 wrote: > Hi, > I had thought Unicode61 Tokenizer can support CJK -- Chinese Japanese > Korean I verify my sqlite supports fts5 > > {snipped} > > But to my surprise it can't find any CJK word at all. Why is that ? Based on my experience with such things, I

[sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-20 Thread 邱朗
Hi, I had thought Unicode61 Tokenizer can support CJK -- Chinese Japanese Korean I verify my sqlite supports fts5 sqlite> pragma compile_options; BUG_COMPATIBLE_20160819 COMPILER=clang-9.0.0 DEFAULT_CACHE_SIZE=2000 DEFAULT_CKPTFULLFSYNC DEFAULT_JOURNAL_SIZE_LIMIT=32768 DEFAULT_PAGE_SIZE=4096