Hello, full text search index can be used to see how the text is tokenized for both FTS4 and FTS5. for FTS4, fts3tokenize can be used too.
sqlite> CREATE VIRTUAL TABLE icu_zh_cn USING fts3tokenize(icu, zh_CN); sqlite> SELECT token, start, end, position FROM icu_zh_cn WHERE INPUT='为什么不支持中文 fts5 does not seem to work for chinese'; 为什么|0|9|0 不|9|12|1 支持|12|18|2 中文|18|24|3 fts5|25|29|4 does|30|34|5 not|35|38|6 seem|39|43|7 to|44|46|8 work|47|51|9 for|52|55|10 chinese|56|63|11 based on the output above, the query below works and makes sense to me. sqlite> select * from zh_text where text match '中文'; 为什么不支持中文 icu does not seem to work for chinese FTS5 + unicode61 sqlite> CREATE VIRTUAL TABLE ft5_test USING fts5(content, tokenize = 'porter unicode61 remove_diacritics 1'); sqlite> INSERT INTO ft5_test values('为什么不支持中文 fts5 does not seem to work for chinese'); sqlite> CREATE VIRTUAL TABLE ft5_test_vocab_i USING fts5vocab(ft5_test, 'instance'); sqlite> SELECT term, doc, col, offset FROM ft5_test_vocab_i; (snip non-Chinese portion) 为什么不支持中文|1|content|0 FTS4 + ICU(zh_CN) sqlite> CREATE VIRTUAL TABLE zh_text USING fts4(text, tokenize=icu zh_CN); sqlite> INSERT INTO zh_text values('为什么不支持中文 icu does not seem to work for chinese'); sqlite> CREATE VIRTUAL TABLE zh_terms USING fts4aux(zh_text); sqlite> SELECT term, col, documents FROM zh_terms; (snip non-Chinese portion) 不|*|1 不|0|1 中文|*|1 中文|0|1 为什么|*|1 为什么|0|1 支持|*|1 支持|0|1 Thanks, Hideaki On Sat, Sep 22, 2018 at 12:44 AM 邱朗 <qiulang2...@126.com> wrote: > Hi, > > > It was exactly like you said, my bad, so now I have built an icu version. > BUT unfortunately it still does not support CJK, why is that ? > > > qiulangs-MacBook-Pro:sqlite-autoconf-3250100 qiulang$ ./sqlite3 > SQLite version 3.25.1 2018-09-18 20:20:44 > Enter ".help" for usage hints. > Connected to a transient in-memory database. > Use ".open FILENAME" to reopen on a persistent database. > sqlite> CREATE VIRTUAL TABLE zh_text USING fts4(text, tokenize=icu zh_CN); > sqlite> INSERT INTO zh_text values('为什么不支持中文 icu does not seem to work for > chinese'); > sqlite> select * from zh_text where text match 'work'; > 为什么不支持中文 icu does not seem to work for chinese > sqlite> select * from zh_text where text match '中'; > sqlite> > > > BTW, whoever hit the icu4c error it may be because you make the same > mistake as I did. So I first run brew link icu4c, but brew refused, > "Warning: Refusing to link macOS-provided software: icu4c", then I forgot > to add it to my path :$ > > > If you run brew info icu4c, it will tell you that but actually I didn't > set them and compiler still can find them > > > For compilers to find icu4c you may need to set: > export LDFLAGS="-L/usr/local/opt/icu4c/lib" > export CPPFLAGS="-I/usr/local/opt/icu4c/include" > > > Thanks, > Qiulang > At 2018-09-21 23:43:01, "Dan Kennedy" <danielk1...@gmail.com> wrote: > >On 09/21/2018 09:44 PM, 邱朗 wrote: > >> I actually first used ./configure CFLAGS="-DSQLITE_ENABLE_ICU > `icu-config --cppflags`" LDFLAGS="`icu-config --ldflags`" But I got the > error > > > >When you ran this configure command, is the first line out output > >something like the following? > > > > bash: icu-config: command not found > > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users