Hello,

full text search index can be used to see how the text is tokenized for
both FTS4 and FTS5.
for FTS4, fts3tokenize can be used too.

sqlite> CREATE VIRTUAL TABLE icu_zh_cn USING fts3tokenize(icu, zh_CN);
sqlite> SELECT token, start, end, position FROM icu_zh_cn WHERE
INPUT='为什么不支持中文 fts5 does not seem to work for chinese';
为什么|0|9|0
不|9|12|1
支持|12|18|2
中文|18|24|3
fts5|25|29|4
does|30|34|5
not|35|38|6
seem|39|43|7
to|44|46|8
work|47|51|9
for|52|55|10
chinese|56|63|11

based on the output above, the query below works and makes sense to me.
sqlite> select * from zh_text where text match '中文';
为什么不支持中文 icu does not seem to work for chinese


FTS5 + unicode61
sqlite> CREATE VIRTUAL TABLE ft5_test USING fts5(content, tokenize =
'porter unicode61 remove_diacritics 1');
sqlite> INSERT INTO ft5_test values('为什么不支持中文 fts5 does not seem to work
for chinese');
sqlite> CREATE VIRTUAL TABLE ft5_test_vocab_i USING fts5vocab(ft5_test,
'instance');
sqlite> SELECT term, doc, col, offset FROM ft5_test_vocab_i;
(snip non-Chinese portion)
为什么不支持中文|1|content|0

FTS4 + ICU(zh_CN)
sqlite> CREATE VIRTUAL TABLE zh_text USING fts4(text, tokenize=icu zh_CN);
sqlite> INSERT INTO zh_text values('为什么不支持中文 icu does not seem to work for
chinese');
sqlite> CREATE VIRTUAL TABLE zh_terms USING fts4aux(zh_text);
sqlite> SELECT term, col, documents FROM zh_terms;
(snip non-Chinese portion)
不|*|1
不|0|1
中文|*|1
中文|0|1
为什么|*|1
为什么|0|1
支持|*|1
支持|0|1

Thanks,
Hideaki

On Sat, Sep 22, 2018 at 12:44 AM 邱朗 <qiulang2...@126.com> wrote:

> Hi,
>
>
> It was exactly like you said, my bad, so now I have built an icu version.
> BUT unfortunately it still does not support CJK, why is that ?
>
>
> qiulangs-MacBook-Pro:sqlite-autoconf-3250100 qiulang$ ./sqlite3
> SQLite version 3.25.1 2018-09-18 20:20:44
> Enter ".help" for usage hints.
> Connected to a transient in-memory database.
> Use ".open FILENAME" to reopen on a persistent database.
> sqlite> CREATE VIRTUAL TABLE zh_text USING fts4(text, tokenize=icu zh_CN);
> sqlite> INSERT INTO zh_text values('为什么不支持中文 icu does not seem to work for
> chinese');
> sqlite> select * from zh_text where text match 'work';
> 为什么不支持中文 icu does not seem to work for chinese
> sqlite> select * from zh_text where text match '中';
> sqlite>
>
>
> BTW, whoever hit the icu4c error it may be because you make the same
> mistake as I did. So I first run brew link icu4c, but brew refused,
> "Warning: Refusing to link macOS-provided software: icu4c", then I forgot
> to add it to my path :$
>
>
> If you run brew info icu4c, it will tell you that but actually I didn't
> set them and compiler still can find them
>
>
> For compilers to find icu4c you may need to set:
>   export LDFLAGS="-L/usr/local/opt/icu4c/lib"
>   export CPPFLAGS="-I/usr/local/opt/icu4c/include"
>
>
> Thanks,
> Qiulang
> At 2018-09-21 23:43:01, "Dan Kennedy" <danielk1...@gmail.com> wrote:
> >On 09/21/2018 09:44 PM, 邱朗 wrote:
> >> I actually first used  ./configure CFLAGS="-DSQLITE_ENABLE_ICU
> `icu-config --cppflags`" LDFLAGS="`icu-config --ldflags`"  But I got the
> error
> >
> >When you ran this configure command, is the first line out output
> >something like the following?
> >
> >   bash: icu-config: command not found
> >
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to