A few days ago I suspected a bug with SQLite FTS4 parenthesis when using
the ICU tokenizer. To rule out it was my slightly altered SQLite build
environment to compile SQLite + ICU for the iPhone, I reproduced this
issue today with the current version of SQLite on OS X.

Steps to reproduce:

SQLite (3.8.4.3) was configured and built on OS X 10.9.2 using:
./configure CFLAGS="-DSQLITE_ENABLE_ICU `/opt/local/bin/icu-config
—cppflags`-DSQLITE_ENABLE_FTS3_PARENTHESIS -I/usr/local/opt/icu4c/include
-L/usr/local/opt/icu4c/lib" LDFLAGS="`/opt/local/bin/icu-config --ldflags`“


./sqlite3 test.sqlite3



— Test without ICU
create virtual table test using fts4(intcol, stringcol);
insert into test(intcol, stringcol) values (1, "a");
insert into test(intcol, stringcol) values (2, "b");
insert into test(intcol, stringcol) values (3, "c");

insert into test(intcol, stringcol) values (4, "c");




select * from test where test match '(intcol:1 OR intcol:2)';
1|a
2|b


=> OK

sqlite> select * from test where test match '(intcol:1 OR intcol:2) AND
stringcol:a';
1|a



=> OK

select * from test where test match '(intcol:1 OR intcol:2 OR intcol:3 OR
intcol:4) AND (stringcol:a* OR stringcol:c*)';
1|a
3|c
4|c


=> OK

drop table test;

— Test with ICU
SELECT icu_load_collation("de_DE", "LOCALIZED");


create virtual table test using fts4(tokenize=icu LOCALIZED, intcol,
stringcol);
insert into test(intcol, stringcol) values (1, "a");
insert into test(intcol, stringcol) values (2, "b");
insert into test(intcol, stringcol) values (3, "c");
insert into test(intcol, stringcol) values (4, "c");

select * from test where test match '(intcol:1 OR intcol:2)';




=> No result. Expected to return two rows 1|a, 2|b.

select * from test where test match '(intcol:1 OR intcol:2 OR intcol:3 OR
intcol:4) AND (stringcol:a* OR stringcol:c*)';
2|b
3|c


=> Wrong result. 2|b must not be in the result, and 4|c is is missing.

sqlite> select * from test where test match 'intcol:1 OR intcol:2';
1|a
2|b


=> Result OK

However, leaving away "tokenize=icu LOCALIZED“ and adding „COLLATE
LOCALIZED“ to each column in the create statement returns the expected
results. The documentation however doesn’t mention COLLATE. Therefore I’m
not sure if that has any drawback, e.g. whether or not it’s using the ICU
tokenizer.

Regards
Benjamin Stadin

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to