On 02/27/2012 05:59 AM, Hamish Allan wrote:
The docs for the simple tokenizer
(http://www.sqlite.org/fts3.html#tokenizer) say:
"A term is a contiguous sequence of eligible characters, where
eligible characters are all alphanumeric characters, the "_"
character, and all characters with UTF codepoints greater than or
equal to 128."
If I do:
CREATE VIRTUAL TABLE test USING fts3();
INSERT INTO test (content) VALUES ('hello_world');
SELECT * FROM test WHERE content MATCH 'orld';
SELECT * FROM test WHERE content MATCH 'world';
I get no match for the first query, because it doesn't match a term,
but I get a match for the second, whereas according to my reading of
the docs "world" shouldn't be a term because the underscore character
shouldn't be considered a term break.
Can anyone please help me understand this behaviour?
Documentation bug. Eligible characters are just alphanumerics and
UTF codepoints greater than 128.
Dan.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users