We use SQLite for indexing and searching the text contents of our app using 
FTS4 (about 27k unique words, about 1 million words for the whole contents). In 
particular, we use the offsets function. Currently, after some testing, we?re 
experiencing a plenty of problems with finding the results needed. For the 
forms of the words searching we use the ?all-to-all? way, which means we have 
some standard English endings and words forms and modify the initial request so 
that all possible combinations are included and separated by OR from each 
other. I. e. if we have two forms and two words in the request (for 
simplicity), that would look like (the MATCH part): ?(word1_form1 NEAR/10 
word2_form1) OR (word1_form1 NEAR/10 word2_form2) OR (word1_form2 NEAR/10 
word2_form1) OR (word1_form2 NEAR/10 word2_form2)?. Initially, the problem 
appeared that the query returned offsets for absolutely wrong words. While 
searching for ?honest fair?, we got words like ?good?, ?sport?, ?natural? 
offsets. We moved from the system provided (iOS, OS X) SQLite (version 3.8.5, 
further called ?old SQLite?) to the latest one available on the official site 
(version 3.8.10.2, further called ?new SQLite? or just ?SQLite?), which solved 
the issue, but returned offsets for words which were not near (according to 
NEAR/10). I supposed it?s because the request was too big because of lots of 
the words forms. But, for example, this request for ?offense is rank?: select 
documents.name, contents.rowid, part, offsets(contents) from contents left join 
documents on contents.document_id = documents.id where text match '(\"offense\" 
NEAR/10 \"is\" NEAR/10 \"rank\")' order by document_id returned the correct 
results, while this one returned false positive results (0 stands for 
apostrophe so it?s not interpreted as a separate token for the simple tokenizer 
we use): select documents.name, contents.rowid, part, offsets(contents) from 
contents left join documents on contents.document_id = documents.id where text 
match '(\"offense\" NEAR/10 \"is\" NEAR/10 \"rank\") OR (\"offense\" NEAR/10 
\"is\" NEAR/10 \"rank0s\")' order by document_id The experiments revealed that 
?rank0s? can be whatever: ranks", "rankqwerty" and so on, even 
"rankqwertyvrfbgbrevwkefovmwpsrvrm" or "yugbuoiipkipnuo?. If it's removed, 
nothing found. If after that the first word is modified, the one correct result 
found. Also, a search for ?speak again? didn?t find the results at all, though 
there should be quite a lot of them. The ?database disk image is malformed? 
error was reported, but integrity check completed with no errors, the database 
was recreated and that didn?t help, meanwhile the old SQLite did find the 
results for this request in exactly that file with no error reported. Also, the 
new SQLite worked well for the same request if the offsets function was 
replaced with the snippet function (that also solved the false positive results 
problem described above). The search for ?father? returned not all results. For 
example, the first result which can manually be found in the contents table 
using SQLite Database Browser for this request: select documents.name, 
contents.rowid, part, offsets(contents) from contents left join documents on 
contents.document_id = documents.id where text match '(\"father\") OR 
(\"fathere\") OR (\"fathering\") OR (\"fatherish\") OR (\"fathers\") OR 
(\"fatheres\") OR (\"fatherian\") OR (\"fatheral\") OR (\"father0s\")' and 
document_id in (25) order by document_id missed when actually searched, the 
snippet function doesn?t help here, neither helps searching for a single word 
form. Error logging with sqlite3_config(SQLITE_CONFIG_LOG, errorLogCallback, 
NULL) was never called. 
Any ideas? 

Reply via email to