FTFY, you're welcome ;-) Now *MAYBE* someone can read it and possibly reply.
-- -- -- --???-- K e V i N On Thu, Jul 2, 2015 at 9:11 AM, <shuricksoft at ukr.net> wrote: > We use SQLite for indexing and searching the text contents of our app > using FTS4 > (about 27k unique words, about 1 million words for the whole contents). In > particular, > we use the offsets function. Currently, after some testing, we?re > experiencing a plenty > of problems with finding the results needed. > > For the forms of the words searching we use the ?all-to-all? way, which > means we have > some standard English endings and words forms and modify the initial > request so that > all possible combinations are included and separated by OR from each > other. > > I. e. if we have two forms and two words in the request (for simplicity), > that would look > like (the MATCH part): > > ?(word1_form1 NEAR/10 word2_form1) > OR (word1_form1 NEAR/10 word2_form2) > OR (word1_form2 NEAR/10 word2_form1) > OR (word1_form2 NEAR/10 word2_form2)?. > > Initially, the problem appeared that the query returned offsets for > absolutely wrong words. > While searching for ?honest fair?, we got words like ?good?, ?sport?, > ?natural? offsets. > We moved from the system provided (iOS, OS X) SQLite (version 3.8.5, > further called > ?old SQLite?) to the latest one available on the official site (version > 3.8.10.2, > further called ?new SQLite? or just ?SQLite?), which solved the issue, but > returned offsets > for words which were not near (according to NEAR/10). I supposed it?s > because the request > was too big because of lots of the words forms. But, for example, this > request > > for ?offense is rank?: > select documents.name, contents.rowid, part, offsets(contents) > from contents left join documents on contents.document_id = documents.id > where text match '(\"offense\" NEAR/10 \"is\" NEAR/10 \"rank\")' > order by document_id > > returned the correct results, while this one returned false positive > results (0 stands for > apostrophe so it?s not interpreted as a separate token for the simple > tokenizer we use): > > select documents.name, contents.rowid, part, offsets(contents) > from contents left join documents on contents.document_id = documents.id > where text match '(\"offense\" NEAR/10 \"is\" NEAR/10 \"rank\") > OR (\"offense\" NEAR/10 \"is\" NEAR/10 \"rank0s\")' > order by document_id > > The experiments revealed that ?rank0s? can be whatever: ranks", > "rankqwerty" and so on, > even "rankqwertyvrfbgbrevwkefovmwpsrvrm" or "yugbuoiipkipnuo?. If it's > removed, nothing > found. If after that the first word is modified, the one correct result > found. Also, a > search for ?speak again? didn?t find the results at all, though there > should be quite a > lot of them. > > The ?database disk image is malformed? error was reported, but integrity > check completed > with no errors, the database was recreated and that didn?t help, meanwhile > the old SQLite > did find the results for this request in exactly that file with no error > reported. > > Also, the new SQLite worked well for the same request if the offsets > function was replaced > with the snippet function (that also solved the false positive results > problem described > above). The search for ?father? returned not all results. For example, the > first result > which can manually be found in the contents table using SQLite Database > Browser for this > request: > > select documents.name, contents.rowid, part, offsets(contents) > from contents left join documents on contents.document_id = documents.id > where text match '(\"father\") > OR (\"fathere\") OR (\"fathering\") > OR (\"fatherish\") OR (\"fathers\") > OR (\"fatheres\") OR (\"fatherian\") > OR (\"fatheral\") OR (\"father0s\")' > and document_id in (25) > order by document_id > > missed when actually searched, the snippet function doesn?t help here, > neither helps > searching for a single word form. Error logging with > > sqlite3_config(SQLITE_CONFIG_LOG, errorLogCallback, NULL) > > was never called. > Any ideas? >