For our stock management I created a SQLITE 3.5.3 table, containing around 1,5M entries.

The FTS3 table is created with statement:
CREATE VIRTUAL TABLE volltext using FTS3(referenzcode, code, deut, engl, ital, sppm, rep, info)

"referenzcode" and "code" are containing part numbers with 10 or 11 digits or letters, f.e. "1401326732D" "deut", "engl" and "ital" are containing descriptions in different languages of the same part, max. 80 characters. "sppm" and "rep" are internal references with up to 6 characters, describing the part usage ("mech", "elect", ...) "info" is the biggest field, containing up to some 1000 characters but also sometimes empty. "info" may contain f.e. a digitalized product manual or installation manual text.

What is really strange is that FTS3 search phrases like
SELECT referenzcode FROM volltext where volltext match ('installation manual') are performed really fast within some milliseconds, independent to the search phrase. But in general I allow the user to enter a search word in a dedicated form field (Windows computers), the search starts after entering of each additional character.

That means:
Entering the word "installation" the search starts after entering the character "i". Then, after entering the second character "n" the search field is "in" and starts again. Then the user enters "s" and the search is interrupted and starts again with search word "ins".

To be able to find not only table entries containing "i", "in", "ins", "inst", ... there is automatically added the character "*" - in fact the search phrase is "i*", "in*", "ins"*, "inst*", ...

Now it's very interesting that a search phrase containing at least 4 characters causes a search time of max. some seconds. But as some parts in our stock are having short names like "P7" or "E5", sometimes the search phrase is only 2 characters long plus additional "*".

I found out that some character combinations are causing longer search durations than others.

The search of "E5*" f.e takes 2,7 seconds and is finding 24419 entries.
But if the search f.e. is "F1*" the search takes around 1128,5 seconds to find 77652 entries. Other search phrases "vi*" are taking 3,7s for 14803 entries or "ta*" takes 42,8s for 102189 entries.
A "very good" example is "tm*" with 0,2s for 1111 entries.

It's clear to me that the search time in some cases takes longer as in other cases. Especially I would expect that the search takes as longer as the amount of found entries is bigger.

But is there anyone who could explain me, why "F1*" takes 1128,5 seconds search time? Or "F3*" takes 202,8s? What is the reason for such a long duration?

In my opinion a short search phrase with "*" should be very fast.

By the way, an other example: The same table with same entries is existing in a standard SQLITE database, without using FTS3.

The search of every phrase, independent to length of the phrase, takes max. 55 seconds only (after first call; every further search is performed in max. 10 seconds as the table seems to be in cache then). Unfortunately this standard search is finding every entry containing the search phrase, not only the rows where the word is beginning with the search phrase.


The goal - and reason for my question - is: I tried to use fulltext search as it should be faster than the standard SQLITE search method, and in fact it is faster if the search phrase is long enough. But with short search phrases the FTS3 extension seems to run in troubles.

Is there any possibility to solve this behaviour? Today I tried every possible search phrase combination with 2 characters only, noticed the search time and decided to use FTS search only in case the search will likely take less than 30 seconds. As soon as the search phrase will take longer than 30 seconds I use the standard SQLITE3 search algorithm. That's a workaround for today, but I consider someone being here who could improve the algorithm behaviour of FTS3?

By the way, as I wrote above, the "referenzcode" entry contains mostly numbers with only some other characters. FTS3 search with digits only like "13*" or "76*" takes always more than 120 seconds, so the use of FTS 3 is never possible in that case. It makes only sense to use FTS3 if the search phrase with digits is at least 4 characters long (f.e. "1403*").

Would be interesting to read your comments about that...

Many thanks!


Best regards
Ingo



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to