For our stock management I created a SQLITE 3.5.3 table, containing
around 1,5M entries.
The FTS3 table is created with statement:
CREATE VIRTUAL TABLE volltext using FTS3(referenzcode, code, deut, engl,
ital, sppm, rep, info)
"referenzcode" and "code" are containing part numbers with 10 or 11
digits or letters, f.e. "1401326732D"
"deut", "engl" and "ital" are containing descriptions in different
languages of the same part, max. 80 characters.
"sppm" and "rep" are internal references with up to 6 characters,
describing the part usage ("mech", "elect", ...)
"info" is the biggest field, containing up to some 1000 characters but
also sometimes empty. "info" may contain f.e. a digitalized product
manual or installation manual text.
What is really strange is that FTS3 search phrases like
SELECT referenzcode FROM volltext where volltext match ('installation
manual') are performed really fast within some milliseconds, independent
to the search phrase.
But in general I allow the user to enter a search word in a dedicated
form field (Windows computers), the search starts after entering of each
additional character.
That means:
Entering the word "installation" the search starts after entering the
character "i". Then, after entering the second character "n" the search
field is "in" and starts again. Then the user enters "s" and the search
is interrupted and starts again with search word "ins".
To be able to find not only table entries containing "i", "in", "ins",
"inst", ... there is automatically added the character "*" - in fact the
search phrase is "i*", "in*", "ins"*, "inst*", ...
Now it's very interesting that a search phrase containing at least 4
characters causes a search time of max. some seconds.
But as some parts in our stock are having short names like "P7" or "E5",
sometimes the search phrase is only 2 characters long plus additional "*".
I found out that some character combinations are causing longer search
durations than others.
The search of "E5*" f.e takes 2,7 seconds and is finding 24419 entries.
But if the search f.e. is "F1*" the search takes around 1128,5 seconds
to find 77652 entries.
Other search phrases "vi*" are taking 3,7s for 14803 entries or "ta*"
takes 42,8s for 102189 entries.
A "very good" example is "tm*" with 0,2s for 1111 entries.
It's clear to me that the search time in some cases takes longer as in
other cases. Especially I would expect that the search takes as longer
as the amount of found entries is bigger.
But is there anyone who could explain me, why "F1*" takes 1128,5 seconds
search time? Or "F3*" takes 202,8s? What is the reason for such a long
duration?
In my opinion a short search phrase with "*" should be very fast.
By the way, an other example: The same table with same entries is
existing in a standard SQLITE database, without using FTS3.
The search of every phrase, independent to length of the phrase, takes
max. 55 seconds only (after first call; every further search is
performed in max. 10 seconds as the table seems to be in cache then).
Unfortunately this standard search is finding every entry containing the
search phrase, not only the rows where the word is beginning with the
search phrase.
The goal - and reason for my question - is: I tried to use fulltext
search as it should be faster than the standard SQLITE search method,
and in fact it is faster if the search phrase is long enough. But with
short search phrases the FTS3 extension seems to run in troubles.
Is there any possibility to solve this behaviour? Today I tried every
possible search phrase combination with 2 characters only, noticed the
search time and decided to use FTS search only in case the search will
likely take less than 30 seconds. As soon as the search phrase will take
longer than 30 seconds I use the standard SQLITE3 search algorithm.
That's a workaround for today, but I consider someone being here who
could improve the algorithm behaviour of FTS3?
By the way, as I wrote above, the "referenzcode" entry contains mostly
numbers with only some other characters. FTS3 search with digits only
like "13*" or "76*" takes always more than 120 seconds, so the use of
FTS 3 is never possible in that case. It makes only sense to use FTS3 if
the search phrase with digits is at least 4 characters long (f.e. "1403*").
Would be interesting to read your comments about that...
Many thanks!
Best regards
Ingo
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------