Hi all, I found a previous FTS5 thread and, encouraged by the comments of Dan
Kennedy, thought I would comment on the issue. - Smaller memory footprint and
more speed is always great. I'm already very impressed with the speed but
even faster is even better of course. My experience is that searches that
produce few hits are very fast (well under a second on a db with 10M+ records).
Searches that produce many hits (tens of thousands) are much slower: several
seconds, or even minutes if there are 100,000+ hits. I can live with that, but
improvement on queries with many hits would be welcome. - I would probably also
make tokenize=unicode61 "remove_diacritics=0" the default tokenization
behaviour instead of simple, but that's a minor issue. - In my usage, the
most inconvenient limitation is that the first search term can't be
negative in queries (i.e. MATCH foo NOT bar is good but MATCH NOT bar foo
throws an error). I would also like to have negative-only queries (MATCH NOT f
oo, returning all records that don't contain foo). Negative-only queries
would mostly be used in combination (INTERSECT) with a positive query on
another column. I know this is probably not a common need, but one can dream. -
Fuzzy matching would be useful as well, but obviously that's a major
feature and introducing it might well compromise performance.
- Same for in-word matching (i.e. MATCH reasonable also matching "unreasonable")
- Same for advanced matching like matching 3 out of 4 search terms if there is
no match with 4 out of 4, or ranking hits based on how close to each other
terms occur.
- For some reason, searches like SELECT * FROM ftstable WHERE col1 MATCH ?
INTERSECT SELECT * FROM ftstable WHERE col2 MATCH ? run very slowly for me.
Much slower than running the two queries separately. This may not be related to
FTS per se, and maybe the query could be written better.
- BTW, will there be full backwards compatibility? And I assume one will need
to recreate (export/reimport) existing databases with FTS5 in order to enjoy
the new features, right? AF
Context:
"Fts5 will use less memory and be faster than fts4 (I think - initial
testing has been positive). It will also be smaller, as we can do
without a bunch of code that is used to workaround problems inherent in
the file-format.
...
The most user-visible change is the addition of an API that allows users
to write their own auxiliary (i.e. snippet(), rank(), offsets()) functions:
http://www.sqlite.org/src/artifact/064f9bf705e59d
The included snippet() and rank() functions use this API.
...
Fts5 is still in the experimental stage at the moment.
If anybody has any ideas for useful features, or knows of problems with
FTS4 that could be fixed in FTS5, don't keep them to yourself! - Dan
Kennedy"
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users