>The Minimal-Perfect-Hash-INTERSECTION-OF-VECTORS approach might benefit
>queries against tables having several million rows. What I'm wondering 
>(and
>lack the C skills to find out for myself) is whether SQLite's underlying
>algorithms for INTERSECT could be optimized with a minimal perfect hash
>approach. The algorithms would decide which vector of ids is the better
>candidate to be used for the MPH and which is the better candidate to be
>iterated, and then iterate over the one vector of ids, testing for the
>existence of each id in the MPH using the optimized Exists() function
>supplied by the mph library for the particular type of mph being used.
>
>The question, in scenarios where the vectors contain many items, is 
>whether
>the overhead of creating the MPH and testing for existence is 
>significantly
>less than the overhead of doing whatever INTERSECT is doing now when it
>intersects vectors of ids.  You have a ready-made acronym to advertise the
>speed if it turns out to be faster: MPH.  ;-)

Ooops, for some reason the end of my post went deleted.  I was adding 
that the simplest naive approch using only one index can be magnitude 
faster, depending on how selective the conditions are, for a given set 
content.  I believe the key is "know your data" for the same reason 
you'll craft a regexp differently depending on what you know about the 
target text.  Histogram can help this.

Good MPH pun!


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to