>The Minimal-Perfect-Hash-INTERSECTION-OF-VECTORS approach might benefit >queries against tables having several million rows. What I'm wondering >(and >lack the C skills to find out for myself) is whether SQLite's underlying >algorithms for INTERSECT could be optimized with a minimal perfect hash >approach. The algorithms would decide which vector of ids is the better >candidate to be used for the MPH and which is the better candidate to be >iterated, and then iterate over the one vector of ids, testing for the >existence of each id in the MPH using the optimized Exists() function >supplied by the mph library for the particular type of mph being used. > >The question, in scenarios where the vectors contain many items, is >whether >the overhead of creating the MPH and testing for existence is >significantly >less than the overhead of doing whatever INTERSECT is doing now when it >intersects vectors of ids. You have a ready-made acronym to advertise the >speed if it turns out to be faster: MPH. ;-)
Ooops, for some reason the end of my post went deleted. I was adding that the simplest naive approch using only one index can be magnitude faster, depending on how selective the conditions are, for a given set content. I believe the key is "know your data" for the same reason you'll craft a regexp differently depending on what you know about the target text. Histogram can help this. Good MPH pun! _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users