Hi, as I side-project I put together an implementation of Hyperminhash as an extension for SQLite3. It provides fast, constant-memory cardinality approximation, including the union- and intersection-set.
For example, querying an in-memory table of two million (INT, INT)-rows, having no indexes, the query `SELECT COUNT(*) FROM (SELECT DISTINCT foo, bar FROM foobar)` completes in close to five seconds. The equivalent query `SELECT hyperminhash(foo, bar) FROM foobar` completes in less than 400ms and deviates from the true result by less than 0.5%. The Hyperminhash-functions shine when indexes can't be used and there are lots of unique elements, requiring sqlite to build large temporary tables to hold unique elements in traditional `DISTINCT`-queries. There is also significant memory savings, as the data structure used by Hyperminhash is just 32kb and never grows while counting unique elements. People may find this useful, the code is found at https://github.com/lukaslueg/sqlite3_hyperminhash -- Sent from: http://sqlite.1065341.n5.nabble.com/ _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users