You don't say what language you are working in. IN C++ I would just declare a "set" and put random row numbers in it until I had enough. Then use that set to build the SQL.
SQLite's random() doesn't have a seed function so you don't really get very random numbers from run-to-run and have no good way of controlling it that I can find in the docs. You want to use your language's random function if you want anything close to real randomness. Hopefully your language has a similar data structure you can use. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems ________________________________ From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Benoit Mortgat [mort...@gmail.com] Sent: Thursday, March 08, 2012 4:10 AM To: General Discussion of SQLite Database Subject: EXT :[sqlite] Efficient random sampling in a large table using builtin functions. I have a table with millions of records. When I run a query with ORDER BY random() LIMIT N; the RANDOM() function is evaluated against all rows of my result set, then sorting occurs, and as a result the query is slow. In this case the query could be rewritten as: * Generate N, random, row numbers between 1 and (SELECT COUNT(*) FROM the_table_name). Maybe using remainder operator % and builtin ABS() and RANDOM() functions can help (see below) * SELECT FROM the_table WHERE rowid IN (those random numbers) For the moment the most simple query I can think of is: SELECT * FROM my_table WHERE rowid IN ( SELECT 1 + (ABS(r.x) % c.num_rows) FROM (SELECT COUNT(*) AS num_rows FROM my_table ) AS c CROSS JOIN (SELECT random() x FROM my_table LIMIT N ) AS r ); This can however return less than N rows if by chance two random numbers have the same remainder modulo COUNT(*) FROM my_table. Note that the generation of N random numbers is quick because there is no ORDER BY involved that would require computation of as many random numbers as there are rows in the original table. This could maybe be optimized inside SQLite if those conditions are met: * random() hasn't been overridden with sqlite_create_function() nor sqlite_create_function_v2() * the user SELECTs FROM a table without joins and w/o WHERE conditions. -- Benoit Mortgat _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users