We went off on a tangent, apologies. If you have contiguous integer primary keys, you could randomly sample that range of integers, then pull the records with those keys.
Or in your external language of choice, sample the integers from 1 to the record count deterministically, select ordered by the primary key, and take the ones with the sampled offsets. (Stepping through 1 query and NOT doing a bunch of ...order by pk limit 1 offset n... queries) Making something quick in Python I might do something like: import random import sqlite3 conn = sqlite3.connect(dbFile, isolation_level = None) cur = conn.cursor() cur.execute("select count(*) from foo;") numRecords = cur.fetchone()[0] sampleSize = 10 random.seed(5) #Your deterministic seed here SampleOffsets = random.sample(range(1, numRecords + 1), sampleSize) SampleOffsets.sort() cur.execute("select * from foo order by primary_key;") currentOffset = 0 for selectedOffset in SampleOffsets: for _ in range(selectedOffset - currentOffset - 1): cur.fetchone() nextSampleRecord = cur.fetchone() currentOffset = selectedOffset doSomethingWithSample(nextSampleRecord) -----Original Message----- From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> On Behalf Of Merijn Verstraaten Sent: Thursday, November 7, 2019 2:16 PM To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org> Subject: Re: [sqlite] Deterministic random sampling via SELECT > On 7 Nov 2019, at 19:16, David Raymond <david.raym...@tomtom.com> wrote: > > Along those lines SQLite includes the reverse_unordered_selects pragma > https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects > which will flip the order it sends rows in queries that don't explicitly > specify an ordering. It's there to assist you in finding spots in your code > where you might be relying on implicit ordering when you really shouldn't be. Like the rest of this threads, this is just pointing out why the things in my initial email don't work, but I already knew that. Which is why I asked for help to see if there is a way to do what I want that *does* work. I don't care particularly about the details of "can I control the order the condition is evaluated", it's just that all reasonable ways to sample large streams that I know would require a deterministic order. If someone has a different/better idea on how to return just a random sample from a query in a repeatable way, I'm all ears. So far the only suggestion was "use some non-deterministic random sampling method and store the result", but since my samples are large and I have lots of them, this would balloon my storage by >100x and I don't have the available storage to make that work. - Merijn _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users