> On Nov 7, 2019, at 2:15 PM, Merijn Verstraaten <mer...@inconsistent.nl> wrote: > > >> On 7 Nov 2019, at 19:16, David Raymond <david.raym...@tomtom.com> wrote: >> >> Along those lines SQLite includes the reverse_unordered_selects pragma >> https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects >> which will flip the order it sends rows in queries that don't explicitly >> specify an ordering. It's there to assist you in finding spots in your code >> where you might be relying on implicit ordering when you really shouldn't be. > > Like the rest of this threads, this is just pointing out why the things in my > initial email don't work, but I already knew that. Which is why I asked for > help to see if there is a way to do what I want that *does* work. I don't > care particularly about the details of "can I control the order the condition > is evaluated", it's just that all reasonable ways to sample large streams > that I know would require a deterministic order. > > If someone has a different/better idea on how to return just a random sample > from a query in a repeatable way, I'm all ears. > > So far the only suggestion was "use some non-deterministic random sampling > method and store the result", but since my samples are large and I have lots > of them, this would balloon my storage by >100x and I don't have the > available storage to make that work. > > - Merijn >
One thought would be to generate a ‘hash’ from part of the record, maybe the record ID, and select records based on that value. The simplest would be something like id%100 == 0 would get you 1% of the records. That admittedly isn’t that random. Put the ID through a linear congruential generator, something like mod(a * Id + b, c) % 100 == 0 And you will pretty well scramble the selection _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users