Re: [sqlite] Deterministic random sampling via SELECT

Richard Damon Thu, 07 Nov 2019 13:24:08 -0800

> On Nov 7, 2019, at 2:15 PM, Merijn Verstraaten <mer...@inconsistent.nl> wrote:
> 
> 
>> On 7 Nov 2019, at 19:16, David Raymond <david.raym...@tomtom.com> wrote:
>> 
>> Along those lines SQLite includes the reverse_unordered_selects pragma
>> https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects
>> which will flip the order it sends rows in queries that don't explicitly 
>> specify an ordering. It's there to assist you in finding spots in your code 
>> where you might be relying on implicit ordering when you really shouldn't be.
> 
> Like the rest of this threads, this is just pointing out why the things in my 
> initial email don't work, but I already knew that. Which is why I asked for 
> help to see if there is a way to do what I want that *does* work. I don't 
> care particularly about the details of "can I control the order the condition 
> is evaluated", it's just that all reasonable ways to sample large streams 
> that I know would require a deterministic order.
> 
> If someone has a different/better idea on how to return just a random sample 
> from a query in a repeatable way, I'm all ears.
> 
> So far the only suggestion was "use some non-deterministic random sampling 
> method and store the result", but since my samples are large and I have lots 
> of them, this would balloon my storage by >100x and I don't have the 
> available storage to make that work.
> 
> - Merijn
>


One thought would be to generate a ‘hash’ from part of the record, maybe the 
record ID, and select records based on that value. The simplest would be 
something like id%100 == 0 would get you 1% of the records. That admittedly 
isn’t that random.

Put the ID through a linear congruential generator, something like

mod(a * Id + b, c) % 100 == 0

And you will pretty well scramble the selection
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Deterministic random sampling via SELECT

Reply via email to