Re: random.sample with large weighted sample-sets?

2014-02-16 Thread Terry Reedy
On 2/15/2014 11:41 PM, Tim Chase wrote: I'm not coming up with the right keywords to find what I'm hunting. I'd like to randomly sample a modestly compact list with weighted distributions, so I might have data = ( (apple, 20), (orange, 50), (grape, 30), ) If you

Re: random.sample with large weighted sample-sets?

2014-02-16 Thread Tim Chase
On 2014-02-16 04:12, Terry Reedy wrote: On 2/15/2014 11:41 PM, Tim Chase wrote: data = ( (apple, 20), (orange, 50), (grape, 30), ) To Ben, yes, this was just some sample data; the original gets built from an external (i.e., client-supplied, thus the need to

Re: random.sample with large weighted sample-sets?

2014-02-16 Thread Ned Batchelder
On 2/16/14 9:22 AM, Tim Chase wrote: 3) you meant to write (10, 'apple') rather than 0. With my original example code, a 0-probability shouldn't ever show up in the sampling, where it looks like it might when using this sample code. In my particular use case, I can limit/ensure that

Re: random.sample with large weighted sample-sets?

2014-02-16 Thread duncan smith
On 16/02/14 05:08, Ben Finney wrote: Tim Chase python.l...@tim.thechases.com writes: I'm not coming up with the right keywords to find what I'm hunting. I'd like to randomly sample a modestly compact list with weighted distributions, so I might have data = ( (apple, 20), (orange,

Re: random.sample with large weighted sample-sets?

2014-02-16 Thread Peter Otten
Tim Chase wrote: On 2014-02-16 04:12, Terry Reedy wrote: On 2/15/2014 11:41 PM, Tim Chase wrote: data = ( (apple, 20), (orange, 50), (grape, 30), ) To Ben, yes, this was just some sample data; the original gets built from an external (i.e., client-supplied,

Re: random.sample with large weighted sample-sets?

2014-02-16 Thread Charles Allen
How efficient does this thing need to be? You can always just turn it into a two-dimensional sampling problem by thinking of the data as a function f(x=item), generating a random x=xr in [0,x], then generating a random y in [0,max(f(x))]. The xr is accepted if 0 y = max(f(xr)), or rejected (and

Re: random.sample with large weighted sample-sets?

2014-02-16 Thread duncan smith
On 16/02/14 16:35, Charles Allen wrote: How efficient does this thing need to be? You can always just turn it into a two-dimensional sampling problem by thinking of the data as a function f(x=item), generating a random x=xr in [0,x], then generating a random y in [0,max(f(x))]. The xr is

Re: random.sample with large weighted sample-sets?

2014-02-16 Thread Terry Reedy
On 2/16/2014 9:22 AM, Tim Chase wrote: On 2014-02-16 04:12, Terry Reedy wrote: On 2/15/2014 11:41 PM, Tim Chase wrote: data = ( (apple, 20), (orange, 50), (grape, 30), ) If you actually start with date in this form, write the few lines needed to produce the form

Re: random.sample with large weighted sample-sets? [SOLVED]

2014-02-16 Thread Tim Chase
On 2014-02-16 14:47, Terry Reedy wrote: 2) the data has to be sorted for bisect to work cumulative sums are automatically sorted. Ah, that they were *cumulative* was the key that I missed in my understanding. It makes sense now and works like a charm. Thanks to all who offered a hand in

random.sample with large weighted sample-sets?

2014-02-15 Thread Tim Chase
I'm not coming up with the right keywords to find what I'm hunting. I'd like to randomly sample a modestly compact list with weighted distributions, so I might have data = ( (apple, 20), (orange, 50), (grape, 30), ) and I'd like to random.sample() it as if it was a 100-element

Re: random.sample with large weighted sample-sets?

2014-02-15 Thread Ben Finney
Tim Chase python.l...@tim.thechases.com writes: I'm not coming up with the right keywords to find what I'm hunting. I'd like to randomly sample a modestly compact list with weighted distributions, so I might have data = ( (apple, 20), (orange, 50), (grape, 30), )