Sampling a population

2006-06-02 Thread Brian Quinlan
This is less a Python question and more a optimization/probability question. Imaging that you have a list of objects and there frequency in a population e.g. lst = [(a, 0.01), (b, 0.05), (c, 0.50), (d, 0.30), (e, 0.04), (f, 0.10)] and you want to drawn n items from that list (duplicates allowed

Re: Sampling a population

2006-06-02 Thread Duncan Smith
Brian Quinlan wrote: > This is less a Python question and more a optimization/probability > question. Imaging that you have a list of objects and there frequency in > a population e.g. > > lst = [(a, 0.01), (b, 0.05), (c, 0.50), (d, 0.30), (e, 0.04), (f, 0.10)] > > and you want to drawn n items f

Re: Sampling a population

2006-06-02 Thread Mitja Trampus
Brian Quinlan wrote: > The fastest algorithm that I have been able to devise for doing so is: > O(n * log(len(lst))). Can anyone think or a solution with a better time > complexity? If not, is there an obviously better way to do this > (assuming n is big and the list size is small). If list is

Re: Sampling a population

2006-06-02 Thread Paddy
Brian Quinlan wrote: > This is less a Python question and more a optimization/probability > question. Imaging that you have a list of objects and there frequency in > a population e.g. > > lst = [(a, 0.01), (b, 0.05), (c, 0.50), (d, 0.30), (e, 0.04), (f, 0.10)] > > and you want to drawn n items fro

Re: Sampling a population

2006-06-02 Thread Terry Reedy
"Paddy" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Brian Quinlan wrote: >> This is less a Python question and more a optimization/probability >> question. Imaging that you have a list of objects and there frequency in >> a population e.g. >> >> lst = [(a, 0.01), (b, 0.05), (c,

Re: Sampling a population

2006-06-02 Thread Robert Kern
Brian Quinlan wrote: > The fastest algorithm that I have been able to devise for doing so is: > O(n * log(len(lst))). Can anyone think or a solution with a better time > complexity? If not, is there an obviously better way to do this > (assuming n is big and the list size is small). numpy.searc

Re: Sampling a population

2006-06-02 Thread Roger Miller
For your example, since the probabilities are all multiples of 0.01 you could make a list of 100 elements. Set one of them to a, 5 of them to b, 50 of them to c, etc. Then just randomly sample from the table (which is O(1)). Of course if the probabilities can be arbitrary floating point values the

Re: Sampling a population

2006-06-02 Thread Brian Quinlan
Robert Kern wrote: > [numpy implementation snipped] > Ed Schofield has an implementation of an algorithm by Marsaglia[1] which turns > sampling into a fast table lookup. If your probabilities have limited > precision > (2**-30 or so rather than the full double precision 2**-52 or so), then this