Re: [UM-LINUX] need to find a CS professor

Rob Wed, 26 Apr 2006 18:36:39 -0700

On Thu, Apr 27, 2006 at 12:48:13AM +0000, Alexey Toptygin wrote:
> On Wed, 26 Apr 2006, Rob wrote:
> 
> n is on order of 2^15 or bigger. m can be any value between 1 and n, I 
> need to be able to handle all cases efficciently.
> 
> >Depends on the relative sizes of n and m.  If m is comperable in size to
> >n, then you can just do a random shuffle ( takes O(n)), then pick off the
> >first m numbers.
> 
> Random shuffle in O(n)? How?


This algorithm is provably random, assuming you use a completely
random pseudo-random number generator (PRNG)

for i=0 to n-1
        array[i]=i              // init array from 1 to n
for i=0 to n-1
        j = PRNG(n-i+1)+i       // gives a random number in [1,n] inclusive
        tmp = array[j]
        array[j]=array[i]
        array[i]=tmp

Where PRNG(x) returns a positive integer mod x

The general intuition behind the proof is that that each of the elements
has a 1 in n chance of being the first element.  IF an element is not
the first element, then it has a 1 in (n-1) chance of being the second
element.  In general, if an element is not one of the first k elements,
then it has a 1/(n-k) chance of being the kth element.  You can imagine
inductively proving that this is perfectly random start from a set of
size 1, and then adding an element inductively.


> >If m is small compared to n, just have an efficient
> >set implementation (with hashes), pick random elements, and make sure
> >they don't match.  Here you can trade off memory vs time.  Pick an array
> >of size k bits (bigger k == more mem, less time), to insert a number i,
> >take a random hash of i that maps to j where j is in [0,k-1], and set
> >the jth bit to one.  Testing is just that proceedure.  If k is small,
> >you will need to implement a hash bucket to catch collisions, but that
> >is easy enough as well.
> 
> I'm not sure the results would be truly random; that would at the very 
> least depend on the hash used. Plus, I'd rather avoid non-deterministic 
> algorithms (i.e. picking a random number, realising it's been picked and 
> trying again until you get one that hasn't been).

The 'truly random' part depends on the PRNG again.  This algorithm would
look like:

set = set_init(k)
for i= 0 to m-1
        do
                j = array[PRNG[n]]
        } while set_exists(set,j)
        set_enter(set,j)

def set_init(k)
        malloc array of size k
        memset array to 0
        return array
def set_exists(set,j)
        i = hash(j)
        return array[i]==1
def set_enter(set,j)
        i = hash(j)
        array[i]=1

hash just needs to map j to something in [0,k-1] One way to do this
is to pick k such that it is a large prime, and pick some other prime
p smaller than k.  Then your hash can just be hash(x) = p^x(mod k).
Note, if this needs to be really fast, I can post some algs on quick
modular exponentiation.

Hope this helps, and sorry if my pseudo-code looks like python.  No one
is more surprised than me...


- Rob
.

Re: [UM-LINUX] need to find a CS professor

Reply via email to