On Thu, Apr 27, 2006 at 12:48:13AM +0000, Alexey Toptygin wrote:
> On Wed, 26 Apr 2006, Rob wrote:
>
> n is on order of 2^15 or bigger. m can be any value between 1 and n, I
> need to be able to handle all cases efficciently.
>
> >Depends on the relative sizes of n and m. If m is comperable in size to
> >n, then you can just do a random shuffle ( takes O(n)), then pick off the
> >first m numbers.
>
> Random shuffle in O(n)? How?
This algorithm is provably random, assuming you use a completely
random pseudo-random number generator (PRNG)
for i=0 to n-1
array[i]=i // init array from 1 to n
for i=0 to n-1
j = PRNG(n-i+1)+i // gives a random number in [1,n] inclusive
tmp = array[j]
array[j]=array[i]
array[i]=tmp
Where PRNG(x) returns a positive integer mod x
The general intuition behind the proof is that that each of the elements
has a 1 in n chance of being the first element. IF an element is not
the first element, then it has a 1 in (n-1) chance of being the second
element. In general, if an element is not one of the first k elements,
then it has a 1/(n-k) chance of being the kth element. You can imagine
inductively proving that this is perfectly random start from a set of
size 1, and then adding an element inductively.
> >If m is small compared to n, just have an efficient
> >set implementation (with hashes), pick random elements, and make sure
> >they don't match. Here you can trade off memory vs time. Pick an array
> >of size k bits (bigger k == more mem, less time), to insert a number i,
> >take a random hash of i that maps to j where j is in [0,k-1], and set
> >the jth bit to one. Testing is just that proceedure. If k is small,
> >you will need to implement a hash bucket to catch collisions, but that
> >is easy enough as well.
>
> I'm not sure the results would be truly random; that would at the very
> least depend on the hash used. Plus, I'd rather avoid non-deterministic
> algorithms (i.e. picking a random number, realising it's been picked and
> trying again until you get one that hasn't been).
The 'truly random' part depends on the PRNG again. This algorithm would
look like:
set = set_init(k)
for i= 0 to m-1
do
j = array[PRNG[n]]
} while set_exists(set,j)
set_enter(set,j)
def set_init(k)
malloc array of size k
memset array to 0
return array
def set_exists(set,j)
i = hash(j)
return array[i]==1
def set_enter(set,j)
i = hash(j)
array[i]=1
hash just needs to map j to something in [0,k-1] One way to do this
is to pick k such that it is a large prime, and pick some other prime
p smaller than k. Then your hash can just be hash(x) = p^x(mod k).
Note, if this needs to be really fast, I can post some algs on quick
modular exponentiation.
Hope this helps, and sorry if my pseudo-code looks like python. No one
is more surprised than me...
- Rob
.