Hi everyone,

I'd like to ask how does Spark (or more generally, distributed computing 
engines) handle RNGs? High-level speaking, there are two ways,

  1.  Use a single RNG on the driver and random numbers generating on each work 
makes request to the single RNG on the driver.
  2.  Use a separate RNG on each worker.

If the 2nd approach above is used, may I ask how does Spark seed RNGs on 
different works to ensure the overall quality of random number generating?


Best,

----

Ben Du

Personal Blog<http://www.legendu.net/> | GitHub<https://github.com/dclong/> | 
Bitbucket<https://bitbucket.org/dclong/> | Docker 
Hub<https://hub.docker.com/r/dclong/>

Reply via email to