It doesn't guarantee the exact sample size. If you fix the random seed, it would return the same result every time. -Xiangrui
On Wed, May 21, 2014 at 2:05 PM, glxc <r.ryan.mcc...@gmail.com> wrote: > I have a graph and am trying to take a random sample of vertices without > replacement, using the RDD.sample() method > > verts are the vertices in the graph > >> val verts = graph.vertices > > and executing this multiple times in a row > >> verts.sample(false, 10000.toDouble/v1.count.toDouble, >> System.currentTimeMillis).count > > yields different results roughly each time (albeit +/- a small % of the > target) > > why does this happen? Looked at PartionwiseSampledRDD but can't figure it > out > > Also, is there another method/technique to yield the same result each time? > My understanding is that grabbing random indices may not be the best use of > the RDD model > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Inconsistent-RDD-Sample-size-tp6197.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.