It doesn't guarantee the exact sample size. If you fix the random
seed, it would return the same result every time. -Xiangrui

On Wed, May 21, 2014 at 2:05 PM, glxc <r.ryan.mcc...@gmail.com> wrote:
> I have a graph and am trying to take a random sample of vertices without
> replacement, using the RDD.sample() method
>
> verts are the vertices in the graph
>
>>  val verts = graph.vertices
>
> and executing this multiple times in a row
>
>>  verts.sample(false, 10000.toDouble/v1.count.toDouble,
>> System.currentTimeMillis).count
>
> yields different results roughly each time (albeit +/- a small % of the
> target)
>
> why does this happen? Looked at PartionwiseSampledRDD but can't figure it
> out
>
> Also, is there another method/technique to yield the same result each time?
> My understanding is that grabbing random indices may not be the best use of
> the RDD model
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Inconsistent-RDD-Sample-size-tp6197.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to