My new test with 10000 elements list and 16 partitions gave me this result: [0, 523776, 0, 1572352, 2620928, 0, 3669504, 4718080, 0, 5766656, 0, 6815232, 7863808, 0, 8912384, 7532280] Now there is some data in some partitions, but it seems that the data is not 'equally' distributed among each partition. Also, I tested it for three times and it gave me the same result.
2013/9/26 Mike <sp...@good-with-numbers.com> > Horia wrote: > > does sc.parallelize guarantee the allocation of the items to always be > > distributed equally across the partitions? > > It does in fact attempt to do that, and the tests check for that, but > there's no guarantee in its API. Of course "equally" here means +/- one > element. > -- -- Shangyu, Luo Department of Computer Science Rice University -- Not Just Think About It, But Do It! -- Success is never final. -- Losers always whine about their best