Silly question: does sc.parallelize guarantee the allocation of the items
to always be distributed equally across the partitions?

It seems to me that, in the example above, all four items were assigned to
the same partition. Have you tried the same with many more items?
On Sep 26, 2013 9:01 PM, "Shangyu Luo" <lsy...@gmail.com> wrote:

> Hi,
> I am trying to test mapPartitions function in Spark Python version, but I
> got wrong result.
> More specifically, in pyspark shell:
> >>> rdd = sc.parallelize([1, 2, 3, 4], 2)
> >>> def f(iterator): yield sum(iterator)
> ...
> >>> rdd.mapPartitions(f).collect()
> The result is [0, 10], not [3, 7]
> Is there anything wrong with my code?
> Thanks!
>
>
> --
> --
>
> Shangyu, Luo
> Department of Computer Science
> Rice University
>
>

Reply via email to