Hi, I am trying to test mapPartitions function in Spark Python version, but I got wrong result. More specifically, in pyspark shell: >>> rdd = sc.parallelize([1, 2, 3, 4], 2) >>> def f(iterator): yield sum(iterator) ... >>> rdd.mapPartitions(f).collect() The result is [0, 10], not [3, 7] Is there anything wrong with my code? Thanks!
-- -- Shangyu, Luo Department of Computer Science Rice University