Thanks a lot oubrik, I got your point, my consideration is that sum() should be already a built-in function for iterators in python. Anyway I tried your approach
def mysum(iter): count = sum = 0 for item in iter: count += 1 sum += item return sum wordCountsGrouped = wordsGrouped.groupByKey().map(lambda (w,iterator):(w,mysum(iterator))) print wordCountsGrouped.collect() but i get the error below, any idea? TypeError: unsupported operand type(s) for +=: 'int' and 'ResultIterable' at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) thx Leonida -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Sum-elements-of-an-iterator-inside-an-RDD-tp23775p23778.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org