Thanks a lot oubrik, 

I got your point, my consideration is that sum() should be already a
built-in function for iterators in python.
Anyway I tried your approach

def mysum(iter):
    count = sum = 0
    for item in iter:
       count += 1
       sum += item
    return sum
wordCountsGrouped = wordsGrouped.groupByKey().map(lambda
(w,iterator):(w,mysum(iterator)))                                               
   
print wordCountsGrouped.collect()

but i get the error below, any idea?

TypeError: unsupported operand type(s) for +=: 'int' and 'ResultIterable'

        at 
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
        at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

thx
Leonida



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Sum-elements-of-an-iterator-inside-an-RDD-tp23775p23778.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to