Hi,

The reduce lambda accepts as its first argument the return value of the 
previous execution. The first time, it is invoked with:
x = ("a", 1), y = ("b", 2)
And returns 1+2=3
Second time, it is invoked with
x = 3, y = ("c", 3)
so you can see why it raises the error that you are seeing.

There are several ways you could fix it. One way is to use a map before the 
reduce, e.g. 
rdd..map(lambda x: x[1]).reduce(lambda x,y: x + y)

Hope that's helpful,

Chris

-----Original Message-----
From: capitnfrak...@free.fr <capitnfrak...@free.fr> 
Sent: 19 January 2022 02:41
To: user@spark.apache.org
Subject: newbie question for reduce

Hello

Please help take a look why my this simple reduce doesn't work?

>>> rdd = sc.parallelize([("a",1),("b",2),("c",3)])
>>> 
>>> rdd.reduce(lambda x,y: x[1]+y[1])
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce
     return reduce(f, vals)
   File "/opt/spark/python/pyspark/util.py", line 74, in wrapper
     return f(*args, **kwargs)
   File "<stdin>", line 1, in <lambda>
TypeError: 'int' object is not subscriptable
>>> 


spark 3.2.0

Thank you.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to