Re: newbie question for reduce

Sean Owen Tue, 18 Jan 2022 19:22:24 -0800

The problem is that you are reducing a list of tuples, but you are
producing an int. The resulting int can't be combined with other tuples
with your function. reduce() has to produce the same type as its arguments.
rdd.map(lambda x: x[1]).reduce(lambda x,y: x+y)
... would work


On Tue, Jan 18, 2022 at 8:41 PM <capitnfrak...@free.fr> wrote:

> Hello
>
> Please help take a look why my this simple reduce doesn't work?
>
> >>> rdd = sc.parallelize([("a",1),("b",2),("c",3)])
> >>>
> >>> rdd.reduce(lambda x,y: x[1]+y[1])
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>    File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce
>      return reduce(f, vals)
>    File "/opt/spark/python/pyspark/util.py", line 74, in wrapper
>      return f(*args, **kwargs)
>    File "<stdin>", line 1, in <lambda>
> TypeError: 'int' object is not subscriptable
> >>>
>
>
> spark 3.2.0
>
> Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: newbie question for reduce

Reply via email to