You could create a custom accumulator using a linkedlist or so.

Some examples that could help:
https://towardsdatascience.com/custom-pyspark-accumulators-310f63ca3c8c
https://stackoverflow.com/questions/34798578/how-to-create-custom-list-accumulator-i-e-listint-int


On Tue, Aug 3, 2021 at 1:23 PM Sachit Murarka <connectsac...@gmail.com>
wrote:

> Hi Team,
>
> We are using rdd.foreach(lambda x : do_something(x))
>
> Our use case requires collecting of the error messages in a list which are
> coming up in the exception block of the method do_something.
> Since this will be running on executor , a global list won't work here. As
> the state needs to be shared among various executors, I thought of using
> Accumulator,
> but the accumulator uses only Integral values.
>
> Can someone please suggest how do I collect all errors in a list which are
> coming from all records of RDD.
>
> Thanks,
> Sachit Murarka
>

Reply via email to