You could create a custom accumulator using a linkedlist or so. Some examples that could help: https://towardsdatascience.com/custom-pyspark-accumulators-310f63ca3c8c https://stackoverflow.com/questions/34798578/how-to-create-custom-list-accumulator-i-e-listint-int
On Tue, Aug 3, 2021 at 1:23 PM Sachit Murarka <connectsac...@gmail.com> wrote: > Hi Team, > > We are using rdd.foreach(lambda x : do_something(x)) > > Our use case requires collecting of the error messages in a list which are > coming up in the exception block of the method do_something. > Since this will be running on executor , a global list won't work here. As > the state needs to be shared among various executors, I thought of using > Accumulator, > but the accumulator uses only Integral values. > > Can someone please suggest how do I collect all errors in a list which are > coming from all records of RDD. > > Thanks, > Sachit Murarka >