Re: Pandas UDF for PySpark error. Big Dataset

Bryan Cutler Tue, 29 May 2018 17:26:15 -0700

Can you share some of the code used, or at least the pandas_udf plus the
stacktrace?  Also does decreasing your dataset size fix the oom?


On Mon, May 28, 2018, 4:22 PM Traku traku <tra...@gmail.com> wrote:

> Hi.
>
> I'm trying to use the new feature but I can't use it with a big dataset
> (about 5 million rows).
>
> I tried  increasing executor memory, driver memory, partition number, but
> any solution can help me to solve the problem.
>
> One of the executor task increase the shufle memory until fails.
>
> Error is arrow generated: unable to expand the buffer.
>
> Any idea?
>

Re: Pandas UDF for PySpark error. Big Dataset

Reply via email to