I'm not quite sure how this is a memory problem. There are no OOM
exceptions and the job only breaks when actions are ran in parallel,
submitted to the scheduler by different threads.

The issue is that the doGetRemote function does not retry when it is denied
access to a cache block.
On May 8, 2016 5:55 PM, "Ashish Dubey" <ashish....@gmail.com> wrote:

Brandon,

how much memory are you giving to your executors - did you check if there
were dead executors in your application logs.. Most likely you require
higher memory for executors..

Ashish

On Sun, May 8, 2016 at 1:01 PM, Brandon White <bwwintheho...@gmail.com>
wrote:

> Hello all,
>
> I am running a Spark application which schedules multiple Spark jobs.
> Something like:
>
> val df  = sqlContext.read.parquet("/path/to/file")
>
> filterExpressions.par.foreach { expression =>
>   df.filter(expression).count()
> }
>
> When the block manager fails to fetch a block, it throws an exception
> which eventually kills the exception: http://pastebin.com/2ggwv68P
>
> This code works when I run it on one thread with:
>
> filterExpressions.foreach { expression =>
>   df.filter(expression).count()
> }
>
> But I really need the parallel execution of the jobs. Is there anyway
> around this? It seems like a bug in the BlockManagers doGetRemote function.
> I have tried the HTTP Block Manager as well.
>

Reply via email to