I'm not quite sure how this is a memory problem. There are no OOM exceptions and the job only breaks when actions are ran in parallel, submitted to the scheduler by different threads.
The issue is that the doGetRemote function does not retry when it is denied access to a cache block. On May 8, 2016 5:55 PM, "Ashish Dubey" <ashish....@gmail.com> wrote: Brandon, how much memory are you giving to your executors - did you check if there were dead executors in your application logs.. Most likely you require higher memory for executors.. Ashish On Sun, May 8, 2016 at 1:01 PM, Brandon White <bwwintheho...@gmail.com> wrote: > Hello all, > > I am running a Spark application which schedules multiple Spark jobs. > Something like: > > val df = sqlContext.read.parquet("/path/to/file") > > filterExpressions.par.foreach { expression => > df.filter(expression).count() > } > > When the block manager fails to fetch a block, it throws an exception > which eventually kills the exception: http://pastebin.com/2ggwv68P > > This code works when I run it on one thread with: > > filterExpressions.foreach { expression => > df.filter(expression).count() > } > > But I really need the parallel execution of the jobs. Is there anyway > around this? It seems like a bug in the BlockManagers doGetRemote function. > I have tried the HTTP Block Manager as well. >