Re: Urgently need help interpreting duration

2014-04-08 Thread Yana Kadiyska
Thank you -- this actually helped a lot. Strangely it appears that the task detail view is not accurate in 0.8 -- that view shows 425ms duration for one of the tasks, but in the driver log I do indeed see Finished TID 125 in 10940ms. On that "slow" worker I see the following: 14/04/08 18:06:24 IN

Re: Urgently need help interpreting duration

2014-04-08 Thread Aaron Davidson
Also, take a look at the driver logs -- if there is overhead before the first task is launched, the driver logs would likely reveal this. On Tue, Apr 8, 2014 at 9:21 AM, Aaron Davidson wrote: > Off the top of my head, the most likely cause would be driver GC issues. > You can diagnose this by e

Re: Urgently need help interpreting duration

2014-04-08 Thread Aaron Davidson
Off the top of my head, the most likely cause would be driver GC issues. You can diagnose this by enabling GC printing at the driver and you can fix this by increasing the amount of memory your driver program has (see http://spark.apache.org/docs/0.9.0/tuning.html#garbage-collection-tuning). The "

Urgently need help interpreting duration

2014-04-08 Thread Yana Kadiyska
Hi Spark users, I'm very much hoping someone can help me out. I have a strict performance requirement on a particular query. One of the stages shows great variance in duration -- from 300ms to 10sec. The stage is mapPartitionsWithIndex at Operator.scala:210 (running Spark 0.8) I have run the job