Thank you -- this actually helped a lot. Strangely it appears that the task
detail view is not accurate in 0.8 -- that view shows 425ms duration for
one of the tasks, but in the driver log I do indeed see Finished TID 125 in
10940ms.
On that "slow" worker I see the following:
14/04/08 18:06:24 IN
Also, take a look at the driver logs -- if there is overhead before the
first task is launched, the driver logs would likely reveal this.
On Tue, Apr 8, 2014 at 9:21 AM, Aaron Davidson wrote:
> Off the top of my head, the most likely cause would be driver GC issues.
> You can diagnose this by e
Off the top of my head, the most likely cause would be driver GC issues.
You can diagnose this by enabling GC printing at the driver and you can fix
this by increasing the amount of memory your driver program has (see
http://spark.apache.org/docs/0.9.0/tuning.html#garbage-collection-tuning).
The "
Hi Spark users, I'm very much hoping someone can help me out.
I have a strict performance requirement on a particular query. One of
the stages shows great variance in duration -- from 300ms to 10sec.
The stage is mapPartitionsWithIndex at Operator.scala:210 (running Spark 0.8)
I have run the job