Re: Spark stages very slow to complete

Karlson Tue, 02 Jun 2015 00:19:32 -0700

Hi, the code is some hundreds lines of Python. I can try to compose aminimal example as soon as I find the time, though. Any ideas untilthen?

Would you mind posting the code?
On 2 Jun 2015 00:53, "Karlson" <ksonsp...@siberie.de> wrote:
Hi,

In all (pyspark) Spark jobs, that become somewhat more involved, I am
experiencing the issue that some stages take a very long time tocompleteand sometimes don't at all. This clearly correlates with the size ofmyinput data. Looking at the stage details for one such stage, I amwondering
where Spark spends all this time. Take this table of the stages task
metrics for example:

Metric                          Min             25th
percentile      Median          75th percentile Max
Duration 1.4 min 1.5 min 1.7min
     1.9 min         2.3 min
Scheduler Delay                 1 ms            3 ms            4 ms
      5 ms            23 ms
Task Deserialization Time       1 ms            2 ms            3 ms
      8 ms            22 ms
GC Time                         0 ms            0 ms            0 ms
      0 ms            0 ms
Result Serialization Time       0 ms            0 ms            0 ms
      0 ms            1 ms
Getting Result Time             0 ms            0 ms            0 ms
      0 ms            0 ms
Input Size / Records 23.9 KB / 1 24.0 KB / 1 24.1KB /
1     24.1 KB / 1     24.3 KB / 1

Why is the overall duration almost 2min? Where is all this time spent,
when no progress of the stages is visible? The progress bar simplydisplays0 succeeded tasks for a very long time before sometimes slowlyprogressing.
Also, the name of the stage displayed above is `javaToPython atnull:-1`,which I find very uninformative. I don't even know which actionexactly isresponsible for this stage. Does anyone experience similar issues orhave
any advice for me?

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark stages very slow to complete

Reply via email to