Re: Spark worker threads waiting

sparrow Mon, 24 Mar 2014 03:08:37 -0700

Yes my input data is partitioned in a completely random manner, so each
worker that produces shuffle data processes only a part of it. The way I
understand it is that before each stage each workers needs to distribute
correct partitions (based on hash key ranges?) to other workers. And this
is where I would expect network input/output to spike. And after that the
processing should occur, where I would expect CPU to spike. If you check
the image I have attached earlier in this thread you can see that for
example between 8.25 and 8.30 Disk, network and CPU are almost not
utilized.


I would be very happy to receive some suggestions on how to debug this. Or
where you think would be a good place to start looking?

Kind regards, Domen


On Fri, Mar 21, 2014 at 6:58 PM, Mayur Rustagi [via Apache Spark User List]
<ml-node+s1001560n3006...@n3.nabble.com> wrote:

> In your task details I dont see a large skew in tasks so the low cpu usage
> period occurs between stages or during stage execution.
> One issue possible is your data is 89GB Shuffle read, if the machine
> producing the shuffle data is not the one processing it, data shuffling
> across machines may be causing the delay.
> Can you look at your network traffic during that period to see performance.
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Fri, Mar 21, 2014 at 8:33 AM, sparrow <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=3006&i=0>
> > wrote:
>
>> Here is the stage overview:
>> [image: Inline image 2]
>>
>> and here are the stage details for stage 0:
>> [image: Inline image 1]
>> Transformations from first stage to the second one are trivial, so that
>> should not be the bottle neck (apart from keyBy().groupByKey() that causes
>> the shuffle write/read).
>>
>> Kind regards, Domen
>>
>>
>>
>> On Thu, Mar 20, 2014 at 8:38 PM, Mayur Rustagi [via Apache Spark User
>> List] <[hidden email] <http://user/SendEmail.jtp?type=node&node=2988&i=0>
>> > wrote:
>>
>>> I would have preferred the stage window details & aggregate task
>>> details(above the task list).
>>> Basically if you run a job , it translates to multiple stages, each
>>> stage translates to multiple tasks (each run on worker core).
>>> So some breakup like
>>> my job is taking 16 min
>>> 3 stages , stage 1 : 5 min Stage 2: 10 min & stage 3:1 min
>>> in Stage 2 give me task aggregate screenshot which talks about 50
>>> percentile, 75 percentile & 100 percentile.
>>> Regards
>>> Mayur
>>>
>>> Mayur Rustagi
>>>
>>> Ph: <a href="tel:%2B1%20%28760%29%20203%203257" value="<a
>>> href="tel:%2B17602033257" value="+17602033257" target="_blank">
>>> +17602033257" target="_blank"><a
>>> href="tel:%2B1%20%28760%29%20203%203257" value="+17602033257"
>>> target="_blank">+1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>
>>>
>>>
>>> On Thu, Mar 20, 2014 at 9:55 AM, sparrow <[hidden 
>>> email]<http://user/SendEmail.jtp?type=node&node=2962&i=0>
>>> > wrote:
>>>
>>>>
>>>> This is what the web UI looks like:
>>>> [image: Inline image 1]
>>>>
>>>> I also tail all the worker logs and theese are the last entries before
>>>> the waiting begins:
>>>>
>>>> 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
>>>> maxBytesInFlight: 50331648, minRequest: 10066329
>>>> 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
>>>> Getting 29853 non-zero-bytes blocks out of 37714 blocks
>>>> 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
>>>> Started 5 remote gets in  62 ms
>>>> [PSYoungGen: 12464967K->3767331K(10552192K)]
>>>> 36074093K->29053085K(44805696K), 0.6765460 secs] [Times: user=5.35
>>>> sys=0.02, real=0.67 secs]
>>>> [PSYoungGen: 10779466K->3203826K(9806400K)]
>>>> 35384386K->31562169K(44059904K), 0.6925730 secs] [Times: user=5.47
>>>> sys=0.00, real=0.70 secs]
>>>>
>>>> From the screenshot above you can see that task take ~ 6 minutes to
>>>> complete. The amount of time it takes the tasks to complete seems to depend
>>>> on the amount of input data. If s3 input string captures 2.5 times less
>>>> data (less data to shuffle write  and later read), same tasks take 1
>>>> minute. Any idea how to debug what the workers are doing?
>>>>
>>>> Domen
>>>>
>>>> On Wed, Mar 19, 2014 at 5:27 PM, Mayur Rustagi [via Apache Spark User
>>>> List] <[hidden email]<http://user/SendEmail.jtp?type=node&node=2938&i=0>
>>>> > wrote:
>>>>
>>>>> You could have some outlier task that is preventing the next set of
>>>>> stages from launching. Can you check out stages state in the Spark WebUI,
>>>>> is any task running or is everything halted.
>>>>> Regards
>>>>> Mayur
>>>>>
>>>>> Mayur Rustagi
>>>>> Ph: <a href="tel:%2B1%20%28760%29%20203%203257" value="<a
>>>>> href="tel:%2B17602033257" value="<a href="tel:%2B17602033257"
>>>>> value="+17602033257" target="_blank">+17602033257" target="_blank"><a
>>>>> href="tel:%2B17602033257" value="<a href="tel:%2B17602033257"
>>>>> value="+17602033257" target="_blank">+17602033257" target="_blank"><a
>>>>> href="tel:%2B17602033257" value="+17602033257" target="_blank">
>>>>> +17602033257" target="_blank"><a
>>>>> href="tel:%2B1%20%28760%29%20203%203257" value="<a
>>>>> href="tel:%2B17602033257" value="<a href="tel:%2B17602033257"
>>>>> value="+17602033257" target="_blank">+17602033257" target="_blank"><a
>>>>> href="tel:%2B17602033257" value="+17602033257" target="_blank">
>>>>> +17602033257" target="_blank"><a
>>>>> href="tel:%2B1%20%28760%29%20203%203257" value="<a
>>>>> href="tel:%2B17602033257" value="+17602033257" target="_blank">
>>>>> +17602033257" target="_blank"><a
>>>>> href="tel:%2B1%20%28760%29%20203%203257" value="+17602033257"
>>>>> target="_blank">+1 (760) 203 3257
>>>>> http://www.sigmoidanalytics.com
>>>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 19, 2014 at 5:40 AM, Domen Grabec <[hidden 
>>>>> email]<http://user/SendEmail.jtp?type=node&node=2882&i=0>
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a cluster with 16 nodes, each node has 69Gb ram (50GB goes to
>>>>>> spark) and 8 cores running spark 0.8.1. I have a groupByKey operation 
>>>>>> that
>>>>>> causes a wide RDD dependency so shuffle write and shuffle read are
>>>>>> performed.
>>>>>>
>>>>>> For some reason all worker threads seem to sleep for about 3-4
>>>>>> minutes each time performing a shuffle read and completing a set of 
>>>>>> tasks.
>>>>>> See graphs below how no resources are being utilized in specific time
>>>>>> windows.
>>>>>>
>>>>>> Each time 3-4 minutes pass, a next set of tasks are being grabbed and
>>>>>> processed, and then another waiting period happens.
>>>>>>
>>>>>> Each task has an input of 80Mb +- 5Mb data to shuffle read.
>>>>>>
>>>>>>  [image: Inline image 1]
>>>>>>
>>>>>> Here <http://pastebin.com/UHWMdTRY> is a link to thread dump
>>>>>> performed in the middle of the waiting period. Any idea what could cause
>>>>>> the long waits?
>>>>>>
>>>>>> Kind regards, Domen
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>  If you reply to this email, your message will be added to the
>>>>> discussion below:
>>>>>
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2882.html
>>>>>  To start a new topic under Apache Spark User List, email [hidden
>>>>> email] <http://user/SendEmail.jtp?type=node&node=2938&i=1>
>>>>> To unsubscribe from Apache Spark User List, click here.
>>>>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> View this message in context: Re: Spark worker threads 
>>>> waiting<http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2938.html>
>>>> Sent from the Apache Spark User List mailing list 
>>>> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>>>>
>>>
>>>
>>>
>>> ------------------------------
>>>  If you reply to this email, your message will be added to the
>>> discussion below:
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2962.html
>>>  To start a new topic under Apache Spark User List, email [hidden 
>>> email]<http://user/SendEmail.jtp?type=node&node=2988&i=1>
>>> To unsubscribe from Apache Spark User List, click here.
>>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>>
>> ------------------------------
>> View this message in context: Re: Spark worker threads 
>> waiting<http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2988.html>
>> Sent from the Apache Spark User List mailing list 
>> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p3006.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click 
> here<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=ZG9tZW5AY2VsdHJhLmNvbXwxfC01NjUwMzk2ODU=>
> .
> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p3090.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark worker threads waiting

Reply via email to