I would have preferred the stage window details & aggregate task
details(above the task list).
Basically if you run a job , it translates to multiple stages, each stage
translates to multiple tasks (each run on worker core).
So some breakup like
my job is taking 16 min
3 stages , stage 1 : 5 min Stage 2: 10 min & stage 3:1 min
in Stage 2 give me task aggregate screenshot which talks about 50
percentile, 75 percentile & 100 percentile.
Regards
Mayur

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Mar 20, 2014 at 9:55 AM, sparrow <do...@celtra.com> wrote:

>
> This is what the web UI looks like:
> [image: Inline image 1]
>
> I also tail all the worker logs and theese are the last entries before the
> waiting begins:
>
> 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
> maxBytesInFlight: 50331648, minRequest: 10066329
> 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
> Getting 29853 non-zero-bytes blocks out of 37714 blocks
> 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
> Started 5 remote gets in  62 ms
> [PSYoungGen: 12464967K->3767331K(10552192K)]
> 36074093K->29053085K(44805696K), 0.6765460 secs] [Times: user=5.35
> sys=0.02, real=0.67 secs]
> [PSYoungGen: 10779466K->3203826K(9806400K)]
> 35384386K->31562169K(44059904K), 0.6925730 secs] [Times: user=5.47
> sys=0.00, real=0.70 secs]
>
> From the screenshot above you can see that task take ~ 6 minutes to
> complete. The amount of time it takes the tasks to complete seems to depend
> on the amount of input data. If s3 input string captures 2.5 times less
> data (less data to shuffle write  and later read), same tasks take 1
> minute. Any idea how to debug what the workers are doing?
>
> Domen
>
> On Wed, Mar 19, 2014 at 5:27 PM, Mayur Rustagi [via Apache Spark User
> List] <[hidden email] 
> <http://user/SendEmail.jtp?type=node&node=2938&i=0>>wrote:
>
>> You could have some outlier task that is preventing the next set of
>> stages from launching. Can you check out stages state in the Spark WebUI,
>> is any task running or is everything halted.
>> Regards
>> Mayur
>>
>> Mayur Rustagi
>> Ph: <a href="tel:%2B1%20%28760%29%20203%203257" value="+17602033257"
>> target="_blank">+1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>
>>
>>
>> On Wed, Mar 19, 2014 at 5:40 AM, Domen Grabec <[hidden 
>> email]<http://user/SendEmail.jtp?type=node&node=2882&i=0>
>> > wrote:
>>
>>> Hi,
>>>
>>> I have a cluster with 16 nodes, each node has 69Gb ram (50GB goes to
>>> spark) and 8 cores running spark 0.8.1. I have a groupByKey operation that
>>> causes a wide RDD dependency so shuffle write and shuffle read are
>>> performed.
>>>
>>> For some reason all worker threads seem to sleep for about 3-4 minutes
>>> each time performing a shuffle read and completing a set of tasks. See
>>> graphs below how no resources are being utilized in specific time windows.
>>>
>>> Each time 3-4 minutes pass, a next set of tasks are being grabbed and
>>> processed, and then another waiting period happens.
>>>
>>> Each task has an input of 80Mb +- 5Mb data to shuffle read.
>>>
>>>  [image: Inline image 1]
>>>
>>> Here <http://pastebin.com/UHWMdTRY> is a link to thread dump performed
>>> in the middle of the waiting period. Any idea what could cause the long
>>> waits?
>>>
>>> Kind regards, Domen
>>>
>>
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2882.html
>>  To start a new topic under Apache Spark User List, email [hidden 
>> email]<http://user/SendEmail.jtp?type=node&node=2938&i=1>
>> To unsubscribe from Apache Spark User List, click here.
>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> ------------------------------
> View this message in context: Re: Spark worker threads 
> waiting<http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2938.html>
> Sent from the Apache Spark User List mailing list 
> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Reply via email to