I would have preferred the stage window details & aggregate task details(above the task list). Basically if you run a job , it translates to multiple stages, each stage translates to multiple tasks (each run on worker core). So some breakup like my job is taking 16 min 3 stages , stage 1 : 5 min Stage 2: 10 min & stage 3:1 min in Stage 2 give me task aggregate screenshot which talks about 50 percentile, 75 percentile & 100 percentile. Regards Mayur
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, Mar 20, 2014 at 9:55 AM, sparrow <do...@celtra.com> wrote: > > This is what the web UI looks like: > [image: Inline image 1] > > I also tail all the worker logs and theese are the last entries before the > waiting begins: > > 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator: > maxBytesInFlight: 50331648, minRequest: 10066329 > 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator: > Getting 29853 non-zero-bytes blocks out of 37714 blocks > 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator: > Started 5 remote gets in 62 ms > [PSYoungGen: 12464967K->3767331K(10552192K)] > 36074093K->29053085K(44805696K), 0.6765460 secs] [Times: user=5.35 > sys=0.02, real=0.67 secs] > [PSYoungGen: 10779466K->3203826K(9806400K)] > 35384386K->31562169K(44059904K), 0.6925730 secs] [Times: user=5.47 > sys=0.00, real=0.70 secs] > > From the screenshot above you can see that task take ~ 6 minutes to > complete. The amount of time it takes the tasks to complete seems to depend > on the amount of input data. If s3 input string captures 2.5 times less > data (less data to shuffle write and later read), same tasks take 1 > minute. Any idea how to debug what the workers are doing? > > Domen > > On Wed, Mar 19, 2014 at 5:27 PM, Mayur Rustagi [via Apache Spark User > List] <[hidden email] > <http://user/SendEmail.jtp?type=node&node=2938&i=0>>wrote: > >> You could have some outlier task that is preventing the next set of >> stages from launching. Can you check out stages state in the Spark WebUI, >> is any task running or is everything halted. >> Regards >> Mayur >> >> Mayur Rustagi >> Ph: <a href="tel:%2B1%20%28760%29%20203%203257" value="+17602033257" >> target="_blank">+1 (760) 203 3257 >> http://www.sigmoidanalytics.com >> @mayur_rustagi <https://twitter.com/mayur_rustagi> >> >> >> >> On Wed, Mar 19, 2014 at 5:40 AM, Domen Grabec <[hidden >> email]<http://user/SendEmail.jtp?type=node&node=2882&i=0> >> > wrote: >> >>> Hi, >>> >>> I have a cluster with 16 nodes, each node has 69Gb ram (50GB goes to >>> spark) and 8 cores running spark 0.8.1. I have a groupByKey operation that >>> causes a wide RDD dependency so shuffle write and shuffle read are >>> performed. >>> >>> For some reason all worker threads seem to sleep for about 3-4 minutes >>> each time performing a shuffle read and completing a set of tasks. See >>> graphs below how no resources are being utilized in specific time windows. >>> >>> Each time 3-4 minutes pass, a next set of tasks are being grabbed and >>> processed, and then another waiting period happens. >>> >>> Each task has an input of 80Mb +- 5Mb data to shuffle read. >>> >>> [image: Inline image 1] >>> >>> Here <http://pastebin.com/UHWMdTRY> is a link to thread dump performed >>> in the middle of the waiting period. Any idea what could cause the long >>> waits? >>> >>> Kind regards, Domen >>> >> >> >> >> ------------------------------ >> If you reply to this email, your message will be added to the >> discussion below: >> >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2882.html >> To start a new topic under Apache Spark User List, email [hidden >> email]<http://user/SendEmail.jtp?type=node&node=2938&i=1> >> To unsubscribe from Apache Spark User List, click here. >> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > ------------------------------ > View this message in context: Re: Spark worker threads > waiting<http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2938.html> > Sent from the Apache Spark User List mailing list > archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com. >