Hi Vijay: I am using spark-shell because I am still prototyping the steps involved. Regarding executors - I have 280 executors and UI only show a few straggler tasks on each trigger. The UI does not show too much time spend on GC. suspect the delay is because of getting data from kafka. The number of straggler is generally less than 5 out 240 but sometimes is higher.
I will try to dig more into it and see if changing partitions etc helps but was wondering if anyone else has encountered similar stragglers holding up processing of a window trigger. Thanks On Friday, February 23, 2018 6:07 PM, vijay.bvp <bvpsa...@gmail.com> wrote: Instead of spark-shell have you tried running it as a job. how many executors and cores, can you share the RDD graph and event timeline on the UI and did you find which of the tasks taking more time was they are any GC please look at the UI if not already it can provide lot of information -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org