Any chance anyone gave a look at this? Thanks!
On Wed, Feb 10, 2016 at 10:46 AM, Roberto Coluccio < roberto.coluc...@gmail.com> wrote: > Thanks Shixiong! > > I'm attaching the thread dumps (I printed the Spark UI after expanding all > the elements, hope that's fine) and related stderr (INFO level) executors > logs. There are 3 of them. Thread dumps have been collected at the time the > StreamingContext was (trying to) shutdown, i.e. when I saw the following > logs in driver's stderr: > > 16/02/10 15:46:25 INFO ApplicationMaster: Final app status: SUCCEEDED, > exitCode: 0 > 16/02/10 15:46:25 INFO StreamingContext: Invoking stop(stopGracefully=true) > from shutdown hook > 16/02/10 15:46:25 INFO ReceiverTracker: Sent stop signal to all 3 receivers > 16/02/10 15:46:35 INFO ReceiverTracker: Waiting for receiver job to terminate > gracefully > > > Then, from 15:50 ongoing, the driver started again to report logs as it > was continuing to process as usual. You might find some exceptions in > executors logs that have right the 15:50 timestamp. > > Thanks you very much in advance! > > Roberto > > > > On Tue, Feb 9, 2016 at 6:25 PM, Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> Could you do a thread dump in the executor that runs the Kinesis receiver >> and post it? It would be great if you can provide the executor log as well? >> >> On Tue, Feb 9, 2016 at 3:14 PM, Roberto Coluccio < >> roberto.coluc...@gmail.com> wrote: >> >>> Hello, >>> >>> can anybody kindly help me out a little bit here? I just verified the >>> problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's >>> definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully >>> able to get Streaming drivers to terminate with no issue IF I don't use >>> Kinesis and open any Receivers. >>> >>> Thank you! >>> >>> Roberto >>> >>> >>> On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio < >>> roberto.coluc...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I'm struggling around an issue ever since I tried to upgrade my Spark >>>> Streaming solution from 1.4.1 to 1.5+. >>>> >>>> I have a Spark Streaming app which creates 3 ReceiverInputDStreams >>>> leveraging KinesisUtils.createStream API. >>>> >>>> I used to leverage a timeout to terminate my app >>>> (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf >>>> spark.streaming.stopGracefullyOnShutdown=true). >>>> >>>> I used to submit my Spark app on EMR in yarn-cluster mode. >>>> >>>> Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9). >>>> >>>> Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 >>>> on emr-4.3.0) I can't get the app to actually terminate. Logs tells me it >>>> tries to, but no confirmation of receivers stop is retrieved. Instead, when >>>> the timer gets to the next period, the StreamingContext continues its >>>> processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem >>>> and pmem killls disabled). >>>> >>>> ... >>>> >>>> 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, >>>> exitCode: 0 >>>> 16/02/02 21:22:08 INFO StreamingContext: Invoking >>>> stop(stopGracefully=true) from shutdown hook >>>> 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers >>>> 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to >>>> terminate gracefully >>>> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141 >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >>>> 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB) >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >>>> ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 >>>> MB) >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >>>> ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 >>>> MB) >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >>>> ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 >>>> MB) >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >>>> ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 >>>> MB) >>>> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184 >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >>>> 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB) >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >>>> ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 >>>> MB) >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >>>> ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 >>>> MB) >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >>>> ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 >>>> MB) >>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >>>> ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 >>>> MB) >>>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time >>>> 1454448300000 ms for checkpointing >>>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time >>>> 1454448300000 ms for checkpointing >>>> 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 1454448000000 ms >>>> to 1454448300000 ms (aligned to 1454448000000 ms and 1454448300000 ms) >>>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time >>>> 1454448300000 ms for checkpointing >>>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time >>>> 1454448300000 ms for checkpointing >>>> 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 1454448300000 ms >>>> 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time >>>> 1454448300000 ms >>>> 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time >>>> 1454448300000 ms >>>> 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job >>>> 1454448300000 ms.0 from job set of time 1454448300000 ms >>>> >>>> ... >>>> >>>> >>>> Please, this is really blocking in the upgrade process to latest Spark >>>> versions and I really don't know how to work it around. >>>> >>>> Any help would be very much appreciated. >>>> >>>> Thank you, >>>> >>>> Roberto >>>> >>>> >>>> >>> >> >