Any chance anyone gave a look at this?

Thanks!

On Wed, Feb 10, 2016 at 10:46 AM, Roberto Coluccio <
roberto.coluc...@gmail.com> wrote:

> Thanks Shixiong!
>
> I'm attaching the thread dumps (I printed the Spark UI after expanding all
> the elements, hope that's fine) and related stderr (INFO level) executors
> logs. There are 3 of them. Thread dumps have been collected at the time the
> StreamingContext was (trying to) shutdown, i.e. when I saw the following
> logs in driver's stderr:
>
> 16/02/10 15:46:25 INFO ApplicationMaster: Final app status: SUCCEEDED, 
> exitCode: 0
> 16/02/10 15:46:25 INFO StreamingContext: Invoking stop(stopGracefully=true) 
> from shutdown hook
> 16/02/10 15:46:25 INFO ReceiverTracker: Sent stop signal to all 3 receivers
> 16/02/10 15:46:35 INFO ReceiverTracker: Waiting for receiver job to terminate 
> gracefully
>
>
> Then, from 15:50 ongoing, the driver started again to report logs as it
> was continuing to process as usual. You might find some exceptions in
> executors logs that have right the 15:50 timestamp.
>
> Thanks you very much in advance!
>
> Roberto
>
>
>
> On Tue, Feb 9, 2016 at 6:25 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Could you do a thread dump in the executor that runs the Kinesis receiver
>> and post it? It would be great if you can provide the executor log as well?
>>
>> On Tue, Feb 9, 2016 at 3:14 PM, Roberto Coluccio <
>> roberto.coluc...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> can anybody kindly help me out a little bit here? I just verified the
>>> problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's
>>> definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully
>>> able to get Streaming drivers to terminate with no issue IF I don't use
>>> Kinesis and open any Receivers.
>>>
>>> Thank you!
>>>
>>> Roberto
>>>
>>>
>>> On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio <
>>> roberto.coluc...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm struggling around an issue ever since I tried to upgrade my Spark
>>>> Streaming solution from 1.4.1 to 1.5+.
>>>>
>>>> I have a Spark Streaming app which creates 3 ReceiverInputDStreams
>>>> leveraging KinesisUtils.createStream API.
>>>>
>>>> I used to leverage a timeout to terminate my app
>>>> (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf
>>>> spark.streaming.stopGracefullyOnShutdown=true).
>>>>
>>>> I used to submit my Spark app on EMR in yarn-cluster mode.
>>>>
>>>> Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9).
>>>>
>>>> Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0
>>>> on emr-4.3.0) I can't get the app to actually terminate. Logs tells me it
>>>> tries to, but no confirmation of receivers stop is retrieved. Instead, when
>>>> the timer gets to the next period, the StreamingContext continues its
>>>> processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem
>>>> and pmem killls disabled).
>>>>
>>>> ...
>>>>
>>>> 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, 
>>>> exitCode: 0
>>>> 16/02/02 21:22:08 INFO StreamingContext: Invoking 
>>>> stop(stopGracefully=true) from shutdown hook
>>>> 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers
>>>> 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to 
>>>> terminate gracefully
>>>> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>>>> 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB)
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>>>> ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 
>>>> MB)
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>>>> ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 
>>>> MB)
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>>>> ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 
>>>> MB)
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>>>> ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 
>>>> MB)
>>>> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>>>> 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB)
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>>>> ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 
>>>> MB)
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>>>> ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 
>>>> MB)
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>>>> ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 
>>>> MB)
>>>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>>>> ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 
>>>> MB)
>>>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time 
>>>> 1454448300000 ms for checkpointing
>>>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time 
>>>> 1454448300000 ms for checkpointing
>>>> 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 1454448000000 ms 
>>>> to 1454448300000 ms (aligned to 1454448000000 ms and 1454448300000 ms)
>>>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time 
>>>> 1454448300000 ms for checkpointing
>>>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time 
>>>> 1454448300000 ms for checkpointing
>>>> 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 1454448300000 ms
>>>> 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time 
>>>> 1454448300000 ms
>>>> 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time 
>>>> 1454448300000 ms
>>>> 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job 
>>>> 1454448300000 ms.0 from job set of time 1454448300000 ms
>>>>
>>>> ...
>>>>
>>>>
>>>> Please, this is really blocking in the upgrade process to latest Spark
>>>> versions and I really don't know how to work it around.
>>>>
>>>> Any help would be very much appreciated.
>>>>
>>>> Thank you,
>>>>
>>>> Roberto
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to