Re: Spark Streaming Checkpoint and Exactly Once Guarantee on Kafka Direct Stream

2017-06-06 Thread Tathagata Das
s* >>> *17/06/05 13:42:31 INFO JobGenerator: Batches pending processing (0 >>> batches): * >>> *17/06/05 13:42:31 INFO JobGenerator: Batches to reschedule (10 >>> batches): *149668428 ms, 149668431 ms, 149668434 ms, >>> 1496684370000 ms, 1496

Re: Spark Streaming Checkpoint and Exactly Once Guarantee on Kafka Direct Stream

2017-06-06 Thread ALunar Beach
8 ms, 149668431 ms, 149668434 ms, 149668437 ms, >> 149668440 ms, 149668443 ms, 149668446 ms, 149668449 ms, >> 1496684520000 ms, 149668455 ms >> 17/06/05 13:42:31 INFO JobScheduler: Added jobs for time 1496684280000 ms >> 17/06/05 13:42:31 IN

Re: Spark Streaming Checkpoint and Exactly Once Guarantee on Kafka Direct Stream

2017-06-06 Thread Tathagata Das
49668452 ms, > 149668455 ms > 17/06/05 13:42:31 INFO JobScheduler: Added jobs for time 149668428 ms > 17/06/05 13:42:31 INFO JobScheduler: Starting job streaming job > 149668428 ms.0 from job set of time 149668428 ms > > > > ------

Fwd: Spark Streaming Checkpoint and Exactly Once Guarantee on Kafka Direct Stream

2017-06-05 Thread anbucheeralan
-Spark-Streaming-Checkpoint-and-Exactly-Once-Guarantee-on-Kafka-Direct-Stream-tp28743.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark Streaming Checkpoint and Exactly Once Guarantee on Kafka Direct Stream

2017-06-05 Thread ALunar Beach
I am using Spark Streaming Checkpoint and Kafka Direct Stream. It uses a 30 sec batch duration and normally the job is successful in 15-20 sec. If the spark application fails after the successful completion (149668428ms in the log below) and restarts, it's duplicating the last batch again.