Re: Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-24 Thread Cody Koeninger
If you haven't looked at the offset ranges in the logs for the time period in question, I'd start there. On Jan 24, 2017 2:51 PM, "Hakan İlter" wrote: Sorry for misunderstanding. When I said that, I meant there are no lag in consumer. Kafka Manager shows each consumer's coverage and lag status.

Re: Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-24 Thread Hakan İlter
Sorry for misunderstanding. When I said that, I meant there are no lag in consumer. Kafka Manager shows each consumer's coverage and lag status. On Tue, Jan 24, 2017 at 10:45 PM, Cody Koeninger wrote: > When you said " I check the offset ranges from Kafka Manager and don't > see any significant

Re: Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-24 Thread Cody Koeninger
When you said " I check the offset ranges from Kafka Manager and don't see any significant deltas.", what were you comparing it against? The offset ranges printed in spark logs? On Tue, Jan 24, 2017 at 2:11 PM, Hakan İlter wrote: > First of all, I can both see the "Input Rate" from Spark job's s

Re: Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-24 Thread Hakan İlter
First of all, I can both see the "Input Rate" from Spark job's statistics page and Kafka producer message/sec from Kafka manager. The numbers are different when I have the problem. Normally these numbers are very near. Besides, the job is an ETL job, it writes the results to Elastic Search. An ano

Re: Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-24 Thread Cody Koeninger
I'm confused, if you don't see any difference between the offsets the job is processing and the offsets available in kafka, then how do you know it's processing less than all of the data? On Tue, Jan 24, 2017 at 12:35 AM, Hakan İlter wrote: > I'm using DirectStream as one stream for all topics. I

Re: Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-23 Thread Hakan İlter
I'm using DirectStream as one stream for all topics. I check the offset ranges from Kafka Manager and don't see any significant deltas. On Tue, Jan 24, 2017 at 4:42 AM, Cody Koeninger wrote: > Are you using receiver-based or direct stream? > > Are you doing 1 stream per topic, or 1 stream for al

Re: Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-23 Thread Cody Koeninger
Are you using receiver-based or direct stream? Are you doing 1 stream per topic, or 1 stream for all topics? If you're using the direct stream, the actual topics and offset ranges should be visible in the logs, so you should be able to see more detail about what's happening (e.g. all topics are s