Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-22 Thread Cody Koeninger
Kafka rdds need to start from a specified offset, you really don't want the executors just starting at whatever offset happened to be latest at the time they ran. If you need a way to figure out the latest offset at the time the driver starts up, you can always use a consumer to read the offsets

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-21 Thread swetha kasireddy
Hi Cody, I think the Assign is used if we want it to start from a specified offset. What if we want it to start it from the latest offset with something like returned by "auto.offset.reset" -> "latest",. Thanks! On Mon, Aug 21, 2017 at 9:06 AM, Cody Koeninger wrote: >

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-21 Thread Cody Koeninger
Yes, you can start from specified offsets. See ConsumerStrategy, specifically Assign http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#your-own-data-store On Tue, Aug 15, 2017 at 1:18 PM, SRK wrote: > Hi, > > How to force Spark Kafka Direct to

How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-15 Thread SRK
Hi, How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10? It seems to be processing from the latest offset stored for a group id. One way to do this is to change the group id. But it would mean that each time that we need to process the job from the