Kafka rdds need to start from a specified offset, you really don't
want the executors just starting at whatever offset happened to be
latest at the time they ran.
If you need a way to figure out the latest offset at the time the
driver starts up, you can always use a consumer to read the offsets
Hi Cody,
I think the Assign is used if we want it to start from a specified offset.
What if we want it to start it from the latest offset with something like
returned by "auto.offset.reset" -> "latest",.
Thanks!
On Mon, Aug 21, 2017 at 9:06 AM, Cody Koeninger wrote:
>
Yes, you can start from specified offsets. See ConsumerStrategy,
specifically Assign
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#your-own-data-store
On Tue, Aug 15, 2017 at 1:18 PM, SRK wrote:
> Hi,
>
> How to force Spark Kafka Direct to
Hi,
How to force Spark Kafka Direct to start from the latest offset when the lag
is huge in kafka 10? It seems to be processing from the latest offset stored
for a group id. One way to do this is to change the group id. But it would
mean that each time that we need to process the job from the