Hello everybody,

I am trying to understand how Kafka Direct Stream works. I'm interested in
having a production ready Spark Streaming application that consumes a Kafka
topic. But I need to guarantee there's (almost) no downtime, specially
during deploys (and submit) of new versions. What it seems to be the best
solution is to deploy and submit the new version without shutting down the
previous one, wait for the new application to start consuming events and
then shutdown the previous one.

What I would expect is that the events get distributed among the two
applications in a balanced fashion using the consumer group id
​ splitted by the partition key that I've previously set on my Kafka
Producer.​ However I don't see that Kafka Direct stream support this
functionality.

I've achieved this with the Receiver-based approach (btw i've used "kafka"
for the "offsets.storage" kafka property[2]). However this approach come
with technical difficulties named in the Documentation[1] (ie: exactly-once
semantics).

​Anyway, not even this approach seems very failsafe, Does anyone know a way
to safely deploy new versions of a streaming application of this kind
without downtime?

​Thanks in advance

Mariano​
​


[1] http://spark.apache.org/docs/latest/streaming-kafka-integration.html
[2] http://kafka.apache.org/documentation.html#oldconsumerconfigs

Reply via email to