Yes the messages in memory would be lost. But let's say another instance
of the Kafka Spout starts on another worker, it would start a new kafka
consumer process and so it could read from the last committed offset.
Since when the 1st worker crashed no offset was committed for the
messages in memory that were lost, those same messages should be read
again by the new spout instance, no?

I don't know if that's what happens with Mattijs' implementation (I scan
through it but didn't look into it in details), but couldn't that
behavior be possible?
That should be exactly what happens; the spout would not have committed offsets for messages in memory that were not ack'd (or failed, depending on the FailHandler in use). So when it would crash and another spout picks up, messages 'lost in memory' will be replayed by the new instance.

This requires the spout to *not* automatically commit offsets as the kafka consumer does by default (kafka setting auto.commit.enable). This setting got lost somewhere in a recent cleanup, just committed a fix for this.

Reply via email to