Hi Ryan,

Perhaps you could share some of your code so we can have a look? One thing I'd 
check is if you are using compacted Kafka topics. If so, and if you have 
non-unique keys, compaction happens automatically and you might only see the 
latest value for a key.

Thanks
Eno
> On 18 Nov 2016, at 13:49, Ryan Slade <r...@avocet.io> wrote:
> 
> Hi
> 
> I'm trialling Kafka Streaming for a large stream processing job, however
> I'm seeing message loss even in the simplest scenarios.
> 
> I've tried to boil it down to the simplest scenario where I see loss which
> is the following:
> 1. Ingest messages from an input stream (String, String)
> 2. Decode message into a type from JSON
> 3. If succesful, send to a second stream and update an atomic counter.
> (String, CustomType)
> 4. A foreach on the second stream that updates an AtomicCounter each time a
> message arrives.
> 
> I would expect that since we have at least once guarantees that the second
> stream would see at least as many messages as were sent to it from the
> first, however I consistently see message loss.
> 
> I've tested multiple times sending around 200k messages. I don't see losses
> every time, maybe around 1 in 5 runs with the same data. The losses are
> small, around 100 messages, but I would expect none.
> 
> I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream Consumer
> all running on the same machine in order to mitigate packet loss.
> 
> I'm running Ubuntu 16.04 with OpenJDK.
> 
> Any advice would be greatly appreciated as I can't move forward with Kafka
> Streams as a solution if messages are consistently lost between stream on
> the same machine.
> 
> Thanks
> Ryan

Reply via email to