Hi Ryan, Perhaps you could share some of your code so we can have a look? One thing I'd check is if you are using compacted Kafka topics. If so, and if you have non-unique keys, compaction happens automatically and you might only see the latest value for a key.
Thanks Eno > On 18 Nov 2016, at 13:49, Ryan Slade <r...@avocet.io> wrote: > > Hi > > I'm trialling Kafka Streaming for a large stream processing job, however > I'm seeing message loss even in the simplest scenarios. > > I've tried to boil it down to the simplest scenario where I see loss which > is the following: > 1. Ingest messages from an input stream (String, String) > 2. Decode message into a type from JSON > 3. If succesful, send to a second stream and update an atomic counter. > (String, CustomType) > 4. A foreach on the second stream that updates an AtomicCounter each time a > message arrives. > > I would expect that since we have at least once guarantees that the second > stream would see at least as many messages as were sent to it from the > first, however I consistently see message loss. > > I've tested multiple times sending around 200k messages. I don't see losses > every time, maybe around 1 in 5 runs with the same data. The losses are > small, around 100 messages, but I would expect none. > > I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream Consumer > all running on the same machine in order to mitigate packet loss. > > I'm running Ubuntu 16.04 with OpenJDK. > > Any advice would be greatly appreciated as I can't move forward with Kafka > Streams as a solution if messages are consistently lost between stream on > the same machine. > > Thanks > Ryan