I've done this using kafka streams: specifically, I created a processor,
and use a keystore (a functionality of streams) to save/check for keys and
only forwarding messages that were not in the keystore.
Since the keystore is in memory, and backed by the local filesystem on the
node the processor is running, you avoid the network lag you'd have using a
keystore like cassandra.  I think you'll have to use a similar approach to
dedupe -- you don't necessarily need to use streams, you can it handle it
directly in your consumer, but then you'll have to solve a lot of problems
streams already handles ... such as what happens if your node is shutdown
or crashes and etc.

On Wed, Apr 3, 2019 at 9:22 AM Vincent Maurin <vincent.maurin...@gmail.com>
wrote:

> Hi,
>
> Idempotence flag will guarantee that the message is produce exactly one
> time on the topic i.e that running your command a single time will produce
> a single message.
> It is not a unique enforcement on the message key, there is no such thing
> in Kafka.
>
> In Kafka, a topic containing the "history" of values for a given key. That
> means that a consumer need to consume the whole topic and keep only the
> last value for a given key.
> So the uniqueness concept is mean to be done on the consumer side.
> Additionally to that, Kafka can perform log compaction to keep only the
> last value and preserve disk space (but consumers will still receive
> duplicates)
>
>
> Best
>
> On Wed, Apr 3, 2019 at 1:28 AM jim.me...@concept-solutions.com <
> jim.me...@concept-solutions.com> wrote:
>
> >
> >
> > On 2019/04/02 22:43:31, jim.me...@concept-solutions.com <
> > jim.me...@concept-solutions.com> wrote:
> > >
> > >
> > > On 2019/04/02 22:25:16, jim.me...@concept-solutions.com <
> > jim.me...@concept-solutions.com> wrote:
> > > >
> > > >
> > > > On 2019/04/02 21:59:21, Hans Jespersen <h...@confluent.io> wrote:
> > > > > yes. Idempotent publish uses a unique messageID to discard
> potential
> > duplicate messages caused by failure conditions when  publishing.
> > > > >
> > > > > -hans
> > > > >
> > > > > > On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com <
> > jim.me...@concept-solutions.com> wrote:
> > > > > >
> > > > > > Does Kafka have something that behaves like a unique key so a
> > producer can’t write the same value to a topic twice?
> > > > >
> > > >
> > > > Hi Hans,
> > > >
> > > >     Is there some documentation or an example with source code where
> I
> > can learn more about this feature and how it is implemented?
> > > >
> > > > Thanks,
> > > > Jim
> > > >
> > >
> > > By the way I tried this...
> > >  echo "key1:value1" | ~/kafka/bin/kafka-console-producer.sh
> > --broker-list localhost:9092 --topic TestTopic --property
> "parse.key=true"
> > --property "key.separator=:" --property "enable.idempotence=true" >
> > /dev/null
> > >
> > > And... that didn't seem to do the trick - after running that command
> > multiple times I did receive key1 value1 for as many times as I had run
> the
> > prior command.
> > >
> > > Maybe it is the way I am setting the flags...
> > > Recently I saw that someone did this...
> > > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> > --producer-property enable.idempotence=true --request-required-acks -1
> > >
> >
> > Also... the reason for my question is that we are going to have two JMS
> > topics with nearly redundant data in them have the UNION written to Kafka
> > for further processing.
> >
> >
>

Reply via email to