Re: Something like a unique key to prevent same record from being inserted twice?

Hans Jespersen Wed, 03 Apr 2019 08:09:50 -0700

Ok what you are describing is different from accidental duplicate message 
pruning which is what the idempotent publish feature does.


You are describing a situation were multiple independent messages just happen 
to have the same contents (both key and value).

Removing those messages is an application specific function as you can imaging 
applications which would not want independent but identical messages to be 
removed (for example temperature sensor readings, heartbeat messages, or other 
telemetry data that has repeat but independent values).

Your best bet is to write a simple intermediate processor that implements your 
message pruning algorithm of choice and republishes (or not) to another topic 
that your consumers read from. Its a stateful app because it needs to remember 
1 or more past messages but that can be done using the Kafka Streams processor 
API and the embedded rocksdb state store that comes with Kafka Streams (or as a 
UDF in KSQL).

You can alternatively write your consuming apps to implement similar message 
pruning functionality themselves and avoid one extra component in the end to 
end architecture

-hans

> On Apr 2, 2019, at 7:28 PM, jim.me...@concept-solutions.com 
> <jim.me...@concept-solutions.com> wrote:
> 
> 
> 
>> On 2019/04/02 22:43:31, jim.me...@concept-solutions.com 
>> <jim.me...@concept-solutions.com> wrote: 
>> 
>> 
>>> On 2019/04/02 22:25:16, jim.me...@concept-solutions.com 
>>> <jim.me...@concept-solutions.com> wrote: 
>>> 
>>> 
>>>> On 2019/04/02 21:59:21, Hans Jespersen <h...@confluent.io> wrote: 
>>>> yes. Idempotent publish uses a unique messageID to discard potential 
>>>> duplicate messages caused by failure conditions when  publishing.
>>>> 
>>>> -hans  
>>>> 
>>>>> On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com 
>>>>> <jim.me...@concept-solutions.com> wrote:
>>>>> 
>>>>> Does Kafka have something that behaves like a unique key so a producer 
>>>>> can’t write the same value to a topic twice?
>>> 
>>> Hi Hans,
>>> 
>>>    Is there some documentation or an example with source code where I can 
>>> learn more about this feature and how it is implemented?
>>> 
>>> Thanks,
>>> Jim
>> 
>> By the way I tried this...
>> echo "key1:value1" | ~/kafka/bin/kafka-console-producer.sh --broker-list 
>> localhost:9092 --topic TestTopic --property "parse.key=true" --property 
>> "key.separator=:" --property "enable.idempotence=true" > /dev/null
>> 
>> And... that didn't seem to do the trick - after running that command 
>> multiple times I did receive key1 value1 for as many times as I had run the 
>> prior command.
>> 
>> Maybe it is the way I am setting the flags...
>> Recently I saw that someone did this...
>> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test 
>> --producer-property enable.idempotence=true --request-required-acks -1
> 
> Also... the reason for my question is that we are going to have two JMS 
> topics with nearly redundant data in them have the UNION written to Kafka for 
> further processing.
>

Re: Something like a unique key to prevent same record from being inserted twice?

Reply via email to