You don't need to run the Kafka processor only on the primary node. The
Kafka client will take care of doing the proper partition-consumer
assignment across the nodes and threads in the NiFi cluster.

We're aware of many NiFi users consuming data from Kafka at very large
scale without any issue duplicate issue.

As mentioned before, I'd recommend looking at the provenance data for all
generated flow file to really understand where there may be duplication.
I'd also look at the Kafka logs and search for rebalancing / reassignment
for consumers / partitions.

Pierre

Le sam. 19 nov. 2022, 19:33, Joe Obernberger <joseph.obernber...@gmail.com>
a écrit :

> Are you by chance using a clustered NiFi?  I'm seeing duplicate messages
> if I run the consumer on multiple NiFi nodes, so I've started running the
> consumer only on the parent.  This seems to correct the issue, but leads to
> other problems.  I'd love a solution.
>
> -Joe
> On 11/16/2022 3:50 AM, Aian Cantabrana wrote:
>
> Hi Joe,
>
> Thanks for the reply. The actual flow is sending data from the ConsumeAMQP
> processor to two different PublishKafka processors, one with Idempotence
> and other with default config. Each of it is sending same data to two
> different topics and comparing both topics is how I am checking that there
> are duplicates. It seems to be random, some times they appear in the
> "normal" processor's topic and others in the "idempotence", I did not find
> any pattern.
>
> I will upgrade to NiFi 1.18.0 and try again.
>
> In any case, messages have json format (one json per flowfile) but since I
> am sending and storing them in kafka in plain text I am using
> *no-record-oriented* Kafka publisher. Is PublishKafkaRecord more
> reliable? Would it be better to use it?
>
> Thanks,
>
> Aian
>
> ------------------------------
> *De: *"Joe Witt" <joe.w...@gmail.com> <joe.w...@gmail.com>
> *Para: *"users" <users@nifi.apache.org> <users@nifi.apache.org>
> *Enviados: *Martes, 15 de Noviembre 2022 17:31:54
> *Asunto: *Re: Exacly once from NiFi to Kafka
>
> Aian,
> How can you tell there are duplicates in Kafka and are you certain that no
> duplicates exist in the source topic?
>
> Given NiFi's data provenance capabilities you should be able to pin point
> a given duplicate and figure out whether it happened at the source, in
> nifi, or otherwise.
>
> Note much has changed/improved since the 1.12.x line of NiFi so we have
> more Kafka components and record oriented mechanisms.  But still pretty
> sure even in your version we should not be duplicating data unless the flow
> is configured such that it would happen.
>
> Thanks
>
> On Tue, Nov 15, 2022 at 9:25 AM Aian Cantabrana <acantabr...@zylk.net>
> wrote:
>
>> Hi,
>>
>> I am having some difficulties trying to get *exactly-once *semantic
>> while ensuring data order from NiFi to Kafka. I have read Kafka
>> documentation and it should be quite straight forward using idempotent
>> producer from NiFi and having a Kafka topic with a single partition, but I
>> am still getting some duplicated messages in Kafka.
>>
>> NiFi version: 1.12.1
>> Kafka version: 2.7.0
>>
>> NiFi flow:
>> (Both shown queues with FIFO prioritizer)
>>
>> PublishKafka_2_6 configuration:
>>
>> As I said, target Kafka topic has just one partition to ensure data order.
>>
>> Incoming flowfiles are small 60 bytes messages.
>>
>> I have been a while working with it so any suggestion is really welcome.
>>
>> Thanks in advance,
>>
>> Aian
>>
>
>
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> Virus-free.www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> <#m_5138987603471460647_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>

Reply via email to