RE: [EXTERNAL] Re: ConsumeKafka_2_6 Processor issue

Chirthani, Deepak Reddy Thu, 15 Aug 2024 09:49:29 -0700

Joe,

Ok, understood. Since you are confident that the messages are being 100% 
consumed in both the cases(original flow and duplicate flow), how can messages 
ends up not reaching to the Putmongorecord processor in the original dataflow 
if the logic the flow is same in both the dataflows(Same processors and same 
connections)?
From: Joe Witt <[email protected]>
Sent: Thursday, August 15, 2024 11:29 AM
To: [email protected]
Subject: Re: [EXTERNAL] Re: ConsumeKafka_2_6 Processor issue


CAUTION: The e-mail below is from an external source. Please exercise caution 
before opening attachments, clicking links, or following guidance.

Got it - yep focusing on the flow design itself is where I'd go for now.  
Consider every config that could in the case of errors/unexpected data allow 
things to leave the flow.  Using provenance data can be valuable to helping 
review this.  Add things into the flow that extract metadata that provenance 
can index and then when you find a missing event you could likely search it.  
There are a lot of techniques to help hunt such things down but in the vast 
majority of cases it is a relationship routing to terminate or something other 
than ensuring it ends up going to the destination.

Thanks

On Thu, Aug 15, 2024 at 9:20 AM Chirthani, Deepak Reddy 
<[email protected]<mailto:[email protected]>>
 wrote:
Hi Joe,

Both the original and the duplicate flows are same except for two changes which 
are kakfa group id and the target mongo collection name. Each processor in the 
dataflow has all the relationships other than success connected to a funnel. 
Therefore, I should see data if any processor is routing to a relationship 
other than success which I don’t see in both the flows.

I agree with you that its generally unlikely there is data loss between kafka 
and nifi but in this case I can clearly see it by querying both the collections 
with the same eventid/transactionid
From: Joe Witt <[email protected]<mailto:[email protected]>>
Sent: Thursday, August 15, 2024 11:11 AM
To: [email protected]<mailto:[email protected]>
Subject: [EXTERNAL] Re: ConsumeKafka_2_6 Processor issue

CAUTION: The e-mail below is from an external source. Please exercise caution 
before opening attachments, clicking links, or following guidance.

Hello

The most likely scenario at play here is that configuration of the flow results 
in certain messages/events/flowfiles being routed to a failure path or some 
path that does not end up in Mongo.  It is highly unlikely there is loss 
between Kafka and NiFi and between NiFi and Mongo.  The more likely scenario is 
a configuration within the flow in nifi which directs certain data in certain 
conditions to be thrown out.

Have you reviewed every possible relationship and how it is handled in the flow?

Thanks

On Thu, Aug 15, 2024 at 8:56 AM Chirthani, Deepak Reddy 
<[email protected]<mailto:[email protected]>>
 wrote:
Hi guys,


I have a dataflow in a Nifi 3-node clustered environment reading from a kafka 
topic and writing to a mongodb collection target1. We are not filtering any 
messages in the dataflow as well. The groupid for this consumer is test1 and 
this consumer has been active since two years

From at least a month, the business customers have been reporting us that they 
are missing data in the target which are they sure to be publishing. Even the 
kafka team helped us searching the message(s) on the kafka brokers which the 
customers claim that they were sure to be publishing. So its evident that the 
consumer is not picking them up.

Now, I did set-up a new dataflow in nifi duplicating the original dataflow. I 
made two differences. New group-id test2 for reading messages and new target 
collection target2 for writing the data. Apparently this duplicate dataflow is 
consuming the expected number of messages.



Now, I changed the groupid in the original dataflow from test1 to newgrouped 
and the rest of the consumekafka processor configuration remains same including 
the offset reset which is latest. Both the original and duplicate dataflows are 
running from quite some time but still the issue exists with the original 
dataflow. The duplicate dataflow is keep on doing good consuming the expected 
number of messages, parsing and loading the processed data to the target.



Please advise what could be the issue and how to resolve this.



Additional note:

Nifi Version: 1.21.0

Number of concurrent threads on the consumekafka processor: 2

Number of kafka partitions on the topic: 5



Thanks



The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient, you are notified that any use, dissemination, 
distribution, copying, or storage of this message or any attachment is strictly 
prohibited.
The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient, you are notified that any use, dissemination, 
distribution, copying, or storage of this message or any attachment is strictly 
prohibited.
The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient, you are notified that any use, dissemination, 
distribution, copying, or storage of this message or any attachment is strictly 
prohibited.

RE: [EXTERNAL] Re: ConsumeKafka_2_6 Processor issue

Reply via email to