Asmath

In a traditional installation, regardless of how a NiFi cluster obtains
data (kafka, ftp, HTTP calls, TCP listening, etc, ....) once it is
responsible for the data it has ack'd its receipt to the source(s).

If that NiFi node were to become offline the data it owns is delayed. If
that node becomes unrecoverably offline the data is likely to be lost.

If you're going to run in environments where there are more powerful
storage alignment options like in many Kubernetes based deployments then
there are definitely options to solve the possibility of loss case to a
very high degree and to ensure there is only minimal data delay in the
worst case.

In a Hadoop style environment though the traditional model I describe works
very well, leverages appropriate RAID, and is proven highly reliable and
durable.

Thanks

On Tue, Aug 11, 2020 at 7:26 AM KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Hi,
>
> [image: image.png]
>
> we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3
> were disconnected when the flow was running . Consume kafka was reading
> data from all node settings and loading the data into the database.
>
> In the above scenario, is there a possibility of loss of data?
> Distributed processing in terms of hadoop will handle it automatically and
> assign the task to other active nodes. Will it be the same case with the
> NIFI cluster?
>
> Thanks,
> Asmath
>

Reply via email to