Hello,

I am working on deployment automation for NiFi server running on AWS. I am 
starting with a single instance running NiFi v1.7.1 with the plan to move to 
cluster setup later. The NiFi server is running on an EC2 instance in 
autoscaling group for self-healing and behind an application load balancer and 
nginx (ALB -> nginx -> NiFi); the users are authenticated using OIDC. The NiFi 
repositories are stored on a dedicated EBS volume.

I am now considering options for high availability where the server instance 
will be re-created in another availability zone in case of an AZ failure. The 
most obvious option is to snapshot the EBS volume with the repositories in 
regular intervals. The instance would then create a volume from the latest 
snapshot if it fails over to another AZ and continue running.

The snapshots will be taken frequently, say in 15 minute intervals, so they 
will be taken while the NiFi service is running. In my understanding, NiFi 
implements WAL for data consistency in case of NiFi service failure. Is it safe 
to assume that this will also prevent data corruption in the case of 
snapshotting hot EBS volumes? I understand there will be a gap in the data 
stored in the snapshots of up to 15 minutes in this scenario, but I would like 
to prevent data corruption.

I am also considering using AWS EFS volumes as an alternative to EBS volumes. 
Data stored on EFS volumes are replicated across all AZs, so there would be no 
need for snapshots, but I am concerned about the higher latency of disk I/O 
operations and perhaps also about the EFS consistency model not being a good 
fit for NiFi. Is anybody using EFS with NiFi?

https://docs.aws.amazon.com/efs/latest/ug/how-it-works.html#consistency

Best regards
Elemir

Reply via email to