What a substantive response!  Thank you for this info/your perspective. 

VR,

Dave 

Sent from my iPhone

On Feb 8, 2023, at 5:17 PM, Adam Taft <a...@adamtaft.com> wrote:


Isha,

Just some perspective from the field. I have had success with containerized NiFi and generally get along with it. That being said, I think there a few caveats and issues you might find going down this road.

Standalone NiFi in a container works pretty much the way you would want and expect. You do need to be careful about where you are mounting your NiFi configuration directories, though. e.g. content_repository, database_repository, flowfile_repository, provenance_repository, state, logs and work. All of these directories are actively written by NiFi and it's good to have these exported as bind mounts external from the container.

You will definitely want to bind mount the flow.xml.gz and flow.json.gz files as well, or you will lose your live dataflow configuration changes as you use NiFi. Any change to your nifi canvas gets written into flow.xml.gz, which means you need to keep a copy of it outside of your container. And there's potentially other files in the conf folder that you also want to keep around. NiFi unfortunately doesn't organize the location of all these directories into a single location by default, so you kind of have to reconfigure and/or bind mount a lot of different paths.

I have found that NiFi clustering with a dockerized environment to be less desirable. Primarily the problem is that the definition of cluster nodes is mostly hard coded into the nifi.properties file. Usually in a containerized environment, you want the ability to dynamically bring nodes up/down as needed (with dynamic IP/network configuration), especially in container orchestration frameworks like kubernetes. There's been a lot of experiments and possibly even some reasonable solutions coming out to help with containerized clusters, but generally you're going to find you have to crack your knuckles a little bit to get this to work. If you're content with a mostly statically defined non-elastic cluster configuration, then a clustered NiFi on docker is possible.

As an option, if you stick with standalone deployments, what you can instead do instead is front your individual NiFi node instances with a load balancer. This may be a poor-man's approach to load distribution, but it works reasonably well and I've seen it in action on large volume flows. If you have the option that your data source can deliver to a load balancer, then you can have the load balancer round-robin (or similar) to your underlying standalone nodes. In a container orchestration environment, you can imagine kubernetes being able to spin up and spin down containerized nodes to handle demand, and managing a load balancer configuration as those nodes are coming up. It's all possible, but will require some work.

Of course, doing anything with multiple standalone nodes, means that you have to propagate changes from one NiFi canvas to all your nodes manually. This is a huge pain and not really scalable. So the load balancer approach is only good if your dataflow configurations are very static and don't change day-to-day with operations.

That is, one of the issues with containerized NiFi is what to do with the flow configuration itself. On the one hand, you kind of want to "burn in" your flow configuration into your docker image. e.g. the flow.xml.gz and/or flow.json.gz would be included as part of your image itself. This enables your NiFi system to come up with a fully configured set of processors ready to accept connections.

But part of the fun with NiFi is being able to make dataflow and processor configuration changes on the fly as needed based on operational conditions. For example, maybe you need to temporarily stop data moving to one location and have it transported to another. This "live" and dynamic way to manage NiFi is a powerful feature, but it kind of goes against the grain of a containerized or static deployment approach. e.g. new nodes coming online will not necessarily have the latest configuration changes that your operational staff has added recently. The NiFi registry can somewhat help here.

Finally to give a shout out, you may want to consider using a dockerized minifi cluster instead of traditional NiFi. Minifi is maybe slightly more aligned with a containerized clustering approach as Minifi more directly supports this concept of a "burned in" processor configuration. In this way, Minifi nodes can be spun up or down based on demand, without too much fuss.e.g. minifi isn't really cluster aware and each node acts independently, making it a bit easier solution for containerized or dynamic deployments.

Hope this gives you some thoughts. There are definitely a lot of recipes and approaches to containerized NiFi, so do some searching to find one that matches what you're after. Almost any configuration can be done, based on your needs.

/Adam



On Fri, Jan 27, 2023 at 3:15 AM Isha Lamboo <isha.lam...@virtualsciences.nl> wrote:

Hi all,

 

I’m looking for some perspectives from people using NiFi deployed in containers (Docker or otherwise).

 

It seems to me that the NiFi architecture benefits from having a lot of compute resources to share for all flows, especially with large batches arriving periodically. On the other hand, it’s hard to prevent badly tuned flows from impacting others and more and more IT operations are moving to containerized environments, so I’m exploring the options for containerized NiFi as an alternative to our current VM-based approach.


Do you deploy a few large containers similar in capacity to a VM to run all flows together or many small ones with only a few flows on each? And do you deploy them clustered or standalone?

 

Thanks,

 

Isha

Reply via email to