Hi Arthur,

I'm afraid I can't answer your questions regarding integrating NiFi with other 
software, but for the developers that do, maybe it is a good idea to describe 
the functionality you intend to add by integrating NiFi in an external 
application?

The original intent of NiFi comes from its NSA roots: Handle a very large and 
unpredictable influx of data and enable analysts to deal with errors right 
there and then, in the live environment. As you've noted this makes it an easy 
tool to pick up and use quickly, but it doesn't fit well in the current 
everything-as-code-from-a-pipeline world. The NiFi contributors have done a 
great job of providing good management options including the registry clients 
and a full-featured API, but there is still tension between those two ways of 
working.

What I can tell you is that we experimented with deploying/managing flows from 
a pipeline and/or client application using the NiFi API (through NiPyApi python 
package for pipeline stuff). This worked quite well for deployment, but the 
external operations tool was too cumbersome in our scenario. Developers much 
preferred to use NiFi in the originally intended way: Investigate and fix 
problems on the Canvas itself. The platform team also preferred not to have to 
add a newly found exception to our tool every week.

Scaling

NiFi was designed to handle many flows in a single instance/cluster, making 
optimal use of the shared resources through dynamic scheduling of processors, 
streaming large data to disk directly and loadbalancing across servers. Scaling 
is achieved by adding more servers to the cluster or increasing resources per 
server. There is a bottleneck with the primary node for certain processes, but 
that is also being improved.

Deploying multiple small NiFi instances works well too and I think more users 
are shifting to that model in a Kubernetes world. Typical reasons to split 
would be optimizing for workload type (streaming/low latency vs batch 
processing), limiting by security scope or dynamic scaling requirements. You're 
still running a fairly large instance with a webserver and everything related, 
so NiFi itself is not a good fit for running single flows.

Single flows is what MiNiFi is best suited for. Originally designed to run on 
edge devices or workload servers to collect data locally, it provides a solid 
option for running a remotely managed flow. You get rid of the large overhead 
of the webserver and clustering stuff and instead start a lean instance (like a 
Kubernetes pod) that does only the minimum needed and can be stopped again 
after finishing. This works great for flows like IoT data that don't change 
often. The overhead is in testing and debugging/redeploying when things don't 
work as planned yet. In our case the tradeoff wasn't worth it at the time, 
especially since the data collection flows are a sideshow for the developers 
and they greatly appreciate the intuitive development and testing on the NiFi 
canvas without the extra steps for MiNiFi.

Our biggest deployment feeds a datalake and it has many scheduled flows that 
trigger once per day or a few times at most, load a large amount of data, do 
some memory-intensive stuff and then finish again. The cluster nodes have 64 GB 
RAM, not even that big and they deal with a few hundred flows just fine. 
Dealing with contention typically means shifting some flow's schedule a few 
minutes or telling a team to be less ambitious in their transforming of GB+ 
files. The "shared everything" model does mean that we've had occasional 
incidents where developers made a flow that soft-hung the cluster by exhausting 
heap or threads.

I hope this helps you get some perspective.

Regards,

Isha


From: Derewjankin, Arthur via users <[email protected]>
Sent: Monday, 30 March 2026 09:46
To: [email protected]
Cc: Derewjankin, Arthur <[email protected]>
Subject: Question on embedding/integrating Apache NiFi with Quarkus 
applications and scalability considerations

Dear Apache NiFi Community,

I hope you are doing well.
I am working as a Business Analyst and have recently been exploring Apache 
NiFi. I really appreciate the way data flows can be modeled and visualized-it's 
been very intuitive and powerful for our use cases.
We are currently evaluating whether we can integrate NiFi into one of our 
Quarkus-based applications. We also looked into MiNiFi as a lightweight 
alternative. However, based on our initial proof of concept, it seems that 
embedding or tightly integrating NiFi (or MiNiFi) directly into an application 
might not be the intended usage pattern, as we were not able to make this 
approach work successfully.
In addition, one of our goals was to deploy each flow independently (e.g., one 
flow per deployment unit) in order to achieve a high degree of scalability and 
isolation. This also proved challenging in our experiments.
This leads to a few questions we were hoping the community could help clarify:

  1.  Is embedding NiFi (or MiNiFi) within an application such as a Quarkus 
service a supported or recommended approach?
  2.  Is deploying individual flows as separate, independently scalable units 
aligned with NiFi's design, or does this conflict with its intended usage model?
  3.  If these approaches are not recommended, could you share the reasoning 
behind these architectural decisions?
  4.  Our concern is that NiFi might become too heavyweight over time in our 
scenario:
     *   We expect a growing number of flows
     *   Many flows would run at scheduled times each day
     *   Data volumes vary significantly-from small messages to files in the GB 
range
     *   Flows would include polling from external systems, importing, and 
exporting data
Given these characteristics, would NiFi still be an appropriate choice, or 
should it rather be operated as a standalone service instead of being 
integrated into an application?
We would greatly appreciate any guidance, best practices, or architectural 
recommendations you can share based on your experience.
Thank you very much for your time and support.

Best regards,
Arthur

P.S. Resending this as I recently subscribed to the mailing list and my 
previous message may not have gone through.


[Logo]<https://www.l-p-a.com/>
Arthur Derewjankin
Business Analyst
Lucht Probst Associates GmbH
Große Gallusstraße 9
Frankfurt am Main 60311, Hessen, DE
e: [email protected]<mailto:[email protected]> | w: 
www.l-p-a.com<http://www.l-p-a.com/>
m: | p: +4969971485245
LinkedIn<https://www.linkedin.com/company/lpa-lucht-probst-associates-gmbh>

[Logo]<https://www.l-p-a.com/wp-content/uploads/payoff-magazine_0326_RZ.pdf>
________________________________

The information contained in this e-mail is privileged and confidential, 
intended only for the use of the recipient named above. If the reader of this 
message is not the above mentioned recipient or is not acting on behalf of the 
respective recipient, please note that any saving, distribution or copying of 
this e-mail is strictly prohibited. Please notify us immediately by telephone 
at + 49-69-97 14 85-0 or e-mail if you have received such misdirected messages, 
and kindly destroy this e-mail. Thank you.
Lucht Probst Associates GmbH, Große Gallusstraße 9, 60311 Frankfurt, Germany
Managing Directors: Stefan Lucht
Vat id number : DE203779866 I commercial register entry: Amtsgericht 
Frankfurt/Main I commercial register no.: HRB 48809
All necessary information for processing your data is available here: 
https://www.l-p-a.com/privacy-policy

Reply via email to