Axel, I think that I can help clarify some of these things.
First of all: nifi.cluster.load.balance.host vs. nifi.cluster.load.balance.address * The nifi.cluster.load.balance.host property is what matters. * The nifi.cluster.load.balance.address is not a real property. NiFi has never looked at this property. However, in the first release that included load-balancing, there was a typo in which the nifi.properties file had “…address” instead of “…host”. This was later addressed. * So if you have a value for “nifi.cluster.load.balance.address”, it does nothing and is always ignored. Next: nifi.cluster.load.balance.host property * nifi.cluster.load.balance.host can be either an IP address or a hostname. But if set, other nodes in the cluster MUST be able to communicate with the node using whatever value you put here. So using a value of 0.0.0.0 will not work. Also, if set, NiFi will listen for incoming connections ONLY on that hostname. So if you set it to “localhost”, for instance, no other node can connect to it, because no other host can connect to the node using “localhost”. So this needs to be an address that both the NiFi instance knows about/can bind to, and other nodes in the cluster can connect to. * If nifi.cluster.load.balance.host is NOT set: NiFi will listen for incoming requests on all network interfaces / hostnames. It will advertise its hostname to other nodes in the cluster according to whatever is set for the “nifi.cluster.node.address” property. Meaning that other nodes in the cluster must be able to connect to this node using whatever hostname is set for the “nifi.cluster.node.address” property. If the “nifi.cluster.node.address” property is not set, it advertises its hostname as localhost - which means other nodes won’t be able to send to it. So you must specify either the “nifi.cluster.load.balance.host” property or the “nifi.cluster.node.address” property. Finally: having to delete the state directory If you change the “nifi.cluster.load.balance.host” or “nifi.cluster.load.balance.port” property and restart a node, you must restart all nodes in the cluster. Otherwise, the other nodes won’t be able to send to that node. So, for example, when you changed the load.balance.host from fqdn or 0.0.0.0 to the IP address - the other nodes in the cluster would stop sending. I created a JIRA [1] for that. In my testing, when I changed the hostname, the other nodes stopped sending. But restarting them got things back on track. I wasn’t able to replicate the issue after restarting all nodes. Hope this is helpful! -Mark [1] https://issues.apache.org/jira/browse/NIFI-9017 On Aug 3, 2021, at 3:08 AM, Axel Schwarz <axelkop...@emailn.de<mailto:axelkop...@emailn.de>> wrote: Hey guys, I think I found the "trick" for at least version 1.13.2 and of course I'll share it with you. I now use the following load balancing properties: # cluster load balancing properties # nifi.cluster.load.balance.host=192.168.1.10 nifi.cluster.load.balance.port=6342 nifi.cluster.load.balance.connections.per.node=4 nifi.cluster.load.balance.max.thread.count=8 nifi.cluster.load.balance.comms.timeout=30 sec So I use the hosts IP address for balance.host instead of 0.0.0.0 or the fqdn and have no balance.address property at all. This led to partly load balancing in my case as already mentioned. It looked like I needed to do one more step to reach the goal and this step seems to be deleting all statemanagement files. Through the state-management.xml config file I changed the state management directory to be outside of the nifi installation, because the config file says "it is important, that the directory be copied over to the new version when upgrading nifi". So everytime when I upgraded or reinstalled Nifi during my load balancing odyssey, the statemanagement remained completely untouched. As soon as I changed that, by deleting the entire state management directory before reinstalling Nifi with above mentioned properties, load balancing was immediately working throughout the whole cluster. I think for my flow it is not quite that bad to delete the state management as I only use one statefull processor to increase some counter. And the times I already tried this by now, I could not encounter any wrong behaviour whatsoever. But of course I can't test everything, so when any of you have some important facts about deleting the state management, please let me know :) Beside that I now feel like this solved my problem. Gotta have an eye on that when updating to version 1.14.0 later on, but I think I can figure this out. So thanks for all your support! :) --- Ursprüngliche Nachricht --- Von: "Jens M. Kofoed" <jmkofoed....@gmail.com<mailto:jmkofoed....@gmail.com>> Datum: 29.07.2021 11:08:28 An: users@nifi.apache.org<mailto:users@nifi.apache.org>, Axel Schwarz <axelkop...@emailn.de<mailto:axelkop...@emailn.de>> Betreff: Re: Re: Re: No Load Balancing since 1.13.2 Hmm... I can't remember :-( sorry My configuration for version 1.13.2 is like this: # cluster node properties (only configure for cluster nodes) # nifi.cluster.is.node=true nifi.cluster.node.address=nifi-node01.domaine.com<http://nifi-node01.domaine.com> nifi.cluster.node.protocol.port=9443 nifi.cluster.node.protocol.threads=10 nifi.cluster.node.protocol.max.threads=50 nifi.cluster.node.event.history.size=25 nifi.cluster.node.connection.timeout=5 sec nifi.cluster.node.read.timeout=5 sec nifi.cluster.node.max.concurrent.requests=100 nifi.cluster.firewall.file= nifi.cluster.flow.election.max.wait.time=5 mins nifi.cluster.flow.election.max.candidates=3 # cluster load balancing properties # nifi.cluster.load.balance.address=192.168.1.11 nifi.cluster.load.balance.port=6111 nifi.cluster.load.balance.connections.per.node=4 nifi.cluster.load.balance.max.thread.count=8 nifi.cluster.load.balance.comms.timeout=30 sec So I defined "nifi.cluster.node.address" with the hostname and not an ip adress and the "nifi.cluster.load.balance.address" with the ip address of the server. And triple check the configuration at all servers :-) Kind Regards Jens M. Kofoed Den tor. 29. jul. 2021 kl. 10.11 skrev Axel Schwarz <axelkop...@emailn.de<mailto:axelkop...@emailn.de>>: Hey Jens, in Issue Nifi-8643 you wrote the last comment with the exactly same behaviour as we're experiencing now. 2 of 3 nodes were load balancing. How did you get the third node to participate in load balancing? An update to 1.14.0 does not change anything for us. https://issues.apache.org/jira/browse/NIFI-8643?focusedCommentId=17361418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17361418 --- Ursprüngliche Nachricht --- Von: "Jens M. Kofoed" <jmkofoed....@gmail.com<mailto:jmkofoed....@gmail.com>> Datum: 28.07.2021 12:07:50 An: users@nifi.apache.org<mailto:users@nifi.apache.org>, Axel Schwarz <axelkop...@emailn.de<mailto:axelkop...@emailn.de>> Betreff: Re: Re: No Load Balancing since 1.13.2 hi I can see that you have configured nifi.cluster.load.balance.address=0.0.0.0 Have your tried to set the correct ip adress? node1: nifi.cluster.load.balance.address=192.168.1.10 node2: nifi.cluster.load.balance.address=192.168.1.11 node3: nifi.cluster.load.balance.address=192.168.1.12 regards Jens M. Kofoed Den ons. 28. jul. 2021 kl. 11.17 skrev Axel Schwarz < axelkop...@emailn.de<mailto:axelkop...@emailn.de>>: Just tried Java 11. But still does not work. Nothing changed. :( --- Ursprüngliche Nachricht --- Von: Jorge Machado <jom...@me.com<mailto:jom...@me.com>> Datum: 27.07.2021 13:08:55 An: users@nifi.apache.org<mailto:users@nifi.apache.org>, Axel Schwarz <axelkop...@emailn.de<mailto:axelkop...@emailn.de>> Betreff: Re: No Load Balancing since 1.13.2 Did you tried java 11 ? I have a client running a similar setup to yours but with a lower nigh version and it works fine. Maybe it is worth to try it. On 27. Jul 2021, at 12:42, Axel Schwarz <axelkop...@emailn.de<mailto:axelkop...@emailn.de>> wrote: I did indeed, but I updated from u161 to u291, as this was the newest version at that time, because I thought it could help. So the issue started under u161. But I just saw that u301 is out. I will try this as well. --- Ursprüngliche Nachricht --- Von: Pierre Villard <pierre.villard...@gmail.com<mailto:pierre.villard...@gmail.com>> Datum: 27.07.2021 10:18:38 An: users@nifi.apache.org<mailto:users@nifi.apache.org>, Axel Schwarz <axelkop...@emailn.de<mailto:axelkop...@emailn.de>> Betreff: Re: No Load Balancing since 1.13.2 Hi, I believe the minor u291 is known to have issues (for some of its early builds). Did you upgrade the Java version recently? Thanks, Pierre Le mar. 27 juil. 2021 à 08:07, Axel Schwarz <axelkop...@emailn.de<mailto:axelkop...@emailn.de> <mailto:axelkop...@emailn.de>> a écrit : Dear Community, we're running a secured 3 node Nifi Cluster on Java 8_u291 and Debian 7 and experiencing problems with load balancing since version 1.13.2. I'm fully aware of Issue Nifi-8643 and tested alot around this, but gotta say, that this is not our problem. Mainly because the balance port never binds to localhost, but also because I implemented all workarounds under version 1.13.2 and even tried version 1.14.0 by now, but load blancing still does not work. What we experience is best described as "the primary node balances with itself"... So what it does is, opening the balancing connections to its own IP instead of the IPs of the other two nodes. And the other two nodes don't open balancing connections at all. When executing "ss | grep 6342" on the primary node, this is what it looks like: [root@nifiHost1 conf]# ss | grep 6342 tcp ESTAB 0 0 192.168.1.10:51380 < http://192.168.1.10:51380/> 192.168.1.10:6342 <http://192.168.1.10:6342/> tcp ESTAB 0 0 192.168.1.10:51376 < http://192.168.1.10:51376/> 192.168.1.10:6342 <http://192.168.1.10:6342/> tcp ESTAB 0 0 192.168.1.10:51378 < http://192.168.1.10:51378/> 192.168.1.10:6342 <http://192.168.1.10:6342/> tcp ESTAB 0 0 192.168.1.10:51370 < http://192.168.1.10:51370/> 192.168.1.10:6342 <http://192.168.1.10:6342/> tcp ESTAB 0 0 192.168.1.10:51372 < http://192.168.1.10:51372/> 192.168.1.10:6342 <http://192.168.1.10:6342/> tcp ESTAB 0 0 192.168.1.10:6342 < http://192.168.1.10:6342/> 192.168.1.10:51376 <http://192.168.1.10:51376/> tcp ESTAB 0 0 192.168.1.10:51374 < http://192.168.1.10:51374/> 192.168.1.10:6342 <http://192.168.1.10:6342/> tcp ESTAB 0 0 192.168.1.10:6342 < http://192.168.1.10:6342/> 192.168.1.10:51374 <http://192.168.1.10:51374/> tcp ESTAB 0 0 192.168.1.10:51366 < http://192.168.1.10:51366/> 192.168.1.10:6342 <http://192.168.1.10:6342/> tcp ESTAB 0 0 192.168.1.10:6342 < http://192.168.1.10:6342/> 192.168.1.10:51370 <http://192.168.1.10:51370/> tcp ESTAB 0 0 192.168.1.10:6342 < http://192.168.1.10:6342/> 192.168.1.10:51366 <http://192.168.1.10:51366/> tcp ESTAB 0 0 192.168.1.10:51368 < http://192.168.1.10:51368/> 192.168.1.10:6342 <http://192.168.1.10:6342/> tcp ESTAB 0 0 192.168.1.10:6342 < http://192.168.1.10:6342/> 192.168.1.10:51372 <http://192.168.1.10:51372/> tcp ESTAB 0 0 192.168.1.10:6342 < http://192.168.1.10:6342/> 192.168.1.10:51378 <http://192.168.1.10:51378/> tcp ESTAB 0 0 192.168.1.10:6342 < http://192.168.1.10:6342/> 192.168.1.10:51368 <http://192.168.1.10:51368/> tcp ESTAB 0 0 192.168.1.10:6342 < http://192.168.1.10:6342/> 192.168.1.10:51380 <http://192.168.1.10:51380/> Executing it on the other non primary nodes, just returns absolutely nothing. Netstat show the following on each server: [root@nifiHost1 conf]# netstat -tulpn Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 192.168.1.10:6342 <http://192.168.1.10:6342/> 0.0.0.0:* LISTEN 10352/java [root@nifiHost2 conf]# netstat -tulpn Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 192.168.1.11:6342 <http://192.168.1.11:6342/> 0.0.0.0:* LISTEN 31562/java [root@nifiHost3 conf]# netstat -tulpn Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 192.168.1.12:6342 <http://192.168.1.12:6342/> 0.0.0.0:* LISTEN 31685/java And here is what our load balancing properties look like: # cluster load balancing properties # nifi.cluster.load.balance.host=nifiHost1.contoso.com<http://nifiHost1.contoso.com> < http://nifihost1.contoso.com/> nifi.cluster.load.balance.address=0.0.0.0 nifi.cluster.load.balance.port=6342 nifi.cluster.load.balance.connections.per.node=4 nifi.cluster.load.balance.max.thread.count=8 nifi.cluster.load.balance.comms.timeout=30 sec When running Nifi in version 1.12.1 on the exact same setup in the exact same environment, load balancing is working absolutely fine. There was a time when load balancing even worked in version 1.13.2. But I'm not able to reproduce this and it just stopped working one day after some restart, without changing any property or whatsoever. If any more information would be helpful please let me know and I'll try to provide it as fast as possible. Versendet mit Emailn.de<http://Emailn.de> <https://www.emailn.de/> - Freemail * Unbegrenzt Speicherplatz * Eigenes Online-Büro * 24h besten Mailempfang * Spamschutz, Adressbuch Versendet mit Emailn.de<http://Emailn.de> <https://www.emailn.de/> - Freemail * Unbegrenzt Speicherplatz * Eigenes Online-Büro * 24h besten Mailempfang * Spamschutz, Adressbuch