Re: No Load Balancing since 1.13.2

Mark Payne Thu, 05 Aug 2021 14:08:39 -0700

Axel,

I think that I can help clarify some of these things.


First of all: nifi.cluster.load.balance.host vs. 
nifi.cluster.load.balance.address
* The nifi.cluster.load.balance.host property is what matters.

* The nifi.cluster.load.balance.address is not a real property. NiFi has never 
looked at this property. However, in the first release that included 
load-balancing, there was a typo in which the nifi.properties file had 
“…address” instead of “…host”. This was later addressed.

* So if you have a value for “nifi.cluster.load.balance.address”, it does 
nothing and is always ignored.



Next: nifi.cluster.load.balance.host property

* nifi.cluster.load.balance.host can be either an IP address or a hostname. But 
if set, other nodes in the cluster MUST be able to communicate with the node 
using whatever value you put here. So using a value of 0.0.0.0 will not work. 
Also, if set, NiFi will listen for incoming connections ONLY on that hostname. 
So if you set it to “localhost”, for instance, no other node can connect to it, 
because no other host can connect to the node using “localhost”. So this needs 
to be an address that both the NiFi instance knows about/can bind to, and other 
nodes in the cluster can connect to.

* If nifi.cluster.load.balance.host is NOT set: NiFi will listen for incoming 
requests on all network interfaces / hostnames. It will advertise its hostname 
to other nodes in the cluster according to whatever is set for the 
“nifi.cluster.node.address” property. Meaning that other nodes in the cluster 
must be able to connect to this node using whatever hostname is set for the 
“nifi.cluster.node.address” property. If the “nifi.cluster.node.address” 
property is not set, it advertises its hostname as localhost - which means 
other nodes won’t be able to send to it.

So you must specify either the “nifi.cluster.load.balance.host” property or the 
“nifi.cluster.node.address” property.



Finally: having to delete the state directory

If you change the “nifi.cluster.load.balance.host” or 
“nifi.cluster.load.balance.port” property and restart a node, you must restart 
all nodes in the cluster. Otherwise, the other nodes won’t be able to send to 
that node.
So, for example, when you changed the load.balance.host from fqdn or 0.0.0.0 to 
the IP address - the other nodes in the cluster would stop sending. I created a 
JIRA [1] for that. In my testing, when I changed the hostname, the other nodes 
stopped sending. But restarting them got things back on track. I wasn’t able to 
replicate the issue after restarting all nodes.

Hope this is helpful!
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-9017


On Aug 3, 2021, at 3:08 AM, Axel Schwarz 
<[email protected]<mailto:[email protected]>> wrote:

Hey guys,

I think I found the "trick" for at least version 1.13.2 and of course I'll 
share it with you.
I now use the following load balancing properties:

# cluster load balancing properties #
nifi.cluster.load.balance.host=192.168.1.10
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

So I use the hosts IP address for balance.host instead of 0.0.0.0 or the fqdn 
and have no balance.address property at all.
This led to partly load balancing in my case as already mentioned. It looked 
like I needed to do one more step to reach the goal and this step seems to be 
deleting all statemanagement files.

Through the state-management.xml config file I changed the state management 
directory to be outside of the nifi installation, because the config file says 
"it is important, that the directory be copied over to the new version when 
upgrading nifi". So everytime when I upgraded or reinstalled Nifi during my 
load balancing odyssey, the statemanagement remained completely untouched.
As soon as I changed that, by deleting the entire state management directory 
before reinstalling Nifi with above mentioned properties, load balancing was 
immediately working throughout the whole cluster.


I think for my flow it is not quite that bad to delete the state management as 
I only use one statefull processor to increase some counter. And the times I 
already tried this by now, I could not encounter any wrong behaviour 
whatsoever. But of course I can't test everything, so when any of you have some 
important facts about deleting the state management, please let me know :)

Beside that I now feel like this solved my problem. Gotta have an eye on that 
when updating to version 1.14.0 later on, but I think I can figure this out. So 
thanks for all your support! :)

--- Ursprüngliche Nachricht ---
Von: "Jens M. Kofoed" <[email protected]<mailto:[email protected]>>
Datum: 29.07.2021 11:08:28
An: [email protected]<mailto:[email protected]>, Axel Schwarz 
<[email protected]<mailto:[email protected]>>
Betreff: Re: Re: Re: No Load Balancing since 1.13.2

Hmm... I can't remember :-( sorry

My configuration for version 1.13.2 is like this:
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi-node01.domaine.com<http://nifi-node01.domaine.com>
nifi.cluster.node.protocol.port=9443
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=3

# cluster load balancing properties #
nifi.cluster.load.balance.address=192.168.1.11
nifi.cluster.load.balance.port=6111
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

So I defined "nifi.cluster.node.address" with the hostname and
not an ip
adress and the "nifi.cluster.load.balance.address" with the ip
address of
the server.
And triple check the configuration at all servers :-)

Kind Regards
Jens M. Kofoed


Den tor. 29. jul. 2021 kl. 10.11 skrev Axel Schwarz 
<[email protected]<mailto:[email protected]>>:


Hey Jens,

in Issue Nifi-8643 you wrote the last comment with the exactly same

behaviour as we're experiencing now. 2 of 3 nodes were load balancing.

How did you get the third node to participate in load balancing? An
update
to 1.14.0 does not change anything for us.


https://issues.apache.org/jira/browse/NIFI-8643?focusedCommentId=17361418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17361418



--- Ursprüngliche Nachricht ---
Von: "Jens M. Kofoed" <[email protected]<mailto:[email protected]>>
Datum: 28.07.2021 12:07:50
An: [email protected]<mailto:[email protected]>, Axel Schwarz 
<[email protected]<mailto:[email protected]>>

Betreff: Re: Re: No Load Balancing since 1.13.2

hi

I can see that you have configured
nifi.cluster.load.balance.address=0.0.0.0

Have your tried to set the correct ip adress?
node1: nifi.cluster.load.balance.address=192.168.1.10
node2: nifi.cluster.load.balance.address=192.168.1.11
node3: nifi.cluster.load.balance.address=192.168.1.12

regards
Jens M. Kofoed

Den ons. 28. jul. 2021 kl. 11.17 skrev Axel Schwarz <
[email protected]<mailto:[email protected]>>:


Just tried Java 11. But still does not work. Nothing changed.
:(

--- Ursprüngliche Nachricht ---
Von: Jorge Machado <[email protected]<mailto:[email protected]>>
Datum: 27.07.2021 13:08:55
An: [email protected]<mailto:[email protected]>,  Axel Schwarz 
<[email protected]<mailto:[email protected]>>


Betreff: Re: No Load Balancing since 1.13.2

Did you tried java 11 ? I have a client running a similar
setup
to yours
but with a lower nigh version and it works fine. Maybe
it is worth
to try
it.


On 27. Jul 2021, at 12:42, Axel Schwarz 
<[email protected]<mailto:[email protected]>>


wrote:

I did indeed, but I updated from u161 to u291, as
this was
the newest
version at that time, because I thought it could help.

So the issue started under u161. But I just saw
that u301
is out. I
will try this as well.
--- Ursprüngliche Nachricht ---
Von: Pierre Villard 
<[email protected]<mailto:[email protected]>>

Datum: 27.07.2021 10:18:38
An: [email protected]<mailto:[email protected]>, Axel Schwarz 
<[email protected]<mailto:[email protected]>>



Betreff: Re: No Load Balancing since 1.13.2

Hi,

I believe the minor u291 is known to have issues
(for some
of its early
builds). Did you upgrade the Java version recently?

Thanks,
Pierre

Le mar. 27 juil. 2021 à 08:07, Axel Schwarz 
<[email protected]<mailto:[email protected]>


<mailto:[email protected]>> a écrit :
Dear Community,

we're running a secured 3 node Nifi Cluster on Java
8_u291
and Debian
7 and experiencing
problems with load balancing since version 1.13.2.


I'm fully aware of Issue Nifi-8643 and tested alot
around
this, but
gotta say, that this
is not our problem. Mainly because the balance port
never
binds to
localhost,
but also because I
implemented all workarounds under version 1.13.2
and even
tried version
1.14.0 by now,
but load blancing still does not work.
What we experience is best described as "the
primary
node balances
with itself"...

So what it does is, opening the balancing connections
to its
own IP
instead of the IPs
of the other two nodes. And the other two nodes
don't open
balancing
connections at all.

When executing "ss | grep 6342" on the
primary node,
this
is what it looks like:

[root@nifiHost1 conf]# ss | grep 6342
tcp    ESTAB      0      0      192.168.1.10:51380
<
http://192.168.1.10:51380/>
              192.168.1.10:6342 <http://192.168.1.10:6342/>



tcp    ESTAB      0      0      192.168.1.10:51376
<
http://192.168.1.10:51376/>
              192.168.1.10:6342 <http://192.168.1.10:6342/>



tcp    ESTAB      0      0      192.168.1.10:51378
<
http://192.168.1.10:51378/>
              192.168.1.10:6342 <http://192.168.1.10:6342/>



tcp    ESTAB      0      0      192.168.1.10:51370
<
http://192.168.1.10:51370/>
              192.168.1.10:6342 <http://192.168.1.10:6342/>



tcp    ESTAB      0      0      192.168.1.10:51372
<
http://192.168.1.10:51372/>
              192.168.1.10:6342 <http://192.168.1.10:6342/>



tcp    ESTAB      0      0      192.168.1.10:6342
<
http://192.168.1.10:6342/>
               192.168.1.10:51376 <http://192.168.1.10:51376/>



tcp    ESTAB      0      0      192.168.1.10:51374
<
http://192.168.1.10:51374/>
              192.168.1.10:6342 <http://192.168.1.10:6342/>



tcp    ESTAB      0      0      192.168.1.10:6342
<
http://192.168.1.10:6342/>
               192.168.1.10:51374 <http://192.168.1.10:51374/>



tcp    ESTAB      0      0      192.168.1.10:51366
<
http://192.168.1.10:51366/>
              192.168.1.10:6342 <http://192.168.1.10:6342/>



tcp    ESTAB      0      0      192.168.1.10:6342
<
http://192.168.1.10:6342/>
               192.168.1.10:51370 <http://192.168.1.10:51370/>



tcp    ESTAB      0      0      192.168.1.10:6342
<
http://192.168.1.10:6342/>
               192.168.1.10:51366 <http://192.168.1.10:51366/>



tcp    ESTAB      0      0      192.168.1.10:51368
<
http://192.168.1.10:51368/>
              192.168.1.10:6342 <http://192.168.1.10:6342/>



tcp    ESTAB      0      0      192.168.1.10:6342
<
http://192.168.1.10:6342/>
               192.168.1.10:51372 <http://192.168.1.10:51372/>



tcp    ESTAB      0      0      192.168.1.10:6342
<
http://192.168.1.10:6342/>
               192.168.1.10:51378 <http://192.168.1.10:51378/>



tcp    ESTAB      0      0      192.168.1.10:6342
<
http://192.168.1.10:6342/>
               192.168.1.10:51368 <http://192.168.1.10:51368/>



tcp    ESTAB      0      0      192.168.1.10:6342
<
http://192.168.1.10:6342/>
               192.168.1.10:51380 <http://192.168.1.10:51380/>



Executing it on the other non primary nodes, just
returns
absolutely
nothing.

Netstat show the following on each server:

[root@nifiHost1 conf]# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign
Address

State       PID/Program name
tcp        0      0 192.168.1.10:6342 <http://192.168.1.10:6342/>


        0.0.0.0:*               LISTEN      10352/java


[root@nifiHost2 conf]# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign
Address

State       PID/Program name
tcp        0      0 192.168.1.11:6342 <http://192.168.1.11:6342/>


        0.0.0.0:*               LISTEN      31562/java


[root@nifiHost3 conf]# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign
Address

State       PID/Program name
tcp        0      0 192.168.1.12:6342 <http://192.168.1.12:6342/>


        0.0.0.0:*               LISTEN      31685/java


And here is what our load balancing properties look
like:


# cluster load balancing properties #
nifi.cluster.load.balance.host=nifiHost1.contoso.com<http://nifiHost1.contoso.com>
<

http://nifihost1.contoso.com/>

nifi.cluster.load.balance.address=0.0.0.0
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=4

nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

When running Nifi in version 1.12.1 on the exact
same setup
in the
exact
same environment, load balancing is working absolutely
fine.
There was a time when load balancing even worked
in version
1.13.2.
But I'm not able to reproduce this and it just stopped

working one day after some restart, without changing
any property
or
whatsoever.

If any more information would be helpful please
let me know
and I'll
try to provide it as fast as possible.



Versendet mit Emailn.de<http://Emailn.de> <https://www.emailn.de/>
- Freemail


* Unbegrenzt Speicherplatz
* Eigenes Online-Büro
* 24h besten Mailempfang
* Spamschutz, Adressbuch




Versendet mit Emailn.de<http://Emailn.de> <https://www.emailn.de/>
- Freemail



* Unbegrenzt Speicherplatz
* Eigenes Online-Büro
* 24h besten Mailempfang
* Spamschutz, Adressbuch

Re: No Load Balancing since 1.13.2

Reply via email to