Re: No Load Balancing since 1.13.2

Jens M. Kofoed Sun, 22 Aug 2021 22:10:49 -0700

Hi Mark

Just back at the office after a small holiday :-)
I have tested my setup with nifi 1.14.0 regarding hostname and FQDN.
If I run a nslookup node01.domain.lan I get the address 192.168.1.11
If I configure nifi.cluster.load.balance.host=node01.domain.lan, netstat -l
show the following:
tcp        0      0 localhost:6342         0.0.0.0:*               LISTEN


if I configure nifi.cluster.load.balance.host=192.168.1.11, netstat -l show
the  following:
tcp        0      0 node01.domain.lan:6342 0.0.0.0:*               LISTEN

I don't know why it will be different than yours since I can get the
correct IP via nslookup

Kind regards
Jens M. Kofoed

Den fre. 6. aug. 2021 kl. 15.48 skrev Mark Payne <marka...@hotmail.com>:

> Jens,
>
> You’re right - my mistake, the change from
> “nifi.cluster.load.balance.address” to “nifi.cluster.load.balance.host” was
> in 1.14.0, not early on. In 1.14.0, only nifi.cluster.load.balance.host is
> used. The documentation and properties file both used .host, but the code
> was making use of .address instead. So the code was fixed in 1.14.0 to
> match what the documentation and nifi.properties file specified.
>
> I just did some testing locally on my macbook regarding the IP address vs.
> hostname.
> What I found is that if I use the IP address, it listens as expected.
> If I use just <hostname> (not fully qualified), interestingly it listens
> on localhost only.
> If I run: "nslookup <hostname>" I get back <hostname>.lan as the fqdn
> If I use "<hostname>.lan” in my properties, it listens as expected.
>
> Thanks
> -Mark
>
> On Aug 6, 2021, at 12:28 AM, Jens M. Kofoed <jmkofoed....@gmail.com>
> wrote:
>
> Hi Mark
>
> In version 1.13.2 (at least) the file
> "main/nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java"
> is looking for a property called "nifi.cluster.load.balance.address" which
> has been reported in https://issues.apache.org/jira/browse/NIFI-8643 and
> fixed in version 1.14.0
>
> In version 1.14.0 the only way I can get it to work, is if I type in the
> IP address. If I don't specified it or type in the fqdn the load balance
> port will bind to localhost. which has been reported in
> https://issues.apache.org/jira/browse/NIFI-9010
> The result from running netstat -l
> tcp 0 0 localhost:6342 0.0.0.0:* LISTEN
>
> Kind regards
> Jens M. Kofoed
>
>
>
> Den tor. 5. aug. 2021 kl. 23.08 skrev Mark Payne <marka...@hotmail.com>:
>
>> Axel,
>>
>> I think that I can help clarify some of these things.
>>
>> First of all: nifi.cluster.load.balance.host vs.
>> nifi.cluster.load.balance.address
>> * The nifi.cluster.load.balance.host property is what matters.
>>
>> * The nifi.cluster.load.balance.address is not a real property. NiFi has
>> never looked at this property. However, in the first release that included
>> load-balancing, there was a typo in which the nifi.properties file had
>> “…address” instead of “…host”. This was later addressed.
>>
>> * So if you have a value for “nifi.cluster.load.balance.address”, it does
>> nothing and is always ignored.
>>
>>
>>
>> Next: nifi.cluster.load.balance.host property
>>
>> * nifi.cluster.load.balance.host can be either an IP address or a
>> hostname. But if set, other nodes in the cluster MUST be able to
>> communicate with the node using whatever value you put here. So using a
>> value of 0.0.0.0 will not work. Also, if set, NiFi will listen for incoming
>> connections ONLY on that hostname. So if you set it to “localhost”, for
>> instance, no other node can connect to it, because no other host can
>> connect to the node using “localhost”. So this needs to be an address that
>> both the NiFi instance knows about/can bind to, and other nodes in the
>> cluster can connect to.
>>
>> * If nifi.cluster.load.balance.host is NOT set: NiFi will listen for
>> incoming requests on all network interfaces / hostnames. It will advertise
>> its hostname to other nodes in the cluster according to whatever is set for
>> the “nifi.cluster.node.address” property. Meaning that other nodes in the
>> cluster must be able to connect to this node using whatever hostname is set
>> for the “nifi.cluster.node.address” property. If
>> the “nifi.cluster.node.address” property is not set, it advertises its
>> hostname as localhost - which means other nodes won’t be able to send to
>> it.
>>
>> So you must specify either the “nifi.cluster.load.balance.host” property
>> or the “nifi.cluster.node.address” property.
>>
>>
>>
>> Finally: having to delete the state directory
>>
>> If you change the “nifi.cluster.load.balance.host” or
>> “nifi.cluster.load.balance.port” property and restart a node, you must
>> restart all nodes in the cluster. Otherwise, the other nodes won’t be able
>> to send to that node.
>> So, for example, when you changed the load.balance.host from fqdn or
>> 0.0.0.0 to the IP address - the other nodes in the cluster would stop
>> sending. I created a JIRA [1] for that. In my testing, when I changed the
>> hostname, the other nodes stopped sending. But restarting them got things
>> back on track. I wasn’t able to replicate the issue after restarting all
>> nodes.
>>
>> Hope this is helpful!
>> -Mark
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-9017
>>
>>
>> On Aug 3, 2021, at 3:08 AM, Axel Schwarz <axelkop...@emailn.de> wrote:
>>
>> Hey guys,
>>
>> I think I found the "trick" for at least version 1.13.2 and of course
>> I'll share it with you.
>> I now use the following load balancing properties:
>>
>> # cluster load balancing properties #
>> nifi.cluster.load.balance.host=192.168.1.10
>> nifi.cluster.load.balance.port=6342
>> nifi.cluster.load.balance.connections.per.node=4
>> nifi.cluster.load.balance.max.thread.count=8
>> nifi.cluster.load.balance.comms.timeout=30 sec
>>
>> So I use the hosts IP address for balance.host instead of 0.0.0.0 or the
>> fqdn and have no balance.address property at all.
>> This led to partly load balancing in my case as already mentioned. It
>> looked like I needed to do one more step to reach the goal and this step
>> seems to be deleting all statemanagement files.
>>
>> Through the state-management.xml config file I changed the state
>> management directory to be outside of the nifi installation, because the
>> config file says "it is important, that the directory be copied over to the
>> new version when upgrading nifi". So everytime when I upgraded or
>> reinstalled Nifi during my load balancing odyssey, the statemanagement
>> remained completely untouched.
>> As soon as I changed that, by deleting the entire state management
>> directory before reinstalling Nifi with above mentioned properties, load
>> balancing was immediately working throughout the whole cluster.
>>
>>
>> I think for my flow it is not quite that bad to delete the state
>> management as I only use one statefull processor to increase some counter.
>> And the times I already tried this by now, I could not encounter any wrong
>> behaviour whatsoever. But of course I can't test everything, so when any of
>> you have some important facts about deleting the state management, please
>> let me know :)
>>
>> Beside that I now feel like this solved my problem. Gotta have an eye on
>> that when updating to version 1.14.0 later on, but I think I can figure
>> this out. So thanks for all your support! :)
>>
>> --- Ursprüngliche Nachricht ---
>> Von: "Jens M. Kofoed" <jmkofoed....@gmail.com>
>> Datum: 29.07.2021 11:08:28
>> An: users@nifi.apache.org, Axel Schwarz <axelkop...@emailn.de>
>> Betreff: Re: Re: Re: No Load Balancing since 1.13.2
>>
>> Hmm... I can't remember :-( sorry
>>
>> My configuration for version 1.13.2 is like this:
>> # cluster node properties (only configure for cluster nodes) #
>> nifi.cluster.is.node=true
>> nifi.cluster.node.address=nifi-node01.domaine.com
>> nifi.cluster.node.protocol.port=9443
>> nifi.cluster.node.protocol.threads=10
>> nifi.cluster.node.protocol.max.threads=50
>> nifi.cluster.node.event.history.size=25
>> nifi.cluster.node.connection.timeout=5 sec
>> nifi.cluster.node.read.timeout=5 sec
>> nifi.cluster.node.max.concurrent.requests=100
>> nifi.cluster.firewall.file=
>> nifi.cluster.flow.election.max.wait.time=5 mins
>> nifi.cluster.flow.election.max.candidates=3
>>
>> # cluster load balancing properties #
>> nifi.cluster.load.balance.address=192.168.1.11
>> nifi.cluster.load.balance.port=6111
>> nifi.cluster.load.balance.connections.per.node=4
>> nifi.cluster.load.balance.max.thread.count=8
>> nifi.cluster.load.balance.comms.timeout=30 sec
>>
>> So I defined "nifi.cluster.node.address" with the hostname and
>> not an ip
>> adress and the "nifi.cluster.load.balance.address" with the ip
>> address of
>> the server.
>> And triple check the configuration at all servers :-)
>>
>> Kind Regards
>> Jens M. Kofoed
>>
>>
>> Den tor. 29. jul. 2021 kl. 10.11 skrev Axel Schwarz <axelkop...@emailn.de
>> >:
>>
>>
>> Hey Jens,
>>
>> in Issue Nifi-8643 you wrote the last comment with the exactly same
>>
>>
>> behaviour as we're experiencing now. 2 of 3 nodes were load balancing.
>>
>>
>> How did you get the third node to participate in load balancing? An
>>
>> update
>>
>> to 1.14.0 does not change anything for us.
>>
>>
>>
>> https://issues.apache.org/jira/browse/NIFI-8643?focusedCommentId=17361418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17361418
>>
>>
>>
>>
>> --- Ursprüngliche Nachricht ---
>> Von: "Jens M. Kofoed" <jmkofoed....@gmail.com>
>> Datum: 28.07.2021 12:07:50
>> An: users@nifi.apache.org, Axel Schwarz <axelkop...@emailn.de>
>>
>>
>> Betreff: Re: Re: No Load Balancing since 1.13.2
>>
>> hi
>>
>> I can see that you have configured
>>
>> nifi.cluster.load.balance.address=0.0.0.0
>>
>>
>> Have your tried to set the correct ip adress?
>> node1: nifi.cluster.load.balance.address=192.168.1.10
>> node2: nifi.cluster.load.balance.address=192.168.1.11
>> node3: nifi.cluster.load.balance.address=192.168.1.12
>>
>> regards
>> Jens M. Kofoed
>>
>> Den ons. 28. jul. 2021 kl. 11.17 skrev Axel Schwarz <
>>
>> axelkop...@emailn.de>:
>>
>>
>>
>> Just tried Java 11. But still does not work. Nothing changed.
>>
>> :(
>>
>>
>> --- Ursprüngliche Nachricht ---
>> Von: Jorge Machado <jom...@me.com>
>> Datum: 27.07.2021 13:08:55
>> An: users@nifi.apache.org,  Axel Schwarz <axelkop...@emailn.de>
>>
>>
>>
>> Betreff: Re: No Load Balancing since 1.13.2
>>
>> Did you tried java 11 ? I have a client running a similar
>>
>> setup
>>
>> to yours
>>
>> but with a lower nigh version and it works fine. Maybe
>>
>> it is worth
>>
>> to try
>>
>> it.
>>
>>
>> On 27. Jul 2021, at 12:42, Axel Schwarz <axelkop...@emailn.de>
>>
>>
>>
>> wrote:
>>
>>
>> I did indeed, but I updated from u161 to u291, as
>>
>> this was
>>
>> the newest
>>
>> version at that time, because I thought it could help.
>>
>>
>> So the issue started under u161. But I just saw
>>
>> that u301
>>
>> is out. I
>>
>> will try this as well.
>>
>> --- Ursprüngliche Nachricht ---
>> Von: Pierre Villard <pierre.villard...@gmail.com>
>>
>>
>> Datum: 27.07.2021 10:18:38
>> An: users@nifi.apache.org, Axel Schwarz <axelkop...@emailn.de>
>>
>>
>>
>>
>> Betreff: Re: No Load Balancing since 1.13.2
>>
>> Hi,
>>
>> I believe the minor u291 is known to have issues
>>
>> (for some
>>
>> of its early
>>
>> builds). Did you upgrade the Java version recently?
>>
>>
>> Thanks,
>> Pierre
>>
>> Le mar. 27 juil. 2021 à 08:07, Axel Schwarz <axelkop...@emailn.de
>>
>>
>>
>> <mailto:axelkop...@emailn.de <axelkop...@emailn.de>>> a écrit :
>>
>> Dear Community,
>>
>> we're running a secured 3 node Nifi Cluster on Java
>>
>> 8_u291
>>
>> and Debian
>>
>> 7 and experiencing
>>
>> problems with load balancing since version 1.13.2.
>>
>>
>>
>> I'm fully aware of Issue Nifi-8643 and tested alot
>>
>> around
>>
>> this, but
>>
>> gotta say, that this
>>
>> is not our problem. Mainly because the balance port
>>
>> never
>>
>> binds to
>>
>> localhost,
>>
>> but also because I
>>
>> implemented all workarounds under version 1.13.2
>>
>> and even
>>
>> tried version
>>
>> 1.14.0 by now,
>>
>> but load blancing still does not work.
>> What we experience is best described as "the
>>
>> primary
>>
>> node balances
>>
>> with itself"...
>>
>>
>> So what it does is, opening the balancing connections
>>
>> to its
>>
>> own IP
>>
>> instead of the IPs
>>
>> of the other two nodes. And the other two nodes
>>
>> don't open
>>
>> balancing
>>
>> connections at all.
>>
>>
>> When executing "ss | grep 6342" on the
>>
>> primary node,
>>
>> this
>>
>> is what it looks like:
>>
>>
>> [root@nifiHost1 conf]# ss | grep 6342
>> tcp    ESTAB      0      0      192.168.1.10:51380
>>
>> <
>>
>> http://192.168.1.10:51380/>
>>
>>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:51376
>>
>> <
>>
>> http://192.168.1.10:51376/>
>>
>>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:51378
>>
>> <
>>
>> http://192.168.1.10:51378/>
>>
>>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:51370
>>
>> <
>>
>> http://192.168.1.10:51370/>
>>
>>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:51372
>>
>> <
>>
>> http://192.168.1.10:51372/>
>>
>>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:6342
>>
>> <
>>
>> http://192.168.1.10:6342/>
>>
>>                192.168.1.10:51376 <http://192.168.1.10:51376/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:51374
>>
>> <
>>
>> http://192.168.1.10:51374/>
>>
>>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:6342
>>
>> <
>>
>> http://192.168.1.10:6342/>
>>
>>                192.168.1.10:51374 <http://192.168.1.10:51374/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:51366
>>
>> <
>>
>> http://192.168.1.10:51366/>
>>
>>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:6342
>>
>> <
>>
>> http://192.168.1.10:6342/>
>>
>>                192.168.1.10:51370 <http://192.168.1.10:51370/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:6342
>>
>> <
>>
>> http://192.168.1.10:6342/>
>>
>>                192.168.1.10:51366 <http://192.168.1.10:51366/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:51368
>>
>> <
>>
>> http://192.168.1.10:51368/>
>>
>>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:6342
>>
>> <
>>
>> http://192.168.1.10:6342/>
>>
>>                192.168.1.10:51372 <http://192.168.1.10:51372/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:6342
>>
>> <
>>
>> http://192.168.1.10:6342/>
>>
>>                192.168.1.10:51378 <http://192.168.1.10:51378/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:6342
>>
>> <
>>
>> http://192.168.1.10:6342/>
>>
>>                192.168.1.10:51368 <http://192.168.1.10:51368/>
>>
>>
>>
>>
>> tcp    ESTAB      0      0      192.168.1.10:6342
>>
>> <
>>
>> http://192.168.1.10:6342/>
>>
>>                192.168.1.10:51380 <http://192.168.1.10:51380/>
>>
>>
>>
>>
>> Executing it on the other non primary nodes, just
>>
>> returns
>>
>> absolutely
>>
>> nothing.
>>
>>
>> Netstat show the following on each server:
>>
>> [root@nifiHost1 conf]# netstat -tulpn
>> Active Internet connections (only servers)
>> Proto Recv-Q Send-Q Local Address           Foreign
>>
>> Address
>>
>>
>> State       PID/Program name
>>
>> tcp        0      0 192.168.1.10:6342 <http://192.168.1.10:6342/>
>>
>>
>>
>>         0.0.0.0:*               LISTEN      10352/java
>>
>>
>>
>> [root@nifiHost2 conf]# netstat -tulpn
>> Active Internet connections (only servers)
>> Proto Recv-Q Send-Q Local Address           Foreign
>>
>> Address
>>
>>
>> State       PID/Program name
>>
>> tcp        0      0 192.168.1.11:6342 <http://192.168.1.11:6342/>
>>
>>
>>
>>         0.0.0.0:*               LISTEN      31562/java
>>
>>
>>
>> [root@nifiHost3 conf]# netstat -tulpn
>> Active Internet connections (only servers)
>> Proto Recv-Q Send-Q Local Address           Foreign
>>
>> Address
>>
>>
>> State       PID/Program name
>>
>> tcp        0      0 192.168.1.12:6342 <http://192.168.1.12:6342/>
>>
>>
>>
>>         0.0.0.0:*               LISTEN      31685/java
>>
>>
>>
>> And here is what our load balancing properties look
>>
>> like:
>>
>>
>>
>> # cluster load balancing properties #
>> nifi.cluster.load.balance.host=nifiHost1.contoso.com
>> <http://nifihost1.contoso.com/>
>>
>> <
>>
>>
>> http://nifihost1.contoso.com/>
>>
>>
>> nifi.cluster.load.balance.address=0.0.0.0
>> nifi.cluster.load.balance.port=6342
>> nifi.cluster.load.balance.connections.per.node=4
>>
>>
>> nifi.cluster.load.balance.max.thread.count=8
>> nifi.cluster.load.balance.comms.timeout=30 sec
>>
>> When running Nifi in version 1.12.1 on the exact
>>
>> same setup
>>
>> in the
>>
>> exact
>>
>> same environment, load balancing is working absolutely
>>
>> fine.
>>
>> There was a time when load balancing even worked
>>
>> in version
>>
>> 1.13.2.
>>
>> But I'm not able to reproduce this and it just stopped
>>
>>
>> working one day after some restart, without changing
>>
>> any property
>>
>> or
>>
>> whatsoever.
>>
>>
>> If any more information would be helpful please
>>
>> let me know
>>
>> and I'll
>>
>> try to provide it as fast as possible.
>>
>>
>>
>>
>> Versendet mit Emailn.de <http://emailn.de/> <https://www.emailn.de/>
>>
>> - Freemail
>>
>>
>>
>> * Unbegrenzt Speicherplatz
>> * Eigenes Online-Büro
>> * 24h besten Mailempfang
>> * Spamschutz, Adressbuch
>>
>>
>>
>>
>> Versendet mit Emailn.de <http://emailn.de/> <https://www.emailn.de/>
>>
>> - Freemail
>>
>>
>>
>>
>> * Unbegrenzt Speicherplatz
>> * Eigenes Online-Büro
>> * 24h besten Mailempfang
>> * Spamschutz, Adressbuch
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: No Load Balancing since 1.13.2

Reply via email to