Hi guys We just upgraded our 8 node NiFi cluster to v1.21.0 and we are hitting a SSH timeout issue with the PutSFTP processor.
May be someone can help us to find the root cause for our issue or guide us
into the right direction. The error message below must be related to the “Data
Timeout”, as this is (in our case) the only user configurable timeout which is
set to 30 seconds.
Error Message:
2023-04-22 14:41:45,422 ERROR [Timer-Driven Process Thread-11]
o.a.nifi.processors.standard.PutSFTP
PutSFTP[id=a73a340e-81f8-1f21-8f04-9bb06b767d7d] Unable to transfer
StandardFlowFileRecord[uuid=c8be51dc-074c-429d-8b52-077a91b4e339,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1682167305368-1016468,
container=default, section=660], offset=0,
length=441044],offset=0,name=aaa-01_detail-20230422-143800_0200.gz,size=441044]
to remote host nas-lan-01.my.net due to
org.apache.nifi.processors.standard.socket.ClientConnectException: SSH Client
connection failed [nas-lan-01.my.net:22]:
net.schmizz.sshj.transport.TransportException: Timeout expired: 30000
MILLISECONDS; routing to failure
net.schmizz.sshj.transport.TransportException: Timeout expired: 30000
MILLISECONDS
at
net.schmizz.sshj.transport.TransportException$1.chain(TransportException.java:33)
at
net.schmizz.sshj.transport.TransportException$1.chain(TransportException.java:27)
at net.schmizz.concurrent.Promise.retrieve(Promise.java:139)
at net.schmizz.concurrent.Event.await(Event.java:105)
at
net.schmizz.sshj.transport.KeyExchanger.waitForDone(KeyExchanger.java:148)
at net.schmizz.sshj.transport.KeyExchanger.startKex(KeyExchanger.java:143)
...
Below a splunk graph which shows the number of error messages per hour. The
behavior changed when we switched to queue loadbalance strategy “single node”.
So instead of 8 nodes only 1 node was doing the PutSFTP, the only remaining
PutSFTP processor processed more data in a shorter timeframe and the errors
have been gone (the are some errors, but we have a lot of PutSFTP processors in
our environment and my filter was not that specific).
[cid:[email protected]]
So our question is, how could it be that the session can be initiated but no
data can be transferred? Any ideas? Is there any mechanism which reuses an
existing connection? I would assume not? The batch size is set to default (500)
and one flow file has about 7MB average size. The source data is fetched from
kafka and comes very regularly… If it would be an issue on NAS side, we would
assume that it doesn’t matter if one NiFi node does the PutSFTP or 8 nodes, but
it clearly makes a difference if we change the loadbalancing strategy… so we
see the culprit clearly on NiFi.
Cheers Josef
smime.p7s
Description: S/MIME Cryptographic Signature
