Hi guys

We just upgraded our 8 node NiFi cluster to v1.21.0 and we are hitting a SSH 
timeout issue with the PutSFTP processor.

May be someone can help us to find the root cause for our issue or guide us 
into the right direction. The error message below must be related to the “Data 
Timeout”, as this is (in our case) the only user configurable timeout which is 
set to 30 seconds.

Error Message:
2023-04-22 14:41:45,422 ERROR [Timer-Driven Process Thread-11] 
o.a.nifi.processors.standard.PutSFTP 
PutSFTP[id=a73a340e-81f8-1f21-8f04-9bb06b767d7d] Unable to transfer 
StandardFlowFileRecord[uuid=c8be51dc-074c-429d-8b52-077a91b4e339,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1682167305368-1016468, 
container=default, section=660], offset=0, 
length=441044],offset=0,name=aaa-01_detail-20230422-143800_0200.gz,size=441044] 
to remote host nas-lan-01.my.net due to 
org.apache.nifi.processors.standard.socket.ClientConnectException: SSH Client 
connection failed [nas-lan-01.my.net:22]: 
net.schmizz.sshj.transport.TransportException: Timeout expired: 30000 
MILLISECONDS; routing to failure
net.schmizz.sshj.transport.TransportException: Timeout expired: 30000 
MILLISECONDS
      at 
net.schmizz.sshj.transport.TransportException$1.chain(TransportException.java:33)
      at 
net.schmizz.sshj.transport.TransportException$1.chain(TransportException.java:27)
      at net.schmizz.concurrent.Promise.retrieve(Promise.java:139)
      at net.schmizz.concurrent.Event.await(Event.java:105)
      at 
net.schmizz.sshj.transport.KeyExchanger.waitForDone(KeyExchanger.java:148)
      at net.schmizz.sshj.transport.KeyExchanger.startKex(KeyExchanger.java:143)
      ...


Below a splunk graph which shows the number of error messages per hour. The 
behavior changed when we switched to queue loadbalance strategy “single node”. 
So instead of 8 nodes only 1 node was doing the PutSFTP, the only remaining 
PutSFTP processor processed more data in a shorter timeframe and the errors 
have been gone (the are some errors, but we have a lot of PutSFTP processors in 
our environment and my filter was not that specific).


[cid:[email protected]]



So our question is, how could it be that the session can be initiated but no 
data can be transferred? Any ideas? Is there any mechanism which reuses an 
existing connection? I would assume not? The batch size is set to default (500) 
and one flow file has about 7MB average size. The source data is fetched from 
kafka and comes very regularly… If it would be an issue on NAS side, we would 
assume that it doesn’t matter if one NiFi node does the PutSFTP or 8 nodes, but 
it clearly makes a difference if we change the loadbalancing strategy… so we 
see the culprit clearly on NiFi.

Cheers Josef



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to