Hello

If going from 8 nodes with many errors to  1 node with few errors then you
likely hit max connection limits on the sftp server.  You can change that
value on fhe sftp server.  How many concurrent tasks do you allow the
processor.  If Y tasks you will want Y times 8 connections allowed.

Thanks

On Mon, Apr 24, 2023 at 12:23 AM <josef.zahn...@swisscom.com> wrote:

> Hi guys
>
>
>
> We just upgraded our 8 node NiFi cluster to v1.21.0 and we are hitting a
> SSH timeout issue with the *PutSFTP* processor.
>
>
>
> May be someone can help us to find the root cause for our issue or guide
> us into the right direction. The error message below must be related to the
> “*Data Timeout*”, as this is (in our case) the only user configurable
> timeout which is set to 30 seconds.
>
>
>
> *Error Message:*
>
> 2023-04-22 14:41:45,422 ERROR [Timer-Driven Process Thread-11]
> o.a.nifi.processors.standard.PutSFTP
> PutSFTP[id=a73a340e-81f8-1f21-8f04-9bb06b767d7d] Unable to transfer
> StandardFlowFileRecord[uuid=c8be51dc-074c-429d-8b52-077a91b4e339,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1682167305368-1016468,
> container=default, section=660], offset=0,
> length=441044],offset=0,name=aaa-01_detail-20230422-143800_0200.gz,size=441044]
> to remote host nas-lan-01.my.net due to
> org.apache.nifi.processors.standard.socket.ClientConnectException: SSH
> Client connection failed [nas-lan-01.my.net:22]:
> net.schmizz.sshj.transport.TransportException: Timeout expired: 30000
> MILLISECONDS; routing to failure
>
> net.schmizz.sshj.transport.TransportException: Timeout expired: 30000
> MILLISECONDS
>
>       at
> net.schmizz.sshj.transport.TransportException$1.chain(TransportException.java:33)
>
>       at
> net.schmizz.sshj.transport.TransportException$1.chain(TransportException.java:27)
>
>       at net.schmizz.concurrent.Promise.retrieve(Promise.java:139)
>
>       at net.schmizz.concurrent.Event.await(Event.java:105)
>
>       at
> net.schmizz.sshj.transport.KeyExchanger.waitForDone(KeyExchanger.java:148)
>
>       at
> net.schmizz.sshj.transport.KeyExchanger.startKex(KeyExchanger.java:143)
>
>       ...
>
>
>
>
>
> Below a splunk graph which shows the number of error messages per hour.
> The behavior changed when we switched to queue loadbalance strategy “single
> node”. So instead of 8 nodes only 1 node was doing the PutSFTP, the only
> remaining PutSFTP processor processed more data in a shorter timeframe and
> the errors have been gone (the are some errors, but we have a lot of
> PutSFTP processors in our environment and my filter was not that specific).
>
>
>
>
>
>
>
>
>
>
>
> So our question is, how could it be that the session can be initiated but
> no data can be transferred? Any ideas? Is there any mechanism which reuses
> an existing connection? I would assume not? The batch size is set to
> default (500) and one flow file has about 7MB average size. The source data
> is fetched from kafka and comes very regularly… If it would be an issue
> on NAS side, we would assume that it doesn’t matter if one NiFi node does
> the PutSFTP or 8 nodes, but it clearly makes a difference if we change the
> loadbalancing strategy… so we see the culprit clearly on NiFi.
>
>
>
> Cheers Josef
>
>
>
>
>
>
>

Reply via email to