Apologies in advance if I've got this completely wrong, but I recall that
error if I forget to increase the limit of open files for a heavily loaded
install. It is more obvious via the UI but the logs will have error
messages about too many open files.
On Wed, 22 Mar 2023, 16:49 Mark Payne, wrote
OK. So changing the checkpoint internal to 300 seconds might help reduce IO a
bit. But it will cause the repo to become much larger, and it will take much
longer to startup whenever you restart NiFi.
The variance in size between nodes is likely due to how recently it’s
checkpointed. If it stays
Thanks for this Mark. I'm not seeing any large attributes at the moment
but will go through this and verify - but I did have one queue that was
set to 100k instead of 10k.
I set the nifi.cluster.node.connection.timeout to 30 seconds (up from 5)
and the nifi.flowfile.repository.checkpoint.interv
Joe,
The errors noted are indicating that NiFi cannot communicate with registry.
Either the registry is offline, NiFi’s Registry Client is not configured
properly, there’s a firewall in the way, etc.
A FlowFile repo of 35 GB is rather huge. This would imply one of 3 things:
- You have a huge nu
Thank you Mark. These are SATA drives - but there's no way for the
flowfile repo to be on multiple spindles. It's not huge - maybe 35G per
node.
I do see a lot of messages like this in the log:
2023-03-22 10:52:13,960 ERROR [Timer-Driven Process Thread-62]
o.a.nifi.groups.StandardProcessGrou
I've since brought the node back up - no change. Looks like IO is all
related to flowfile repository. When it's running, CPU is pretty high -
usually ~12 cores (ie top will show 1200%) per node. I'm using the XFS
filesystem; maybe some FS parameters would help?
The big change is that I was
Joe,
1.8 million FlowFiles is not a concern. But when you say “Should I reduce the
queue sizes?” it makes me wonder if they’re all in a single queue?
Generally, you should leave the backpressure threshold at the default 10,000
FlowFile max. Increasing this can lead to huge amounts of swapping, w
Thank you. Was able to get in.
Currently there are 1.8 million flow files and 3.2G. Is this too much
for a 3 node cluster with mutliple spindles each (SATA drives)?
Should I reduce the queue sizes?
-Joe
On 3/22/2023 10:23 AM, Phillip Lord wrote:
Joe,
If you need the UI to come back up, try
Joe,
If you need the UI to come back up, try setting the autoresume setting in
nifi.properties to false and restart node(s).
This will bring up every component/controllerService up stopped/disabled and
may provide some breathing room for the UI to become available again.
Phil
On Mar 22, 2023 at
atop shows the disk as being all red with IO - 100% utilization. There
are a lot of flowfiles currently trying to run through, but I can't
monitor it becauseUI wont' load.
-Joe
On 3/22/2023 10:16 AM, Mark Payne wrote:
Joe,
I’d recommend taking a look at garbage collection. It is far more
Joe,
I’d recommend taking a look at garbage collection. It is far more likely the
culprit than disk I/O.
Thanks
-Mark
> On Mar 22, 2023, at 10:12 AM, Joe Obernberger
> wrote:
>
> I'm getting "java.net.SocketTimeoutException: timeout" from the user
> interface of NiFi when load is heavy. Th
I'm getting "java.net.SocketTimeoutException: timeout" from the user
interface of NiFi when load is heavy. This is 1.18.0 running on a 3
node cluster. Disk IO is high and when that happens, I can't get into
the UI to stop any of the processors.
Any ideas?
I have put the flowfile repository a
12 matches
Mail list logo