Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Joe Obernberger
Hi Joe - yes - /data/4, /data/5 are separate spindles, and yes /data/5 is where the flowfile repo is; which is large. ls -lh -rw-r--r-- 1 root root 6.5G Jul 12 12:36 checkpoint -rw-r--r-- 1 root root 5.2G Jul 12 12:46 checkpoint.partial drwxr-xr-x 4 root root  132 Jul 12 12:46 journals

Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Mark Payne
Joe, The way that the processor works is that it adds an attribute for every “Capturing Group” in the regular expression. This includes a “Capturing Group” 0, which contains the entire value that the regex was run against. You can actually disable capturing this as an attribute by setting the

Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Joe Obernberger
Thank you Mark - it looks like attributes is to blame.  I'm adding lots of UpdateAttribute to delete them as soon as they are not needed and disk IO has dropped. Right now, it's all going to 'spinning rust' - soon to all new SSDs, but either way, this needed addressing. One oddity, is when I

Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Phillip Lord
Some thoughts… putting 10kb of text into an attribute probably isn’t ideal.    Is there another way perhaps to accomplish what you’re doing? Also your flowfile.repo.checkpoint.interval is pretty high.  I’d consider lowering this considerably… On Jul 12, 2023 at 11:18 AM -0400, Joe Obernberger ,

Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Joe Witt
Ah ok. And 'data/5' is its own partition (same physical disk as data/4?). And data/5 is where you see those large files? Can you show what you see there in terms of files/sizes? For the checkpoint period the default is 20 seconds. Am curious to know what benefit moving to 300 seconds was

Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Mark Payne
Joe, How many FlowFiles are you processing here? Let’s say, per second? How many processors are in those flows? Is the FlowFile Repo a spinning disk, SSD, or NAS? You said you’re using ExtractText to pull 10 KB into an attribute. I presume you’re then doing something with it. So maybe you’re

Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Joe Obernberger
Thank you Joe - The content repo doesn't seem to be the issue - it's the flowfile repo. Here is the section from one of the nodes: nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository nifi.content.claim.max.appendable.size=50 KB

Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Joe Witt
Joe I dont recall the specific version in which we got it truly sorted but there was an issue with our default settings for an important content repo property and how we handled mixture of large/small flowfiles written within the same underlying slab/claim in the content repository. Please check

Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Joe Obernberger
Raising this thread from the dead... Having issues with IO to the flowfile repository.  NiFi will show 500k flow files and a size of ~1.7G - but the size on disk on each of the 4 nodes is massive - over 100G, and disk IO to the flowfile spindle is just pegged doing writes. I do have