I just switched a cluster using 3 EBS volumes for cont-repo from gp2 to gp3… resolved definite I/O throughput issues. The change to gp3 was significant enough that I might actually reduce from 3 to 2 volumes, perhaps even a single volume would be sufficient.
Of course every use case is unique. On Dec 15, 2023 at 5:37 PM -0500, Gregory M. Foreman <gfore...@spinnerconsulting.com>, wrote: > Mark: > > Got it. Thank you for the help. > > Greg > > > On Dec 15, 2023, at 4:14 PM, Mark Payne <marka...@hotmail.com> wrote: > > > > Greg, > > > > Whether or not multiple content repos will have any impact depends very > > much on where your system’s bottleneck is. If your bottleneck is disk I/O, > > it will absolutely help. If your bottleneck is CPU, it won’t. If, for > > example, you’re running on bare metal and have 48 cores on your machine and > > you’re running with spinning disks, you’ll definitely want to use multiple > > spinning disks. But if you’re running in AWS on a VM that has 4 cores and > > you’re using gp3 EBS volumes, it’s unlikely that multiple content repos > > will help. > > > > Thanks > > -Mark > > > > > > > > > On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman > > > <gfore...@spinnerconsulting.com> wrote: > > > > > > Mark: > > > > > > I was just discussing multiple content repos on EBS volumes with a > > > colleague. I found your post from a long time ago: > > > > > > https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv > > > > > > “Re #2: I don't know that i've used any SAN to back my repositories other > > > than the EBS provided by Amazon EC2. In that environment, I found that > > > having one or having multiple repos was essentially equivalent.” > > > > > > Does that statement still hold true today? Essentially there is no real > > > performance benefit to having multiple content repos on multiple EBS > > > volumes? > > > > > > Thanks, > > > Greg > > > > > > > > > > > > > On Dec 11, 2023, at 8:50 PM, Mark Payne <marka...@hotmail.com> wrote: > > > > > > > > Hey Phil, > > > > > > > > NiFi will not spread the content of a single file over multiple > > > > partitions. It will write the content of FlowFile 1 to content repo 1, > > > > then write the next FlowFile to repo 2, etc. so it does round-robin but > > > > does not spread a single FlowFile across multiple repos. > > > > > > > > Thanks > > > > -Mark > > > > > > > > Sent from my iPhone > > > > > > > > > On Dec 11, 2023, at 8:45 PM, Phillip Lord <phillord0...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > Hello Nifi comrades, > > > > > > > > > > Here's my scenario... > > > > > Let's say I have a Nifi cluster running on EC2 instances with > > > > > attached EBS volumes serving as their repos. They've split up their > > > > > content-repos into three content-repos per node(cont1, cont2, cont3). > > > > > Each being a dedicated EBS volume. My understanding is that the > > > > > content-claims for a single file can potentially span across more > > > > > than one of these repos.(correct me if I've lost my mind over the > > > > > years) > > > > > For instance if you have a 1 MB file, and lets say your > > > > > max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) > > > > > potentially split up across the 3 EBS volumes. So if Nifi is trying > > > > > to move that file to S3 or something for instance... it needs to be > > > > > read from each of the volumes. > > > > > Whereas if it was a single EBS volume for the cont-repo... it would > > > > > read from the single volume, which I would think would be more > > > > > performant? Or does spreading out any IO contention across volumes > > > > > provide more of a benefit? > > > > > I know there's different levels of EBS volumes... but not factoring > > > > > that in for right now. > > > > > > > > > > Appreciate any insight... trying to determine the best configuration. > > > > > > > > > > Thanks, > > > > > Phil > > > > > > > > > > > > > > > >