I just switched a cluster using 3 EBS volumes for cont-repo from gp2 to gp3… 
resolved definite I/O throughput issues.  The change to gp3 was significant 
enough that I might actually reduce from 3 to 2 volumes, perhaps even a single 
volume would be sufficient.

Of course every use case is unique.
On Dec 15, 2023 at 5:37 PM -0500, Gregory M. Foreman 
<gfore...@spinnerconsulting.com>, wrote:
> Mark:
>
> Got it. Thank you for the help.
>
> Greg
>
> > On Dec 15, 2023, at 4:14 PM, Mark Payne <marka...@hotmail.com> wrote:
> >
> > Greg,
> >
> > Whether or not multiple content repos will have any impact depends very 
> > much on where your system’s bottleneck is. If your bottleneck is disk I/O, 
> > it will absolutely help. If your bottleneck is CPU, it won’t. If, for 
> > example, you’re running on bare metal and have 48 cores on your machine and 
> > you’re running with spinning disks, you’ll definitely want to use multiple 
> > spinning disks. But if you’re running in AWS on a VM that has 4 cores and 
> > you’re using gp3 EBS volumes, it’s unlikely that multiple content repos 
> > will help.
> >
> > Thanks
> > -Mark
> >
> >
> >
> > > On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman 
> > > <gfore...@spinnerconsulting.com> wrote:
> > >
> > > Mark:
> > >
> > > I was just discussing multiple content repos on EBS volumes with a 
> > > colleague. I found your post from a long time ago:
> > >
> > > https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv
> > >
> > > “Re #2: I don't know that i've used any SAN to back my repositories other 
> > > than the EBS provided by Amazon EC2. In that environment, I found that 
> > > having one or having multiple repos was essentially equivalent.”
> > >
> > > Does that statement still hold true today? Essentially there is no real 
> > > performance benefit to having multiple content repos on multiple EBS 
> > > volumes?
> > >
> > > Thanks,
> > > Greg
> > >
> > >
> > >
> > > > On Dec 11, 2023, at 8:50 PM, Mark Payne <marka...@hotmail.com> wrote:
> > > >
> > > > Hey Phil,
> > > >
> > > > NiFi will not spread the content of a single file over multiple 
> > > > partitions. It will write the content of FlowFile 1 to content repo 1, 
> > > > then write the next FlowFile to repo 2, etc. so it does round-robin but 
> > > > does not spread a single FlowFile across multiple repos.
> > > >
> > > > Thanks
> > > > -Mark
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Dec 11, 2023, at 8:45 PM, Phillip Lord <phillord0...@gmail.com> 
> > > > > wrote:
> > > > >
> > > > >
> > > > > Hello Nifi comrades,
> > > > >
> > > > > Here's my scenario...
> > > > > Let's say I have a Nifi cluster running on EC2 instances with 
> > > > > attached EBS volumes serving as their repos. They've split up their 
> > > > > content-repos into three content-repos per node(cont1, cont2, cont3). 
> > > > > Each being a dedicated EBS volume. My understanding is that the 
> > > > > content-claims for a single file can potentially span across more 
> > > > > than one of these repos.(correct me if I've lost my mind over the 
> > > > > years)
> > > > > For instance if you have a 1 MB file, and lets say your 
> > > > > max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) 
> > > > > potentially split up across the 3 EBS volumes. So if Nifi is trying 
> > > > > to move that file to S3 or something for instance... it needs to be 
> > > > > read from each of the volumes.
> > > > > Whereas if it was a single EBS volume for the cont-repo... it would 
> > > > > read from the single volume, which I would think would be more 
> > > > > performant? Or does spreading out any IO contention across volumes 
> > > > > provide more of a benefit?
> > > > > I know there's different levels of EBS volumes... but not factoring 
> > > > > that in for right now.
> > > > >
> > > > > Appreciate any insight... trying to determine the best configuration.
> > > > >
> > > > > Thanks,
> > > > > Phil
> > > > >
> > > > >
> > >
> >
>

Reply via email to