Dear Joe, Thank you very much. That was a really great explanation. I investigated the Nifi architecture, and it seems that most of the read/write operations for flow file repo and provenance repo are random. However, for content repo most of the read/write operations are sequential. Let's say cost does not matter. In this case, even choosing SSD for content repo can not provide huge performance gain instead of HDD. Am I right? Hence, it would be better to spend content repo SSD money on network infrastructure.
Best regards, Ali On Thu, Oct 13, 2016 at 10:22 PM, Joe Witt <[email protected]> wrote: > Ali, > > You have a lot of nice resources to work with there. I'd recommend the > series of RAID-1 configuration personally provided you keep in mind this > means you can only lose a single disk for any one partition. As long as > they're being monitored and would be quickly replaced this in practice > works well. If there could be lapses in monitoring or time to replace then > it is perhaps safer to go with more redundancy or an alternative RAID type. > > I'd say do the OS, app installs w/user and audit db stuff, application > logs on one physical RAID volume. Have a dedicated physical volume for the > flow file repository. It will not be able to use all the space but it > certainly could benefit from having no other contention. This could be a > great thing to have SSDs for actually. And for the remaining volumes split > them up for content and provenance as you have. You get to make the > overall performance versus retention decision. Frankly, you have a great > system to work with and I suspect you're going to see excellent results > anyway. > > Conservatively speaking expect say 50MB/s of throughput per volume in the > content repository so if you end up with 8 of them could achieve upwards of > 400MB/s sustained. You'll also then want to make sure you have a good 10G > based network setup as well. Or, you could dial back on the speed tradeoff > and simply increase retention or disk loss tolerance. Lots of ways to play > the game. > > There are no published SSD vs HDD performance benchmarks that I am aware > of though this is a good idea. Having a hybrid of SSDs and HDDs could > offer a really solid performance/retention/cost tradeoff. For example > having SSDs for the OS/logs/provenance/flowfile with HDDs for the content - > that would be quite nice. At that rate to take full advantage of the > system you'd need to have very strong network infrastructure between NiFi > and any systems it is interfacing with and your flows would need to be > well tuned for GC/memory efficiency. > > Thanks > Joe > > On Thu, Oct 13, 2016 at 2:50 AM, Ali Nazemian <[email protected]> > wrote: > >> Dear Nifi Users/ developers, >> Hi, >> >> I was wondering is there any benchmark about the question that is it >> better to dedicate disk control to Nifi or using RAID for this purpose? For >> example, which of these scenarios is recommended from the performance point >> of view? >> Scenario 1: >> 24 disk in total >> 2 disk- raid 1 for OS and fileflow repo >> 2 disk- raid 1 for provenance repo1 >> 2 disk- raid 1 for provenance repo2 >> 2 disk- raid 1 for content repo1 >> 2 disk- raid 1 for content repo2 >> 2 disk- raid 1 for content repo3 >> 2 disk- raid 1 for content repo4 >> 2 disk- raid 1 for content repo5 >> 2 disk- raid 1 for content repo6 >> 2 disk- raid 1 for content repo7 >> 2 disk- raid 1 for content repo8 >> 2 disk- raid 1 for content repo9 >> >> >> Scenario 2: >> 24 disk in total >> 2 disk- raid 1 for OS and fileflow repo >> 4 disk- raid 10 for provenance repo1 >> 18 disk- raid 10 for content repo1 >> >> Moreover, is there any benchmark for SSD vs HDD performance for Nifi? >> Thank you very much. >> >> Best regards, >> Ali >> > > -- A.Nazemian
