Thank you very much. I would be more than happy to provide some benchmark results after the implementation. Sincerely yours, Ali
On Thu, Oct 13, 2016 at 11:32 PM, Joe Witt <[email protected]> wrote: > Ali, > > I agree with your assumption. It would be great to test that out and > provide some numbers but intuitively I agree. > > I could envision certain scatter/gather data flows that could challenge > that sequential access assumption but honestly with how awesome disk > caching is in Linux these days in think practically speaking this is the > right way to think about it. > > Thanks > Joe > > On Thu, Oct 13, 2016 at 8:29 AM, Ali Nazemian <[email protected]> > wrote: > >> Dear Joe, >> >> Thank you very much. That was a really great explanation. >> I investigated the Nifi architecture, and it seems that most of the >> read/write operations for flow file repo and provenance repo are random. >> However, for content repo most of the read/write operations are sequential. >> Let's say cost does not matter. In this case, even choosing SSD for content >> repo can not provide huge performance gain instead of HDD. Am I right? >> Hence, it would be better to spend content repo SSD money on network >> infrastructure. >> >> Best regards, >> Ali >> >> On Thu, Oct 13, 2016 at 10:22 PM, Joe Witt <[email protected]> wrote: >> >>> Ali, >>> >>> You have a lot of nice resources to work with there. I'd recommend the >>> series of RAID-1 configuration personally provided you keep in mind this >>> means you can only lose a single disk for any one partition. As long as >>> they're being monitored and would be quickly replaced this in practice >>> works well. If there could be lapses in monitoring or time to replace then >>> it is perhaps safer to go with more redundancy or an alternative RAID type. >>> >>> I'd say do the OS, app installs w/user and audit db stuff, application >>> logs on one physical RAID volume. Have a dedicated physical volume for the >>> flow file repository. It will not be able to use all the space but it >>> certainly could benefit from having no other contention. This could be a >>> great thing to have SSDs for actually. And for the remaining volumes split >>> them up for content and provenance as you have. You get to make the >>> overall performance versus retention decision. Frankly, you have a great >>> system to work with and I suspect you're going to see excellent results >>> anyway. >>> >>> Conservatively speaking expect say 50MB/s of throughput per volume in >>> the content repository so if you end up with 8 of them could achieve >>> upwards of 400MB/s sustained. You'll also then want to make sure you have >>> a good 10G based network setup as well. Or, you could dial back on the >>> speed tradeoff and simply increase retention or disk loss tolerance. Lots >>> of ways to play the game. >>> >>> There are no published SSD vs HDD performance benchmarks that I am aware >>> of though this is a good idea. Having a hybrid of SSDs and HDDs could >>> offer a really solid performance/retention/cost tradeoff. For example >>> having SSDs for the OS/logs/provenance/flowfile with HDDs for the content - >>> that would be quite nice. At that rate to take full advantage of the >>> system you'd need to have very strong network infrastructure between NiFi >>> and any systems it is interfacing with and your flows would need to be >>> well tuned for GC/memory efficiency. >>> >>> Thanks >>> Joe >>> >>> On Thu, Oct 13, 2016 at 2:50 AM, Ali Nazemian <[email protected]> >>> wrote: >>> >>>> Dear Nifi Users/ developers, >>>> Hi, >>>> >>>> I was wondering is there any benchmark about the question that is it >>>> better to dedicate disk control to Nifi or using RAID for this purpose? For >>>> example, which of these scenarios is recommended from the performance point >>>> of view? >>>> Scenario 1: >>>> 24 disk in total >>>> 2 disk- raid 1 for OS and fileflow repo >>>> 2 disk- raid 1 for provenance repo1 >>>> 2 disk- raid 1 for provenance repo2 >>>> 2 disk- raid 1 for content repo1 >>>> 2 disk- raid 1 for content repo2 >>>> 2 disk- raid 1 for content repo3 >>>> 2 disk- raid 1 for content repo4 >>>> 2 disk- raid 1 for content repo5 >>>> 2 disk- raid 1 for content repo6 >>>> 2 disk- raid 1 for content repo7 >>>> 2 disk- raid 1 for content repo8 >>>> 2 disk- raid 1 for content repo9 >>>> >>>> >>>> Scenario 2: >>>> 24 disk in total >>>> 2 disk- raid 1 for OS and fileflow repo >>>> 4 disk- raid 10 for provenance repo1 >>>> 18 disk- raid 10 for content repo1 >>>> >>>> Moreover, is there any benchmark for SSD vs HDD performance for Nifi? >>>> Thank you very much. >>>> >>>> Best regards, >>>> Ali >>>> >>> >>> >> >> >> -- >> A.Nazemian >> > > -- A.Nazemian
