I just recently built an OpenIndiana 151a7 system that is currently 1/2 PB that will be expanded to 1 PB as we collect imaging data for the Human Connectome Project at Washington University in St. Louis. It is very much like your use case as this is an offsite backup system that will write once and read rarely.
It has displaced a BlueArc DR system because their mechanisms for syncing over distances could not keep up with our data generation rate. The fact it cost 5x per TB as homebrew helped the decision also. It is currently 180 4TB SAS Seagate Constellations in 4 Supermicro JBODs. The JBODS currently are in two branches only cascading once. When expanded 4 JBODs will be on each branch. The pool is configured as 9 zvols of 19 drives in raidz3. The remaining disks are configured as hot spares. Metedata only is cached in 128GB ram and 2 480GB Intel 520 SSDs for L2ARC. Sync (ZIL) is turned off since the worst that would happen is that we would need to rerun an rsync job. Two identical servers were built for a cold standby configuration. Since it is a DR system the need for a hot standby was ruled out since even several hours downtime would not be an issue. Each server is fitted with 2 LSI 9207-8e HBAs configured as redundant multipath to the JBODs. Before putting in into service I ran several iozone tests to benchmark the pool. Even with really fat vdevs the performance is impressive. If you're interested in that data let me know. It has many hours of idle time each day so additional performance tests are not out of the question either. Actually I should say I designed and configured the system. The system was assembled by a colleague at UMINN. If you would like more details on the hardware I have a very detailed assembly doc I wrote and would be happy to share. The system receives daily rsyncs from our production BlueArc system. The rsyncs are split into 120 parallel rsync jobs. This overcomes the latency slow down TCP suffers from and we see total throughput between 500-700Mb/s. The BlueArc has 120TB of 15k SAS tiered to NL-SAS. All metadata is on the SAS pool. The ZFS system outpaces the BlueArc on metadata when rsync does its tree walk. Given all the safeguards built into ZFS, I would not hesitate to build a production system at the multi-petabyte scale. If a channel to disks are no longer available it will simply stop writing and data will be safe. Given the redundant paths, power supplies, etc, the odds of that happening are very unlikely. The single points of failure left when running a single server remain at the motherboard, CPU and RAM level. Build a hot standby server and human error becomes the most likely failure. -Chip On Fri, Mar 15, 2013 at 8:09 PM, Marion Hakanson <hakan...@ohsu.edu> wrote: > Greetings, > > Has anyone out there built a 1-petabyte pool? I've been asked to look > into this, and was told "low performance" is fine, workload is likely > to be write-once, read-occasionally, archive storage of gene sequencing > data. Probably a single 10Gbit NIC for connectivity is sufficient. > > We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, > using 4TB "nearline SAS" drives, giving over 100TB usable space (raidz3). > Back-of-the-envelope might suggest stacking up eight to ten of those, > depending if you want a "raw marketing petabyte", or a proper "power-of-two > usable petabyte". > > I get a little nervous at the thought of hooking all that up to a single > server, and am a little vague on how much RAM would be advisable, other > than "as much as will fit" (:-). Then again, I've been waiting for > something like pNFS/NFSv4.1 to be usable for gluing together multiple > NFS servers into a single global namespace, without any sign of that > happening anytime soon. > > So, has anyone done this? Or come close to it? Thoughts, even if you > haven't done it yourself? > > Thanks and regards, > > Marion > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss