On Thu, Mar 06, 2014 at 02:33:26PM +0000, Alex Adriaanse wrote: > We’re needing to store many TBs of data (somewhere between 10TB and 50TB) on > a SmartOS server (or cluster of SmartOS servers). Most of this data will be > fairly static, and the files that will take up this space will typically > average around 1MB in size. Performance isn’t going to be a huge concern, > except that we’ll also house a database server where performance is a bit > more important (although two 7200RPM drives in a mirrored zpool seem to be > holding up currently just fine). We’ll be using zones for most virtual > machines on this server. > > We’re looking at getting a dedicated server with a 36 hard drive bay chassis > to give us lots of room for growth. I’m envisioning we have two options: > > 1. Create one giant zpool that will eventually span up to ~32 hard drives > and dump all the files in there. When using 4TB drives in RAID10, this would > give us 32TB of usable storage. This would be the easiest solution because we > could dump most of our data in a single filesystem. However, I’m concerned > about scalability and reliability, as the wrong two hard drives going out at > the same time would kill the entire zpool, and also dealing with a ~32TB > filesystem may be challenging (we use zfs send/receive to replicate our data > to an offsite mirror server, so at some point the diffs between snapshots > will become huge as we collect more and more data; the initial full zfs > send/receive would take forever, too).
The way we do this for our Manta storage nodes is to have a single RAIDZ2 pool with 3 vdevs (the topology ends up being 3x(9+2) with 2 spares and a slog in a 36-disk chassis). This works well, performs well, and has excellent durability. If you're curious about our HW config, see github.com/joyent/manufacturing. I don't understand your comments on snapshots and send/recv; the size of the pool has nothing to do with these things. Snapshots are per-dataset (per-"filesystem", or practically speaking per-zone). You can also use incrementals snapshots to reduce send stream size. None of this is influenced in any way by the pool size; it's strictly about the amount of data that you need consistent snapshots of. A 32 TB pool is much smaller than many others in production, including at Joyent. The people building storage systems with ZFS are accustomed to pools with hundreds of TB or even several PB. The problems with huge pools all stem from managing huge numbers of disks, not with any of the things you're worried about. > 2. We create one zpool for each group of 4 hard drives in RAID10 (with 4TB > hard drives this would give us 8TB of usable space per zpool). This would > require some forethought in our application as we’d have to split our data > carefully to make sure it doesn’t exceed 8TB per filesystem, but this > shouldn’t be a big deal. Multiple pools is insane. Please don't do this. It defeats the purpose of pooled storage, and support for more than the zones pool in SmartOS is somewhere between poor and nonexistent. ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
