On Tue, Jun 15, 2010 at 3:09 PM, Erik Trimble <erik.trim...@oracle.com>wrote:
> On 6/15/2010 4:42 AM, Arve Paalsrud wrote: > >> Hi, >> >> We are currently building a storage box based on OpenSolaris/Nexenta using >> ZFS. >> Our hardware specifications are as follows: >> >> Quad AMD G34 12-core 2.3 GHz (~110 GHz) >> 10 Crucial RealSSD (6Gb/s) >> 42 WD RAID Ed. 4 2TB disks + 6Gb/s SAS expanders >> LSI2008SAS (two 4x ports) >> Mellanox InfiniBand 40 Gbit NICs >> 128 GB RAM >> >> This setup gives us about 40TB storage after mirror (two disks in spare), >> 2.5TB L2ARC and 64GB Zil, all fit into a single 5U box. >> >> Both L2ARC and Zil shares the same disks (striped) due to bandwidth >> requirements. Each SSD has a theoretical performance of 40-50k IOPS on 4k >> read/write scenario with 70/30 distribution. Now, I know that you should >> have mirrored Zil for safety, but the entire box are synchronized with an >> active standby on a different site location (18km distance - round trip of >> 0.16ms + equipment latency). So in case the Zil in Site A takes a fall, or >> the motherboard/disk group/motherboard dies - we still have safety. >> >> DDT requirements for dedupe on 16k blocks should be about 640GB when main >> pool are full (capacity). >> >> Without going into details about chipsets and such, do any of you on this >> list have any experience with a similar setup and can share with us your >> thoughts, do's and dont's, and any other information that could be of help >> while building and configuring this? >> >> What I want to achieve is 2 GB/s+ NFS traffic against our ESX clusters >> (also InfiniBand-based), with both dedupe and compression enabled in ZFS. >> >> Let's talk moon landings. >> >> Regards, >> Arve >> >> > > > Given that for ZIL, random write IOPS is paramount, the RealSSD isn't a > good choice. SLC SSDs still spank any MLC device, and random IOPS for > something like an Intel X25-E or OCZ Vertex EX are over twice that of the > RealSSD. I don't know where they manage to get 40k+ IOPS number for the > RealSSD (I know it's in the specs, but how did they get that?), but that's > not what others are reporting: > > > http://benchmarkreviews.com/index.php?option=com_content&task=view&id=454&Itemid=60&limit=1&limitstart=7 See http://www.anandtech.com/show/2944/3 and http://www.crucial.com/pdf/Datasheets-letter_C300_RealSSD_v2-5-10_online.pdf But I agree that we should look into using the Vertex instead. Sadly, none of the current crop of SSDs support a capacitor or battery to > back up their local (on-SSD) cache, so they're all subject to data loss on a > power interruption. > Noted > Likewise, random Read dominates L2ARC usage. Here, the most cost-effective > solutions tend to be MLC-based SSDs with more moderate IOPS performance - > the Intel X25-M and OCZ Vertex series are likely much more cost-effective > than a RealSSD, especially considering price/performance. > Our other option are to use two Fusion-IO ioDrive Duo SLC/MLC or the SMLC when available (as well as drivers for Solaris) - so the price we're currently talking about is not an issue. > Also, given the limitations of a x4 port connection to the rest of the > system, I'd consider using a couple more SAS controllers, and fewer > Expanders. The SSDs together are likely to be able to overwhelm a x4 PCI-E > connection, so I'd want at least one dedicated x4 SAS HBA just for them. > For the 42 disks, it depends more on what your workload looks like. If it > is mostly small or random I/O to the disks, you can get away with fewer > HBAs. Large, sequential I/O to the disks is going to require more HBAs. > Remember, a modern 7200RPM SATA drive can pump out well over 100MB/s > sequential, but well under 10MB/s random. Do the math to see how fast it > will overwhelm the x4 PCI-E 2.0 connection which maxes out at about 2GB/s. > We're talking about 4X SAS 6Gb/s lanes - 4800MB/s per port. See http://www.lsi.com/DistributionSystem/AssetDocument/SCG_LSISAS2008_PB_043009.pdffor specifications of the LSI chip. In other words, it utilizes PCIe 2.0 8x. I'd go with 2 Intel X25-E 32GB models for ZIL. Mirror them - striping isn't > really going to buy you much here (so far as I can tell). 6Gbit/s SAS is > wasted on HDs, so don't bother paying for it if you can avoid doing so. > Really, I'd suspect that paying for 6Gb/s SAS isn't worth it at all, as > really only the read performance of the L2ARC SSDs might possibly exceed > 3Gb/s SAS. > What about bandwidth in this scenario? Won't the ZIL be limited to the throughput of only one X25-E? The SATA disks operates at 3Gb/s through the SAS expanders, so no 6Gb/s there. > I'm going to say something sacrilegious here: 128GB of RAM may be > overkill. You have the SSDs for L2ARC - much of which will be the DDT, but, > if I'm reading this correctly, even if you switch to the 160GB Intel X25-M, > that give you 8 x 160GB = 1280GB of L2ARC, of which only half is in-use by > the DDT. The rest is file cache. You'll need lots of RAM if you plan on > storing lots of small files in the L2ARC (that is, if your workload is lots > of small files). 200bytes/record needed in RAM for an L2ARC entry. > > I.e. > > if you have 1k average record size, for 600GB of L2ARC, you'll need 600GB > / 1kb * 200B = 120GB RAM. > > if you have a more manageable 8k record size, then, 600GB / 8kB * 200B = > 15GB The box will store mostly large VM files, and DRAM will always be used when available, regardless of L2ARC size - it's a benefit to have more of it. > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > Regards, Arve
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss