On Tue, Jun 15, 2010 at 3:09 PM, Erik Trimble <erik.trim...@oracle.com>wrote:

> On 6/15/2010 4:42 AM, Arve Paalsrud wrote:
>
>> Hi,
>>
>> We are currently building a storage box based on OpenSolaris/Nexenta using
>> ZFS.
>> Our hardware specifications are as follows:
>>
>> Quad AMD G34 12-core 2.3 GHz (~110 GHz)
>> 10 Crucial RealSSD (6Gb/s)
>> 42 WD RAID Ed. 4 2TB disks + 6Gb/s SAS expanders
>> LSI2008SAS (two 4x ports)
>> Mellanox InfiniBand 40 Gbit NICs
>> 128 GB RAM
>>
>> This setup gives us about 40TB storage after mirror (two disks in spare),
>> 2.5TB L2ARC and 64GB Zil, all fit into a single 5U box.
>>
>> Both L2ARC and Zil shares the same disks (striped) due to bandwidth
>> requirements. Each SSD has a theoretical performance of 40-50k IOPS on 4k
>> read/write scenario with 70/30 distribution. Now, I know that you should
>> have mirrored Zil for safety, but the entire box are synchronized with an
>> active standby on a different site location (18km distance - round trip of
>> 0.16ms + equipment latency). So in case the Zil in Site A takes a fall, or
>> the motherboard/disk group/motherboard dies - we still have safety.
>>
>> DDT requirements for dedupe on 16k blocks should be about 640GB when main
>> pool are full (capacity).
>>
>> Without going into details about chipsets and such, do any of you on this
>> list have any experience with a similar setup and can share with us your
>> thoughts, do's and dont's, and any other information that could be of help
>> while building and configuring this?
>>
>> What I want to achieve is 2 GB/s+ NFS traffic against our ESX clusters
>> (also InfiniBand-based), with both dedupe and compression enabled in ZFS.
>>
>> Let's talk moon landings.
>>
>> Regards,
>> Arve
>>
>>
>
>
> Given that for ZIL, random write IOPS is paramount, the RealSSD isn't a
> good choice.  SLC SSDs still spank any MLC device, and random IOPS for
> something like an Intel X25-E or OCZ Vertex EX are over twice that of the
> RealSSD.  I don't know where they manage to get 40k+ IOPS number for the
> RealSSD (I know it's in the specs, but how did they get that?), but that's
> not what others are reporting:
>
>
> http://benchmarkreviews.com/index.php?option=com_content&task=view&id=454&Itemid=60&limit=1&limitstart=7


See http://www.anandtech.com/show/2944/3 and
http://www.crucial.com/pdf/Datasheets-letter_C300_RealSSD_v2-5-10_online.pdf
But I agree that we should look into using the Vertex instead.

Sadly, none of the current crop of SSDs support a capacitor or battery to
> back up their local (on-SSD) cache, so they're all subject to data loss on a
> power interruption.
>

Noted


> Likewise, random Read dominates L2ARC usage. Here, the most cost-effective
> solutions tend to be MLC-based SSDs with more moderate IOPS performance -
> the Intel X25-M and OCZ Vertex series are likely much more cost-effective
> than a RealSSD, especially considering price/performance.
>

Our other option are to use two Fusion-IO ioDrive Duo SLC/MLC or the SMLC
when available (as well as drivers for Solaris) - so the price we're
currently talking about is not an issue.


> Also, given the limitations of a x4 port connection to the rest of the
> system, I'd consider using a couple more SAS controllers, and fewer
> Expanders. The SSDs together are likely to be able to overwhelm a x4 PCI-E
> connection, so I'd want at least one dedicated x4 SAS HBA just for them.
>  For the 42 disks, it depends more on what your workload looks like. If it
> is mostly small or random I/O to the disks, you can get away with fewer
> HBAs. Large, sequential I/O to the disks is going to require more HBAs.
>  Remember, a modern 7200RPM SATA drive can pump out well over 100MB/s
> sequential, but well under 10MB/s random.  Do the math to see how fast it
> will overwhelm the x4 PCI-E 2.0 connection which maxes out at about 2GB/s.
>

We're talking about 4X SAS 6Gb/s lanes - 4800MB/s per port. See
http://www.lsi.com/DistributionSystem/AssetDocument/SCG_LSISAS2008_PB_043009.pdffor
specifications of the  LSI chip. In other words, it utilizes PCIe 2.0
8x.

I'd go with 2 Intel X25-E 32GB models for ZIL. Mirror them - striping isn't
> really going to buy you much here (so far as I can tell).  6Gbit/s SAS is
> wasted on HDs, so don't bother paying for it if you can avoid doing so.
> Really, I'd suspect that paying for 6Gb/s SAS isn't worth it at all, as
> really only the read performance of the L2ARC SSDs might possibly exceed
> 3Gb/s SAS.
>

What about bandwidth in this scenario? Won't the ZIL be limited to the
throughput of only one X25-E? The SATA disks operates at 3Gb/s through the
SAS expanders, so no 6Gb/s there.


> I'm going to say something sacrilegious here:  128GB of RAM may be
> overkill.  You have the SSDs for L2ARC - much of which will be the DDT, but,
> if I'm reading this correctly, even if you switch to the 160GB Intel X25-M,
> that give you 8 x 160GB = 1280GB of L2ARC, of which only half is in-use by
> the DDT. The rest is file cache.  You'll need lots of RAM if you plan on
> storing lots of small files in the L2ARC (that is, if your workload is lots
> of small files).  200bytes/record needed in RAM for an L2ARC entry.
>
> I.e.
>
> if you have 1k average record size, for 600GB of L2ARC, you'll need  600GB
> / 1kb * 200B = 120GB RAM.
>
> if you have a more manageable 8k record size, then, 600GB / 8kB * 200B =
> 15GB


The box will store mostly large VM files, and DRAM will always be used when
available, regardless of L2ARC size - it's a benefit to have more of it.


> --
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
>

Regards,
Arve
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to