> Sounds good so far:  lots of small files in a largish
> system with presumably significant access parallelism
> makes RAID-Z a non-starter, but RAID-5 should be OK,
> especially if the workload is read-dominated.  ZFS
> might aggregate small writes such that their
> performance would be good as well if Cyrus doesn't
> force them to be performed synchronously (and ZFS
> doesn't force them to disk synchronously on file
> close); even synchronous small writes could perform
> well if you mirror the ZFS small-update log:  flash -
> at least the kind with decent write performance -
> might be ideal for this, but if you want to steer
> clear of a specialized configuration just carving one
> small LUN for mirroring out of each array (you could
> use a RAID-0 stripe on each array if you were
> compulsive about keeping usage balanced; it would be
> nice to be able to 'center' it on the disks, but
> probably not worth the management overhead unless the
> array makes it easy to do so) should still offer a
> noticeable improvement over just placing the ZIL on
> the RAID-5 LUNs.

I'm not sure I understand you here.  I suppose I need to read
up on the ZIL option.  We are running Solaris 10u4 not OpenSolaris.

Can I setup a disk in each 2540 array for this ZIL disk, and then mirror them 
such that if one array goes down I'm not dead?  If this ZIL disk also goes 
dead, what is the failure mode and recovery option then?

We did get the 2540 fully populated.  With 12 disks, and wanting to have at 
least ONE hot global spare in each array, and needing to keep LUNs the same 
size, you end up doing 2 5-disk RAID-5 LUNs and 2 hot spares in each array.  
Not that I really need 2 spares I just didn't see any way to make good use of 
an extra disk in each array.  If we wanted to dedicate them instead to this ZIL 
need, what is best way to go about that?  Our current setup to be specific:

{cyrus3-1:vf5:136} zpool status
  pool: ms11
 state: ONLINE
 scrub: none requested
config:

        NAME                                       STATE     READ WRITE CKSUM
        ms11                                       ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t600A0B800038ACA0000002AB47504368d0  ONLINE       0     0     0
            c6t600A0B800038A04400000251475045D1d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t600A0B800038A1CF000002994750442Fd0  ONLINE       0     0     0
            c6t600A0B800038A3C40000028447504628d0  ONLINE       0     0     0

errors: No known data errors

> By 'stripe size' do you mean the size of the entire
> stripe (i.e., your default above reflects 32 KB on
> each data disk, plus a 32 KB parity segment) or the
> amount of contiguous data on each disk (i.e., your
> default above reflects 128 KB on each data disk for a
> total of 512 KB in the entire stripe, exclusive of
> the 128 KB parity segment)?

I'm going from the pulldown menu choices in CAM 6.0 for
the 2540 arrays, which are currently 128K, and only go
up to 512K.  I'll have to pull up the interface again when I am
at work but I think it was called stripe size, and referred to values
the 2540 firmware was assigning onto the 5-disk RAID-5 sets.

> If the former, by all means increase it to 512 KB:
> this will keep the largest ZFS block on a single
> disk (assuming that ZFS aligns them on 'natural'
> boundaries) and help read-access parallelism
> significantly in large-block cases (I'm guessing
> that ZFS would use small blocks for small files but
> still quite possibly use large blocks for its
> metadata).  Given ZFS's attitude toward multi-block
> on-disk contiguity there might not be much benefit
> in going to even larger stripe sizes, though it
> probably wouldn't hurt noticeably either as long as
> the entire stripe (ignoring parity) didn't exceed 4
> - 16 MB in size (all the above numbers assume the 4
>  + 1 stripe configuration that you described).
> 
> In general, having less than 1 MB per-disk stripe
> segments doesn't make sense for *any* workload:  it
> only takes 10 - 20 milliseconds to transfer 1 MB from
> a contemporary SATA drive (the analysis for
> high-performance SCSI/FC/SAS drives is similar, since
> both bandwidth and latency performance improve),
> which is comparable to the 12 - 13 ms. that it takes
> on average just to position to it - and you can still
> stream data at high bandwidths in parallel from the
> disks in an array as long as you have a client buffer
> as large in MB as the number of disks you need to
> stream from to reach the required bandwidth (you want
> 1 GB/sec?  no problem:  just use a 10 - 20 MB buffer
> and stream from 10 - 20 disks in parallel).  Of
> course, this assumes that higher software layers
> organize data storage to provide that level of
> contiguity to leverage...

Hundreds of POP and IMAP user processes coming and going from users reading 
their mail.  Hundreds more LMTP processes from mail being delivered to the 
Cyrus mail-store.  Sometimes writes predominate over reads, depends on time of 
day whether backups are running, etc.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to