On Fri, 30 Nov 2007, Vincent Fox wrote:

... reformatted ...
> We will be using Cyrus to store mail on 2540 arrays.
>
> We have chosen to build 5-disk RAID-5 LUNs in 2 arrays which are 
> both connected to same host, and mirror and stripe the LUNs.  So a 
> ZFS RAID-10 set composed of 4 LUNs.  Multi-pathing also in use for 
> redundancy.
>
> My question is any guidance on best choice in CAM for stripe size in the LUNs?

[after reading the entire thread where details of the storage related 
application is presented piecemeal and piecing together the details] I 
can't give you an answer or a recommendation, because the question 
does not make sense IMHO.

IOW: This is like saying: "I want to get from Dallas to LA as quickly 
as possible and have already decided that a bicycle would be the best 
mode of transport to use; can you tell me how I should configure the 
bicycle."  The problem is that its very unlikely that the bicycle is 
the correct solution and to recommend which bicycle config is correct 
is likely to provide very bad advice..... and also validate the 
supposition that the solution utilizing the bicycle is, indeed, the 
correct solution.

> Default is 128K right now, can go up to 512K, should we go higher?
>
> Cyrus stores mail messages as many small files, not big mbox files. 
> But there are so many layers in action here it's hard to know what 
> is best choice.

[again based on reading the entire thread and not an answer to the 
above paragraph]

It appears that the chosen solution is to use a stripe of two hardware 
RAID5 luns presented by a 2540 (please correct me if this is 
incorrect).  There are several issues with this proposal:

a) You're mixing solutions: Hardware RAID5 and ZFS.  Why?  All this 
does is introduce needless complexity and make it very difficult to 
troubleshoot issues with the storage subsystem - especially if the 
issue is performance related.  Also - how do you localize a fault 
condition that is caused by a 2540 RAID firmware bug?  How do you 
isolate performance issues caused by the interaction between the 
hardware RAID5 luns and ZFS?

b) You've chosen a stripe - despite Richard Ellings best advice 
(something like "friends don't let friends use stripes").  See 
Richards blogs for a comparison of the reliability rates for different 
storage configurations.

c) For a mail storage subsystem a stripe seems totally wrong. 
Generally speaking, email (stores) consists of many small files - with 
occasional medium sized files (due to attachments) and less commonly, 
some large files - usually limited by the max message size defined by 
the MTA (typical value is 10Mb - what is it in your case?).

d) ZFS, with its built-in volume manager, relies on having direct 
access to individual disks (JBOD).  Placing a hardware RAID engine 
between ZFS and the actual disks is a "black box" in terms of the ZFS 
volume manager - and it can't possibly "understand" how various 
storage providers' "black boxes" will behave.... especially when ZFS 
tells the "disk" to do something and the hardware RAID lun lies to ZFS
(example sync writes).

e) You've presented no data in terms of typical iostat -xcnz 5 output 
- generalized over various times of the day where particular user data 
access patterns are known.  This information would allow us to give 
you some basic recommendations.  IOW - we need to know the basic 
requirements in terms of IOPS and average I/O transfer sizes.  BTW: 
Brendan Greggs DTrace scripts will allow you to gather very detailed 
I/O usage data on the production system with no risk.

f) You have not provided any details of the 2540 config - except for 
the fact that it is "fully loaded" IIRC.  SAS disks?  10,000 RPM 
drives of 15k RPM drives?  Disk drive size?

g) You've provided no details of how the host is configured.  If you 
decide to deploy a ZFS based system, the amount of installed RAM on 
the mailserver will have a *huge* impact on the actual load placed on 
the I/O subsystem.  In this regard, ZFS is your friend, as it'll cache 
almost _everything_, given enough RAM.  And DDR2 RAM is (arguably) 
less than $40 a gigabyte today - with 2Gb SIMMs having reached price 
parity with the equivalent pricing of 2 * 1Gb DIMMs.

For example: if an end-user MUA is configured to poll the mailserver 
every 30 Seconds, to check if new mail has arrived, if the mailserver 
has sufficient (cache) memory, then only the first request will 
require disk access and a large number of subsequent requests will be 
handled out of (cache) memory.

h) Another observation: You've commented on the importance of system 
reliability because there are 10k users on the mailserver.  Whether 
you have 10 users or 10k users or 100k users is of no importance if 
you are considering system reliability (aka failure rates).  IOW - a 
system that is configured to a certain reliability requirement will be 
the same, regardless of the number of end users that rely on that 
system.  The number of concurrent users is important only in terms of 
system performance and response time.

i) I don't know what the overall storage requirement is (someone said 
1Tb IIRC) and how this relates to the number/size of the available 
disk drives (in the 2540).

Observations:

1) Any striped config seems inherently wrong - given the available 
information.

2) mixing RAID5 luns (backend) with ZFS introduces unnecessary 
system complexity.

3) designing a system when no requirements have been presented in 
terms of:
    i)   I/O access patterns
    ii)  IOPS (I/O Ops per Second)
    iii) required response time
    iv)  number of concurrent requests
    v)   application host config (CPUs/cores, RAM, I/O bus, disk ctrls)
    vi)  backup methodology and frequency
    vii) storage subsystem config
    ....
is very unlikely to result in a correctly configured system that will 
meet the owner/operators expectations.

Please don't frame this response as completely negative.  That is not 
my intention - what I'm trying to do is present you with a list of 
questions that must be answered before a technically correct storage 
subsystem can be designed and implemented.  IOW - before a storage 
subsystem can be correctly *engineered*.

Also - please don't be discouraged by this response.  If you are 
willing to fill in the blanks, I'm willing to help provide a 
meaningful recommendation.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
            Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Graduate from "sugar-coating school"?  Sorry - I never attended! :)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to