Howdy!

Very valuable advice here (and from Bob, who made similar comments - thanks,
Bob!). I think, then, we'll generally stick to 128K recordsizes. In the case
of databases, we'll stray as appropriate, and we may also stray with the HPC
compute cluster if we can get demonstrate that it is worth it.

To answer your questions below...

Currently, we have a single pool, in a "load share" configuration (no
raidz), that collects all the storage (which answers Ross' question too).
>From that we carve filesystems on demand. There are many more tests planned
for that construction, though, so we are not married to it.

Redundancy abounds. ;> Since the pool doesn't employ raidz, it isn't
internally redundant, but we plan to replicate the pool's data to an
identical system (which is not yet built) at another site. Our initial
userbase don't need the replication, however, because they uses the system
for little more than scratch space. Huge genomic datasets are dumped on the
storage, analyzed, and the results (which are much smaller) get sent
elsewhere. Everything is wiped out soon after that and the process starts
again. Future projected uses of the storage, however, would be far less
tolerant of loss, so I expect we'll want to reconfigure the pool in raidz.

I see that Archie and Miles have shared some harrowing concerns which we
take very seriously. I don't think I'll be able to reply to them today, but
I certainly will in the near future (particularly once we've completed some
more of our induced failure scenarios).

Sidenote: Today we made eight network/iSCSI related tweaks that, in
aggregate, have resulted in dramatic performance improvements (some I just
hadn't gotten around to yet, others suggested by Sun's Mertol Ozyoney)...

- disabling the Nagle algorithm on the head node
- setting each iSCSI target block size to match the ZFS record size of 128K
- disabling "thin provisioning" on the iSCSI targets
- enabling jumbo frames everywhere (each switch and NIC)
- raising ddi_msix_alloc_limit to 8
- raising ip_soft_rings_cnt to 16
- raising tcp_deferred_acks_max to 16
- raising tcp_local_dacks_max to 16

Rerunning the same tests, we now see...

[1GB file size, 1KB record size]
Command: iozone -i o -i 1 -i 2 -r 1k -s 1g -f /data-das/perftest/1gbtest
Write: 143373
Rewrite: 183170
Read: 433205
Reread: 435503
Random Read: 90118
Random Write: 19488

[8GB file size, 512KB record size]
Command: iozone -i 0 -i 1 -i 2 -r 512k -s 8g -f
/volumes/data-iscsi/perftest/8gbtest
Write:  463260
Rewrite:  449280
Read:  1092291
Reread:  881044
Random Read:  442565
Random Write:  565565

[64GB file size, 1MB record size]
Command: iozone -i o -i 1 -i 2 -r 1m -s 64g -f /data-das/perftest/64gbtest
Write: 357199
Rewrite: 342788
Read: 609553
Reread: 645618
Random Read: 218874
Random Write: 339624

Thanks so much to everyone for all their great contributions!
-Gray

On Thu, Oct 16, 2008 at 2:20 AM, Akhilesh Mritunjai <
[EMAIL PROTECTED] <[EMAIL PROTECTED]>> wrote:

> Hi Gray,
>
> You've got a nice setup going there, few comments:
>
> 1. Do not tune ZFS without a proven test-case to show otherwise, except...
> 2. For databases. Tune recordsize for that particular FS to match DB
> recordsize.
>
> Few questions...
>
> * How are you divvying up the space ?
> * How are you taking care of redundancy ?
> * Are you aware that each layer of ZFS needs its own redundancy ?
>
> Since you have got a mixed use case here, I would be surprized if a general
> config would cover all, though it might do with some luck.
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
Gray Carper
MSIS Technical Services
University of Michigan Medical School
[EMAIL PROTECTED]  |  skype:  graycarper  |  734.418.8506
http://www.umms.med.umich.edu/msis/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to