Re: [zfs-discuss] I/O freeze after a disk failure

Ralf Ramge Wed, 12 Sep 2007 04:52:56 -0700

Gino wrote:
> The real problem is that ZFS should stop to force kernel panics.
>
>   
I found these panics very annoying, too. And even more that the zpool 
was faulted afterwards. But my problem is that when someone asks me what 
ZFS should do instead, I have no idea.


>> I have large Sybase database servers and file servers
>> with billions of 
>> inodes running using ZFSv3. They are attached to
>> X4600 boxes running 
>> Solaris 10 U3, 2x 4 GBit/s dual FibreChannel, using
>> dumb and cheap 
>> Infortrend FC JBODs (2 GBit/s) as storage shelves.
>>     
>
> Are you using FATA drives?
>
>   
Seagate FibreChannel drives, Cheetah 15k, ST3146855FC for the databases.

For the NFS filers we use Infortrend FC shelves with SATA inside.

>> During all my 
>> benchmarks (both on the command line and within
>> applications) show that 
>> the FibreChannel is the bottleneck, even with random
>> read. ZFS doesn't 
>> do this out of the box, but a bit of tuning helped a
>> lot.
>>     
>
> You found and other good point.
> I think that with ZFS and JBOD, FC links will be soon the bottleneck.
> What tuning have you done?
>
>   
That depends on the indivdual requirements of each service. Basically, 
we change to recordsize according to the transaction size of the 
databases and, on the filers, the performance results were best when the 
recordsize was a bit lower than the average file size (average file size 
is 12K, so I set a recordsize of 8K). I set a vdev cache size of 8K and 
our databases worked best with a vq_max_pending of 32. ZFSv3 was used, 
that's the version which is shipped with Solaris 10 11/06.

> It is a problem if your apps hangs waiting for you to power down/pull out the 
> drive!
> Almost in a time=money environment :)
>
>   
Yes, but why doesn't your application fail over to a standby? I'm also 
working in a "time is money and failure no option" environment, and I 
doubt I would sleep better if I  were responsible for an application 
under such a service level agreement without full high availability. If 
a system reboot can be a single point of failure, what about the network 
infrastructure? Hardware errors? Or power outages?
I'm definitely NOT some kind of know-it-all, don't misunderstand me. 
Your statement just let my alarm bells ring and that's why I'm asking.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963 
[EMAIL PROTECTED] - http://web.de/

1&1 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, 
Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss 
Aufsichtsratsvorsitzender: Michael Scheeren

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O freeze after a disk failure

Reply via email to