[EMAIL PROTECTED] wrote:

>       War wounds?  Could you please expand on the why a bit more?



- ZFS is not aware of AVS. On the secondary node, you'll always have to 
force the `zfs import` due to the unnoticed changes of metadata (zpool 
in use). No mechanism to prevent data loss exists, e.g. zpools can be 
imported when the replicator is *not* in logging mode.

- AVS is not ZFS aware. For instance, if ZFS resilves a mirrored disk, 
e.g. after replacing a drive, the complete disk is sent over the network 
to the secondary node, even though the replicated data on the secondary 
is intact.
That's a lot of fun with today's disk sizes of 750 GB and 1 TB drives, 
resulting in usually 10+ hours without real redundancy (customers who 
use Thumpers to store important data usually don't have the budget to
connect their data centers with 10 Gbit/s, so expect 10+ hours *per disk*).

- ZFS & AVS & X4500 leads to a bad error handling. The Zpool may not be 
imported on the secondary node during the replication. The X4500 does 
not have a RAID controller which signals (and handles) drive faults. 
Drive failures on the secondary node may happen unnoticed until the 
primary nodes goes down and you want to import the zpool on the 
secondary node with the broken drive. Since ZFS doesn't offer a recovery 
mechanism like fsck, data loss of up to 20 TB may occur.
If you use AVS with ZFS, make sure that you have a storage which handles 
drive failures without OS interaction.

- 5 hours for scrubbing a 1 TB drive. If you're lucky. Up to 48 drives 
in total.

- An X4500 has no battery buffered write cache. ZFS uses the server's 
RAM as a cache, 15 GB+. I don't want to find out how much time a 
resilver over the network after a power outage may take (a full reverse 
replication would take up to 2 weeks and is no valid option in a serious 
production environment). But the underlying question I asked myself is 
why I should I want to replicate data in such an expensive way, when I 
think the 48 TB data itself are not important enough to be protected by 
a battery?


- I gave AVS a set of 6 drives just for the bitmaps (using SVM soft 
partitions). Weren't enough, the replication was still very slow, 
probably because of an insane amount of head movements, and scales
badly. Putting the bitmap of a drive on the drive itself (if I remember 
correctly, this is recommended in one of the most referenced howto blog 
articles) is a bad idea. Always use ZFS on whole disks, if performance 
and caching matters to you.

- AVS seems to require an additional shared storage when building 
failover clusters with 48 TB of internal storage. That may be hard to 
explain to the customer. But I'm not 100% sure about this, because I 
just didn't find a way, I didn't ask on a mailing list for help.


If you want a fail-over solution for important data, use the external 
JBODs. Use AVS only to mirror complete clusters, don't use it to 
replicate single boxes with local drives. And, in case OpenSolaris is 
not an option for you due to your company policies or support contracts, 
building a real cluster also A LOT cheaper.


-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

1&1 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to