Hi,
I'm struggling to get a stable ZFS replication using Solaris 10 110/06
(actual patches) and AVS 4.0 for several weeks now. We tried it on
VMware first and ended up in kernel panics en masse (yes, we read Jim
Dunham's blog articles :-). Now we try on the real thing, two X4500
servers. Well, I have no trouble replicating our kernel panics there,
too ... but I think I learned some important things, too. But one
problem is still remaining.
I have a zpool on host A. Replication to host B works fine.
* zpool export tank on the primary - works.
* sndradm -d on both servers - works (paranoia mode)
* zpool import id on the secondary - works.
So far, so good. I chance the contents of the file system, add some
files, delete some others ... no problems. The secondary is in
production use now, everything is fine.
Okay, let's imagine I switched to the secodary host because had a
problem with the primary. Now it's repaired, now I want my redundancy back.
* sndradm -E -f on both hosts - works.
* sndradm -u -r on the primary for refreshing the primary - works.
`nicstat` shows me a bit of traffic.
Good, let's switch back to the primary. Actual status: zpool is imported
on the secondary and NOT imported on the primary.
* zpool export tank on the secondary - *kernel panic*
Sadly, the machine dies fast, I don't see the kernel panic with `dmesg`.
And disabling the replication again later and mounting the zpool on the
primary again shows me that the update sync didn't take place, the file
system changes I did on the secondary wren't replicated. Exporting the
zpool on the secondary works *after* the system rebooted.
I uses slices for the zpool, not LUNs, because I think many of my
problems were caused by exclusive locking, but it doesn't help with this
one.
Questions:
a) I don't understand why the kernel panics at the moment. the zpool
isn't mounted on both systems, the zpool itself seems to be fine after a
reboot ... and switching the primary and secondary hosts just for
resyncing seems to force a full sync, which isn't an option.
b) I'll try a sndradm -m -r the next time ... but I'm not sure if I
like that thought. I would accept this if I replaced the primary host
with another server, but having to do a 24 TB full sync just because the
replication itself had been disabled for a few minutes would be hard to
swallow. Or did I do something wrong?
c) What performance can I expect from a X4500, 40 disks zpool, when
using slices, compared to LUNs? Any experiences?
And another thing: I did some experiments with zvols, because I wanted
to make desasters and the AVS configuration itself easier to handle -
there won't be a full sync after replacing a disk because AVS doesn't
see that a hot spare is being used, and hot spares won't be replicated
to the secondary host as well although the original drive on the
secondary never failed. I used the zvol with UFS and this kind of
hardware RAID controller emulation by ZFS works pretty well, just the
performance went down the cliff. Sunsolve told me that this is a
flushing problem and there's a workaround in Nevada build 53 and higher.
Has somebody done a comparison, can you share some experiences? I only
have a few days left and I don't waste time on installing Nevada for
nothing ...
Thanks,
Ralf
--
Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA
Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/
11 Internet AG
Brauerstraße 48
76135 Karlsruhe
Amtsgericht Montabaur HRB 6484
Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger,
Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss