On 9/3/07, Dale Ghent <[EMAIL PROTECTED]> wrote:
>
> I saw a putback this past week from M. Maybee regarding this, but I
> thought I'd post here that I saw what is apparently an incarnation of
> 6569719 on a production box running s10u3 x86 w/ latest (on
> sunsolve) patches. I have 3 other servers configured the same way WRT
> work load, zfs pools and hardware resources, so if this occurs again
> I'll see about logging a case and getting a relief patch. Anyhow,
> perhaps a backport to s10 may be in order
[note: the patches I mention are s10 sparc specific. Translation to
x86 required.]
As of a few weeks ago s10u3 with latest patches did not have this
problem for me, but s10u4 beta and snv69 did. My situation was on
sun4v, not i386. More specifically:
S10 118833-36, 118833-07, 118833-10:
# zpool import
pool: zfs
id: 679728171331086542
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: http://www.sun.com/msg/ZFS-8000-5E
config:
zfs FAULTED corrupted data
c0d1s3FAULTED corrupted data
snv_69, s10u4beta:
Boot device: /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]:dhcp
File and args: -s
SunOS Release 5.11 Version snv_69 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Booting to milestone "milestone/single-user:default".
Configuring /dev
Using DHCP for network configuration information.
Requesting System Maintenance Mode
SINGLE USER MODE
# zpool import
panic[cpu0]/thread=300028943a0: dangling dbufs (dn=3000392dbe0,
dbuf=3000392be08)
02a10076f270 zfs:dnode_evict_dbufs+188 (3000392dbe0, 0, 1, 1,
2a10076f320, 7b729000)
%l0-3: 03000392ddf0 03000392ddf8
%l4-7: 02a10076f320 0001 03000392bf20 0003
02a10076f3e0 zfs:dmu_objset_evict_dbufs+100 (2, 0, 0, 7b722800, 0,
3516900)
%l0-3: 7b72ac00 7b724510 7b724400 03516a70
%l4-7: 03000392dbe0 03516968 7b7228c1 0001
...
Sun offered me an IDR against 125100-07, but since I could not
reproduce the problem on that kernel, I never tested it. This does
imply that they think there is a dangling dbufs problem in 125100-07
that they think they have a fixed for support-paying customers.
Perhaps this is the problem and related solution that you would be
interested in.
The interesting thing with my case is that the backing store for this
device is a file on a ZFS file system, served up has a virtual disk in
an LDOM. From the primary LDOM, there is no corruption. An
unexpected reset (panic, I believe) of the primary LDOM seems to have
caused the corruption in the guest LDOM. What was that about having
the redundancy as close to the consumer as possible? :)
--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss