On 29.07.09 15:18, Markus Kovero wrote:
I recently noticed that importing larger pools that are occupied by large
amounts of data can do zpool import for several hours while zpool iostat only
showing some random reads now and then and iostat -xen showing quite busy disk
usage, It's almost it goes thru every bit in pool before it goes thru.
Somebody said that zpool import got faster on snv118, but I don't have real
information on that yet.
This had nothing to do with speed of 'zpool import'. There was corrupted
pool-wide metadata block that prevented pool from importing successfully.
Fortunately enough, we found better previous state a few txgs back with txg
2683802 (last synced was 2682802:
#zdb -e -ubbcsL -t 2682802 data1
...
4.25K 19.9M 8.62M 25.8M 6.08K 2.31 0.00 SPA space map
1 128K 128K 128K 128K 1.00 0.00 ZIL intent log
1.77K 28.4M 8.48M 17.3M 9.8K 3.35 0.00 DMU dnode
2 2K 1K 2.50K 1.25K 2.00 0.00 DMU objset
- - - - - - - DSL directory
2 1K 1K 3.00K 1.50K 1.00 0.00 DSL directory child map
1 512 512 1.50K 1.50K 1.00 0.00 DSL dataset snap map
2 1K 1K 3.00K 1.50K 1.00 0.00 DSL props
- - - - - - - DSL dataset
- - - - - - - ZFS znode
- - - - - - - ZFS V0 ACL
46.3M 5.74T 5.74T 5.74T 127K 1.00 100.00 ZFS plain file
1.87K 9.04M 2.75M 5.50M 2.94K 3.29 0.00 ZFS directory
1 512 512 1K 1K 1.00 0.00 ZFS master node
1 512 512 1K 1K 1.00 0.00 ZFS delete queue
- - - - - - - zvol object
- - - - - - - zvol prop
- - - - - - - other uint8[]
- - - - - - - other uint64[]
- - - - - - - other ZAP
- - - - - - - persistent error log
1 128K 4.50K 13.5K 13.5K 28.44 0.00 SPA history
- - - - - - - SPA history offsets
- - - - - - - Pool properties
- - - - - - - DSL permissions
- - - - - - - ZFS ACL
- - - - - - - ZFS SYSACL
- - - - - - - FUID table
- - - - - - - FUID table size
- - - - - - - DSL dataset next clones
- - - - - - - scrub work queue
46.3M 5.74T 5.74T 5.74T 127K 1.00 100.00 Total
capacity operations bandwidth ---- errors ----
description used avail read write read write read write cksum
data1 5.74T 6.99T 523 0 65.1M 0 0 0 1
/dev/dsk/c14t0d0 5.74T 6.99T 523 0 65.1M 0 0 0 17
So we reactivated it and were able to import pool just fine. Subsequent scrub
did find couple of errors in metadata. There were no user data error at all:
# zpool status -v data1
pool: data1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 12h43m with 0 errors on Thu Aug 6 06:00:11 2009
config:
NAME STATE READ WRITE CKSUM
data1 ONLINE 0 0 0
c14t0d0 ONLINE 0 0 2 12K repaired
errors: No known data errors
Upcoming zpool recovery support is going to help perform this kind of recovery
in user-friendlier and more automated way.
Btw, pool was originally created on FreeBSD, but we performed recovery on
Solaris. Pavel said that he was going to stay on OpenSolaris as he learned a lot
about it along the way ;-)
Cheers,
Victor
Yours
Markus Kovero
-----Original Message-----
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Victor Latushkin
Sent: 29. heinäkuuta 2009 14:05
To: Pavel Kovalenko
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] zpool import hungs up forever...
On 29.07.09 14:42, Pavel Kovalenko wrote:
fortunately, after several hours terminal went back -->
# zdb -e data1
Uberblock
magic = 0000000000bab10c
version = 6
txg = 2682808
guid_sum = 14250651627001887594
timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009
Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects
Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects
capacity operations bandwidth ---- errors ----
description used avail read write read write read write cksum
data1 5.74T 6.99T 772 0 96.0M 0 0 0 91
/dev/dsk/c14t0d0 5.74T 6.99T 772 0 96.0M 0 0 0 223
#
So we know that there are some checksum errors there but at least zdb
was able to open pool in read-only mode.
i've tried to run zdb -e -t 2682807 data1
and
#echo "0t::pid2proc|::walk thread|::findstack -v" | mdb -k
This is wrong - you need to put PID of the 'zpool import data1' process
right after '0t'.
and
#fmdump -eV
shows checksum errors, such as
Jul 28 2009 11:17:35.386268381 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x1baa23c52ce01c01
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x578154df5f3260c0
vdev = 0x6e4327476e17daaa
(end detector)
pool = data1
pool_guid = 0x578154df5f3260c0
pool_context = 2
pool_failmode = wait
vdev_guid = 0x6e4327476e17daaa
vdev_type = disk
vdev_path = /dev/dsk/c14t0d0p0
vdev_devid = id1,s...@n2661000612646364/q
parent_guid = 0x578154df5f3260c0
parent_type = root
zio_err = 50
zio_offset = 0x2313d58000
zio_size = 0x4000
zio_objset = 0x0
zio_object = 0xc
zio_level = 0
zio_blkid = 0x0
__ttl = 0x1
__tod = 0x4a6ea60f 0x1705fcdd
This tells us that object 0xc in metabjset (objset 0x0) is corrupted.
So to get more details you can do the following:
zdb -e -dddd data1
zdb -e -bbcs data1
victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
--
Victor Latushkin phone: x11467 / +74959370467
TSC-Kernel EMEA mobile: +78957693012
Sun Services, Moscow blog: http://blogs.sun.com/vlatushkin
Sun Microsystems
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss