Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-15 Thread Nathan Hand
I don't know if this is relevant or merely a coincidence but the zdb command 
fails an assertion in the same txg_wait_synced function.

r...@opensolaris:~# zdb -p /mnt -e zones 
Assertion failed: tx->tx_threads == 2, file ../../../uts/common/fs/zfs/txg.c, 
line 423, function txg_wait_synced
Abort (core dumped)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-15 Thread Nathan Hand
I've had some success.

I started with the ZFS on-disk format PDF.

http://opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf

The uberblocks all have magic value 0x00bab10c. Used od -x to find that value 
in the vdev.

r...@opensolaris:~# od -A x -x /mnt/zpool.zones | grep "b10c 00ba"
02 b10c 00ba   0004   
020400 b10c 00ba   0004   
020800 b10c 00ba   0004   
020c00 b10c 00ba   0004   
021000 b10c 00ba   0004   
021400 b10c 00ba   0004   
021800 b10c 00ba   0004   
021c00 b10c 00ba   0004   
022000 b10c 00ba   0004   
022400 b10c 00ba   0004   
...

So the uberblock array begins 128kB into the vdev and there's an uberblock 
every 1kb.

To identify the active uberblock I used zdb.

r...@kestrel:/opt$ zdb -U -uuuv zones
Uberblock
magic = 00bab10c
version = 4
txg = 1504158 (= 0x16F39E) 
guid_sum = 10365405068077835008 = (0x8FD950FDBBD02300)
timestamp = 1229142108 UTC = Sat Dec 13 15:21:48 2008 = (0x4943385C)
rootbp = [L0 DMU objset] 400L/200P DVA[0]=<0:52e3edc00:200> 
DVA[1]=<0:6f9c1d600:200> DVA[2]=<0:16e280400:200> fletcher4 lzjb LE contiguous 
birth=1504158 fill=172 cksum=b0a5275f3:474e0ed6469:e993ed9bee4d:205661fa1d4016

I spy those hex values at the uberblock starting 027800.

027800 b10c 00ba   0004   
027810 f39e 0016   2300 bbd0 50fd 8fd9
027820 385c 4943   0001   
027830 1f6e 0297   0001   
027840 e0eb 037c   0001   
027850 1402 00b7   0001  0703 800b
027860        
027870     f39e 0016  
027880 00ac    75f3 0a52 000b 
027890 6469 e0ed 0474  ee4d ed9b e993 
0278a0 4016 fa1d 5661 0020    
0278b0        

Breaking it down

* the first 8 bytes are the magic uberblock number (b10c 00ba  )
* the second 8 bytes are the version number (0004   )
* the third 8 bytes are the transaction group a.k.a txg (f39e 0016  )
* the fourth 8 bytes are the guid sum (2300 bbd0 50fd 8fd9)
* the fifth 8 bytes are the timestamp (385c 4943  )

The remainder of the bytes are the "blkptr" structure and I'll ignore them.

Those values match the active uberblock exactly, so I know this is the on-disk 
location of the first active uberblock.

Scanning further I find an exact duplicate 256kB later in the device.

067800 b10c 00ba   0004   
067810 f39e 0016   2300 bbd0 50fd 8fd9
067820 385c 4943   0001   
067830 1f6e 0297   0001   
067840 e0eb 037c   0001   
067850 1402 00b7   0001  0703 800b
067860        
067870     f39e 0016  
067880 00ac    75f3 0a52 000b 
067890 6469 e0ed 0474  ee4d ed9b e993 
0678a0 4016 fa1d 5661 0020    
0678b0        

I know ZPOOL keeps four copies of the label; two at the front and two at the 
back, each 256kB in size.

r...@opensolaris:~# ls -l /mnt/zpool.zones 
-rw-r--r-- 1 root root 42949672960 Dec 15 04:49 /mnt/zpool.zones

That's 0xA = 42949672960 = 41943040kB. If I subtract 512kB I should see 
the third and fourth labels.

r...@opensolaris:~# dd if=/mnt/zpool.zones bs=1k skip=41942528 | od -A x -x | 
grep "385c 4943  "
027820 385c 4943   0001   
512+0 records in
512+0 records out
524288 bytes (524 kB) copied, 0.0577013 s, 9.1 MB/s
r...@opensolaris:~# 

Oddly enough I see the third uberblock at 0x27800 but the fourth uberblock at 
0x67800 is missing. Perhaps corrupted?

No matter. I now work out the exact offsets to the three valid uberblocks and 
confirm I'm looking at the right uberblocks.

r...@opensolaris:~# dd if=/mnt/zpool.zones bs=1k skip=158 | od -A x -x | head -3
00 b10c 00ba   0004   
10 f39e 0016   2300 bbd0 50fd 8fd9
20 385c 4943   0001   
r...@opensolaris:~# dd if=/mnt/zpool.zones bs=1k skip=414 | od -A x -x | head -3
00 b10c 00ba   0004   
10 f39e 0016   2300 bbd0 50fd 8fd9
20 385c 4943   0001   
r...@opensolaris:~# dd if=/mnt/zpool.zones bs=1k skip=41942686 | od -A x -x | 
head -3
00 b10c 00ba   0004   
10 f39e 0016   2300 bbd0 50fd 8fd9
20 385c 4943   0001   

They all have the same timestamp. I'm looking at the correct uberblocks. Now I 
intentionally harm them.

r...@opensolaris:/mnt# dd if=/dev/zero of=/mnt/zpool.zones bs=1k seek=158 
count=1 conv=notrunc
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.00031522

Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-15 Thread Kees Nuyt
On Mon, 15 Dec 2008 14:23:37 PST, Nathan Hand wrote:

[snip]

> Initial inspection of the filesystems are promising.
> I can read from files, there are no panics, 
> everything seems to be intact.

Good work, congratulations, and thanks for the clear
description of the process. I hope I never need it.
Now one wonders why zfs doesn't have a rescue like that
built-in...
-- 
  (  Kees Nuyt
  )
c[_]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-16 Thread Ross
I know Eric mentioned the possibility of zpool import doing more of this kind 
of thing, and he said that it's current inability to do this will be fixed, but 
I don't know if it's an official project, RFE or bug.  Can anybody shed some 
light on this?

See Jeff's post on Oct 10, and Eric's follow up later that day in this thread:
http://opensolaris.org/jive/thread.jspa?messageID=289537񆬁
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-16 Thread Nils Goroll
Well done, Nathan, thank you taking on the additional effort to write it all up.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-16 Thread A Darren Dunham
On Tue, Dec 16, 2008 at 12:07:52PM +, Ross Smith wrote:
> It sounds to me like there are several potentially valid filesystem
> uberblocks available, am I understanding this right?
> 
> 1. There are four copies of the current uberblock.  Any one of these
> should be enough to load your pool with no data loss.
> 
> 2. There are also a few (would love to know how many) previous
> uberblocks which will point to a consistent filesystem, but with some
> data loss.

My memory is that someone on this list said "3" in response to a
question I had about it.  I looked through the archives and couldn't
come up with the post.  It was over a year ago.

> 3. Failing that, the system could be rolled back to any snapshot
> uberblock.  Any data saved since that snapshot will be lost.

What is a "snapshot uberblock"?  The uberblock points to the entire
tree: live data, snapshots, clones, etc.  If you don't have a valid
uberblock, you don't have any snapshots.

> Is there any chance at all of automated tools that can take advantage
> of all of these for pool recovery?

I'm sure there is.  In addition, I think there needs to be more that can
be done non-destructively.  Any successful import is read-write,
potentially destroying other information.   It would be nice to get "df"
or "zfs list" information so you could make a decision about using an
older uberblock.  Even better would be a read-only (at the pool level)
mount so the data could be directly examined.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-16 Thread Casper . Dik


>When current uber-block A is detected to point to a corrupted on-disk data,
>how would "zpool import" (or any other tool for that matter) quickly and
>safely know that, once it found an older uber-block "B" that it points to a
>set of blocks which does not include any blocks that has since been freed
>and re-allocated and, thus, corrupted?  Eg, without scanning the entire
>on-disk structure?

Without a scrub, you mean?

Not possible, except the first few uberblocks (blocks aren't used until a 
few uberblocks later)

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-16 Thread Ross Smith
It sounds to me like there are several potentially valid filesystem
uberblocks available, am I understanding this right?

1. There are four copies of the current uberblock.  Any one of these
should be enough to load your pool with no data loss.

2. There are also a few (would love to know how many) previous
uberblocks which will point to a consistent filesystem, but with some
data loss.

3. Failing that, the system could be rolled back to any snapshot
uberblock.  Any data saved since that snapshot will be lost.

Is there any chance at all of automated tools that can take advantage
of all of these for pool recovery?




On Tue, Dec 16, 2008 at 11:55 AM, Johan Hartzenberg  wrote:
>
> On Tue, Dec 16, 2008 at 1:43 PM,  wrote:
>>
>>
>> >When current uber-block A is detected to point to a corrupted on-disk
>> > data,
>> >how would "zpool import" (or any other tool for that matter) quickly and
>> >safely know that, once it found an older uber-block "B" that it points to
>> > a
>> >set of blocks which does not include any blocks that has since been freed
>> >and re-allocated and, thus, corrupted?  Eg, without scanning the entire
>> >on-disk structure?
>>
>> Without a scrub, you mean?
>>
>> Not possible, except the first few uberblocks (blocks aren't used until a
>> few uberblocks later)
>>
>> Casper
>
> Does that mean that each of the last "few-minus-1" uberblocks point to a
> consistent version of the file system? Does "few" have a definition?
>
>
>
> --
> Any sufficiently advanced technology is indistinguishable from magic.
>Arthur C. Clarke
>
> My blog: http://initialprogramload.blogspot.com
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-16 Thread Johan Hartzenberg
On Tue, Dec 16, 2008 at 1:43 PM,  wrote:

>
>
> >When current uber-block A is detected to point to a corrupted on-disk
> data,
> >how would "zpool import" (or any other tool for that matter) quickly and
> >safely know that, once it found an older uber-block "B" that it points to
> a
> >set of blocks which does not include any blocks that has since been freed
> >and re-allocated and, thus, corrupted?  Eg, without scanning the entire
> >on-disk structure?
>
> Without a scrub, you mean?
>
> Not possible, except the first few uberblocks (blocks aren't used until a
> few uberblocks later)
>
> Casper
>

Does that mean that each of the last "few-minus-1" uberblocks point to a
consistent version of the file system? Does "few" have a definition?



-- 
Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog: http://initialprogramload.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-16 Thread Johan Hartzenberg
On Tue, Dec 16, 2008 at 11:39 AM, Ross  wrote:

> I know Eric mentioned the possibility of zpool import doing more of this
> kind of thing, and he said that it's current inability to do this will be
> fixed, but I don't know if it's an official project, RFE or bug.  Can
> anybody shed some light on this?
>
> See Jeff's post on Oct 10, and Eric's follow up later that day in this
> thread:
> http://opensolaris.org/jive/thread.jspa?messageID=289537񆬁
> --
>
>

When current uber-block A is detected to point to a corrupted on-disk data,
how would "zpool import" (or any other tool for that matter) quickly and
safely know that, once it found an older uber-block "B" that it points to a
set of blocks which does not include any blocks that has since been freed
and re-allocated and, thus, corrupted?  Eg, without scanning the entire
on-disk structure?



-- 
Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog: http://initialprogramload.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2009-08-02 Thread Stephen Pflaum
Does anyone know the correct syntax to use the zdb command on a 
/dev/dsk/c6t0d0s2 

I'm trying to determine the active uberblock on an attached USB drive.



> To identify the active uberblock I used zdb.
> 
> r...@kestrel:/opt$ zdb -U -uuuv zones
> Uberblock
> magic = 00bab10c
> version = 4
> txg = 1504158 (= 0x16F39E) 
> guid_sum = 10365405068077835008 =
> (0x8FD950FDBBD02300)
> timestamp = 1229142108 UTC = Sat Dec 13
>  15:21:48 2008 = (0x4943385C)
> rootbp = [L0 DMU objset] 400L/200P
> DVA[0]=<0:52e3edc00:200> DVA[1]=<0:6f9c1d600:200>
> DVA[2]=<0:16e280400:200> fletcher4 lzjb LE
> contiguous birth=1504158 fill=172
>  cksum=b0a5275f3:474e0ed6469:e993ed9bee4d:205661fa1d40
> 6
> 
> I spy those hex values at the uberblock starting
> 027800.


Does anyone know the correct syntax to use the zdb command on a 
/dev/dsk/c6t0d0s2 

I'm trying to determine the active uberblock on an attached USB drive.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2009-11-16 Thread Martin Vool
You might want to check out this thread:

http://opensolaris.org/jive/thread.jspa?messageID=435420
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss