Re: [zfs-discuss] How to diagnose zfs - iscsi - nfs hang

2008-12-03 Thread [EMAIL PROTECTED]
Hi Blake,

Blake Irvin wrote:
> I am directly on the console.  cde-login is disabled, so i'm dealing 
> with direct entry.
>  
>>Are you directly on the console, or is the console on
>> a serial port?  If you are
>> running over X windows, the input might still get in,
>> but X may not be displaying.
>> If keyboard input is not getting in, your machine is
>> probably wedged at a high
>> level interrupt, which sounds doubtful based on your
>> problem description.
>> 
> Out of curiosity, why do you say that?  I'm no expert on interrupts, 
> so I'm curious.  It DOES seem that keyboard entry is ignored in this 
> situation, since I see no results from ctrl-c, for example (I had left 
> the console running 'tail -f /var/adm/messages'.  I'm not saying your 
> are wrong, but if I should be examining interrupt issues, I'd like to 
> know (I have 3 hard disk controllers in the box, for example...)
>   
Typing ctrl-c, and having process killed because of it are 2 different 
actions.
The interpretation of ctrl-c as a kill character is done in a streams 
module
(ldterm, I believe).  This is not done at the device interrupt handler.  
I doubt
you need to examine interrupts.  I was only saying that you could try 
what I
recommended to get a dump.  The f1-a is handled at the driver during 
interrupt
handling, so it should get processed.
I have done this many times, so I am sure it works.

>>   If the deadman timer does not trigger, the clock is
>> almost certainly running, and your machine is
>> almost certainly accepting keyboard input.
>> 
> That's good to know.  I just enabled deadman after the last freeze, so 
> it will be a bit before I can test this (hope I don't have to).
>
> thanks!
> Blake
>
>  
>> Good luck,
>> max
>> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to diagnose zfs - iscsi - nfs hang

2008-12-03 Thread [EMAIL PROTECTED]
Hi Blake,

Blake Irvin wrote:
> Thanks - however, the machine hangs and doesn't even accept console input 
> when this occurs.  I can't get into the kernel debugger in these cases.
>   
Are you directly on the console, or is the console on a serial port?  If 
you are
running over X windows, the input might still get in, but X may not be 
displaying.
If keyboard input is not getting in, your machine is probably wedged at 
a high
level interrupt, which sounds doubtful based on your problem description.

> I've enabled the deadman timer instead.  I'm also using the automatic 
> snapshot service to get a look at things like /var/adm/sa/sa** files that get 
> overwritten after a hard reset.
>   
If the deadman timer does not trigger, the clock is almost certainly 
running, and your machine is
almost certainly accepting keyboard input.

Good luck,
max

> I'm just going to stay up late tonight and see what happens :)
>
> Blake
>
>
>
>
>   
>> Hi Blake,
>>
>> Blake Irvin wrote:
>> 
>>> I'm having a very similar issue.  Just updated to
>>>   
>> 10 u6 and upgrade my zpools.  They are fine (all
>> 3-way mirors), but I've lost the machine around
>> 12:30am two nights in a row.
>> 
>>> What I'd really like is a way to force a core dump
>>>   
>> when the machine hangs like this.  scat is a very
>> nifty tool for debugging such things - but I'm not
>> getting a core or panic or anything :(
>> 
>>>   
>>>   
>> You can force a dump.  Here are the steps:
>>
>> Before the system is hung:
>>
>> # mdb -K -F   <-- this will load kmdb and drop into
>> it
>>
>> Don't worry if your system now seems hung.
>> Type, carefully, with no typos:
>>
>> :c   <-- and carriage-return.  You should get your
>> prompt back
>>
>> Now, when the system is hung, type F1-a  (that's
>> function key f1 and the 
>> "a" key together.
>> This should put you into kmdb.  Now, type, (again, no
>> typos):
>>
>> $>
>> This should give you a panic dump, followed by
>> reboot,  (unless your 
>> system is hard-hung).
>>
>> max
>>
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
>> ss
>> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to diagnose zfs - iscsi - nfs hang

2008-12-03 Thread [EMAIL PROTECTED]
Hi Blake,

Blake Irvin wrote:
> I'm having a very similar issue.  Just updated to 10 u6 and upgrade my 
> zpools.  They are fine (all 3-way mirors), but I've lost the machine around 
> 12:30am two nights in a row.
>
>
> What I'd really like is a way to force a core dump when the machine hangs 
> like this.  scat is a very nifty tool for debugging such things - but I'm not 
> getting a core or panic or anything :(
>   
You can force a dump.  Here are the steps:

Before the system is hung:

# mdb -K -F   <-- this will load kmdb and drop into it

Don't worry if your system now seems hung.
Type, carefully, with no typos:

:c   <-- and carriage-return.  You should get your prompt back

Now, when the system is hung, type F1-a  (that's function key f1 and the 
"a" key together.
This should put you into kmdb.  Now, type, (again, no typos):

$http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [website-discuss] zdb to dump data

2008-10-31 Thread [EMAIL PROTECTED]
Hi Derek,
Derek Cicero wrote:
> Victor Latushkin wrote:
>> [EMAIL PROTECTED] пишет:
>>> Hi,
>>> Victor Latushkin wrote:
>>>>

>>> I have decided to file an RFE so that zdb with the -R option will
>>> allow one to decompress data before dumping it. I have had this
>>> implemented for several months now, and was told that a way to get it
>>> into opensolaris was to file an RFE. However, when I go to file the
>>> RFE, after typing in the information and hitting "send", I am
>>> getting: >
>>>
>>> Not Found
>>>
>>> The requested URL /bug/os was not found on this server.
>>>
>>>  
>>>
>>> Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.8a DAV/2 proxy_html/2.5
>>> mod_jk/1.2.15 Server at www.opensolaris.org Port 80
>>>
>>> Any ideas?
>>
> So you are unable to submit a bug?
>
> When did this happen (date/time)?
>
> I will check the logs.
>
> Derek
>
I have decided not to file an RFE, as Victor points out that it was 
already filed.
(I searched for keyword zfs, and decompress in text.  Victor suggested I try
zfs and decompress both in text).  However, this error occurred this morning
(last night, California time around 11:45pm, October 30).
thanks,
max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zdb to dump data

2008-10-31 Thread [EMAIL PROTECTED]
Victor Latushkin wrote:
> [EMAIL PROTECTED] пишет:
>> Hi,
>> Victor Latushkin wrote:
>>> Hi Ben,
>>>
>>> Ben Rockwood пишет:
>>>  
>>>> Is there some hidden way to coax zdb into not just displaying data
>>>> based on a given DVA but rather to dump it in raw usable form?
>>>>
>>>> I've got a pool with large amounts of corruption.  Several
>>>> directories are toast and I get "I/O Error" when trying to enter or
>>>> read the directory... however I can read the directory and files
>>>> using ZDB, if I could just dump it in a raw format I could do
>>>> recovery that way.
>>>>
>>>> To be clear, I've already recovered from the situation, this is
>>>> purely an academic "can I do it" exercise for the sake of learning.
>>>>
>>>> If ZDB can't do it, I'd assume I'd have to write some code to read
>>>> based on DVA.  Maybe I could write a little tool for it.
>>>> 
>>>
>>> zdb -R can read raw data blocks from the pool if flag 'r' is used, 
>>> so if you can identify list of a blocks comprising some file, you 
>>> can feed it to zdb -R.
>>>
>>>   
>> I have decided to file an RFE so that zdb with the -R option will
>> allow one to decompress data before dumping it. I have had this
>> implemented for several months now, and was told that a way to get it
>> into opensolaris was to file an RFE. However, when I go to file the 
>> RFE, after typing in the information and hitting "send", I am
>> getting: >
>>
>>  Not Found
>>
>> The requested URL /bug/os was not found on this server.
>>
>> 
>> Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.8a DAV/2 
>> proxy_html/2.5 mod_jk/1.2.15 Server at www.opensolaris.org Port 80
>>
>> Any ideas?
>
> I have no idea whatmay be wrong with web-site, but there's already RFE
>
> 6757444 want zdb -R to support decompression, checksumming and raid-z
>
> regards,
> victor
>
Thanks Victor.  I guess searching the bug/rfe site for zdb is not the 
correct way to find
this.  I looked for keyword: zdb and text: decompress
and got no hits.  Maybe I should have used google...

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zdb to dump data

2008-10-31 Thread [EMAIL PROTECTED]
Hi,
Victor Latushkin wrote:
> Hi Ben,
>
> Ben Rockwood пишет:
>   
>> Is there some hidden way to coax zdb into not just displaying data
>> based on a given DVA but rather to dump it in raw usable form?
>>
>> I've got a pool with large amounts of corruption.  Several
>> directories are toast and I get "I/O Error" when trying to enter or
>> read the directory... however I can read the directory and files
>> using ZDB, if I could just dump it in a raw format I could do
>> recovery that way.
>>
>> To be clear, I've already recovered from the situation, this is
>> purely an academic "can I do it" exercise for the sake of learning.
>>
>> If ZDB can't do it, I'd assume I'd have to write some code to read
>> based on DVA.  Maybe I could write a little tool for it.
>> 
>
> zdb -R can read raw data blocks from the pool if flag 'r' is used, so if 
> you can identify list of a blocks comprising some file, you can feed it 
> to zdb -R.
>
>   
I have decided to file an RFE so that zdb with the -R option will allow 
one to decompress
data before dumping it.  I have had this implemented for several months 
now, and was told that
a way to get it into opensolaris was to file an RFE.  However, when I go 
to file the
RFE, after typing in the information and hitting "send", I am getting:


  Not Found

The requested URL /bug/os was not found on this server.


Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.8a DAV/2 proxy_html/2.5 
mod_jk/1.2.15 Server at www.opensolaris.org Port 80

Any ideas?
thanks,
max


> See comments in zdb source (before zdb_read_block()) for exact syntax.
>
> Wbr,
> Victor
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] COW & updates [C1]

2008-10-28 Thread [EMAIL PROTECTED]
Hi Cyril,

Cyril ROUHASSIA wrote:
>
> Dear all,
> please find below test that I have run:
>
> #zdb -v unxtmpzfs3<--uberblock   for unxtmpzfs3 spool
> Uberblock
>
> magic = 00bab10c
> version = 4
> txg = 86983
> guid_sum = 9860489793107228114
> timestamp = 1225183041 UTC = Tue Oct 28 09:37:21 2008
> rootbp = [L0 DMU objset] 400L/200P DVA[0]=<1:8:200> 
> DVA[1]=<0:7ac00:200> DVA[2]=<1:18013000:200> fletcher4 lzjb BE 
> contiguous birth=86983 fill=38 
> cksum=d7e4c6e6f:508f5121f9f:f66339b469f2:2025284ff2f12d
>
>
> # echo titi >> /unxtmpzfs3/mnt1/mnt4/te1 <-- update of 
> te1 file located in the zpool
>
> #  zdb -v unxtmpzfs3  <-- uberblock  for 
> unxtmpzfs3 spool after file update
> Uberblock
>
> magic = 00bab10c
> version = 4
> txg = 87012
> guid_sum = 9860489793107228114
> timestamp = 1225183186 UTC = Tue Oct 28 09:39:46 2008
> rootbp = [L0 DMU objset] 400L/200P DVA[0]=<1:82a00:200> 
> DVA[1]=<0:7e400:200> DVA[2]=<1:18015c00:200> fletcher4 lzjb BE 
> contiguous birth=87012 fill=38 
> cksum=c3ac8e047:46e375e1c21:d272d39402da:1aaadb02468e54
>
>
>
> Conclusion is:
>
> * Because of one change to just one file, the MOS  is a brend new
>   one. Then the question is :
>
>   Is the new MOS a  whole copy of the previous one  or  does it  share 
> untouched data with the previous one and has its own copy of specific 
> data (like an update onto a regular file)?
> Indeed, I have checked the metadnode array entries  and it sounds like 
> there are few entries which are different .
A block containing changed MOS data will be new.  Other blocks of the 
MOS should be unchanged.  Of course,
any indirect (gang) blocks that need to be updated will also be new.
>
> * Is the uperblock a brend new one after the update (just 128k
>   possible uperblocks!!!)??
>
Only one is "active" at any one time.  As I recall, the 128 possible 
uberblocks are treated as
a circular array.
max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow zpool import with b98

2008-09-22 Thread Detlef [EMAIL PROTECTED]
I have no snapshots in this zpool.

On 09/22/08 16:09, Sanjeev wrote:
> Detlef,
> 
> I presume you have about 9 filesystems. How many snapshots do you have ?
> 
> Thanks and regards,
> Sanjeev.
> 
> On Mon, Sep 22, 2008 at 03:59:34PM +0200, Detlef [EMAIL PROTECTED] wrote:
>> With Nevada Build 98 I realize a slow zpool import of my pool which 
>> holds my user and archive data on my laptop.
>>
>> The first time it was realized during the boot if Solaris tells me to 
>> mount zfs filesystems (1/9) and then works for 1-2 minutes until it goes 
>> ahead. I hear the disk working but have no clue what happens here.
>> So I checked to zpool export and import, and with this import it is also 
>> slow (takes around 90 seconds to import and with b97 it took 5 seconds). 
>> Has anyone an idea what the reason could be ?
>>
>> I also had created 2 ZVOL's under one filesysystem. Now I removed the 
>> upper filesystem (and expected that zfs will also remove the both 
>> zvols). But now on zpool exports it complains about these two unknown 
>> datasets as: "dataset does not exist"
>>
>> Any comments and ideas how to "really" remove the zvols and what's the 
>> issue with slow zpool import ?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Slow zpool import with b98

2008-09-22 Thread Detlef [EMAIL PROTECTED]
With Nevada Build 98 I realize a slow zpool import of my pool which 
holds my user and archive data on my laptop.

The first time it was realized during the boot if Solaris tells me to 
mount zfs filesystems (1/9) and then works for 1-2 minutes until it goes 
ahead. I hear the disk working but have no clue what happens here.
So I checked to zpool export and import, and with this import it is also 
slow (takes around 90 seconds to import and with b97 it took 5 seconds). 
Has anyone an idea what the reason could be ?

I also had created 2 ZVOL's under one filesysystem. Now I removed the 
upper filesystem (and expected that zfs will also remove the both 
zvols). But now on zpool exports it complains about these two unknown 
datasets as: "dataset does not exist"

Any comments and ideas how to "really" remove the zvols and what's the 
issue with slow zpool import ?

Detlef
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-19 Thread [EMAIL PROTECTED]
Hi Robert, et.al.,
I have blogged about a method I used to recover a removed file from a 
zfs file system
at http://mbruning.blogspot.com.
Be forewarned, it is very long...
All comments are welcome.

max

Robert Milkowski wrote:
> Hello max,
>
> Sunday, August 17, 2008, 1:02:05 PM, you wrote:
>
> mbc> A Darren Dunham wrote:
>   
>>> If the most recent uberblock appears valid, but doesn't have useful
>>> data, I don't think there's any way currently to see what the tree of an
>>> older uberblock looks like.  It would be nice to see if that data
>>> appears valid and try to create a view that would be
>>> readable/recoverable.
>>>
>>>   
>>>   
> mbc> I have a method to examine uberblocks on disk.  Using this, along with
> mbc> my modified
> mbc> mdb and zdb, I have been able to recover a previously removed file.  
> mbc> I'll post
> mbc> details in a blog if there is interest.
>
> Of course, pleas do so.
>
>
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-17 Thread [EMAIL PROTECTED]
A Darren Dunham wrote:
>
> If the most recent uberblock appears valid, but doesn't have useful
> data, I don't think there's any way currently to see what the tree of an
> older uberblock looks like.  It would be nice to see if that data
> appears valid and try to create a view that would be
> readable/recoverable.
>
>   
I have a method to examine uberblocks on disk.  Using this, along with 
my modified
mdb and zdb, I have been able to recover a previously removed file.  
I'll post
details in a blog if there is interest.

max


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]

2008-08-12 Thread [EMAIL PROTECTED]
Darren J Moffat wrote:
> [EMAIL PROTECTED] wrote:
>   
>>> As others have noted, the COW nature of ZFS means that there is a
>>> good chance that on a mostly-empty pool, previous data is still intact
>>> long after you might think it is gone. A utility to recover such data is
>>> (IMHO) more likely to be in the category of forensic analysis than
>>> a mount (import) process. There is more than enough information
>>> publically available for someone to build such a tool (hint, hint :-)
>>>  -- richard
>>>   
>>   Veritas,  the makers if vxfs, whom I consider ZFS to be trying to
>> compete against has higher level (normal) support engineers that have
>> access to tools that let them scan the disk for inodes and other filesystem
>> fragments and recover.  When you log a support call on a faulty filesystem
>> (in one such case I was involved in zeroed out 100mb of the first portion
>> of the volume killing off both top OLT's -- bad bad) they can actually help
>> you at a very low level dig data out of the filesystem or even recover from
>> pretty nasty issues.  They can scan for inodes (marked by a magic number),
>> have utilities to pull out files from those inodes (including indirect
>> blocks/extents).  Given the tools and help from their support I was able to
>> pull back 500 gb of files (99%) from a filesystem that emc killed during a
>> botched powerpath upgrade.  Can Sun's support engineers,  or is their
>> answer pull from tape?  (hint, hint ;-)
>> 
>
> Sounds like a good topic for here:
>
> http://opensolaris.org/os/project/forensics/
>   
I took a look at this project, specifically 
http://opensolaris.org/os/project/forensics/ZFS-Forensics/.
Is there any reason that the paper and slides I presented at the 
OpenSolaris Developers Conference
on zfs on-disk format not mentioned?  The paper is at: 
http://www.osdevcon.org/2008/files/osdevcon2008-proceedings.pdf
starting on page 36, and the slides are at: 
http://www.osdevcon.org/2008/files/osdevcon2008-max.pdf

thanks,
max



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread [EMAIL PROTECTED]
Hi Simon,
Simon Breden wrote:
> Thanks Max, and the fact that rsync stresses the system less would help 
> explain why rsync works, and cp hangs. The directory was around 11GB in size.
>
> If Sun engineers are interested in this problem then I'm happy to run 
> whatever commands they give me -- after all, I have a pure goldmine here for 
> them to debug ;-) And it *is* running on a ZFS filesystem. Opportunities like 
> this don't come along every day :) Tempted? :)
>
> Well, if I can't tempt Sun, then for anyone who has the same disks, I would 
> be interested to see what happens on your machine:
> Model Number: WD7500AAKS-00RBA0
> Firmware revision: 4G30
>
> I use three of these disks in a RAIDZ1 vdev within the pool.
>  
>   
I think Rob Logan is probably correct, and there is a problem with the 
disks, not zfs.  Have you
tried this with a different file system (ufs), or multiple dd commands 
running at the same time with
the raw disks?

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread [EMAIL PROTECTED]
Hi Simon,
Simon Breden wrote:
> The plot thickens. I replaced 'cp' with 'rsync' and it worked -- I ran it a 
> few times and it didn't hang so far.
>
> So on the face of it, it appears that 'cp' is doing something that causes my 
> system to hang if the files are read from and written to the same pool, but 
> simply replacing 'cp' with 'rsync' works. Hmmm... anyone have a clue about 
> what I can do next to home in on the problem with 'cp' ?  
>
> Here is the output using 'rsync' :
>
> bash-3.2$ truss -topen rsync -a z1 z2
> open("/var/ld/ld.config", O_RDONLY)   Err#2 ENOENT
>   
The rsync command and cp command work very differently.  cp mmaps up to 
8MB of the input file and writes
from the returned address of mmap, faulting in the pages as it writes  
(unless you are a normal user on Indiana,
in which case cp is gnu's cp which reads/writes (so, why are there 2 
versions?)).  Rsync forks and sets up a socketpair between parent and
child processes then reads/writes.  It should be much slower than cp, 
and put much less stress on the disk.
It would be great to have a way to reproduce this.   I have not had any 
problems.  How large is the
directory you are copying?  Either the disk has not sent a response to 
an I/O operation, or the response was
somehow lost.  If I could reproduce the problem, I might try to dtrace 
the commands being sent to the HBA
and responses coming back...  Hopefully someone here who has experience 
with the disks you are using
will be able to help.

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Endian relevance for decoding lzjb blocks

2008-05-03 Thread [EMAIL PROTECTED]
Hi Benjamin,

Benjamin Brumaire wrote:
> I 'm trying to decode a lzjb compressed blocks and I have some hard times 
> regarding big/little endian. I'm on x86 working with build 77.
>
> #zdb - ztest
> ...
> rootbp = [L0 DMU objset] 400L/200P DVA[0]=<0:e0c98e00:200>
> ...
>
> ## zdb -R ztest:c0d1s4:e0c98e00:200:
> Found vdev: /dev/dsk/c0d1s4
>
> ztest:c0d1s4:e0c98e00:200:
>   0 1 2 3 4 5 6 7   8 9 a b c d e f  0123456789abcdef
> 00:  0003020e0a00  dd0304050020b601  .. .
> 10:  c505048404040504  35b558231002047c  |...#X.5
>
>   
Using the modified zdb, you should be able to do:

# zdb -R ztest:c0d1s4:e0c98e00:200:d,lzjb,400 2>/tmp/foo

Then you can od /tmp/foo.  I am not sure what happens if you run zdb
with a zfs file system that is different endianess from the machine on which
you are running zdb.  It may just work...
The "d:lzjb:400" says to use lzjb decompression with a logical (after 
decompression) size
of 0x400 bytes.  It dumps raw data to stderr, hence the "2>/tmp/foo".

max

> Looking at this blocks with dd:
> dd if=/dev/dsk/c0d1s4 iseek=7374023 bs=512 count=1 | od -x
> 000: 0a00 020e 0003  b601 0020 0405 dd03
>
> od -x is responsible for swapping every two bytes. I have on disk
> 000: 000a 0e02 0300  01b6 0200 0504 03dd
>
> Comparing with the zdb output is every 8 bytes reversed.
>
> Now I don't know how to pass this to my lzjb decoding programm?
>
> Should I read the 512 bytes and pass them:
>- from the end
>- from the start and reverse every 8 bytes
>- or something else
>
> thanks for any advice
>
> bbr
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread [EMAIL PROTECTED]
Simon Breden wrote:
> set sata:sata_max_queue_depth = 0x1
>
> =
>
> Anyway, after adding the line above into /etc/system, I rebooted and then 
> re-tried the copy with truss:
>
> truss cp -r testdir z4
>
> It seems to hang on random files -- so it's not always the same file that it 
> hangs on.
>
> On this particular run here are the last few lines of truss output, although 
> they're probably not useful:
>   
Hi Simon,
Try with:

truss -topen cp -r testdir z4

This will only show you the files being opened.  The last file opened in 
testdir is the one it is hanging on.
(Unless it is hanging in getdents(2), but I don't think so based on the 
kernel stacktrace).
But, if it is hanging on random files, this is not going to help either.
How long do you wait before deciding it's hung?  I think usually you 
should get
console output saying I/O has been retried if the device does not 
respond to a
previously sent I/O.

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon,
Simon Breden wrote:
>
> Thanks for your advice Max, and here is my reply to your suggestion:
>
>
> # mdb -k
> Loading modules: [ unix genunix specfs dtrace cpu.generic 
> cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs ip hook neti sctp arp usba 
> s1394 nca lofs zfs random md sppp smbsrv nfs ptm ipc crypto ]
>   
>> ::pgrep cp
>> 
> SPID   PPID   PGIDSIDUID  FLAGS ADDR NAME
> R889868889868501 0x4a004000 ff01deca9048 cp
>   
>> ff01deca9048::walk thread | ::threadlist -v
>> 
> ADDR PROC  LWP CLS PRIWCHAN
> ff01e0045840 ff01deca9048 ff01de9d9210   2  60 ff01d861ca80
>   PC: _resume_from_idle+0xf1CMD: cp -pr testdir z1
>   stack pointer for thread ff01e0045840: ff0007fcdf00
>   [ ff0007fcdf00 _resume_from_idle+0xf1() ]
> swtch+0x17f()
> cv_wait+0x61()
> zio_wait+0x5f()
> dbuf_read+0x1b5()
> dbuf_findbp+0xe8()
> dbuf_prefetch+0x9b()
> dmu_zfetch_fetch+0x43()
> dmu_zfetch_dofetch+0xc2()
> dmu_zfetch_find+0x3a1()
> dmu_zfetch+0xa5()
> dbuf_read+0xe3()
> dmu_buf_hold_array_by_dnode+0x1c4()
> dmu_read+0xd4()
> zfs_fillpage+0x15e()
> zfs_getpage+0x187()
> fop_getpage+0x9f()
> segvn_fault+0x9ef()
> as_fault+0x5ae()
> pagefault+0x95()
> trap+0x1286() 
> 0xfb8001d9()  
> fuword8+0x21()
> zfs_write+0x147() 
> fop_write+0x69()  
> write+0x2af() 
> write32+0x1e()
> sys_syscall32+0x101() 
>   
>   
So, a write has been issued, zfs is retrieving a page and is waiting for 
the pagein  to complete.  I'll take a further look tomorrow,
but maybe someone else reading this has an idea.  (It is midnight here).

max

>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon,

Simon Breden wrote:
> Hi Max,
>
> I re-ran the cp command and when it hanged I ran 'ps -el' looked up the cp 
> command, got it's PID and then ran:
>
> # truss -p PID_of_cp
>
> and it output nothing at all -- i.e. it hanged too -- just showing a flashing 
> cursor.
>
> The system is still operational as I am typing into the browser.
>
> Before I ran the cp command I did a 'tail -f /var/adm/messages' and there is 
> no output. I also did a 'tail -f /var/log/syslog' and there is also no output.
>
> If I try 'kill -15 PID_of_cp' and then 'ps -el' cp is still running.
> And if I try 'kill -9 PID_of_cp' and then 'ps -el' cp is still running.
>
> What next ?
>   
You can try the following:

# mdb -k
::pgrep cp   <-- this should give you a line with the cp you are 
running.  Next to "cp" is an address, use this address in the next line:

address_from_pgrep::walk thread | ::threadlist -v

This will give you a stack trace.  Please post it.

$q  <-- this gets you out of mdb

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon,


Simon Breden wrote:
> Hi Max,
>
> I haven't used truss before, but give me the command line + switches 
> and I'll be happy to run it.
>
> Simon
# truss -p pid_from_cp

where pid_from_cp is... the pid of the cp process that is "hung".  The 
pid you can get from ps.

I am curious if the cp is stuck on a specific file, or is just very 
slow, or is hung in the kernel.
Also, can you kill the cp when it hangs?

thanks,
max
>
> 2008/5/1 [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> 
> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>:
>
> Hi Simon,
>
> Simon Breden wrote:
>
> Thanks a lot Richard. To give a bit more info, I've copied my
> /var/adm/messages from booting up the machine:
>
> And @picker: I guess the 35 requests are stacked up waiting
> for the hanging request to be serviced?
>
> The question I have is where do I go from now, to get some
> more info on what is causing cp to have problems.
>
> I will now try another tack: use rsync to copy the directory
> to a disk outside the pool (i.e. my home directory on the boot
> drive), to see if it is happy doing that.
>  
>
> What does truss show the cp doing? max
>
>
>

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon,

Simon Breden wrote:
> Thanks a lot Richard. To give a bit more info, I've copied my 
> /var/adm/messages from booting up the machine:
>
> And @picker: I guess the 35 requests are stacked up waiting for the hanging 
> request to be serviced?
>
> The question I have is where do I go from now, to get some more info on what 
> is causing cp to have problems.
>
> I will now try another tack: use rsync to copy the directory to a disk 
> outside the pool (i.e. my home directory on the boot drive), to see if it is 
> happy doing that.
>   
What does truss show the cp doing? 
max


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions about block sizes

2008-04-21 Thread [EMAIL PROTECTED]
Hi Mario,
Mario Goebbels wrote:
>> ZFS can use block sizes up to 128k.  If the data is compressed, then 
>> this size will be larger when decompressed.
>> 
>
> ZFS allows you to use variable blocksizes (sized a power of 2 from 512
> to 128k), and as far as I know, a compressed block is put into the
> smallest fitting one.
>   
Yes.  Of course.  But my question is:  can I have in memory a 
decompressed array of blkptr_t used
for indirection that is larger than 128k, so that when it is compressed 
and written to disk, it is 128k
in size.
> -mg
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] questions about block sizes

2008-04-20 Thread [EMAIL PROTECTED]
Hi,
ZFS can use block sizes up to 128k.  If the data is compressed, then 
this size will be larger when decompressed.
So, can the decompressed data be larger than 128k?  If so, does this 
also hold for metadata?  In other words,
can I have a 128k block on the disk with, for instance, indirect blocks 
(compressed blkptr_t data), that results in
more than 1024 blkptr_t when de-compressed?  If I had a very large 
amount of free space, I could try this
and see, but since I don't, I thought I'd ask here.

thanks,
max


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] reviewers needed for paper on zfs on-disk structure walk

2008-03-14 Thread [EMAIL PROTECTED]
Hi,
I am (hoping) to present a paper at osdevcon in Prague in June.  I have 
a draft of the paper and
am looking for a couple of people to review it.  I am interested to know 
the following:
1. Is it understandable?
2. Is it technically correct?
3. Any comments/suggestions to make it better?

The paper starts at the active uberblock on the disk, and walks the data 
structures on disk
to find the data for a given file.  It uses a modified mdb and modified 
zdb, along with an mdb
dmod.  It is not specifically aimed at system administrators, but rather 
tries to give people
better insight as to where and how data is located in a ZFS file system.

So, anyone interested in reviewing this?  If so, let me know via email 
and I'll send
you a copy.  Also, if you want the modified mdb/zdb and the dmod, let me 
know that
as well.

thanks much,
max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repairing corrupted files?

2008-03-12 Thread [EMAIL PROTECTED]
Hi Richard,
Richard Elling wrote:
> Occasionally the topic arises about what to do when a file is
> corrupted.  ZFS will tell you about it, but what then?  Usually
> the conversation then degenerates into how some people can
> tolerate broken mp3 files or whatever.
>
> Well, the other day I found a corrupted file which gave me an
> opportunity to test a little hypothesis on how to recover what
> you can recover.  Details are in my blog:
> http://blogs.sun.com/relling/entry/holy_smokes_a_holey_file
>
> There is an opportunity here, for someone with some spare time,
> to come up with a more clever solution than my dd script.  hint...
> hint...
>  -- richard
>
>   
Would it help if you had the block number (i.e., disk location) of the 
block that is corrupted?
zdb might tell you this.  I have a way to do it, I think, but don't want 
to test it because I don't
want to corrupt a file on purpose.  I am writing a paper (actually, done 
with first draft) that allows you
to find the data for a given file on the raw disk (i.e., file system is 
not mounted).  I plan on presenting
this at osdevcon in Prague in June.  I am looking for reviewers.  If you 
are interested, please send
me email and I'll send you a copy.
The method I use is quite a bit more complex than using dd.  It involves 
using a modified
zdb and modified mdb together.  I think it would work for this type of 
problem.  (Then again, if zfs
completely wipes out the corrupted block, it won't help).  If I have 
time, I'll try corrupting a few bits
in a file and see if my method works to get the corrupted block.

thanks,
max

> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs create' hanging

2008-03-07 Thread [EMAIL PROTECTED]
Mark J Musante wrote:
> On Fri, 7 Mar 2008, Paul Raines wrote:
>
>   
>> zfs create -o quota=131G -o reserv=131G -o recsize=8K zpool1/itgroup_001
>>
>> and this is still running now.  truss on the process shows nothing.  I
>> don't know how to debug it beyond that.  I thought I would ask for any
>> info from this list before I just reboot.
>> 
>
> What does pstack show?
>
>
>   
If truss shows nothing, it's either looping at user level, or hung in 
the kernel.
Try
echo ::threadlist -v | mdb -k

and see what the stack trace looks like for the zfs process in the kernel.

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] path-name encodings

2008-02-27 Thread [EMAIL PROTECTED]
Hi Marcus,

Marcus Sundman wrote:
> Are path-names text or raw data in zfs? I.e., is it possible to know
> what the name of a file/dir/whatever is, or do I have to make more or
> less wild guesses what encoding is used where?
>
> - Marcus
>   
I'm not sure what you are asking here.  When a zfs file system is 
mounted, it looks like a normal
unix file system, i.e., a tree of files where intermediate nodes are 
directories and leaf nodes may be
directories or regular files.  In other words, ls gives you the same 
kind of output you would expect on
any unix file system.  As to whether a file/directory name is text or 
binary, that depends
on the name used when creating the file/directory.  As far as the 
meta-data used to maintain the file system tree, most of this is
compressed.  But your question makes me wonder if you have tried zfs.  
If so, then I really am not sure
what you are asking.  If not, maybe you should try it out...

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] modification to zdb to decompress blocks

2008-02-26 Thread [EMAIL PROTECTED]
Hi All,
I have modified zdb to do decompression in zdb_read_block.  Syntax is:

# zdb -R poolname:devid:blkno:psize:d,compression_type,lsize

Where compression_type can be lzjb or any other type compression that 
zdb uses, and
lsize is the size after compression.  I have used this with a modified 
mdb to allow one to
do the following:

given a pathname for a file on a zfs file system, display the blocks 
(i.e., data) of the file.  The file
system need not be mounted.

If anyone is interested, send me email.  I can send a webrev of the zdb 
changes for those interested.
As for the mdb changes, I sent a webrev of those a while ago, and have 
since added a rawzfs dmod.

I plan to present a paper at osdevcon in Prague in June that uses the 
modified zdb and mdb to
show the physical layout of a zfs file system.  (I should mention that, 
over time, I have found that
the ZFS on-disk format paper actually does tell you almost everything).

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Did MDB Functionality Change?

2008-02-08 Thread [EMAIL PROTECTED]
Hi Spencer,

spencer wrote:
> On Solaris 10 u3 (11/06) I can execute the following:
>
> bash-3.00# mdb -k
> Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcipsy ip sctp 
> usba nca md zfs random ipc nfs crypto cpc fctl fcip logindmux ptm sppp ]
>   
>> arc::print
>> 
> {
> anon = ARC_anon
> mru = ARC_mru
> mru_ghost = ARC_mru_ghost
> mfu = ARC_mfu
> mfu_ghost = ARC_mfu_ghost
> size = 0x6b800
> p = 0x3f83f80
> c = 0x7f07f00
> c_min = 0x7f07f00
> c_max = 0xbe8be800
> hits = 0x30291
> misses = 0x4f
> deleted = 0xe
> skipped = 0
> hash_elements = 0x3a
> hash_elements_max = 0x3a
> hash_collisions = 0x3
> hash_chains = 0x1
> hash_chain_max = 0x1
> no_grow = 0
> }
>
> However, when I execute the same command on Solaris 10 u4 (8/07) I receive 
> the following error:
>
> bash-3.00# mdb -k
> Loading modules: [ unix krtld genunix specfs dtrace ufs ssd fcp fctl qlc 
> pcisch md ip hook neti sctp arp usba nca lofs logindmux ptm cpc fcip sppp 
> random sd crypto zfs ipc nfs ]
>   
>> arc::print
>> 
> mdb: failed to dereference symbol: unknown symbol name
>   
mdb functionality did not change.  There is no longer a global variable 
named "arc".  The "::arc" command
gets its data from a few different variables.  You can look at 
usr/src/cmd/mdb/common/modules/zfs/zfs.c
and look at the arc_print function to see what it does.  Or, you can use 
::nm !grep arc | grep OBJT
to see what arc related variables exist and use one of those with ::print.
For instance, arc_stats::print.

max
> In addition, u3 doesn't recognize "::arc" where u4 does.
> u3 displays memory locations with "arc::print -a" where "::arc -a" doesn't 
> work for u4.
>
> I posted this into the zfs discussion forum, because this limited u4 
> functionality prevents you from dynamically changing the ARC in ZFS by trying 
> the ZFS Tuning instructions.
>
>
> Spencer
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best stripe-size in array for ZFS mail storage?

2007-12-01 Thread [EMAIL PROTECTED]
Hi Bill,
can you guess? wrote:
>> We will be using Cyrus to store mail on 2540 arrays.
>>
>> We have chosen to build 5-disk RAID-5 LUNs in 2
>> arrays which are both connected to same host, and
>> mirror and stripe the LUNs.  So a ZFS RAID-10 set
>> composed of 4 LUNs.  Multi-pathing also in use for
>> redundancy.
>> 
>
> Sounds good so far:  lots of small files in a largish system with presumably 
> significant access parallelism makes RAID-Z a non-starter,
Why does "lots of small files in a largish system with presumably 
significant access parallelism makes RAID-Z a non-starter"?
thanks,
max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about uberblock blkptr

2007-09-20 Thread [EMAIL PROTECTED]
Hi Roch,
Roch - PAE wrote:
> [EMAIL PROTECTED] writes:
>  > Roch - PAE wrote:
>  > > [EMAIL PROTECTED] writes:
>  > >  > Jim Mauro wrote:
>  > >  > >
>  > >  > > Hey Max - Check out the on-disk specification document at
>  > >  > > http://opensolaris.org/os/community/zfs/docs/.
>  

> > >  > Ok.  I think I know what's wrong.  I think the information (most 
> > > likely, 
>  > >  > a objset_phys_t) is compressed
>  > >  > with lzjb compression.  Is there a way to turn this entirely off (not 
>  > >  > just for file data, but for all meta data
>  > >  > as well when a pool is created?  Or do I need to figure out how to 
> hack 
>  > >  > in the lzjb_decompress() function in
>  > >  > my modified mdb?  (Also, I figured out that zdb is already doing the 
>  > >  > left shift by 9 before dumping DVA values,
>  > >  > for anyone following this...).
>  > >  > 
>  > >
>  > > Max, this might help (zfs_mdcomp_disable) :
>  > > 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#METACOMP
>  > >   
>  > Hi Roch,
>  > That would help, except it does not seem to work.  I set 
>  > zfs_mdcomp_disable to 1 with mdb,
>  > deleted the pool, recreated the pool, and zdb - still shows the 
>  > rootbp in the uberblock_t
>  > to have the lzjb flag turned on.  So I then added the variable to 
>  > /etc/system, destroyed the pool,
>  > rebooted, recreated the pool, and still the same result.  Also, my mdb 
>  > shows the same thing
>  > for the uberblock_t rootbp blkptr data.   I am running Nevada build 55b.
>  > 
>  > I shall update the build I am running soon, but in the meantime I'll 
>  > probably write a modified cmd_print() function for my
>  > (modified)  mdb to handle (at least) lzjb compressed metadata.  Also, I 
>  > think the ZFS Evil Tuning Guide should be
>  > modified.  It says this can be tuned for Solaris 10 11/06 and snv_52.  I 
>  > guess that means only those
>  > two releases.  snv_55b has the variable, but it doesn't have an effect 
>  > (at least on the uberblock_t
>  > rootbp meta-data).
>  > 
>  > thanks for your help.
>  > 
>  > max
>  > 
>
> My bad. The tunable only affects indirect  dbufs (so I guess
> only for  large  files). As  you   noted, other metadata  is
> compressed unconditionaly(I  guess from the use   of
> ZIO_COMPRESS_LZJB in dmu_objset_open_impl).
>
> -r
>
>
>   
This makes printing the data with ::print much more problematic...
The code in mdb that prints data structures recursively iterates through the
structure members reading each member separately.  I can either write a new
print function that does the decompression, or add a new dcmd that does the
descompression and dumps the data to the screen, but then I lose the
structure member names in the output.  I guess I'll do the decompression 
dcmd
first, and then figure out how to get the member names back in the output...

thanks,
max


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about uberblock blkptr

2007-09-19 Thread [EMAIL PROTECTED]
Roch - PAE wrote:
> [EMAIL PROTECTED] writes:
>  > Jim Mauro wrote:
>  > >
>  > > Hey Max - Check out the on-disk specification document at
>  > > http://opensolaris.org/os/community/zfs/docs/.
>  > >
>  > > Page 32 illustration shows the rootbp pointing to a dnode_phys_t
>  > > object (the first member of a objset_phys_t data structure).
>  > >
>  > > The source code indicates ub_rootbp is a blkptr_t, which contains
>  > > a 3 member array of dva_t 's called blk_dva (blk_dva[3]).
>  > > Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]).
>  > >
>  > > So it looks like each blk_dva contains 3 128-bit DVA's
>  > >
>  > > You probably figured all this out alreadydid you try using
>  > > a objset_phys_t to format the data?
>  > >
>  > > Thanks,
>  > > /jim
>  > Ok.  I think I know what's wrong.  I think the information (most likely, 
>  > a objset_phys_t) is compressed
>  > with lzjb compression.  Is there a way to turn this entirely off (not 
>  > just for file data, but for all meta data
>  > as well when a pool is created?  Or do I need to figure out how to hack 
>  > in the lzjb_decompress() function in
>  > my modified mdb?  (Also, I figured out that zdb is already doing the 
>  > left shift by 9 before dumping DVA values,
>  > for anyone following this...).
>  > 
>
> Max, this might help (zfs_mdcomp_disable) :
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#METACOMP
>   
Hi Roch,
That would help, except it does not seem to work.  I set 
zfs_mdcomp_disable to 1 with mdb,
deleted the pool, recreated the pool, and zdb - still shows the 
rootbp in the uberblock_t
to have the lzjb flag turned on.  So I then added the variable to 
/etc/system, destroyed the pool,
rebooted, recreated the pool, and still the same result.  Also, my mdb 
shows the same thing
for the uberblock_t rootbp blkptr data.   I am running Nevada build 55b.

I shall update the build I am running soon, but in the meantime I'll 
probably write a modified cmd_print() function for my
(modified)  mdb to handle (at least) lzjb compressed metadata.  Also, I 
think the ZFS Evil Tuning Guide should be
modified.  It says this can be tuned for Solaris 10 11/06 and snv_52.  I 
guess that means only those
two releases.  snv_55b has the variable, but it doesn't have an effect 
(at least on the uberblock_t
rootbp meta-data).

thanks for your help.

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about uberblock blkptr

2007-09-18 Thread [EMAIL PROTECTED]
Jim Mauro wrote:
>
> Hey Max - Check out the on-disk specification document at
> http://opensolaris.org/os/community/zfs/docs/.
>
> Page 32 illustration shows the rootbp pointing to a dnode_phys_t
> object (the first member of a objset_phys_t data structure).
>
> The source code indicates ub_rootbp is a blkptr_t, which contains
> a 3 member array of dva_t 's called blk_dva (blk_dva[3]).
> Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]).
>
> So it looks like each blk_dva contains 3 128-bit DVA's
>
> You probably figured all this out alreadydid you try using
> a objset_phys_t to format the data?
>
> Thanks,
> /jim
Ok.  I think I know what's wrong.  I think the information (most likely, 
a objset_phys_t) is compressed
with lzjb compression.  Is there a way to turn this entirely off (not 
just for file data, but for all meta data
as well when a pool is created?  Or do I need to figure out how to hack 
in the lzjb_decompress() function in
my modified mdb?  (Also, I figured out that zdb is already doing the 
left shift by 9 before dumping DVA values,
for anyone following this...).

thanks,
max

>
>
>
> [EMAIL PROTECTED] wrote:
>> Hi All,
>> I have modified mdb so that I can examine data structures on disk 
>> using ::print.
>> This works fine for disks containing ufs file systems.  It also works 
>> for zfs file systems, but...
>> I use the dva block number from the uberblock_t to print what is at 
>> the block
>> on disk.  The problem I am having is that I can not figure out what 
>> (if any) structure to use.
>> All of the xxx_phys_t types that I try do not look right.  So, the 
>> question is, just what is
>> the structure that the uberblock_t dva's refer to on the disk?
>>
>> Here is an example:
>>
>> First, I use zdb to get the dva for the rootbp (should match the 
>> value in the uberblock_t(?)).
>>
>> # zdb - usbhard | grep -i dva
>> Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 
>> DMU objset] 400L/200P DVA[0]=<0:111f79000:200> 
>> DVA[1]=<0:506bde00:200> DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE 
>> contiguous birth=621838 fill=167 
>> cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb
>> bp = [L0 DMU objset] 400L/200P 
>> DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> 
>> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 
>> fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529
>> Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp 
>> [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> 
>> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE 
>> contiguous birth=621838 fill=34026 
>> cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529
>> first block: [L0 ZIL intent log] 9000L/9000P 
>> DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous 
>> birth=263950 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1
>> ^C
>> #
>>
>> Then I run my modified mdb on the vdev containing the "usbhard" pool
>> # ./mdb /dev/rdsk/c4t0d0s0
>>
>> I am using the DVA[0} for the META data set above.  Note that I have 
>> tried all of the xxx_phys_t structures
>> that I can find in zfs source, but none of them look right.  Here is 
>> example output dumping the data as a objset_phys_t.
>> (The shift by 9 and adding 40 is from the zfs on-disk format 
>> paper, I have tried without the addition, without the shift,
>> in all combinations, but the output still does not make sense).
>>
>>  > (111f79000<<9)+40::print zfs`objset_phys_t
>> {
>> os_meta_dnode = {
>> dn_type = 0x4f
>> dn_indblkshift = 0x75
>> dn_nlevels = 0x82
>> dn_nblkptr = 0x25
>> dn_bonustype = 0x47
>> dn_checksum = 0x52
>> dn_compress = 0x1f
>> dn_flags = 0x82
>> dn_datablkszsec = 0x5e13
>> dn_bonuslen = 0x63c1
>> dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ]
>> dn_maxblkid = 0x20a34fa97f3ff2a6
>> dn_used = 0xac2ea261cef045ff
>> dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, 
>> 0x315efac9cb693387, 0x2d56c54db5da75bf ]
>> dn_blkptr = [
>> {
>> blk_dva = [
>> {
>> dva_word = [ 0x87c9ed7672454887, 
>> 0x760f569622246efe ]
>> }
>> {
>> dv

Re: [zfs-discuss] question about uberblock blkptr

2007-09-17 Thread [EMAIL PROTECTED]
Jim Mauro wrote:
>
> Hey Max - Check out the on-disk specification document at
> http://opensolaris.org/os/community/zfs/docs/.
>
> Page 32 illustration shows the rootbp pointing to a dnode_phys_t
> object (the first member of a objset_phys_t data structure).
>
> The source code indicates ub_rootbp is a blkptr_t, which contains
> a 3 member array of dva_t 's called blk_dva (blk_dva[3]).
> Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]).
>
> So it looks like each blk_dva contains 3 128-bit DVA's
>
> You probably figured all this out alreadydid you try using
> a objset_phys_t to format the data?
>
> Thanks,
> /jim

Hi Jim,
Yes, I have tried an objset_phys_t.  This is what I am using below in 
the example.  Either there's some
extra stuff that the on-disk format specification is not saying, or I'm 
not picking up the correct blkptr
(though I have tried other blkptr's from the uberblock array following 
the nvpair/label section at the
beginning of the disk), or the uberblock_t blkptr is pointing to 
something completely different.  I am
going to have another look at the zdb code, as I suspect that it must 
also do something like what I am
trying to do.  Also, I think someone on this list should know what the 
uberblock_t blkptr refers to
if it is not an objset_t.  I don't have compression or any encryption 
turned on, but I am also wondering
if the metadata is somehow compressed or encrypted.
Thanks for the response.  I was beginning to think the only people that 
read this mailing list are admins...
(Sorry guys, getting zfs configured properly is much more important than 
what I'm doing here, but
this is more interesting to me).

max

>
>
>
> [EMAIL PROTECTED] wrote:
>> Hi All,
>> I have modified mdb so that I can examine data structures on disk 
>> using ::print.
>> This works fine for disks containing ufs file systems.  It also works 
>> for zfs file systems, but...
>> I use the dva block number from the uberblock_t to print what is at 
>> the block
>> on disk.  The problem I am having is that I can not figure out what 
>> (if any) structure to use.
>> All of the xxx_phys_t types that I try do not look right.  So, the 
>> question is, just what is
>> the structure that the uberblock_t dva's refer to on the disk?
>>
>> Here is an example:
>>
>> First, I use zdb to get the dva for the rootbp (should match the 
>> value in the uberblock_t(?)).
>>
>> # zdb - usbhard | grep -i dva
>> Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 
>> DMU objset] 400L/200P DVA[0]=<0:111f79000:200> 
>> DVA[1]=<0:506bde00:200> DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE 
>> contiguous birth=621838 fill=167 
>> cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb
>> bp = [L0 DMU objset] 400L/200P 
>> DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> 
>> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 
>> fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529
>> Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp 
>> [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> 
>> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE 
>> contiguous birth=621838 fill=34026 
>> cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529
>> first block: [L0 ZIL intent log] 9000L/9000P 
>> DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous 
>> birth=263950 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1
>> ^C
>> #
>>
>> Then I run my modified mdb on the vdev containing the "usbhard" pool
>> # ./mdb /dev/rdsk/c4t0d0s0
>>
>> I am using the DVA[0} for the META data set above.  Note that I have 
>> tried all of the xxx_phys_t structures
>> that I can find in zfs source, but none of them look right.  Here is 
>> example output dumping the data as a objset_phys_t.
>> (The shift by 9 and adding 40 is from the zfs on-disk format 
>> paper, I have tried without the addition, without the shift,
>> in all combinations, but the output still does not make sense).
>>
>>  > (111f79000<<9)+40::print zfs`objset_phys_t
>> {
>> os_meta_dnode = {
>> dn_type = 0x4f
>> dn_indblkshift = 0x75
>> dn_nlevels = 0x82
>> dn_nblkptr = 0x25
>> dn_bonustype = 0x47
>> dn_checksum = 0x52
>> dn_compress = 0x1f
>> dn_flags = 0x82
>> dn_datablkszsec = 0x5e13
>> dn_bonuslen =

[zfs-discuss] question about uberblock blkptr

2007-09-17 Thread [EMAIL PROTECTED]
Hi All,
I have modified mdb so that I can examine data structures on disk using 
::print.
This works fine for disks containing ufs file systems.  It also works 
for zfs file systems, but...
I use the dva block number from the uberblock_t to print what is at the 
block
on disk.  The problem I am having is that I can not figure out what (if 
any) structure to use.
All of the xxx_phys_t types that I try do not look right.  So, the 
question is, just what is
the structure that the uberblock_t dva's refer to on the disk?

Here is an example:

First, I use zdb to get the dva for the rootbp (should match the value 
in the uberblock_t(?)).

# zdb - usbhard | grep -i dva
Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 DMU 
objset] 400L/200P DVA[0]=<0:111f79000:200> DVA[1]=<0:506bde00:200> 
DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE contiguous birth=621838 
fill=167 cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb
bp = [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> 
DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE 
contiguous birth=621838 fill=34026 
cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529
Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp [L0 
DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> 
DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 
fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529
first block: [L0 ZIL intent log] 9000L/9000P 
DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous birth=263950 
fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1
^C
#

Then I run my modified mdb on the vdev containing the "usbhard" pool
# ./mdb /dev/rdsk/c4t0d0s0

I am using the DVA[0} for the META data set above.  Note that I have 
tried all of the xxx_phys_t structures
that I can find in zfs source, but none of them look right.  Here is 
example output dumping the data as a objset_phys_t.
(The shift by 9 and adding 40 is from the zfs on-disk format paper, 
I have tried without the addition, without the shift,
in all combinations, but the output still does not make sense).

 > (111f79000<<9)+40::print zfs`objset_phys_t
{
os_meta_dnode = {
dn_type = 0x4f
dn_indblkshift = 0x75
dn_nlevels = 0x82
dn_nblkptr = 0x25
dn_bonustype = 0x47
dn_checksum = 0x52
dn_compress = 0x1f
dn_flags = 0x82
dn_datablkszsec = 0x5e13
dn_bonuslen = 0x63c1
dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ]
dn_maxblkid = 0x20a34fa97f3ff2a6
dn_used = 0xac2ea261cef045ff
dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, 
0x315efac9cb693387, 0x2d56c54db5da75bf ]
dn_blkptr = [
{
blk_dva = [
{
dva_word = [ 0x87c9ed7672454887, 
0x760f569622246efe ]
}
{
dva_word = [ 0xce26ac20a6a5315c, 
0x38802e5d7cce495f ]
}
{
dva_word = [ 0x9241150676798b95, 
0x9c6985f95335742c ]
}
]
None of this looks believable.  So, just what is the rootbp in the 
uberblock_t referring to?

thanks,
max


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and IBM's TSM

2007-07-09 Thread Dan [EMAIL PROTECTED]




Does anyone have a customer
using IBM Tivoli Storage Manager (TSM) with ZFS? I see that IBM has a
client for Solaris 10, but does it work with ZFS?
-- 



 





Dan Christensen 
System Engineer

Sun Microsystems, Inc.
Des Moines, IA 50266 US
877-263-2204

  
  







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss