Re: [zfs-discuss] How to diagnose zfs - iscsi - nfs hang
Hi Blake, Blake Irvin wrote: > I am directly on the console. cde-login is disabled, so i'm dealing > with direct entry. > >>Are you directly on the console, or is the console on >> a serial port? If you are >> running over X windows, the input might still get in, >> but X may not be displaying. >> If keyboard input is not getting in, your machine is >> probably wedged at a high >> level interrupt, which sounds doubtful based on your >> problem description. >> > Out of curiosity, why do you say that? I'm no expert on interrupts, > so I'm curious. It DOES seem that keyboard entry is ignored in this > situation, since I see no results from ctrl-c, for example (I had left > the console running 'tail -f /var/adm/messages'. I'm not saying your > are wrong, but if I should be examining interrupt issues, I'd like to > know (I have 3 hard disk controllers in the box, for example...) > Typing ctrl-c, and having process killed because of it are 2 different actions. The interpretation of ctrl-c as a kill character is done in a streams module (ldterm, I believe). This is not done at the device interrupt handler. I doubt you need to examine interrupts. I was only saying that you could try what I recommended to get a dump. The f1-a is handled at the driver during interrupt handling, so it should get processed. I have done this many times, so I am sure it works. >> If the deadman timer does not trigger, the clock is >> almost certainly running, and your machine is >> almost certainly accepting keyboard input. >> > That's good to know. I just enabled deadman after the last freeze, so > it will be a bit before I can test this (hope I don't have to). > > thanks! > Blake > > >> Good luck, >> max >> ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to diagnose zfs - iscsi - nfs hang
Hi Blake, Blake Irvin wrote: > Thanks - however, the machine hangs and doesn't even accept console input > when this occurs. I can't get into the kernel debugger in these cases. > Are you directly on the console, or is the console on a serial port? If you are running over X windows, the input might still get in, but X may not be displaying. If keyboard input is not getting in, your machine is probably wedged at a high level interrupt, which sounds doubtful based on your problem description. > I've enabled the deadman timer instead. I'm also using the automatic > snapshot service to get a look at things like /var/adm/sa/sa** files that get > overwritten after a hard reset. > If the deadman timer does not trigger, the clock is almost certainly running, and your machine is almost certainly accepting keyboard input. Good luck, max > I'm just going to stay up late tonight and see what happens :) > > Blake > > > > > >> Hi Blake, >> >> Blake Irvin wrote: >> >>> I'm having a very similar issue. Just updated to >>> >> 10 u6 and upgrade my zpools. They are fine (all >> 3-way mirors), but I've lost the machine around >> 12:30am two nights in a row. >> >>> What I'd really like is a way to force a core dump >>> >> when the machine hangs like this. scat is a very >> nifty tool for debugging such things - but I'm not >> getting a core or panic or anything :( >> >>> >>> >> You can force a dump. Here are the steps: >> >> Before the system is hung: >> >> # mdb -K -F <-- this will load kmdb and drop into >> it >> >> Don't worry if your system now seems hung. >> Type, carefully, with no typos: >> >> :c <-- and carriage-return. You should get your >> prompt back >> >> Now, when the system is hung, type F1-a (that's >> function key f1 and the >> "a" key together. >> This should put you into kmdb. Now, type, (again, no >> typos): >> >> $> >> This should give you a panic dump, followed by >> reboot, (unless your >> system is hard-hung). >> >> max >> >> >> ___ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discu >> ss >> ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to diagnose zfs - iscsi - nfs hang
Hi Blake, Blake Irvin wrote: > I'm having a very similar issue. Just updated to 10 u6 and upgrade my > zpools. They are fine (all 3-way mirors), but I've lost the machine around > 12:30am two nights in a row. > > > What I'd really like is a way to force a core dump when the machine hangs > like this. scat is a very nifty tool for debugging such things - but I'm not > getting a core or panic or anything :( > You can force a dump. Here are the steps: Before the system is hung: # mdb -K -F <-- this will load kmdb and drop into it Don't worry if your system now seems hung. Type, carefully, with no typos: :c <-- and carriage-return. You should get your prompt back Now, when the system is hung, type F1-a (that's function key f1 and the "a" key together. This should put you into kmdb. Now, type, (again, no typos): $http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [website-discuss] zdb to dump data
Hi Derek, Derek Cicero wrote: > Victor Latushkin wrote: >> [EMAIL PROTECTED] пишет: >>> Hi, >>> Victor Latushkin wrote: >>>> >>> I have decided to file an RFE so that zdb with the -R option will >>> allow one to decompress data before dumping it. I have had this >>> implemented for several months now, and was told that a way to get it >>> into opensolaris was to file an RFE. However, when I go to file the >>> RFE, after typing in the information and hitting "send", I am >>> getting: > >>> >>> Not Found >>> >>> The requested URL /bug/os was not found on this server. >>> >>> >>> >>> Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.8a DAV/2 proxy_html/2.5 >>> mod_jk/1.2.15 Server at www.opensolaris.org Port 80 >>> >>> Any ideas? >> > So you are unable to submit a bug? > > When did this happen (date/time)? > > I will check the logs. > > Derek > I have decided not to file an RFE, as Victor points out that it was already filed. (I searched for keyword zfs, and decompress in text. Victor suggested I try zfs and decompress both in text). However, this error occurred this morning (last night, California time around 11:45pm, October 30). thanks, max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zdb to dump data
Victor Latushkin wrote: > [EMAIL PROTECTED] пишет: >> Hi, >> Victor Latushkin wrote: >>> Hi Ben, >>> >>> Ben Rockwood пишет: >>> >>>> Is there some hidden way to coax zdb into not just displaying data >>>> based on a given DVA but rather to dump it in raw usable form? >>>> >>>> I've got a pool with large amounts of corruption. Several >>>> directories are toast and I get "I/O Error" when trying to enter or >>>> read the directory... however I can read the directory and files >>>> using ZDB, if I could just dump it in a raw format I could do >>>> recovery that way. >>>> >>>> To be clear, I've already recovered from the situation, this is >>>> purely an academic "can I do it" exercise for the sake of learning. >>>> >>>> If ZDB can't do it, I'd assume I'd have to write some code to read >>>> based on DVA. Maybe I could write a little tool for it. >>>> >>> >>> zdb -R can read raw data blocks from the pool if flag 'r' is used, >>> so if you can identify list of a blocks comprising some file, you >>> can feed it to zdb -R. >>> >>> >> I have decided to file an RFE so that zdb with the -R option will >> allow one to decompress data before dumping it. I have had this >> implemented for several months now, and was told that a way to get it >> into opensolaris was to file an RFE. However, when I go to file the >> RFE, after typing in the information and hitting "send", I am >> getting: > >> >> Not Found >> >> The requested URL /bug/os was not found on this server. >> >> >> Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.8a DAV/2 >> proxy_html/2.5 mod_jk/1.2.15 Server at www.opensolaris.org Port 80 >> >> Any ideas? > > I have no idea whatmay be wrong with web-site, but there's already RFE > > 6757444 want zdb -R to support decompression, checksumming and raid-z > > regards, > victor > Thanks Victor. I guess searching the bug/rfe site for zdb is not the correct way to find this. I looked for keyword: zdb and text: decompress and got no hits. Maybe I should have used google... max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zdb to dump data
Hi, Victor Latushkin wrote: > Hi Ben, > > Ben Rockwood пишет: > >> Is there some hidden way to coax zdb into not just displaying data >> based on a given DVA but rather to dump it in raw usable form? >> >> I've got a pool with large amounts of corruption. Several >> directories are toast and I get "I/O Error" when trying to enter or >> read the directory... however I can read the directory and files >> using ZDB, if I could just dump it in a raw format I could do >> recovery that way. >> >> To be clear, I've already recovered from the situation, this is >> purely an academic "can I do it" exercise for the sake of learning. >> >> If ZDB can't do it, I'd assume I'd have to write some code to read >> based on DVA. Maybe I could write a little tool for it. >> > > zdb -R can read raw data blocks from the pool if flag 'r' is used, so if > you can identify list of a blocks comprising some file, you can feed it > to zdb -R. > > I have decided to file an RFE so that zdb with the -R option will allow one to decompress data before dumping it. I have had this implemented for several months now, and was told that a way to get it into opensolaris was to file an RFE. However, when I go to file the RFE, after typing in the information and hitting "send", I am getting: Not Found The requested URL /bug/os was not found on this server. Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.8a DAV/2 proxy_html/2.5 mod_jk/1.2.15 Server at www.opensolaris.org Port 80 Any ideas? thanks, max > See comments in zdb source (before zdb_read_block()) for exact syntax. > > Wbr, > Victor > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] COW & updates [C1]
Hi Cyril, Cyril ROUHASSIA wrote: > > Dear all, > please find below test that I have run: > > #zdb -v unxtmpzfs3<--uberblock for unxtmpzfs3 spool > Uberblock > > magic = 00bab10c > version = 4 > txg = 86983 > guid_sum = 9860489793107228114 > timestamp = 1225183041 UTC = Tue Oct 28 09:37:21 2008 > rootbp = [L0 DMU objset] 400L/200P DVA[0]=<1:8:200> > DVA[1]=<0:7ac00:200> DVA[2]=<1:18013000:200> fletcher4 lzjb BE > contiguous birth=86983 fill=38 > cksum=d7e4c6e6f:508f5121f9f:f66339b469f2:2025284ff2f12d > > > # echo titi >> /unxtmpzfs3/mnt1/mnt4/te1 <-- update of > te1 file located in the zpool > > # zdb -v unxtmpzfs3 <-- uberblock for > unxtmpzfs3 spool after file update > Uberblock > > magic = 00bab10c > version = 4 > txg = 87012 > guid_sum = 9860489793107228114 > timestamp = 1225183186 UTC = Tue Oct 28 09:39:46 2008 > rootbp = [L0 DMU objset] 400L/200P DVA[0]=<1:82a00:200> > DVA[1]=<0:7e400:200> DVA[2]=<1:18015c00:200> fletcher4 lzjb BE > contiguous birth=87012 fill=38 > cksum=c3ac8e047:46e375e1c21:d272d39402da:1aaadb02468e54 > > > > Conclusion is: > > * Because of one change to just one file, the MOS is a brend new > one. Then the question is : > > Is the new MOS a whole copy of the previous one or does it share > untouched data with the previous one and has its own copy of specific > data (like an update onto a regular file)? > Indeed, I have checked the metadnode array entries and it sounds like > there are few entries which are different . A block containing changed MOS data will be new. Other blocks of the MOS should be unchanged. Of course, any indirect (gang) blocks that need to be updated will also be new. > > * Is the uperblock a brend new one after the update (just 128k > possible uperblocks!!!)?? > Only one is "active" at any one time. As I recall, the 128 possible uberblocks are treated as a circular array. max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow zpool import with b98
I have no snapshots in this zpool. On 09/22/08 16:09, Sanjeev wrote: > Detlef, > > I presume you have about 9 filesystems. How many snapshots do you have ? > > Thanks and regards, > Sanjeev. > > On Mon, Sep 22, 2008 at 03:59:34PM +0200, Detlef [EMAIL PROTECTED] wrote: >> With Nevada Build 98 I realize a slow zpool import of my pool which >> holds my user and archive data on my laptop. >> >> The first time it was realized during the boot if Solaris tells me to >> mount zfs filesystems (1/9) and then works for 1-2 minutes until it goes >> ahead. I hear the disk working but have no clue what happens here. >> So I checked to zpool export and import, and with this import it is also >> slow (takes around 90 seconds to import and with b97 it took 5 seconds). >> Has anyone an idea what the reason could be ? >> >> I also had created 2 ZVOL's under one filesysystem. Now I removed the >> upper filesystem (and expected that zfs will also remove the both >> zvols). But now on zpool exports it complains about these two unknown >> datasets as: "dataset does not exist" >> >> Any comments and ideas how to "really" remove the zvols and what's the >> issue with slow zpool import ? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Slow zpool import with b98
With Nevada Build 98 I realize a slow zpool import of my pool which holds my user and archive data on my laptop. The first time it was realized during the boot if Solaris tells me to mount zfs filesystems (1/9) and then works for 1-2 minutes until it goes ahead. I hear the disk working but have no clue what happens here. So I checked to zpool export and import, and with this import it is also slow (takes around 90 seconds to import and with b97 it took 5 seconds). Has anyone an idea what the reason could be ? I also had created 2 ZVOL's under one filesysystem. Now I removed the upper filesystem (and expected that zfs will also remove the both zvols). But now on zpool exports it complains about these two unknown datasets as: "dataset does not exist" Any comments and ideas how to "really" remove the zvols and what's the issue with slow zpool import ? Detlef ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
Hi Robert, et.al., I have blogged about a method I used to recover a removed file from a zfs file system at http://mbruning.blogspot.com. Be forewarned, it is very long... All comments are welcome. max Robert Milkowski wrote: > Hello max, > > Sunday, August 17, 2008, 1:02:05 PM, you wrote: > > mbc> A Darren Dunham wrote: > >>> If the most recent uberblock appears valid, but doesn't have useful >>> data, I don't think there's any way currently to see what the tree of an >>> older uberblock looks like. It would be nice to see if that data >>> appears valid and try to create a view that would be >>> readable/recoverable. >>> >>> >>> > mbc> I have a method to examine uberblocks on disk. Using this, along with > mbc> my modified > mbc> mdb and zdb, I have been able to recover a previously removed file. > mbc> I'll post > mbc> details in a blog if there is interest. > > Of course, pleas do so. > > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
A Darren Dunham wrote: > > If the most recent uberblock appears valid, but doesn't have useful > data, I don't think there's any way currently to see what the tree of an > older uberblock looks like. It would be nice to see if that data > appears valid and try to create a view that would be > readable/recoverable. > > I have a method to examine uberblocks on disk. Using this, along with my modified mdb and zdb, I have been able to recover a previously removed file. I'll post details in a blog if there is interest. max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]
Darren J Moffat wrote: > [EMAIL PROTECTED] wrote: > >>> As others have noted, the COW nature of ZFS means that there is a >>> good chance that on a mostly-empty pool, previous data is still intact >>> long after you might think it is gone. A utility to recover such data is >>> (IMHO) more likely to be in the category of forensic analysis than >>> a mount (import) process. There is more than enough information >>> publically available for someone to build such a tool (hint, hint :-) >>> -- richard >>> >> Veritas, the makers if vxfs, whom I consider ZFS to be trying to >> compete against has higher level (normal) support engineers that have >> access to tools that let them scan the disk for inodes and other filesystem >> fragments and recover. When you log a support call on a faulty filesystem >> (in one such case I was involved in zeroed out 100mb of the first portion >> of the volume killing off both top OLT's -- bad bad) they can actually help >> you at a very low level dig data out of the filesystem or even recover from >> pretty nasty issues. They can scan for inodes (marked by a magic number), >> have utilities to pull out files from those inodes (including indirect >> blocks/extents). Given the tools and help from their support I was able to >> pull back 500 gb of files (99%) from a filesystem that emc killed during a >> botched powerpath upgrade. Can Sun's support engineers, or is their >> answer pull from tape? (hint, hint ;-) >> > > Sounds like a good topic for here: > > http://opensolaris.org/os/project/forensics/ > I took a look at this project, specifically http://opensolaris.org/os/project/forensics/ZFS-Forensics/. Is there any reason that the paper and slides I presented at the OpenSolaris Developers Conference on zfs on-disk format not mentioned? The paper is at: http://www.osdevcon.org/2008/files/osdevcon2008-proceedings.pdf starting on page 36, and the slides are at: http://www.osdevcon.org/2008/files/osdevcon2008-max.pdf thanks, max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cp -r hanged copying a directory
Hi Simon, Simon Breden wrote: > Thanks Max, and the fact that rsync stresses the system less would help > explain why rsync works, and cp hangs. The directory was around 11GB in size. > > If Sun engineers are interested in this problem then I'm happy to run > whatever commands they give me -- after all, I have a pure goldmine here for > them to debug ;-) And it *is* running on a ZFS filesystem. Opportunities like > this don't come along every day :) Tempted? :) > > Well, if I can't tempt Sun, then for anyone who has the same disks, I would > be interested to see what happens on your machine: > Model Number: WD7500AAKS-00RBA0 > Firmware revision: 4G30 > > I use three of these disks in a RAIDZ1 vdev within the pool. > > I think Rob Logan is probably correct, and there is a problem with the disks, not zfs. Have you tried this with a different file system (ufs), or multiple dd commands running at the same time with the raw disks? max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cp -r hanged copying a directory
Hi Simon, Simon Breden wrote: > The plot thickens. I replaced 'cp' with 'rsync' and it worked -- I ran it a > few times and it didn't hang so far. > > So on the face of it, it appears that 'cp' is doing something that causes my > system to hang if the files are read from and written to the same pool, but > simply replacing 'cp' with 'rsync' works. Hmmm... anyone have a clue about > what I can do next to home in on the problem with 'cp' ? > > Here is the output using 'rsync' : > > bash-3.2$ truss -topen rsync -a z1 z2 > open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT > The rsync command and cp command work very differently. cp mmaps up to 8MB of the input file and writes from the returned address of mmap, faulting in the pages as it writes (unless you are a normal user on Indiana, in which case cp is gnu's cp which reads/writes (so, why are there 2 versions?)). Rsync forks and sets up a socketpair between parent and child processes then reads/writes. It should be much slower than cp, and put much less stress on the disk. It would be great to have a way to reproduce this. I have not had any problems. How large is the directory you are copying? Either the disk has not sent a response to an I/O operation, or the response was somehow lost. If I could reproduce the problem, I might try to dtrace the commands being sent to the HBA and responses coming back... Hopefully someone here who has experience with the disks you are using will be able to help. max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Endian relevance for decoding lzjb blocks
Hi Benjamin, Benjamin Brumaire wrote: > I 'm trying to decode a lzjb compressed blocks and I have some hard times > regarding big/little endian. I'm on x86 working with build 77. > > #zdb - ztest > ... > rootbp = [L0 DMU objset] 400L/200P DVA[0]=<0:e0c98e00:200> > ... > > ## zdb -R ztest:c0d1s4:e0c98e00:200: > Found vdev: /dev/dsk/c0d1s4 > > ztest:c0d1s4:e0c98e00:200: > 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef > 00: 0003020e0a00 dd0304050020b601 .. . > 10: c505048404040504 35b558231002047c |...#X.5 > > Using the modified zdb, you should be able to do: # zdb -R ztest:c0d1s4:e0c98e00:200:d,lzjb,400 2>/tmp/foo Then you can od /tmp/foo. I am not sure what happens if you run zdb with a zfs file system that is different endianess from the machine on which you are running zdb. It may just work... The "d:lzjb:400" says to use lzjb decompression with a logical (after decompression) size of 0x400 bytes. It dumps raw data to stderr, hence the "2>/tmp/foo". max > Looking at this blocks with dd: > dd if=/dev/dsk/c0d1s4 iseek=7374023 bs=512 count=1 | od -x > 000: 0a00 020e 0003 b601 0020 0405 dd03 > > od -x is responsible for swapping every two bytes. I have on disk > 000: 000a 0e02 0300 01b6 0200 0504 03dd > > Comparing with the zdb output is every 8 bytes reversed. > > Now I don't know how to pass this to my lzjb decoding programm? > > Should I read the 512 bytes and pass them: >- from the end >- from the start and reverse every 8 bytes >- or something else > > thanks for any advice > > bbr > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cp -r hanged copying a directory
Simon Breden wrote: > set sata:sata_max_queue_depth = 0x1 > > = > > Anyway, after adding the line above into /etc/system, I rebooted and then > re-tried the copy with truss: > > truss cp -r testdir z4 > > It seems to hang on random files -- so it's not always the same file that it > hangs on. > > On this particular run here are the last few lines of truss output, although > they're probably not useful: > Hi Simon, Try with: truss -topen cp -r testdir z4 This will only show you the files being opened. The last file opened in testdir is the one it is hanging on. (Unless it is hanging in getdents(2), but I don't think so based on the kernel stacktrace). But, if it is hanging on random files, this is not going to help either. How long do you wait before deciding it's hung? I think usually you should get console output saying I/O has been retried if the device does not respond to a previously sent I/O. max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cp -r hanged copying a directory
Hi Simon, Simon Breden wrote: > > Thanks for your advice Max, and here is my reply to your suggestion: > > > # mdb -k > Loading modules: [ unix genunix specfs dtrace cpu.generic > cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs ip hook neti sctp arp usba > s1394 nca lofs zfs random md sppp smbsrv nfs ptm ipc crypto ] > >> ::pgrep cp >> > SPID PPID PGIDSIDUID FLAGS ADDR NAME > R889868889868501 0x4a004000 ff01deca9048 cp > >> ff01deca9048::walk thread | ::threadlist -v >> > ADDR PROC LWP CLS PRIWCHAN > ff01e0045840 ff01deca9048 ff01de9d9210 2 60 ff01d861ca80 > PC: _resume_from_idle+0xf1CMD: cp -pr testdir z1 > stack pointer for thread ff01e0045840: ff0007fcdf00 > [ ff0007fcdf00 _resume_from_idle+0xf1() ] > swtch+0x17f() > cv_wait+0x61() > zio_wait+0x5f() > dbuf_read+0x1b5() > dbuf_findbp+0xe8() > dbuf_prefetch+0x9b() > dmu_zfetch_fetch+0x43() > dmu_zfetch_dofetch+0xc2() > dmu_zfetch_find+0x3a1() > dmu_zfetch+0xa5() > dbuf_read+0xe3() > dmu_buf_hold_array_by_dnode+0x1c4() > dmu_read+0xd4() > zfs_fillpage+0x15e() > zfs_getpage+0x187() > fop_getpage+0x9f() > segvn_fault+0x9ef() > as_fault+0x5ae() > pagefault+0x95() > trap+0x1286() > 0xfb8001d9() > fuword8+0x21() > zfs_write+0x147() > fop_write+0x69() > write+0x2af() > write32+0x1e() > sys_syscall32+0x101() > > So, a write has been issued, zfs is retrieving a page and is waiting for the pagein to complete. I'll take a further look tomorrow, but maybe someone else reading this has an idea. (It is midnight here). max > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cp -r hanged copying a directory
Hi Simon, Simon Breden wrote: > Hi Max, > > I re-ran the cp command and when it hanged I ran 'ps -el' looked up the cp > command, got it's PID and then ran: > > # truss -p PID_of_cp > > and it output nothing at all -- i.e. it hanged too -- just showing a flashing > cursor. > > The system is still operational as I am typing into the browser. > > Before I ran the cp command I did a 'tail -f /var/adm/messages' and there is > no output. I also did a 'tail -f /var/log/syslog' and there is also no output. > > If I try 'kill -15 PID_of_cp' and then 'ps -el' cp is still running. > And if I try 'kill -9 PID_of_cp' and then 'ps -el' cp is still running. > > What next ? > You can try the following: # mdb -k ::pgrep cp <-- this should give you a line with the cp you are running. Next to "cp" is an address, use this address in the next line: address_from_pgrep::walk thread | ::threadlist -v This will give you a stack trace. Please post it. $q <-- this gets you out of mdb max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cp -r hanged copying a directory
Hi Simon, Simon Breden wrote: > Hi Max, > > I haven't used truss before, but give me the command line + switches > and I'll be happy to run it. > > Simon # truss -p pid_from_cp where pid_from_cp is... the pid of the cp process that is "hung". The pid you can get from ps. I am curious if the cp is stuck on a specific file, or is just very slow, or is hung in the kernel. Also, can you kill the cp when it hangs? thanks, max > > 2008/5/1 [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>: > > Hi Simon, > > Simon Breden wrote: > > Thanks a lot Richard. To give a bit more info, I've copied my > /var/adm/messages from booting up the machine: > > And @picker: I guess the 35 requests are stacked up waiting > for the hanging request to be serviced? > > The question I have is where do I go from now, to get some > more info on what is causing cp to have problems. > > I will now try another tack: use rsync to copy the directory > to a disk outside the pool (i.e. my home directory on the boot > drive), to see if it is happy doing that. > > > What does truss show the cp doing? max > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cp -r hanged copying a directory
Hi Simon, Simon Breden wrote: > Thanks a lot Richard. To give a bit more info, I've copied my > /var/adm/messages from booting up the machine: > > And @picker: I guess the 35 requests are stacked up waiting for the hanging > request to be serviced? > > The question I have is where do I go from now, to get some more info on what > is causing cp to have problems. > > I will now try another tack: use rsync to copy the directory to a disk > outside the pool (i.e. my home directory on the boot drive), to see if it is > happy doing that. > What does truss show the cp doing? max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] questions about block sizes
Hi Mario, Mario Goebbels wrote: >> ZFS can use block sizes up to 128k. If the data is compressed, then >> this size will be larger when decompressed. >> > > ZFS allows you to use variable blocksizes (sized a power of 2 from 512 > to 128k), and as far as I know, a compressed block is put into the > smallest fitting one. > Yes. Of course. But my question is: can I have in memory a decompressed array of blkptr_t used for indirection that is larger than 128k, so that when it is compressed and written to disk, it is 128k in size. > -mg > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] questions about block sizes
Hi, ZFS can use block sizes up to 128k. If the data is compressed, then this size will be larger when decompressed. So, can the decompressed data be larger than 128k? If so, does this also hold for metadata? In other words, can I have a 128k block on the disk with, for instance, indirect blocks (compressed blkptr_t data), that results in more than 1024 blkptr_t when de-compressed? If I had a very large amount of free space, I could try this and see, but since I don't, I thought I'd ask here. thanks, max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] reviewers needed for paper on zfs on-disk structure walk
Hi, I am (hoping) to present a paper at osdevcon in Prague in June. I have a draft of the paper and am looking for a couple of people to review it. I am interested to know the following: 1. Is it understandable? 2. Is it technically correct? 3. Any comments/suggestions to make it better? The paper starts at the active uberblock on the disk, and walks the data structures on disk to find the data for a given file. It uses a modified mdb and modified zdb, along with an mdb dmod. It is not specifically aimed at system administrators, but rather tries to give people better insight as to where and how data is located in a ZFS file system. So, anyone interested in reviewing this? If so, let me know via email and I'll send you a copy. Also, if you want the modified mdb/zdb and the dmod, let me know that as well. thanks much, max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] repairing corrupted files?
Hi Richard, Richard Elling wrote: > Occasionally the topic arises about what to do when a file is > corrupted. ZFS will tell you about it, but what then? Usually > the conversation then degenerates into how some people can > tolerate broken mp3 files or whatever. > > Well, the other day I found a corrupted file which gave me an > opportunity to test a little hypothesis on how to recover what > you can recover. Details are in my blog: > http://blogs.sun.com/relling/entry/holy_smokes_a_holey_file > > There is an opportunity here, for someone with some spare time, > to come up with a more clever solution than my dd script. hint... > hint... > -- richard > > Would it help if you had the block number (i.e., disk location) of the block that is corrupted? zdb might tell you this. I have a way to do it, I think, but don't want to test it because I don't want to corrupt a file on purpose. I am writing a paper (actually, done with first draft) that allows you to find the data for a given file on the raw disk (i.e., file system is not mounted). I plan on presenting this at osdevcon in Prague in June. I am looking for reviewers. If you are interested, please send me email and I'll send you a copy. The method I use is quite a bit more complex than using dd. It involves using a modified zdb and modified mdb together. I think it would work for this type of problem. (Then again, if zfs completely wipes out the corrupted block, it won't help). If I have time, I'll try corrupting a few bits in a file and see if my method works to get the corrupted block. thanks, max > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 'zfs create' hanging
Mark J Musante wrote: > On Fri, 7 Mar 2008, Paul Raines wrote: > > >> zfs create -o quota=131G -o reserv=131G -o recsize=8K zpool1/itgroup_001 >> >> and this is still running now. truss on the process shows nothing. I >> don't know how to debug it beyond that. I thought I would ask for any >> info from this list before I just reboot. >> > > What does pstack show? > > > If truss shows nothing, it's either looping at user level, or hung in the kernel. Try echo ::threadlist -v | mdb -k and see what the stack trace looks like for the zfs process in the kernel. max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] path-name encodings
Hi Marcus, Marcus Sundman wrote: > Are path-names text or raw data in zfs? I.e., is it possible to know > what the name of a file/dir/whatever is, or do I have to make more or > less wild guesses what encoding is used where? > > - Marcus > I'm not sure what you are asking here. When a zfs file system is mounted, it looks like a normal unix file system, i.e., a tree of files where intermediate nodes are directories and leaf nodes may be directories or regular files. In other words, ls gives you the same kind of output you would expect on any unix file system. As to whether a file/directory name is text or binary, that depends on the name used when creating the file/directory. As far as the meta-data used to maintain the file system tree, most of this is compressed. But your question makes me wonder if you have tried zfs. If so, then I really am not sure what you are asking. If not, maybe you should try it out... max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] modification to zdb to decompress blocks
Hi All, I have modified zdb to do decompression in zdb_read_block. Syntax is: # zdb -R poolname:devid:blkno:psize:d,compression_type,lsize Where compression_type can be lzjb or any other type compression that zdb uses, and lsize is the size after compression. I have used this with a modified mdb to allow one to do the following: given a pathname for a file on a zfs file system, display the blocks (i.e., data) of the file. The file system need not be mounted. If anyone is interested, send me email. I can send a webrev of the zdb changes for those interested. As for the mdb changes, I sent a webrev of those a while ago, and have since added a rawzfs dmod. I plan to present a paper at osdevcon in Prague in June that uses the modified zdb and mdb to show the physical layout of a zfs file system. (I should mention that, over time, I have found that the ZFS on-disk format paper actually does tell you almost everything). max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Did MDB Functionality Change?
Hi Spencer, spencer wrote: > On Solaris 10 u3 (11/06) I can execute the following: > > bash-3.00# mdb -k > Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcipsy ip sctp > usba nca md zfs random ipc nfs crypto cpc fctl fcip logindmux ptm sppp ] > >> arc::print >> > { > anon = ARC_anon > mru = ARC_mru > mru_ghost = ARC_mru_ghost > mfu = ARC_mfu > mfu_ghost = ARC_mfu_ghost > size = 0x6b800 > p = 0x3f83f80 > c = 0x7f07f00 > c_min = 0x7f07f00 > c_max = 0xbe8be800 > hits = 0x30291 > misses = 0x4f > deleted = 0xe > skipped = 0 > hash_elements = 0x3a > hash_elements_max = 0x3a > hash_collisions = 0x3 > hash_chains = 0x1 > hash_chain_max = 0x1 > no_grow = 0 > } > > However, when I execute the same command on Solaris 10 u4 (8/07) I receive > the following error: > > bash-3.00# mdb -k > Loading modules: [ unix krtld genunix specfs dtrace ufs ssd fcp fctl qlc > pcisch md ip hook neti sctp arp usba nca lofs logindmux ptm cpc fcip sppp > random sd crypto zfs ipc nfs ] > >> arc::print >> > mdb: failed to dereference symbol: unknown symbol name > mdb functionality did not change. There is no longer a global variable named "arc". The "::arc" command gets its data from a few different variables. You can look at usr/src/cmd/mdb/common/modules/zfs/zfs.c and look at the arc_print function to see what it does. Or, you can use ::nm !grep arc | grep OBJT to see what arc related variables exist and use one of those with ::print. For instance, arc_stats::print. max > In addition, u3 doesn't recognize "::arc" where u4 does. > u3 displays memory locations with "arc::print -a" where "::arc -a" doesn't > work for u4. > > I posted this into the zfs discussion forum, because this limited u4 > functionality prevents you from dynamically changing the ARC in ZFS by trying > the ZFS Tuning instructions. > > > Spencer > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best stripe-size in array for ZFS mail storage?
Hi Bill, can you guess? wrote: >> We will be using Cyrus to store mail on 2540 arrays. >> >> We have chosen to build 5-disk RAID-5 LUNs in 2 >> arrays which are both connected to same host, and >> mirror and stripe the LUNs. So a ZFS RAID-10 set >> composed of 4 LUNs. Multi-pathing also in use for >> redundancy. >> > > Sounds good so far: lots of small files in a largish system with presumably > significant access parallelism makes RAID-Z a non-starter, Why does "lots of small files in a largish system with presumably significant access parallelism makes RAID-Z a non-starter"? thanks, max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] question about uberblock blkptr
Hi Roch, Roch - PAE wrote: > [EMAIL PROTECTED] writes: > > Roch - PAE wrote: > > > [EMAIL PROTECTED] writes: > > > > Jim Mauro wrote: > > > > > > > > > > Hey Max - Check out the on-disk specification document at > > > > > http://opensolaris.org/os/community/zfs/docs/. > > > > > Ok. I think I know what's wrong. I think the information (most > > > likely, > > > > a objset_phys_t) is compressed > > > > with lzjb compression. Is there a way to turn this entirely off (not > > > > just for file data, but for all meta data > > > > as well when a pool is created? Or do I need to figure out how to > hack > > > > in the lzjb_decompress() function in > > > > my modified mdb? (Also, I figured out that zdb is already doing the > > > > left shift by 9 before dumping DVA values, > > > > for anyone following this...). > > > > > > > > > > Max, this might help (zfs_mdcomp_disable) : > > > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#METACOMP > > > > > Hi Roch, > > That would help, except it does not seem to work. I set > > zfs_mdcomp_disable to 1 with mdb, > > deleted the pool, recreated the pool, and zdb - still shows the > > rootbp in the uberblock_t > > to have the lzjb flag turned on. So I then added the variable to > > /etc/system, destroyed the pool, > > rebooted, recreated the pool, and still the same result. Also, my mdb > > shows the same thing > > for the uberblock_t rootbp blkptr data. I am running Nevada build 55b. > > > > I shall update the build I am running soon, but in the meantime I'll > > probably write a modified cmd_print() function for my > > (modified) mdb to handle (at least) lzjb compressed metadata. Also, I > > think the ZFS Evil Tuning Guide should be > > modified. It says this can be tuned for Solaris 10 11/06 and snv_52. I > > guess that means only those > > two releases. snv_55b has the variable, but it doesn't have an effect > > (at least on the uberblock_t > > rootbp meta-data). > > > > thanks for your help. > > > > max > > > > My bad. The tunable only affects indirect dbufs (so I guess > only for large files). As you noted, other metadata is > compressed unconditionaly(I guess from the use of > ZIO_COMPRESS_LZJB in dmu_objset_open_impl). > > -r > > > This makes printing the data with ::print much more problematic... The code in mdb that prints data structures recursively iterates through the structure members reading each member separately. I can either write a new print function that does the decompression, or add a new dcmd that does the descompression and dumps the data to the screen, but then I lose the structure member names in the output. I guess I'll do the decompression dcmd first, and then figure out how to get the member names back in the output... thanks, max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] question about uberblock blkptr
Roch - PAE wrote: > [EMAIL PROTECTED] writes: > > Jim Mauro wrote: > > > > > > Hey Max - Check out the on-disk specification document at > > > http://opensolaris.org/os/community/zfs/docs/. > > > > > > Page 32 illustration shows the rootbp pointing to a dnode_phys_t > > > object (the first member of a objset_phys_t data structure). > > > > > > The source code indicates ub_rootbp is a blkptr_t, which contains > > > a 3 member array of dva_t 's called blk_dva (blk_dva[3]). > > > Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]). > > > > > > So it looks like each blk_dva contains 3 128-bit DVA's > > > > > > You probably figured all this out alreadydid you try using > > > a objset_phys_t to format the data? > > > > > > Thanks, > > > /jim > > Ok. I think I know what's wrong. I think the information (most likely, > > a objset_phys_t) is compressed > > with lzjb compression. Is there a way to turn this entirely off (not > > just for file data, but for all meta data > > as well when a pool is created? Or do I need to figure out how to hack > > in the lzjb_decompress() function in > > my modified mdb? (Also, I figured out that zdb is already doing the > > left shift by 9 before dumping DVA values, > > for anyone following this...). > > > > Max, this might help (zfs_mdcomp_disable) : > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#METACOMP > Hi Roch, That would help, except it does not seem to work. I set zfs_mdcomp_disable to 1 with mdb, deleted the pool, recreated the pool, and zdb - still shows the rootbp in the uberblock_t to have the lzjb flag turned on. So I then added the variable to /etc/system, destroyed the pool, rebooted, recreated the pool, and still the same result. Also, my mdb shows the same thing for the uberblock_t rootbp blkptr data. I am running Nevada build 55b. I shall update the build I am running soon, but in the meantime I'll probably write a modified cmd_print() function for my (modified) mdb to handle (at least) lzjb compressed metadata. Also, I think the ZFS Evil Tuning Guide should be modified. It says this can be tuned for Solaris 10 11/06 and snv_52. I guess that means only those two releases. snv_55b has the variable, but it doesn't have an effect (at least on the uberblock_t rootbp meta-data). thanks for your help. max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] question about uberblock blkptr
Jim Mauro wrote: > > Hey Max - Check out the on-disk specification document at > http://opensolaris.org/os/community/zfs/docs/. > > Page 32 illustration shows the rootbp pointing to a dnode_phys_t > object (the first member of a objset_phys_t data structure). > > The source code indicates ub_rootbp is a blkptr_t, which contains > a 3 member array of dva_t 's called blk_dva (blk_dva[3]). > Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]). > > So it looks like each blk_dva contains 3 128-bit DVA's > > You probably figured all this out alreadydid you try using > a objset_phys_t to format the data? > > Thanks, > /jim Ok. I think I know what's wrong. I think the information (most likely, a objset_phys_t) is compressed with lzjb compression. Is there a way to turn this entirely off (not just for file data, but for all meta data as well when a pool is created? Or do I need to figure out how to hack in the lzjb_decompress() function in my modified mdb? (Also, I figured out that zdb is already doing the left shift by 9 before dumping DVA values, for anyone following this...). thanks, max > > > > [EMAIL PROTECTED] wrote: >> Hi All, >> I have modified mdb so that I can examine data structures on disk >> using ::print. >> This works fine for disks containing ufs file systems. It also works >> for zfs file systems, but... >> I use the dva block number from the uberblock_t to print what is at >> the block >> on disk. The problem I am having is that I can not figure out what >> (if any) structure to use. >> All of the xxx_phys_t types that I try do not look right. So, the >> question is, just what is >> the structure that the uberblock_t dva's refer to on the disk? >> >> Here is an example: >> >> First, I use zdb to get the dva for the rootbp (should match the >> value in the uberblock_t(?)). >> >> # zdb - usbhard | grep -i dva >> Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 >> DMU objset] 400L/200P DVA[0]=<0:111f79000:200> >> DVA[1]=<0:506bde00:200> DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE >> contiguous birth=621838 fill=167 >> cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb >> bp = [L0 DMU objset] 400L/200P >> DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> >> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 >> fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 >> Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp >> [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> >> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE >> contiguous birth=621838 fill=34026 >> cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 >> first block: [L0 ZIL intent log] 9000L/9000P >> DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous >> birth=263950 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1 >> ^C >> # >> >> Then I run my modified mdb on the vdev containing the "usbhard" pool >> # ./mdb /dev/rdsk/c4t0d0s0 >> >> I am using the DVA[0} for the META data set above. Note that I have >> tried all of the xxx_phys_t structures >> that I can find in zfs source, but none of them look right. Here is >> example output dumping the data as a objset_phys_t. >> (The shift by 9 and adding 40 is from the zfs on-disk format >> paper, I have tried without the addition, without the shift, >> in all combinations, but the output still does not make sense). >> >> > (111f79000<<9)+40::print zfs`objset_phys_t >> { >> os_meta_dnode = { >> dn_type = 0x4f >> dn_indblkshift = 0x75 >> dn_nlevels = 0x82 >> dn_nblkptr = 0x25 >> dn_bonustype = 0x47 >> dn_checksum = 0x52 >> dn_compress = 0x1f >> dn_flags = 0x82 >> dn_datablkszsec = 0x5e13 >> dn_bonuslen = 0x63c1 >> dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ] >> dn_maxblkid = 0x20a34fa97f3ff2a6 >> dn_used = 0xac2ea261cef045ff >> dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, >> 0x315efac9cb693387, 0x2d56c54db5da75bf ] >> dn_blkptr = [ >> { >> blk_dva = [ >> { >> dva_word = [ 0x87c9ed7672454887, >> 0x760f569622246efe ] >> } >> { >> dv
Re: [zfs-discuss] question about uberblock blkptr
Jim Mauro wrote: > > Hey Max - Check out the on-disk specification document at > http://opensolaris.org/os/community/zfs/docs/. > > Page 32 illustration shows the rootbp pointing to a dnode_phys_t > object (the first member of a objset_phys_t data structure). > > The source code indicates ub_rootbp is a blkptr_t, which contains > a 3 member array of dva_t 's called blk_dva (blk_dva[3]). > Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]). > > So it looks like each blk_dva contains 3 128-bit DVA's > > You probably figured all this out alreadydid you try using > a objset_phys_t to format the data? > > Thanks, > /jim Hi Jim, Yes, I have tried an objset_phys_t. This is what I am using below in the example. Either there's some extra stuff that the on-disk format specification is not saying, or I'm not picking up the correct blkptr (though I have tried other blkptr's from the uberblock array following the nvpair/label section at the beginning of the disk), or the uberblock_t blkptr is pointing to something completely different. I am going to have another look at the zdb code, as I suspect that it must also do something like what I am trying to do. Also, I think someone on this list should know what the uberblock_t blkptr refers to if it is not an objset_t. I don't have compression or any encryption turned on, but I am also wondering if the metadata is somehow compressed or encrypted. Thanks for the response. I was beginning to think the only people that read this mailing list are admins... (Sorry guys, getting zfs configured properly is much more important than what I'm doing here, but this is more interesting to me). max > > > > [EMAIL PROTECTED] wrote: >> Hi All, >> I have modified mdb so that I can examine data structures on disk >> using ::print. >> This works fine for disks containing ufs file systems. It also works >> for zfs file systems, but... >> I use the dva block number from the uberblock_t to print what is at >> the block >> on disk. The problem I am having is that I can not figure out what >> (if any) structure to use. >> All of the xxx_phys_t types that I try do not look right. So, the >> question is, just what is >> the structure that the uberblock_t dva's refer to on the disk? >> >> Here is an example: >> >> First, I use zdb to get the dva for the rootbp (should match the >> value in the uberblock_t(?)). >> >> # zdb - usbhard | grep -i dva >> Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 >> DMU objset] 400L/200P DVA[0]=<0:111f79000:200> >> DVA[1]=<0:506bde00:200> DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE >> contiguous birth=621838 fill=167 >> cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb >> bp = [L0 DMU objset] 400L/200P >> DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> >> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 >> fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 >> Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp >> [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> >> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE >> contiguous birth=621838 fill=34026 >> cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 >> first block: [L0 ZIL intent log] 9000L/9000P >> DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous >> birth=263950 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1 >> ^C >> # >> >> Then I run my modified mdb on the vdev containing the "usbhard" pool >> # ./mdb /dev/rdsk/c4t0d0s0 >> >> I am using the DVA[0} for the META data set above. Note that I have >> tried all of the xxx_phys_t structures >> that I can find in zfs source, but none of them look right. Here is >> example output dumping the data as a objset_phys_t. >> (The shift by 9 and adding 40 is from the zfs on-disk format >> paper, I have tried without the addition, without the shift, >> in all combinations, but the output still does not make sense). >> >> > (111f79000<<9)+40::print zfs`objset_phys_t >> { >> os_meta_dnode = { >> dn_type = 0x4f >> dn_indblkshift = 0x75 >> dn_nlevels = 0x82 >> dn_nblkptr = 0x25 >> dn_bonustype = 0x47 >> dn_checksum = 0x52 >> dn_compress = 0x1f >> dn_flags = 0x82 >> dn_datablkszsec = 0x5e13 >> dn_bonuslen =
[zfs-discuss] question about uberblock blkptr
Hi All, I have modified mdb so that I can examine data structures on disk using ::print. This works fine for disks containing ufs file systems. It also works for zfs file systems, but... I use the dva block number from the uberblock_t to print what is at the block on disk. The problem I am having is that I can not figure out what (if any) structure to use. All of the xxx_phys_t types that I try do not look right. So, the question is, just what is the structure that the uberblock_t dva's refer to on the disk? Here is an example: First, I use zdb to get the dva for the rootbp (should match the value in the uberblock_t(?)). # zdb - usbhard | grep -i dva Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 DMU objset] 400L/200P DVA[0]=<0:111f79000:200> DVA[1]=<0:506bde00:200> DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE contiguous birth=621838 fill=167 cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb bp = [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 first block: [L0 ZIL intent log] 9000L/9000P DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous birth=263950 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1 ^C # Then I run my modified mdb on the vdev containing the "usbhard" pool # ./mdb /dev/rdsk/c4t0d0s0 I am using the DVA[0} for the META data set above. Note that I have tried all of the xxx_phys_t structures that I can find in zfs source, but none of them look right. Here is example output dumping the data as a objset_phys_t. (The shift by 9 and adding 40 is from the zfs on-disk format paper, I have tried without the addition, without the shift, in all combinations, but the output still does not make sense). > (111f79000<<9)+40::print zfs`objset_phys_t { os_meta_dnode = { dn_type = 0x4f dn_indblkshift = 0x75 dn_nlevels = 0x82 dn_nblkptr = 0x25 dn_bonustype = 0x47 dn_checksum = 0x52 dn_compress = 0x1f dn_flags = 0x82 dn_datablkszsec = 0x5e13 dn_bonuslen = 0x63c1 dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ] dn_maxblkid = 0x20a34fa97f3ff2a6 dn_used = 0xac2ea261cef045ff dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, 0x315efac9cb693387, 0x2d56c54db5da75bf ] dn_blkptr = [ { blk_dva = [ { dva_word = [ 0x87c9ed7672454887, 0x760f569622246efe ] } { dva_word = [ 0xce26ac20a6a5315c, 0x38802e5d7cce495f ] } { dva_word = [ 0x9241150676798b95, 0x9c6985f95335742c ] } ] None of this looks believable. So, just what is the rootbp in the uberblock_t referring to? thanks, max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and IBM's TSM
Does anyone have a customer using IBM Tivoli Storage Manager (TSM) with ZFS? I see that IBM has a client for Solaris 10, but does it work with ZFS? -- Dan Christensen System Engineer Sun Microsystems, Inc. Des Moines, IA 50266 US 877-263-2204 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss