Re: [zfs-discuss] [OpenIndiana-discuss] format dumps the core
> r...@tos-backup:~# pstack /dev/rdsk/core > core '/dev/rdsk/core' of 1217: format > fee62e4a UDiv (4, 0, 8046c80, 80469a0, 8046a30, 8046a50) + 2a > 08079799 auto_sense (4, 0, 8046c80, 0) + 281 > ... Seems that one function call is missing in the back trace between auto_sense and UDiv, because UDiv does not setup a complete stack frame. Looking at the source ... http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/format/auto_sense.c#819 ... you can get some extra debug output from format when you specify the "-M" option. E.g. with an usb flash memory stick and format -eM I get # format -eM Searching for disks... c11t0d0: attempting auto configuration Inquiry: 00 80 02 02 1f 00 00 00 53 61 6e 44 69 73 6b 20 SanDisk 55 33 20 43 6f 6e 74 6f 75 72 20 20 20 20 20 20 U3 Contour 34 2e 304.0 Product id: U3 Contour Capacity: 00 7a 46 90 00 00 02 00 blocks: 8013456 (0x7a4690) blksize: 512 disk name: `r ` Request sense for command mode sense failed Sense data: f0 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Mode sense page 0x3 failed Request sense for command mode sense failed Sense data: f0 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Mode sense page 0x4 failed Geometry: pcyl:1956 ncyl:1954 heads: 128 nsects: 32 acyl:2 bcyl:0 rpm: 0 nblocks: 8013457 The current rpm value 0 is invalid, adjusting it to 3600 Geometry after adjusting for capacity: pcyl:1956 ncyl:1954 heads: 128 nsects: 32 acyl:2 rpm: 3600 Partition 0: 128.00MB 64 cylinders Partition 1: 128.00MB 64 cylinders Partition 2: 3.82GB 1956 cylinders Partition 6: 3.56GB 1825 cylinders Partition 8: 2.00MB1 cylinders Inquiry: 00 00 03 02 1f 00 00 02 41 54 41 20 20 20 20 20 ATA 48 69 74 61 63 68 69 20 48 54 53 37 32 33 32 33 Hitachi HTS72323 43 33 30C30 done c11t0d0: configured with capacity of 3.82GB -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [OpenIndiana-discuss] format dumps the core
> - Original Message - ... > > r...@tos-backup:~# format > > Searching for disks...Arithmetic Exception (core dumped) > This error also seems to occur on osol 134. Any idea > what this might be? What stack backtrace is reported for that core dump ("pstack core") ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Root pool on boot drive lost on another machine because of devids
> I have a USB flash drive which boots up my > opensolaris install. What happens is that whenever I > move to a different machine, > the root pool is lost because the devids don't match > with what's in /etc/zfs/zpool.cache and the system > just can't find the rpool. See defect 4755 or defect 5484 https://defect.opensolaris.org/bz/show_bug.cgi?id=4755 https://defect.opensolaris.org/bz/show_bug.cgi?id=5484 When I last experimented with booting Solaris from flash memory sticks I modified scsa2usb so that it would construct a devid for the usb flash memory stick, -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs periodic writes on idle system [Re: Getting desktop to auto sleep]
> Why does zfs produce a batch of writes every 30 seconds on opensolaris b134 > (5 seconds on a post b142 kernel), when the system is idle? It was caused by b134 gnome-terminal. I had an iostat running in a gnome-terminal window, and the periodic iostat output is written to a temporary file by gnome-terminal. This kept the hdd busy. Older gnome-terminals (b111) didn't write terminal output to a disk file. Workaround is to use xterm instead of b134 gnome-terminal. for a command that periodically produces output -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs periodic writes on idle system [Re: Getting desktop to auto sleep]
Why does zfs produce a batch of writes every 30 seconds on opensolaris b134 (5 seconds on a post b142 kernel), when the system is idle? On an idle OpenSolaris 2009.06 (b111) system, /usr/demo/dtrace/iosnoop.d shows no i/o activity for at least 15 minutes. The same dtrace test on an idle b134 system shows a batch of writes every 30 seconds. And on current opensolaris bits, on an idle system, I see writes every 5 seconds. The periodic writes prevents that the disk can enter power save mode. And this breaks the /etc/power.conf autoS3 feature. Why does zfs have to write something to disk when the system is idle? > > Putting the flag does not seem to do anything to the > > system. Here is my power.conf file: > ... > > autopm enable > > autoS3 enable > > S3-support enable > > Problem seems to be that all power managed devices > must be at their lowest power level, otherwise autoS3 > won't suspend the system. And somehow one or more > device does not reach the lowest power level. ... > The laptop still does not power down, because every > 30 seconds there is a batch of writes to the hdd drive, > apparently from zfs, and that keeps the hdd powered > up. > > The periodic writes can be monitored with: > > dtrace -s /usr/demo/dtrace/iosnoop.d -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and 4kb sector Drives (All new western digital GREEN Drives?)
> It would be nice if the 32bit osol kernel support > 48bit LBA Is already supported, for may years (otherwise disks with a capacity >= 128GB could not be used with Solaris) ... > (similar to linux, not sure if 32bit BSD > supports 48bit LBA ), then the drive would probably > work - perhaps later in the year we will have time to > work on a patch to support 48bit lba on the 32bit > osol kernels... I think that - as a start - you have to eliminate the use of the (signed 32-bit long on 32-bit kernel) daddr_t data type in the kernel, and switch everything to 64 bit diskaddr_t, and fix all device drivers that are currently using daddr_t (including getting 3rd party device drivers fixed). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] opensolaris fresh install lock up
> > in the build 130 annoucement you can find this: > > 13540 Xserver crashes and freezes a system installed with LiveCD on bld 130 > > It is for sure this bug. This is ok, i > can do most of what i need via ssh. I just > wasn't sure if it was a bug or if i had done > something wrongi had tried installing 2-3 times > and it kept happening...was driving me insane. > > I can deal with it if it's something that > will be fixed in 131 (which is what the bug page > seems to hint at) A part of the problem will be fixed in b131: CR 6913965 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913965 But it seems the segfault from CR 6913157 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913157 is not yet fixed in b131. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] opensolaris fresh install lock up
> I just installed opensolaris build 130 which i > downloaded from genunix. The install went > fineand the first reboot after install seemed to > work but when i powered down and rebooted fully, it > locks up as soon as i log in. Hmm, seems you're asking in the wrong forum. Sounds more like a desktop or x-window problem to me. Why do you think this is a zfs problem? > Gnome is still showing > the icon it shows when stuff hasn't finished > loadingis there any way i can find out why > it's locking up and how to fix it? Hmm, in the build 130 annoucement you can find this: ( http://www.opensolaris.org/jive/thread.jspa?threadID=120631&tstart=0 ) 13540 Xserver crashes and freezes a system installed with LiveCD on bld 130 http://defect.opensolaris.org/bz/show_bug.cgi?id=13540 After installation, the X server may crash and appears to not be restarted by the GNOME Display Manager (gdm). Work-around: None at this time. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] I/O Read starvation
> > I wasnt clear in my description, I m referring to ext4 on Linux. In > > fact on a system with low RAM even the dd command makes the system > > horribly unresponsive. > > > > IMHO not having fairshare or timeslicing between different processes > > issuing reads is frankly unacceptable given a lame user can bring > > the system to a halt with 3 large file copies. Are there ZFS > > settings or Project Resource Control settings one can use to limit > > abuse from individual processes? > > I am confused. Are you talking about ZFS under OpenSolaris, or are > you talking about ZFS under Linux via Fuse? > > Do you have compression or deduplication enabled on > the zfs filesystem? > > What sort of system are you using? I was able to reproduce the problem running current (mercurial) opensolaris bits, with the "dd" command: dd if=/dev/urandom of=largefile.txt bs=1048576k count=8 dedup is off, compression is on. System is a 32-bit laptop with 2GB of memory, single core cpu. The system was unusable / unresponsive for about 5 minutes before I was able to interrupt the dd process. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting & reservations
> But: Isn't there an implicit expectation for a space guarantee associated > with a > dataset? In other words, if a dataset has 1GB of data, isn't it natural to > expect to be able to overwrite that space with other > data? Is there such a space guarantee for compressed or cloned zfs? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting
> Well, then you could have more "logical space" than > "physical space", and that would be extremely cool, I think we already have that, with zfs clones. I often clone a zfs onnv workspace, and everything is "deduped" between zfs parent snapshot and clone filesystem. The clone (initially) needs no extra zpool space. And with zfs clone I can actually use all the remaining free space from the zpool. With zfs deduped blocks, I can't ... > but what happens if for some reason you wanted to > turn off dedup on one of the filesystems? It might > exhaust all the pool's space to do this. As far as I understand it, nothing happens for existing deduped blocks when you turn off dedup for a zfs filesystem. The new dedup=off setting is affecting new written blocks only. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup issue
> I think I'm observing the same (with changeset 10936) ... # mkfile 2g /var/tmp/tank.img # zpool create tank /var/tmp/tank.img # zfs set dedup=on tank # zfs create tank/foobar > dd if=/dev/urandom of=/tank/foobar/file1 bs=1024k count=512 512+0 records in 512+0 records out > cp /tank/foobar/file1 /tank/foobar/file2 > cp /tank/foobar/file1 /tank/foobar/file3 > cp /tank/foobar/file1 /tank/foobar/file4 /tank/foobar/file4: No space left on device > zfs list -r tank NAME USED AVAIL REFER MOUNTPOINT tank 1.95G 022K /tank tank/foobar 1.95G 0 1.95G /tank/foobar > zpool list tank NAME SIZE USED AVAILCAP DEDUP HEALTH ALTROOT tank 1.98G 515M 1.48G25% 3.90x ONLINE - -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup issue
> So.. it seems that data is deduplicated, zpool has > 54.1G of free space, but I can use only 40M. > > It's x86, ONNV revision 10924, debug build, bfu'ed from b125. I think I'm observing the same (with changeset 10936) ... I created a 2GB file, and a "tank" zpool on top of that file, with compression and dedup enabled: mkfile 2g /var/tmp/tank.img zpool create tank /var/tmp/tank.img zfs set dedup=on tank zfs set compression=on tank Now I tried to create four zfs filesystems, and filled them by pulling and updating the same set of onnv sources from mercurial. One copy needs ~ 800MB of disk space uncompressed, or ~ 520MB compressed. During the 4th "hg update": > hg update abort: No space left on device: /tank/snv_128_yy/usr/src/lib/libast/sparcv9/src/lib/libast/FEATURE/common > zpool list tank NAME SIZE USED AVAILCAP DEDUP HEALTH ALTROOT tank 1,98G 720M 1,28G35% 3.70x ONLINE - > zfs list -r tank NAME USED AVAIL REFER MOUNTPOINT tank 1,95G 026K /tank tank/snv_128 529M 0 529M /tank/snv_128 tank/snv_128_jk 530M 0 530M /tank/snv_128_jk tank/snv_128_xx 530M 0 530M /tank/snv_128_xx tank/snv_128_yy 368M 0 368M /tank/snv_128_yy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Change physical path to a zpool.
> I have a functional OpenSolaris x64 system on which I need to physically > move the boot disk, meaning its physical device path will change and > probably its cXdX name. > > When I do this the system fails to boot ... > How do I inform ZFS of the new path? ... > Do I need to boot from the LiveCD and then import the > pool from its new path? Exactly. Boot from the livecd with the disk connected on the new physical path, and run "pfexec zpool import -f rpool", followed by a reboot. That'll update the zpool's label with the new physical device path information. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Install and boot from USB stick?
> Does this give you anything? > > [url=http://bildr.no/view/460193][img]http://bildr.no/thumb/460193.jpeg[/img][/url] That looks like the zfs mountroot panic you get when the root disk was moved to a different physical location (e.g. different usb port). In this case the physical device path recorded in the zpool on disk label cannot be used to access the root disk. Updating the physical device path works by booting the livecd and "zpool import -f rpool". zpool import will rewrite the on disk label with the new physical device path, so that the next boot from the device should work. In theory, zfs is able to find the disk in any other physical location using the storage device's "devid" property. Unfortunately most usb sticks use the device type of "removable media disk" devices, and in this case Solaris won't generate "devids" for the usb stick. And the result is that the root disk can't be found, when it was moved around and connected to a different usb port. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Install and boot from USB stick?
> No there was no error level fatal. > > Well, here is what I have tried since: > > a) I´ve tried to install a custom grub like described here: > http://defect.opensolaris.org/bz/show_bug.cgi?id=4755#c28 > With that in place, I just get the grub prompt. I´ve > tried to zpool import -f rpool when this occoured (I > read somewhere that it might help, but it didnt). The grub in OS 2009.06 should not be affected by bug 4755 any more. I think the grub from bug 4755 comment 28 is too old and does not support the latest zpool format version updates; so that it can't read from a current (version 17?) zpool. > b) I noticed when booting from the livecd (text mode), > with the newly installed usb stick in, i get this: > [url=http://bildr.no/view/460143][img]http://bildr.no/thumb/460143.jpeg[/img][/url] Hmm, seems that Solaris' disk driver is receiving bogus "mode sense" data from the usb stick ? I think with scsa2usb.conf changed and reduced command set enabled, we could avoid sending mode sense commands to the usb flash stick... > And then, when i imported the zpool to edit > scsa2usb.conf, I get these messages again: > [url=http://bildr.no/view/460144][img]http://bildr.no/thumb/460144.jpeg[/img][/url] > Then, when i were done editing scsa2usb.conf, and > rebooted, thoose same messages appears once more. Hmm, so we can get these unit attention messages both when booted from the usb stick, and when booted from the live cd. > c) I´ve tried to edit grub after rebooting from a > fresh install, remoeving splashimage, back/fron > color, and ´console=graphics´, and adding ´-v´ after > ´kernel$´. When doing this, nothing happens. I press > ´b´to boot, the menu list disappears, but the grub > image is still there (the splashimage where the logo > is placed down right). Should work, it should boot the kernel in text mode; I just tested it with an OS 2009.06 install / virtualbox guest. > d) I´ve noticed that after installation, the > installation log says the same as this: > http://defect.opensolaris.org/bz/show_bug.cgi?id=4755#c25 > > > I´m running out of ideas. I´ve seen someone mention > that you can replace $ZFSBOOT (cant remember the > correct variable name atm.) with the full path of the > usb stick in grub. Probably something like this: http://www.opensolaris.org/os/community/xen/devdocs/install-depedencies/ "-B zfs-bootfs=rpool/57, " Problem is that the zfs id of the boot filesystem (57) isn't fixed, and changes when you create new boot environments. And I think the id could change between different opensolaris releases. And I'm not sure how far your get with booting from the usb stick; I suspect that your system has already mounted the zfs filesystem root just fine, but gets into trouble due to these "unit attention" "medium may have changed" events received from the usb stick. Maybe you could try to boot in text mode from the usb stick with options " -kv", and when the system is stuck with booting, enter the kernel debugger by pressing "F1-a", and printing a process tree listing with the kmdb ::ptree command? If we get a process tree listing, that would be an indication how far the kernel got with the boot process. > Im going to try OS 2008.05, 2008.11 and the latest > dev build of OS from genunix.org to see if any of > thoose are able to create a proper installation. I´ve > seen so much complaints around theese three build > regarding USB drives that maybe it will work. Do you have a non Kingston usb flash memory stick (or e.g. a usb hard disk drive) that you could try instead? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Install and boot from USB stick?
> > Are there any message with "Error level: fatal" ? > > Not that I know of, however, i can check. But im > unable to find out what to change in grub to get > verbose output rather than just the splashimage. Edit the grub commands, delete all splashimage, foreground and background lines, and delete the console=graphics option from the kernel$ line. To enable verbose kernel message, append kernel boot option " -v" at the end of the kernel$ boot command line. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Install and boot from USB stick?
> Nah, that didnt seem to do the trick. > > After unmounting > and rebooting, i get the same error msg from my > previous post. Did you get these scsi error messages during installation to the usb stick, too? Another thing that confuses me: the unit attention / medium may have changed message is using "error level: retryable". I think the sd disk driver is supposed to just retry the read or write operation. The message seems more like a warning message, not a fatal error. Are there any message with "Error level: fatal" ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Install and boot from USB stick?
> How can i implement that change, after installing the > OS? Or do I need to build my own livecd? Boot from the livecd, attach the usb stick, open a terminal window, "pfexec bash" starts a root shell, "zpool import -f rpool" should find and import the zpool from the usb stick. Mount the root filesystem from the usb stick; zfs set mountpoint=legacy rpool/ROOT/opensolaris mount -F zfs rpool/ROOT/opensolaris /mnt And edit /mnt/kernel/drv/scsa2usb.conf E.g. try attribute-override-list = "vid=* reduced-cmd-support=true"; Try to boot from the usb stick, using the "reboot" command. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Install and boot from USB stick?
> Well, here is the error: > > ... usb stick reports(?) scsi error: medium may have changed ... That's strange. The media in a flash memory stick can't be changed - although most sticks report that they do have removable media. Maybe this stick needs one of the workarounds that can be enabled in /kernel/drv/scsa2usb.conf ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Install and boot from USB stick?
> I've found it only works for USB sticks up to 4GB :( > If I tried a USB stick bigeer than that, it didn't boot. Works for me on 8GB USB sticks. It is possible that the stick you've tried has some issues with the Solaris USB drivers, and needs to have one of the workarounds from the scsa2usb.conf file enabled. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Install and boot from USB stick?
> The GRUB menu is presented, no problem there, and > then the opensolaris progress bar. But im unable to > find a way to view any details on whats happening > there. The progress bar just keep scrolling and > scrolling. Press the ESC key; this should switch back from graphics to text mode and most likely you'll see that the OS is waiting for some console user input. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
> > 32 bit Solaris can use at most 2^31 as disk address; a disk block is > > 512bytes, so in total it can address 2^40 bytes. > > > > A SMI label found in Solaris 10 (update 8?) and OpenSolaris has been > > enhanced > > and can address 2TB but only on a 64 bit system. > > is what the problem is. so 32-bit zfs cannot use disks larger than > 1(.09951)tb regardless of whether it's for the root pool or not. I think this isn't a problem with the 32-bit zfs module, but with all of the 32-bit Solaris kernel. The daddr_t type is used in a *lot* of places, and is defined as a signed 32-bit integer ("long") in the 32-bit kernel. It seems that there already are 64-bit disk address types defined, diskaddr_t and lldaddr_t (that could be used in the 32-bit kernel, too), but a lot of the existing kernel code doesn't use them. And redefining the existing daddr_t type to 64-bit "long long" for the 32-bit kernel won't work, because it would break binary compatibility. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving a disk between controllers
> I had a system with it's boot drive > attached to a backplane which worked fine. I tried > moving that drive to the onboard controller and a few > seconds into booting it would just reboot. In certain cases zfs is able to find the drive on the new physical device path (IIRC: the disk's "devid" didn't change and the new physical location of the disk is already present in the /etc/devices/devid_cache). But in most cases you have to boot from the installation media and "zpool import -f rpool" the pool, with the disk attached at the new physical device path, so that the new physical device path gets recorded in zpool's on-disk label. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
> Not a ZFS bug. IIRC, the story goes something like this: a SMI > label only works to 1 TByte, so to use > 1 TByte, you need an > EFI label. For older x86 systems -- those which are 32-bit -- you > probably have a BIOS which does not handle EFI labels. This > will become increasingly irritating since 2 TByte disks are now > hitting the store shelves, but it doesn't belong in a ZFS category. Hasn't the 1TB limit for SMI labels been fixed (= limit raised to 2TB) by "PSARC/2008/336 Extended VTOC" ? http://www.opensolaris.org/os/community/on/flag-days/pages/2008091102/ But there still is a 1TB limit for 32-bit kernel, the PSARC case includes this: The following functional limitations are applicable: * 32-bit kernel will not support disks > 1 TB. ... Btw. on older Solaris releases the install media always booted into a 32-bit kernel, even on systems that are capable to run the 64-bit kernel. Seems to have been changed with the latest opensolaris releases and that PSARC case, so that 64-bit systems can install to a disk > 1TB. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
> besides performance aspects, what`s the con`s of > running zfs on 32 bit ? The default 32 bit kernel can cache a limited amount of data (< 512MB) - unless you lower the "kernelbase" parameter. In the end the small cache size on 32 bit explains the inferior performance compared to the 64 bit kernel. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot splitting & joining
> The problem was with the shell. For whatever reason, > /usr/bin/ksh can't rejoin the files correctly. When > I switched to /sbin/sh, the rejoin worked fine, the > cksum's matched, ... > > The ksh I was using is: > > # what /usr/bin/ksh > /usr/bin/ksh: > Version M-11/16/88i > SunOS 5.10 Generic 118873-04 Aug 2006 > > So, is this a bug in the ksh included with Solaris 10? Are you able to reproduce the issue with a script like this (needs ~ 200 gigabytes of free disk space) ? I can't... == % cat split.sh #!/bin/ksh bs=1k count=`expr 57 \* 1024 \* 1024` split_bs=8100m set -x dd if=/dev/urandom of=data.orig bs=${bs} count=${count} split -b ${split_bs} data.orig data.split. ls -l data.split.* cat data.split.a[a-z] > data.join cmp -l data.orig data.join == On SX:CE / OpenSolaris the same version of /bin/ksh = /usr/bin/ksh is present: % what /usr/bin/ksh /usr/bin/ksh: Version M-11/16/88i SunOS 5.11 snv_104 November 2008 I did run the script in a directory in an uncompressed zfs filesystem: % ./split.sh + dd if=/dev/urandom of=data.orig bs=1k count=59768832 59768832+0 records in 59768832+0 records out + split -b 8100m data.orig data.split. + ls -l data.split.aa data.split.ab data.split.ac data.split.ad data.split.ae data.split.af data.split.ag data.split.ah -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:31 data.split.aa -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:35 data.split.ab -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:39 data.split.ac -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:43 data.split.ad -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:48 data.split.ae -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:53 data.split.af -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:58 data.split.ag -rw-r--r-- 1 jk usr 1749024768 Feb 12 18:58 data.split.ah + cat data.split.aa data.split.ab data.split.ac data.split.ad data.split.ae data.split.af data.split.ag data.split.ah + 1> data.join + cmp -l data.orig data.join 2002.33u 2302.05s 1:51:06.85 64.5% As expected, it works without problem. The files are bit for bit identical after splitting and joining. For me this looks more as if your hardware is broken: http://opensolaris.org/jive/thread.jspa?messageID=338148 A single bad bit (!) in the middle of the joined file is very suspicious... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
> bash-3.00# zfs mount usbhdd1 > cannot mount 'usbhdd1': E/A-Fehler > bash-3.00# Why is there an I/O error? Is there any information logged to /var/adm/messages when this I/O error is reported? E.g. timeout errors for the USB storage device? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import of bootable root pool renders it unbootable
> Again, what I'm trying to do is to boot the same OS from physical > drive - once natively on my notebook, the other time from withing > Virtualbox. There are two problems, at least. First is the bootpath as > in VB it emulates the disk as IDE while booting natively it is sata. When I started experimenting with installing SXCE to an USB flash memory stick, which should be bootable on different machines, I initially worked around this problem by creating multiple /boot/grub/menu.lst boot entries, one for each supported bootable machine. The difference between the grub boot entries was the "-B bootpath=/physical/device/path" option. Fortunately, this has become much easier in recent builds, because zfs boot is now able to open the pool by using a disk unique "devid" in addition to using physical device paths. Whatever the physical device path will be on a random selected x86 machine where I try to boot my usb flash memory stick, the sd driver will always generate the same unique "devid" for the flash memory stick, and zfs boot is able to find and open the correct usb storage device in the system that has the desired "devid" for the pool. In case sata on the notebook will create the same "devid" for the disk as virtualbox with p-ata, the zpool should be bootable just fine on the two different boxes. But apparently the "devid" created for the disk with sata on the notebook is different from the "devid" created for the disk when running under virtualbox... (that is, pool open by physical device path and by devid fails) I guess what we need is the fix for this bug, which allows to open the pool by the boot disk's unique "guid": Bug ID 6513775 Synopsiszfs root disk portability http://bugs.opensolaris.org/view_bug.do?bug_id=6513775 > The other one seems to be hostid stored in a pool. This shouldn't be a problem for x86, because the hostid is stored in a file (sysinit kernel module) in the root filesystem, on x86. Wherever you boot that disk, the hostid will move with it. Well, unless you boot some other installed Solaris / OpenSolaris system (has a unique hostid / sysinit file) and import that zfs root pool. In this case the hostid stored in the zpool label will change. This should change in build 100, with the putback for this bug: Bug ID 6716241 SynopsisChanging hostid, by moving in a new sysinit file, panics a zfs root file system http://bugs.opensolaris.org/view_bug.do?bug_id=6716241 AFAIR, a hostid mismatch will be ignored when mounting a zfs root file system. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import of bootable root pool renders it unbootable
> Cannot mount root on /[EMAIL PROTECTED],0/pci103c,[EMAIL PROTECTED],2/[EMAIL > PROTECTED],0:a fstype zfs Is that physical device path correct for your new system? Or is this the physical device path (stored on-disk in the zpool label) from some other system? In this case you may be able to work around the problem by passing a "-B bootpath=..." option to the kernel e.g. something like this: kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS,bootpath="/[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a" You can find out the correct physical device path string for the zfs root disk by booting the system from the optical installation media, and running the format utility. OTOH, if you already have booted from the optical installation media, it's easiest to just import the root zpool from the installation system, because that'll update the physical device path in the zpool's label on disk (and it clears the hostid stored in the zpool label - another problem that could prevent mounting the zfs root). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SIL3124 stability?
> THe lock I observed happened inside the BIOS of the card after the main board > BIOS jumped into the board BIOS. This was before any bootloader has been > ionvolved. Is there a disk using a zpool with an EFI disk label? Here's a link to an old thread about systems hanging in BIOS POST when they see disks with EFI disk labels: http://www.opensolaris.org/jive/thread.jspa?messageID=18211 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CF to SATA adapters for boot device
> What Widows utility you are talking about? I have > used the Sandisk utility program to remove the U3 > Launchpad (which creates a permanent hsfs partition > in the flash disk), but it does not help the problem. That's the problem, most usb sticks don't require any special software and just work with the OS' usb mass storage support. IIRC, I once had a 128MB Prolific USB stick (sold as Kingmax) which could be partitioned into two devices, one of them could be password protected and it was possible to configure on of the mass storage devices as "HDD" (= fixed media) or "FDD" (=floppy / removable media). There was an extra windows utility to use/configure these extra features. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CF to SATA adapters for boot device
W. Wayne Liauh wrote: > If you are running B95, that "may" be the problem. I > have no problem booting B93 (& previous builds) from > a USB stick, but B95, which has a newer version of > ZFS, does not allow me to boot from it (& the USB > stick was of course recognized during installation of > B95, just won't boot). I suspect the problem with ZFS boot from USB sticks is, that the kernel does not create "devid" properties for the USB stick, and apparently those devids are now required for zfs booting. The kernel (sd driver) does not create "devid" properties for USB flash memory sticks, because most (all ?) of them nowadays report that they use removable media - which is a lie, I'm not able to change the media / flash roms in such a device. If you have a windows utility distributed with your flash memory stick that allows configuration of the removable media attribute: try to set it to "fixed media". For such an usb storage device with fixed media, the sd(7d) driver should create "devid" properties, and zfs booting works just fine for such an usb flash memory stick. Btw. you can view the "removable media" attribute with the command "cdrecord -scanbus". I'm getting this, for two different usb flash memory sticks (note: it reports "Removable Disk", not just "Disk"): scsibus7: 7,0,0 700) 'Samsung ' 'Mighty Drive' 'PMAP' Removable Disk scsibus10: 10,0,0 1000) 'OCZ ' 'ATV ' '1100' Removable Disk This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] error found while scubbing, how to fix it?
> On 08/21/08 17:26, Jürgen Keil wrote: > > Looks like bug 6727872, which is fixed in build 96. > > http://bugs.opensolaris.org/view_bug.do?bug_id=6727872 > > that pool contains normal OpenSolaris mountpoints, Did you upgrade the opensolaris installation in the past? AFAIK the opensolaris upgrade procedure results in cloned zfs filesystems rpool/RPOOL/opensolaris, rpool/RPOOL/opensolaris-1, ..., rpool/RPOOL/opensolaris-N And only the latest one is mounted, the other (older) zfs root filesystems are unmounted. > what do you meen abount umounting and remounting it? The bug happens with unmounted filesystems, so you need to mount them first, then umount. Something like mount -F zfs rpool/RPOOL/opensolaris /mnt && umount /mnt mount -F zfs rpool/RPOOL/opensolaris-1 /mnt && umount /mnt ... > I need to do this with a live cd? No. You can do that when the system is booted from the hdd. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] error found while scubbing, how to fix it?
> I have OpenSolaris (snv_95) installed into my laptop (single sata disk) > and tomorrow I updated my pool with: > > # zpool -V 11 -a > > and after I start a scrub into the pool with: > > # zpool scrub rpool > > # zpool status -vx > > NAMESTATE READ WRITE CKSUM > rpool ONLINE 0 0 4 > c5t0d0s0 ONLINE 0 0 4 > > errors: Permanent errors have been detected in the > following files: > > :<0x0> Looks like bug 6727872, which is fixed in build 96. http://bugs.opensolaris.org/view_bug.do?bug_id=6727872 Do you have unmounted zfs filesystems that use legacy mountpoints?If yes: Don't worry, the hdd and the zpool on it should be fine. Workaround is to mount and unmount all those zfs filesystems, and on the next zpool scrub there should be no more checksum errors. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
I wrote: > Bill Sommerfeld wrote: > > On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote: > > > > I ran a scrub on a root pool after upgrading to snv_94, and got > > > > checksum errors: > > > > > > Hmm, after reading this, I started a zpool scrub on my mirrored pool, > > > on a system that is running post snv_94 bits: It also found checksum > > > errors > > > > > once is accident. twice is coincidence. three times is enemy action :-) > > > > I'll file a bug as soon as I can > > I filed 6727872, for the problem with zpool scrub checksum errors > on unmounted zfs filesystems with an unplayed ZIL. 6727872 has already been fixed, in what will become snv_96. For my zpool, zpool scrub doesn't report checksum errors any more. But: something is still a bit strange with the data reported by zpool status. The error counts displayed by zpool status are all 0 (during the scrub, and when the scrub has completed), but when zpool scrub completes it tells me that "scrub completed after 0h58m with 6 errors". But it doesn't list the errors. # zpool status -v files pool: files state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: scrub in progress for 0h57m, 99.39% done, 0h0m to go config: NAME STATE READ WRITE CKSUM files ONLINE 0 0 0 mirror ONLINE 0 0 0 c8t0d0s6 ONLINE 0 0 0 c9t0d0s6 ONLINE 0 0 0 errors: No known data errors # zpool status -v files pool: files state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: scrub completed after 0h58m with 6 errors on Wed Jul 23 18:23:00 2008 config: NAME STATE READ WRITE CKSUM files ONLINE 0 0 0 mirror ONLINE 0 0 0 c8t0d0s6 ONLINE 0 0 0 c9t0d0s6 ONLINE 0 0 0 errors: No known data errors This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moving ZFS root pool to different system breaks boot
> Recently, I needed to move the boot disks containing a ZFS root pool in an > Ultra 1/170E running snv_93 to a different system (same hardware) because > the original system was broken/unreliable. > > To my dismay, unlike with UFS, the new machine wouldn't boot: > > WARNING: pool 'root' could not be loaded as it was > last accessed by another system (host: hostid: > 0x808f7fd8). See: http://www.sun.com/msg/ZFS-8000-EY > > panic[cpu0]/thread=180e000: BAD TRAP: type=31 rp=180acc0 addr=0 mmu_fsr=0 > occurred in module "unix" due to a NULL pointer dereference ... > suffering from the absence of SPARC failsafe archives after liveupgrade > (recently mentioned on install-discuss), I'd have been completely stuck. Yes, on x86 you can boot into failsafe and let it mount the root pool under /a and then reboot. This removes the hostid from the configuration information in the zpool's label. I guess that on SPARC you could boot from the installation optical media (or from a network server), and zpool import -f the root pool; that should put the correct hostid into the root pool's label. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
Bill Sommerfeld wrote: > On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote: > > > I ran a scrub on a root pool after upgrading to snv_94, and got checksum > > > errors: > > > > Hmm, after reading this, I started a zpool scrub on my mirrored pool, > > on a system that is running post snv_94 bits: It also found checksum errors > > > once is accident. twice is coincidence. three times is enemy action :-) > > I'll file a bug as soon as I can I filed 6727872, for the problem with zpool scrub checksum errors on unmounted zfs filesystems with an unplayed ZIL. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
Rustam wrote: > I'm living with this error for almost 4 months and probably have record > number of checksum errors: > # zpool status -xv > pool: box5 ... > errors: Permanent errors have been detected in the > following files: > > box5:<0x0> > > I've Sol 10 U5 though. I suspect that this (S10u5) is a different issue, because for my system's pool it seems to be caused by the opensolaris putback on July 07th for these fixes: 6343667 scrub/resilver has to start over when a snapshot is taken 6343693 'zpool status' gives delayed start for 'zpool scrub' 6670746 scrub on degraded pool return the status of 'resilver completed'? 6675685 DTL entries are lost resulting in checksum errors 6706404 get_history_one() can dereference off end of hist_event_table[] 6715414 assertion failed: ds->ds_owner != tag in dsl_dataset_rele() 6716437 ztest gets SEGV in arc_released() 6722838 bfu does not update grub This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
Bill Sommerfeld wrote: > On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote: > > > I ran a scrub on a root pool after upgrading to snv_94, and got checksum > > > errors: > > > > Hmm, after reading this, I started a zpool scrub on my mirrored pool, > > on a system that is running post snv_94 bits: It also found checksum errors > > > > out of curiosity, is this a root pool? It started as standard pool, and is using version 3 zpool format. I'm using a small ufs root, and have /usr as a zfs filesystem on that pool. At some point in the past i did setup a zfs root and /usr filesystem for experimenting with xVM unstable bits. > A second system of mine with a mirrored root pool (and an additional > large multi-raidz pool) shows the same symptoms on the mirrored root > pool only. > > once is accident. twice is coincidence. three times is enemy action :-) > > I'll file a bug as soon as I can (I'm travelling at the moment with > spotty connectivity), citing my and your reports. Btw. I also found the scrub checksum errors on a non-mirrored zpool (laptop with only one hdd). And on one zpool that was using a non-mirrored, striped pool on two S-ATA drives. I think that in my case the cause for the scrub checksum errors is an open ZIL transaction on an *unmounted* zfs filesystem. In the past such a zfs state prevented creating snapshots for the unmounted zfs, see bug 6482985, 6462803. That is still the case. But now it also seems to trigger checksum errors for a zpool scrub. Stack backtrace for the ECKSUM (which gets translated into EIO errors in arc_read_done()): 1 64703 arc_read_nolock:return, rval 5 zfs`zil_read_log_block+0x140 zfs`zil_parse+0x155 zfs`traverse_zil+0x55 zfs`scrub_visitbp+0x284 zfs`scrub_visit_rootbp+0x4e zfs`scrub_visitds+0x82 zfs`dsl_pool_scrub_sync+0x109 zfs`dsl_pool_sync+0x158 zfs`spa_sync+0x254 zfs`txg_sync_thread+0x226 unix`thread_start+0x8 Does a "zdb -ivv {pool}" report any ZIL headers with a claim_txg != 0 on your pools? Is the dataset that is associated with such a ZIL an unmounted zfs? # zdb -ivv files | grep claim_txg ZIL header: claim_txg 5164405, seq 0 ZIL header: claim_txg 0, seq 0 ZIL header: claim_txg 0, seq 0 ZIL header: claim_txg 0, seq 0 ZIL header: claim_txg 0, seq 0 ZIL header: claim_txg 5164405, seq 0 ZIL header: claim_txg 0, seq 0 # zdb -i files/matrix-usr Dataset files/matrix-usr [ZPL], ID 216, cr_txg 5091978, 2.39G, 192089 objects ZIL header: claim_txg 5164405, seq 0 first block: [L0 ZIL intent log] 1000L/1000P DVA[0]=<0:12421e:1000> zilog uncompressed LE contiguous birth=5163908 fill=0 cksum=c368086f1485f7c4:39a549a81d769386:d8:3 Block seqno 3, already claimed, [L0 ZIL intent log] 1000L/1000P DVA[0]=<0:12421e:1000> zilog uncompressed LE contiguous birth=5163908 fill=0 cksum=c368086f1485f7c4:39a549a81d769386:d8:3 On two of my zpools I've eliminated the zpool scrub checksum errors by mounting / unmounting the zfs with the unplayed ZIL. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
Miles Nordin wrote: > "jk" == Jürgen Keil <[EMAIL PROTECTED]> writes: > jk> And a zpool scrub under snv_85 doesn't find checksum errors, either. > how about a second scrub with snv_94? are the checksum errors gone > the second time around? Nope. I've now seen this problem on 4 zpools on three different systems. Post snv_94 (bfu'ed) reports checksum errors during scrub, and the scrub under the original nevada release (snv_85, snv_89 and snv_91) didn't report checksum errors. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
> > I ran a scrub on a root pool after upgrading to snv_94, and got checksum > > errors: > > Hmm, after reading this, I started a zpool scrub on my mirrored pool, > on a system that is running post snv_94 bits: It also found checksum errors ... > OTOH, trying to verify checksums with zdb -c didn't > find any problems: And a zpool scrub under snv_85 doesn't find checksum errors, either. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
> I ran a scrub on a root pool after upgrading to snv_94, and got checksum > errors: Hmm, after reading this, I started a zpool scrub on my mirrored pool, on a system that is running post snv_94 bits: It also found checksum errors # zpool status files pool: files state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h46m with 9 errors on Fri Jul 18 13:33:56 2008 config: NAME STATE READ WRITE CKSUM files DEGRADED 0 018 mirror DEGRADED 0 018 c8t0d0s6 DEGRADED 0 036 too many errors c9t0d0s6 DEGRADED 0 036 too many errors errors: No known data errors Addding the -v option to zpool status returned: errors: Permanent errors have been detected in the following files: :<0x0> OTOH, trying to verify checksums with zdb -c didn't find any problems: # zdb -cvv files Traversing all blocks to verify checksums and verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 2804880 bp logical:121461614592 avg: 43303 bp physical: 84585684992 avg: 30156compression: 1.44 bp allocated: 85146115584 avg: 30356compression: 1.43 SPA allocated: 85146115584 used: 79.30% 951.08u 419.55s 2:24:34.32 15.8% # This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Mike Gerdts wrote > By default, only kernel memory is dumped to the dump device. Further, > this is compressed. I have heard that 3x compression is common and > the samples that I have range from 3.51x - 6.97x. My samples are in the range 1.95x - 3.66x. And yes, I lost a few crash dumps on a box with a 2GB swap slice, after physical memory was upgraded from 4GB to 8GB. % grep "pages dumped" /var/adm/messages* /var/adm/messages:Jun 27 13:43:56 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 593680 pages dumped, compression ratio 3.51, /var/adm/messages.0:Jun 25 13:08:22 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 234922 pages dumped, compression ratio 2.39, /var/adm/messages.1:Jun 12 13:22:53 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 399746 pages dumped, compression ratio 1.95, /var/adm/messages.1:Jun 12 19:00:01 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 245417 pages dumped, compression ratio 2.41, /var/adm/messages.1:Jun 16 19:15:37 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 710001 pages dumped, compression ratio 3.48, /var/adm/messages.1:Jun 16 19:21:35 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 315989 pages dumped, compression ratio 3.66, /var/adm/messages.2:Jun 11 15:40:32 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 341209 pages dumped, compression ratio 2.68, This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot issues on older P3 system.
> I wanted to resurrect an old dual P3 system with a couple of IDE drives > to use as a low power quiet NIS/DHCP/FlexLM server so I tried installing > ZFS boot from build 90. > Jun 28 16:09:19 zack scsi: [ID 107833 kern.warning] WARNING: /[EMAIL > PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED] (ata0): > Jun 28 16:09:19 zacktimeout: abort request, target=0 lun=0 I suspect that the root cause for these timeout bugs on MP systems is 6657646, and it is supposed to be fixed in snv_92: http://bugs.opensolaris.org/view_bug.do?bug_id=6657646 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS very slow under xVM
> I've got Solaris Express Community Edition build 75 > (75a) installed on an Asus P5K-E/WiFI-AP (ip35/ICH9R > based) board. CPU=Q6700, RAM=8Gb, disk=Samsung > HD501LJ and (older) Maxtor 6H500F0. > > When the O/S is running on bare metal, ie no xVM/Xen > hypervisor, then everything is fine. > > When it's booted up running xVM and the hypervisor, > then unlike plain disk I/O, and unlike svm volumes, > zfs is around 20 time slower. Just a wild guess, but since we're just seeing a similar strange performance problem on an Intel quadcore system with 8GB or memory Can you try to remove some part of the ram, so that the system runs on 4GB instead of 8GB? Or use xen / solaris boot options to restrict physical memory usage to the low 4GB range? It seems that on certain mainboards [*] the bios is unable to install mtrr cachable ranges for all of the 8GB system ram, when when some important stuff ends up in uncachable ram, performance gets *really* bad. [*] http://lkml.org/lkml/2007/6/1/231 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs: allocating allocated segment(offset=77984887808
size=66560) In-Reply-To: <[EMAIL PROTECTED]> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Approved: 3sm4u3 X-OpenSolaris-URL: http://www.opensolaris.org/jive/message.jspa?messageID=163221&tstart=0#163221 > how does one free segment(offset=77984887808 size=66560) > on a pool that won't import? > > looks like I found > http://bugs.opensolaris.org/view_bug.do?bug_id=6580715 > http://mail.opensolaris.org/pipermail/zfs-discuss/2007-September/042541.html Btw. my machine from that mail.opensolaris.org zfs-discuss thread, which paniced with "freeing free segment", did have a defective ram module. I don't know for sure, but I suspect that the bad ram module might have been the root cause for that "freeing free segment" zfs panic, too ... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bug 6580715, panic: freeing free segment
A few weeks ago, I wrote: > Yesterday I tried to clone a xen dom0 zfs root > filesystem and hit this panic (probably Bug ID 6580715): > > > > ::status > debugging crash dump vmcore.6 (64-bit) from moritz > operating system: 5.11 wos_b73 (i86pc) > panic message: freeing free segment (vdev=0 offset=11c14df000 size=1000) > dump content: kernel pages only > > > $c > vpanic() > vcmn_err+0x28(3, f812d818, ff0004850798) > zfs_panic_recover+0xb6() > metaslab_free_dva+0x1a2(ff01487ec580, ff0162231b20, 20b236c, 0) > metaslab_free+0x97(ff01487ec580, ff0162231b20, 20b236c, 0) > zio_free_blk+0x4c(ff01487ec580, ff0162231b20, 20b236c) > zil_sync+0x334(ff015b7d94c0, ff015689d180) > dmu_objset_sync+0x18e(ff014ff39c40, ff017c500d58, ff015689d180) > dsl_dataset_sync+0x5d(ff01571efa00, ff017c500d58, ff015689d180) > dsl_pool_sync+0xb5(ff014f4ace00, 20b236c) > spa_sync+0x1c5(ff01487ec580, 20b236c) > txg_sync_thread+0x19a(ff014f4ace00) > thread_start+8() Btw, a few weeks later I got more strange panics on this machine, in the procfs filesystem module, which I finally traced as a single defective bit in a ddr2 ram module (verified by memtest86). So, I guess it's possible that the above zfs panic happened due to the defective ram module. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mountroot and Bootroot Comparison
> Regarding compression, if I am not mistaken, grub > cannot access files that are compressed. There was a bug where grub was unable to access files on zfs that contained holes: Bug ID 6541114 SynopsisGRUB/ZFS fails to load files from a default compressed (lzjb) root http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6541114 That has been fixed in snv_71. The description text is misleading, there was no issue with reading lzjb compressed files, the bug occurred when reading "hole" blocks from a zfs file. Grub is unable to read from gzip compressed zfs filesystems, though: Bug ID 6538017 SynopsisZFS boot to support gzip decompression http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538017 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs boot doesn't support /usr on a separate partition.
> Should I bfu to the latest bits to fix this > problem or do I also need to install b72? bfu to b72 (or newer) should be OK, iff there really is a difference with shared library dependencies between b70 and b72. I'm not sure about b70; but b72 with just an empty /usr directory in the root filesystem, used as a mount point for for mounting a zfs /usr works just fine. Are you trying to setup a system that boots from a zfs root filesystem, and has /usr on a separate zfs filesystem? What exactly is the panic that you get when you try to boot with option "-k"? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs boot doesn't support /usr on a separate partition.
> I would like confirm that Solaris Express Developer Edition 09/07 > b70, you can't have /usr on a separate zfs filesystem because of > broken dependencies. > > 1/ Part of the problem is that /sbin/zpool is linked to > /usr/lib/libdiskmgt.so.1 Yep, in the past this happened on several occasions for me: /sbin/zfs, /etc/fs/zfs/mount or /lib/libzfs.so.1 depends on libraries that can only be found in /usr/lib and are not yet available when you have /usr in a separate zfs filesystem - the system becomes unbootable. See also bugs like this: Bug ID 6570056 Synopsis/sbin/zpool should not link to files in /usr/lib http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6570056 Bug ID 6494840 Synopsislibzfs should dlopen libiscsitgt rather than linking to it http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494840 Workaround was to copy the relevant missing shared libraries into the root filesystem. I currently have snv_72 installed, and bfu'ed to the latest opensolaris bits. And I have /usr on a zfs filesystem. This doesn't need extra copies of libraries from /usr/lib in the root filesystem, and is able to mount a separate zfs /usr filesystem. Note that the system apparently doesn't need /sbin/zpool for mounting a zfs /usr filesystem; /etc/fs/zfs/mount or /sbin/zfs should be enough. Not sure why your system is rebooting, though. You should boot it with option "-k", so that you can read the exact panic message. Note that you need a valid /etc/zfs/zpool.cache file, and for zfs *root* you also have to make sure that the /etc/zfs/zpool.cache file can be found in the boot archive. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Boot Won't work with a straight or mirror zfsroot
> > Using build 70, I followed the zfsboot instructions > at http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/ > to the letter. > > I tried first with a mirror zfsroot, when I try to boot to zfsboot > the screen is flooded with "init(1M) exited on fatal signal 9" Could be this problem: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6423745 > This is everything I did: > zpool create -f rootpool c1t0d0s0 > zfs create rootpool/rootfs > > zfs set mountpoint=legacy rootpool/rootfs > mkdir /zfsroot > mount -F zfs rootpool/rootfs /zfsroot Ok. > cd /zfsroot ; mkdir -p usr opt var home export/home > > mount -F zfs datapool/usr /zfsroot/usr > mount -F zfs datapool/opt /zfsroot/opt > mount -F zfs datapool/var /zfsroot/var > mount -F zfs datapool/home /zfsroot/export/home > > Added the following to /etc/vfstab > rootpool/rootfs - /zfsroot zfs - yes - > datapool/usr- /zfsroot/usr zfs - yes - > datapool/var- /zfsroot/var zfs - yes - > datapool/opt- /zfsroot/opt zfs - yes - > datapool/home - /zfsroot/export/home zfs - yes > - > /zvol/dsk/datapool/swap - - swap- > > - > cd / ; find . -xdev -depth -print | cpio -pvdm /zfsroot > cd / ; find usr -xdev -depth -print | cpio -pvdm /zfsroot > cd / ; find var -xdev -depth -print | cpio -pvdm /zfsroot > cd / ; find opt -xdev -depth -print | cpio -pvdm /zfsroot > cd / ; find export/home -xdev -depth -print | cpio -pvdm /zfsroot > > # ran this script: > http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/create_dirs/ > > mount -F lofs -o nosub / /mnt > (cd /mnt; tar cvf - devices dev ) | (cd /zfsroot; tar xvf -) > umount /mnt Your source root filesystem is on UFS? I think much of the above steps could be simplified by populating the zfs root filesystem like this: mount -F zfs rootpool/rootfs /zfsroot ufsdump 0f - / | (cd /zfsroot; ufsrestore xf -) umount /zfsroot That way, you don't have to use the "create_dirs" script, or mess with the /devices and /dev device tree and the lofs mount. Using ufsdump/ufsrestore also gets the lib/libc.so.1 file correct in the rootfs zfs, which typically has some lofs file mounted on top of it. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Bug 6580715, panic: freeing free segment
Yesterday I tried to clone a xen dom0 zfs root filesystem and hit this panic (probably Bug ID 6580715): System is running last week's opensolaris bits (but I'm also accessing the zpool using the xen snv_66 bits). files/s11-root-xen: is an existing version 1 zfs files/[EMAIL PROTECTED]: new snapshot files/s11-root-xen-uppc: clone for files/[EMAIL PROTECTED] - initially the files/[EMAIL PROTECTED] snapshot couldn't be created, because files/s11-root-xen (zfs with legacy mount / not mounted) was "busy" This should be bug 6462803 or 6482985. Workaround: manually mount files/s11-root-xen and umount it - this clears the unplayed log - created files/[EMAIL PROTECTED] and cloned it as files/s11-root-xen-uppc, set files/s11-root-xen-uppc mountpoint as legacy - mount files/s11-root-xen-uppc and edited a few files using vi, after writing back one of them and leaving vi, system crashed Looks like the new zfs filesystem is using log blocks, that are not allocated? (see the zdb output below) Details for the initial panic: > ::status debugging crash dump vmcore.6 (64-bit) from moritz operating system: 5.11 wos_b73 (i86pc) panic message: freeing free segment (vdev=0 offset=11c14df000 size=1000) dump content: kernel pages only > $c vpanic() vcmn_err+0x28(3, f812d818, ff0004850798) zfs_panic_recover+0xb6() metaslab_free_dva+0x1a2(ff01487ec580, ff0162231b20, 20b236c, 0) metaslab_free+0x97(ff01487ec580, ff0162231b20, 20b236c, 0) zio_free_blk+0x4c(ff01487ec580, ff0162231b20, 20b236c) zil_sync+0x334(ff015b7d94c0, ff015689d180) dmu_objset_sync+0x18e(ff014ff39c40, ff017c500d58, ff015689d180) dsl_dataset_sync+0x5d(ff01571efa00, ff017c500d58, ff015689d180) dsl_pool_sync+0xb5(ff014f4ace00, 20b236c) spa_sync+0x1c5(ff01487ec580, 20b236c) txg_sync_thread+0x19a(ff014f4ace00) thread_start+8() > ::msgbuf MESSAGE zfs0 is /pseudo/[EMAIL PROTECTED] pcplusmp: pci-ide (pci-ide) instance #1 vector 0x17 ioapic 0x2 intin 0x17 is bou nd to cpu 0 IDE device at targ 0, lun 0 lastlun 0x0 model SAMSUNG HD300LJ ATA/ATAPI-7 supported, majver 0xfe minver 0x21 PCI Express-device: [EMAIL PROTECTED], ata2 ata2 is /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED] UltraDMA mode 6 selected Disk0: cmdk0 at ata2 target 0 lun 0 cmdk0 is /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 NOTICE: nge0: Using FIXED interrupt type NOTICE: IRQ20 is being shared by drivers with different interrupt levels. This may result in reduced system performance. NOTICE: nge0 registered NOTICE: nge0 link up, 100 Mbps, full duplex NOTICE: cpqhpc: 64-bit driver module not found UltraDMA mode 6 selected dump on /dev/dsk/c1d0s1 size 2055 MB UltraDMA mode 6 selected pseudo-device: devinfo0 devinfo0 is /pseudo/[EMAIL PROTECTED] iscsi0 at root iscsi0 is /iscsi xsvc0 at root: space 0 offset 0 xsvc0 is /[EMAIL PROTECTED],0 pseudo-device: pseudo1 pseudo1 is /pseudo/[EMAIL PROTECTED] pcplusmp: fdc (fdc) instance 0 vector 0x6 ioapic 0x2 intin 0x6 is bound to cpu 0 ISA-device: fdc0 pseudo-device: ramdisk1024 ramdisk1024 is /pseudo/[EMAIL PROTECTED] pcplusmp: lp (ecpp) instance 0 vector 0x7 ioapic 0x2 intin 0x7 is bound to cpu 1 ISA-device: ecpp0 ecpp0 is /isa/[EMAIL PROTECTED],378 fd0 at fdc0 fd0 is /isa/[EMAIL PROTECTED],3f0/[EMAIL PROTECTED],0 NOTICE: audiohd0: codec info: vid=0x11d4198b, sid=0x, rev=0x00100200 NOTICE: IRQ21 is being shared by drivers with different interrupt levels. This may result in reduced system performance. PCI Express-device: pci1043,[EMAIL PROTECTED],1, audiohd0 audiohd0 is /[EMAIL PROTECTED],0/pci1043,[EMAIL PROTECTED],1 pcplusmp: ide (ata) instance 0 vector 0xe ioapic 0x2 intin 0xe is bound to cpu 0 ATAPI device at targ 1, lun 0 lastlun 0x0 model _NEC DVD_RW ND-4550A PCI Express-device: [EMAIL PROTECTED], ata0 ata0 is /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED] UltraDMA mode 2 selected UltraDMA mode 2 selected UltraDMA mode 2 selected PCI-device: pci1274,[EMAIL PROTECTED], audioens0 audioens0 is /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1274,[EMAIL PROTECTED] pseudo-device: lockstat0 lockstat0 is /pseudo/[EMAIL PROTECTED] pseudo-device: llc10 llc10 is /pseudo/[EMAIL PROTECTED] pseudo-device: lofi0 lofi0 is /pseudo/[EMAIL PROTECTED] pseudo-device: profile0 profile0 is /pseudo/[EMAIL PROTECTED] pseudo-device: systrace0 systrace0 is /pseudo/[EMAIL PROTECTED] pseudo-device: fbt0 fbt0 is /pseudo/[EMAIL PROTECTED] pseudo-device: sdt0 sdt0 is /pseudo/[EMAIL PROTECTED] pseudo-device: fasttrap0 fasttrap0 is /pseudo/[EMAIL PROTECTED] pseudo-device: power0 power0 is /pseudo/[EMAIL PROTECTED] pseudo-device: fcp0 fcp0 is /pseudo/[EMAIL PROTECTED] pseudo-device: fcsm0 fcsm0 is /pse
Re: [zfs-discuss] EOF broken on zvol raw devices?
> > I tried to copy a 8GB Xen domU disk image from a zvol device > > to an image file on an ufs filesystem, and was surprised that > > reading from the zvol character device doesn't detect "EOF". > > I've filed bug 6596419... Requesting a sponsor for bug 6596419... http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6596419 My suggested fix is included in the bug report. My contributor agreement # : OS0003 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] EOF broken on zvol raw devices?
> I tried to copy a 8GB Xen domU disk image from a zvol device > to an image file on an ufs filesystem, and was surprised that > reading from the zvol character device doesn't detect "EOF". I've filed bug 6596419... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] EOF broken on zvol raw devices?
> I tried to copy a 8GB Xen domU disk image from a zvol device > to an image file on an ufs filesystem, and was surprised that > reading from the zvol character device doesn't detect "EOF". > > On snv_66 (sparc) and snv_73 (x86) I can reproduce it, like this: > > # zfs create -V 1440k tank/floppy-img > > # dd if=/dev/zvol/dsk/tank/floppy-img of=/dev/null > bs=1k count=2000 > 1440+0 records in > 1440+0 records out > (no problem on block device, we detect eof after > reading 1440k) > > > # dd if=/dev/zvol/rdsk/tank/floppy-img of=/dev/null > bs=1k count=2000 > 2000+0 records in > 2000+0 records out > > (Oops! No eof detected on zvol raw device after > reading 1440k?) After looking at the code in usr/src/uts/common/fs/zfs/zvol.c it seems that neither zvol_read() nor zvol_write() cares about the zvol's "zv_volsize". I think we need something like this: diff -r 26be3efbd346 usr/src/uts/common/fs/zfs/zvol.c --- a/usr/src/uts/common/fs/zfs/zvol.c Thu Aug 23 00:53:10 2007 -0700 +++ b/usr/src/uts/common/fs/zfs/zvol.c Thu Aug 23 16:30:41 2007 +0200 @@ -904,6 +904,7 @@ zvol_read(dev_t dev, uio_t *uio, cred_t { minor_t minor = getminor(dev); zvol_state_t *zv; + uint64_t volsize; rl_t *rl; int error = 0; @@ -914,10 +915,16 @@ zvol_read(dev_t dev, uio_t *uio, cred_t if (zv == NULL) return (ENXIO); + volsize = zv->zv_volsize; + rl = zfs_range_lock(&zv->zv_znode, uio->uio_loffset, uio->uio_resid, RL_READER); - while (uio->uio_resid > 0) { + while (uio->uio_resid > 0 && uio->uio_loffset < volsize) { uint64_t bytes = MIN(uio->uio_resid, DMU_MAX_ACCESS >> 1); + + /* don't read past the end */ + if (bytes > volsize - uio->uio_loffset) + bytes = volsize - uio->uio_loffset; error = dmu_read_uio(zv->zv_objset, ZVOL_OBJ, uio, bytes); if (error) @@ -933,6 +940,7 @@ zvol_write(dev_t dev, uio_t *uio, cred_t { minor_t minor = getminor(dev); zvol_state_t *zv; + uint64_t volsize; rl_t *rl; int error = 0; @@ -943,13 +951,19 @@ zvol_write(dev_t dev, uio_t *uio, cred_t if (zv == NULL) return (ENXIO); + volsize = zv->zv_volsize; + rl = zfs_range_lock(&zv->zv_znode, uio->uio_loffset, uio->uio_resid, RL_WRITER); - while (uio->uio_resid > 0) { + while (uio->uio_resid > 0 && uio->uio_loffset < volsize) { uint64_t bytes = MIN(uio->uio_resid, DMU_MAX_ACCESS >> 1); uint64_t off = uio->uio_loffset; - - dmu_tx_t *tx = dmu_tx_create(zv->zv_objset); + dmu_tx_t *tx; + + if (bytes > volsize - off) /* don't write past the end */ + bytes = volsize - off; + + tx = dmu_tx_create(zv->zv_objset); dmu_tx_hold_write(tx, ZVOL_OBJ, off, bytes); error = dmu_tx_assign(tx, TXG_WAIT); if (error) { This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] EOF broken on zvol raw devices?
I tried to copy a 8GB Xen domU disk image from a zvol device to an image file on an ufs filesystem, and was surprised that reading from the zvol character device doesn't detect "EOF". On snv_66 (sparc) and snv_73 (x86) I can reproduce it, like this: # zfs create -V 1440k tank/floppy-img # dd if=/dev/zvol/dsk/tank/floppy-img of=/dev/null bs=1k count=2000 1440+0 records in 1440+0 records out (no problem on block device, we detect eof after reading 1440k) # dd if=/dev/zvol/rdsk/tank/floppy-img of=/dev/null bs=1k count=2000 2000+0 records in 2000+0 records out (Oops! No eof detected on zvol raw device after reading 1440k?) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] nv-69 install panics dell precision 670
> using hyperterm, I captured the panic message as: > > SunOS Release 5.11 Version snv_69 32-bit > Copyright 1983-2007 Sun Microsystems, Inc. All > rights reserved. > Use is subject to license terms. > > panic[cpu0]/thread=fec1ede0: Can't handle mwait size > 0 > > fec37e70 unix:mach_alloc_mwait+72 (fec2006c) > fec37e8c unix:mach_init+b0 (c0ce80, fe800010, f) > fec37eb8 unix:psm_install+95 (fe84166e, 3, fec37e) > fec37ec8 unix:startup_end+93 (fec37ee4, fe91731e,) > fec37ed0 unix:startup+3a (fe800010, fec33c98,) > fec37ee4 genunix:main+1e () > > skipping system dump - no dump device configured > rebooting... > > this behavior loops endlessly Have a look at these bugs: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6577473 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6588054 It seems to be fixed in snv_70, and apparently you can work around the bug by setting some kernel variables, see bug 6588054 (idle_cpu_prefer_mwait = 0, cpuid_feature_ecx_exclude = 8) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Unremovable file in ZFS filesystem.
> I managed to create a link in a ZFS directory that I can't remove. > > # find . -print > . > ./bayes_journal > find: stat() error ./bayes.lock.router.3981: No such > file or directory > ./user_prefs > # > > > ZFS scrub shows no problems in the pool. Now, this > was probably cause when I was doing some driver work > so I'm not too surprised, BUT it would be nice if > there was a way to clean this up without having to > copy the filesystem to a new zfs filesystem and > destroying the current one. Are you running an opensolaris using release or debug kernel bits? Maybe a kernel with a zfs compiled as debug bits would print some extra error messages or maybe panic the machine when that broken file is accessed? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot: 3 smaller glitches with console,
> in my setup i do not install the ufsroot. > > i have 2 disks > -c0d0 for the ufs install > -c1d0s0 which is my zfs root i want to exploit > > my idea is to remove the c0d0 disk when the system will be ok Btw. if you're trying to pull the ufs disk c0d0 from the system, and physically move the zfs root disk from c1d0 -> c0d0 and use that as the only disk (= boot disk) in the system, you'll probably run into the problem that zfs root becomes unbootable, because in the etc/zfs/zpool.cache file the c1d0 name is still recorded for the zpool containing the rootfs. To fix it you probably have to boot a failsafe kernel from somewhere, zpool import the pool from the disk's new location, and copy the updated /etc/zfs/zpool.cache into the zfs root filesystem and build new boot archives there... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot: 3 smaller glitches with console,
> it seems i have the same problem after zfs boot > installation (following this setup on a snv_69 release > http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/ ). Hmm, in step 4., wouldn't it be better to use ufsdump / ufsrestore instead of find / cpio to clone the ufs root into the zfs root pool? cd /zfsroot ufsdump 0f - / | ufsrestore -xf - Advantages: - it copies the mountpoint for the /etc/dfs/dfstab filesystem (and all the other mountpoints, like /tmp, /proc, /etc/mnttab, ...) - it does not mess up the /lib/libc.so.1 shared library I think the procedure at the above url could copy the wrong version of the shared libc.so.1 into the zfsroot /lib/libc.so.1; this might explain bugs like 6423745, Synopsis: zfs root pool created while booted 64 bit can not be booted 32 bit - the files hidden by the /devices mount are copied,too > The outputs from the requested command > are similar to the outputs posted by dev2006. > > Reading this page, i found no solution concerning the > /dev/random problem. Is there somewhere a procedure > to repair my install ? AFAICT, there's nothing you can do to avoid the "WARNING: No randomness provider enabled for /dev/random." message with zfs root at this time. It seems that zfs mountroot needs some random numbers for mounting the zfs root filesystem, and at that point early during the bootstrap there isn't a fully initialized random device available. This fact is remembered by the random device and is reported later on, when the system is fully booted. I think when the system is fully booted from zfs root, the random device should work just fine. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SiI 3114 Chipset on Syba Card - Solaris Hangs
> I'm running snv 65 and having an issue > much like this: >http://osdir.com/ml/solaris.opensolaris.help/2006-11/msg00047.html Bug 6414472? > Has anyone found a workaround? You can try to patch my suggested fix for 6414472 into the ata binary and see if it helps: http://www.opensolaris.org/jive/thread.jspa?messageID=84127 I don't have access to the snv_65 media, but for snv_66 (32-bit) the code has changed slightly, and the instruction to patch can be found at address "ata_id_common+0x3c", so the patch procedure would be ::bp ata`ata_id_common :c ::delete 1 ata_id_common+0x3c?w a6a :c > Or is this the issue with the BIOS not liking EFI information that ZFS > uses? If it is 6414472: No; BIOS wouldn't be used any more at the point 6414472 is hanging the system... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174
> By coincidence, I spent some time dtracing 6560174 yesterday afternoon on > b62, and these bugs are indeed duplicates. I never noticed 6445725 because my > system wasn't hanging but as the notes say, the fix for 6434435 changes the > problem, and instead the error that gets propogated back from t1394_write() > causes "transport rejected" messages. Yes, I had filed two bugs (6445725 / 6434435) a year ago and started the opensolaris request-sponsor process for both. The fix for 6434435 has been integrated, but 6445725 is stuck somehow. > I see your proposed fix (which looks very plausible) is dated over a year > ago... Have you heard anything on when it might get integrated? No, nothing. I did send Alan Perry (@sun.com) a mail last friday, asking about the state of bug 6445725 and my suggested fix, but so far received no reply... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174
> > 3) Can your code diffs be integrated into the OS on my end to use this > > drive, and if so, how? > > I believe the bug is still being worked on, right Jürgen ? The opensolaris sponsor process for fixing bug 6445725 seems to got stuck. I ping'ed Alan P. on the state of that bug... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174
> > Nope, no work-around. > > OK. Then I have 3 questions: > > 1) How do I destroy the pool that was on the firewire > drive? (So that zfs stops complaining about it) Even if the drive is disconnected, it should be possible to "zpool export" it, so that the OS forgets about it and doesn't try to mount from that pool during the next boot. > 2) How can I reformat the firewire drive? Does this > need to be done on a non-Solaris OS? When 6445725 is fixed, it should be possible to reformat and / or use it with Solaris. > 3) Can your code diffs be integrated into the OS on > my end to use this drive, and if so, how? Sure. You need the opensolaris "ON Source", unpack them, apply the patch from the website using something like "gpatch -p0 < scsa1394-mkfs-hang2-alt" and build everything using the "nightly" command. You'll also need to install the "ON Specific Build Tools" package, the "ON Binary-Only Components", and the correct Studio 11 compiler for building the opensolaris sources. Here are some detailed instuctions on building the opensolaris sources: http://www.blastwave.org/articles/BLS-0050/index.html Unfortunately, the sources for your installed version (build_64a) are missing on http://dlc.sun.com/osol/on/downloads ; there are sources for build 63 and 65, but not for 64a . You could pick a newer release of the opensolaris sources (the latest available for download is build_69), patch the sources and compile them, and upgrade your installation to that newer release, using the "bfu" command. Or pick a slightly newer release than 64a, patch & compile (make sure to compile as a "release" build) , and just replace the firewire kernel driver modules that are affected by the bugfix, "scsa1394" and "sbp": usr/src/uts/intel/scsa1394/obj32/scsa1394 -> /kernel/drv/scsa1394 usr/src/uts/intel/scsa1394/obj64/scsa1394 -> /kernel/drv/amd64/scsa1394 usr/src/uts/intel/sbp2/obj32/sbp2 -> /kernel/misc/sbp2 usr/src/uts/intel/sbp2/obj64/sbp2 -> /kernel/misc/amd64/sbp2 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174
> > And 6560174 might be a duplicate of 6445725 > > I see what you mean. Unfortunately there does not > look to be a work-around. Nope, no work-around. This is a scsa1394 bug; it has some issues when it is used from interrupt context. I have some source code diffs, that are supposed to fix the issue, see this thread: http://www.opensolaris.org/jive/thread.jspa?messageID=46190 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174
> I think I have ran into this bug, 6560174, with a firewire drive. And 6560174 might be a duplicate of 6445725 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] snv_70 -> snv_66: ZPL_VERSION 2, File system version mismatch ....?
Yesterday I was surprised because an old snv_66 kernel (installed as a new zfs rootfs) refused to mount. Error message was Mismatched versions: File system is version 2 on-disk format, which is incompatible with this software version 1! I tried to prepare that snv_66 rootfs when running snv_70 bits, using something like this zfs create tank/s11-root-xen zfs set mountpoint=legacy tank/s11-root-xen mount -F zfs tank/s11-root-xen /mnt cd /mnt ufsdump 0f - /dev/rdsk/c4d0s4 | ufsrestore -xf - ... Problem is that snv_70 "zfs create" now seems to construct ZPL_VERSION 2 zfs filesystems, which cannot be mounted by older version of the zfs software, e.g. by snv_66 or s10u2. Btw. I never upgraded this zpool to a zpool version > 2, to allow using that zpool and zfs filesystems both with Nevada and S10. Now it seems I still could work around that problem with ZPL_VERSION mismatch by booting the oldest Solaris release that is supposed to mount a zfs filesystem and create the zfs filesystem from there. How about a new feature for "zpool create" and "zfs create" to allow creation of a zpool or zfs that is not using the newest version but some older version (that the user has specified on the command line), so that the new zpool or zfs can be used on older systems (e.g. on hotpluggable / removable media, or on a disk that is shared between different Solaris releases)? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS usb keys
> Shouldn't S10u3 just see the newer on-disk format and > report that fact, rather than complain it is corrupt? Yep, I just tried it, and it refuses to "zpool import" the newer pool, telling me about the incompatible version. So I guess the pool format isn't the correct explanation for the Dick Davies' (number9) problem. On a S-x86 box running snv_68, ZFS version 7: # mkfile 256m /home/leo.nobackup/tmp/zpool_test.vdev # zpool create test_pool /home/leo.nobackup/tmp/zpool_test.vdev # zpool export test_pool On a S-sparc box running snv_61, ZFS version 3 (I get the same error on S-x86, running S10U2, ZFS version 2): # zpool import -d /home/leo.nobackup/tmp/ pool: test_pool id: 6231880247307261822 state: FAULTED status: The pool is formatted using an incompatible version. action: The pool cannot be imported. Access the pool on a system running newer software, or recreate the pool from backup. see: http://www.sun.com/msg/ZFS-8000-A5 config: test_pool UNAVAIL newer version /home/leo.nobackup/tmp//zpool_test.vdev ONLINE This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS usb keys
> I used a zpool on a usb key today to get some core files off a non-networked > Thumper running S10U4 beta. > > Plugging the stick into my SXCE b61 x86 machine worked fine; I just had to > 'zpool import sticky' and it worked ok. > > But when we attach the drive to a blade 100 (running s10u3), it sees the > pool as corrupt. I thought I'd been too hasty pulling out the stick, > but it works ok back in the b61 desktop and Thumper. > > I'm trying to figure out if this is an endian thing (which I thought > ZFS was immune from) - or has the b61 machine upgraded the zpool > format? Most likely the zpool on the usb stick was formatted using a zpool version that s10u3 does not yet support. Check with "zpool version" on the b61 machine which zpool version is supported by b61, any which zpool version is on the usb stick. Repeat on the s10u3 machine. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zfs compression - scale to multiple cpu ?
> i think i have read somewhere that zfs gzip > compression doesn`t scale well since the in-kernel > compression isn`t done multi-threaded. > > is this true - and if so - will this be fixed ? If you're writing lots of data, zfs gzip compression might not be a good idea for a desktop machine, because it completely kills interactive performance. See this thread: http://www.opensolaris.org/jive/thread.jspa?messageID=118116 http://mail.opensolaris.org/pipermail/zfs-discuss/2007-May/thread.html#27841 It does compress (scale) on up-to 8 cpu cores, though. See "zio_taskq_threads" in usr/src/uts/common/fs/zfs/spa.c > what about default lzjb compression - is it different > regarding this "issue" ? lzjb doesn't consume that much kernel cpu time (compared to gzip), so the machine remains more or less usable for interactive usage. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: SMART
> You are right... I shouldn't post in the middle of > the night... nForce chipsets don't support AHCI. Btw. does anybody have a status update for bug 6296435, "native sata driver needed for nVIDIA mcp04 and mcp55 controllers" http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6296435 ? Commit to Fix target was "snv_59", but we're at "snv_67" now... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: Deterioration with zfs performance and recent zfs bits?
> Hello Jürgen, > > Monday, June 4, 2007, 7:09:59 PM, you wrote: > > >> > Patching zfs_prefetch_disable = 1 has helped > >> It's my belief this mainly aids scanning metadata. my > >> testing with rsync and yours with find (and seen with > >> du & ; zpool iostat -v 1 ) pans this out.. > >> mainly tracked in bug 6437054 vdev_cache: wise up or die > >> http://www.opensolaris.org/jive/thread.jspa?messageID=42212 > >> > >> so to link your code, it might help, but if one ran > >> a clean down the tree, it would hurt compile times. > > > JK> I think the slowdown that I'm observing is due to the changes > JK> that have been made for 6542676 "ARC needs to track meta-data > JK> memory overhead". > JK > > JK> There is now a limit of 1/4 of arc size ("arc_meta_limit") > JK> for zfs meta-data. > > Not good - I have some systems with TBs of meta-data mostly. > I guess there's some tunable... AFAICT, you can patch the kernel global variable "arc_meta_limit" at run time, using mdb -wk (variable should be visible in build 66 or newer) But you can't tune it via an /etc/system "set" command. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Deterioration with zfs performance and recent zfs bits?
I wrote > Instead of compiling opensolaris for 4-6 hours, I've now used > the following find / grep test using on-2007-05-30 sources: > > 1st test using Nevada build 60: > > % cd /files/onnv-2007-05-30 > % repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} + This find + grep command basically - does a recursive scan looking for *.h and *.c files - at the end of the recursive directory scan invokes one grep command with ~ 2 filename args. Simplifying the test a bit more: snv_60 is able to cache all meta-data for a compiled onnv source tree, on a 32-bit x86 machine with 768 mb of physical memory: % cd /files/wos_b67 % repeat 10 sh -c "/bin/time find usr/src/ -name '*.[hc]' -print|wc" real 2:11.7 user0.2 sys 3.2 19355 19355 772864 real2.4 user0.1 sys 1.4 19355 19355 772864 real2.2 user0.1 sys 1.5 19355 19355 772864 real2.0 user0.1 sys 1.4 19355 19355 772864 real 1:21.8 << seems that some meta data was freed here... user0.2 sys 1.7 19355 19355 772864 real 1:21.0 user0.2 sys 1.7 19355 19355 772864 real 45.9 user0.1 sys 1.6 19355 19355 772864 real3.2 user0.1 sys 1.3 19355 19355 772864 real1.9 user0.1 sys 1.3 19355 19355 772864 real2.8 user0.1 sys 1.3 19355 19355 772864 (and the next 10 finds all completed in ~2 seconds per find) build 67 is unable to cache the meta-data, for the same find command on the same zfs: % cd /files/wos_b67 % repeat 10 sh -c "/bin/time find usr/src/ -name '*.[hc]' -print|wc" real 3:20.7 user0.5 sys 7.5 19355 19355 772864 real 3:07.0 user0.5 sys 5.5 19355 19355 772864 real 2:44.6 user0.5 sys 4.7 19355 19355 772864 real 2:06.1 user0.4 sys 3.9 19355 19355 772864 real 1:16.1 user0.4 sys 3.5 19355 19355 772864 real 33.0 user0.4 sys 2.7 19355 19355 772864 real 40.8 user0.4 sys 3.0 19355 19355 772864 real 18.8 user0.3 sys 2.6 19355 19355 772864 real 2:32.2 user0.4 sys 4.2 19355 19355 772864 real 2:05.4 user0.4 sys 3.9 19355 19355 772864 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Deterioration with zfs performance and recent zfs bits?
> > Patching zfs_prefetch_disable = 1 has helped > It's my belief this mainly aids scanning metadata. my > testing with rsync and yours with find (and seen with > du & ; zpool iostat -v 1 ) pans this out.. > mainly tracked in bug 6437054 vdev_cache: wise up or die > http://www.opensolaris.org/jive/thread.jspa?messageID=42212 > > so to link your code, it might help, but if one ran > a clean down the tree, it would hurt compile times. I think the slowdown that I'm observing is due to the changes that have been made for 6542676 "ARC needs to track meta-data memory overhead". There is now a limit of 1/4 of arc size ("arc_meta_limit") for zfs meta-data. On a 32-bit x86 platform with > 512MB physical memory, the arc size is limited to 3/4 of the size of the kernel heap arena, which is 3/4 * ~ 650MB => ~ 500MB. 1/4 of that 500MB is ~ 125MB for zfs meta data. When more than 1/4 of arc is used for meta-data, meta-data allocations steal space from arc mru/mfu list. When more than 1/4 of arc is used for meta-data and arc_reclaim_needed() returns TRUE, entries from the dnlc cache are purged and arc data is evicted. Apparently, before 6542676 it was possible to use a lot more meta-data if we compare it to what is possible now with 6542676. void arc_init(void) { ... /* limit meta-data to 1/4 of the arc capacity */ arc_meta_limit = arc_c_max / 4; ... } static int arc_evict_needed(arc_buf_contents_t type) { if (type == ARC_BUFC_METADATA && arc_meta_used >= arc_meta_limit) return (1); ... } static void arc_get_data_buf(arc_buf_t *buf) { /* * We have not yet reached cache maximum size, * just allocate a new buffer. */ if (!arc_evict_needed(type)) { ... goto out; } /* * If we are prefetching from the mfu ghost list, this buffer * will end up on the mru list; so steal space from there. */ ... if ((buf->b_data = arc_evict(state, size, TRUE, type)) == NULL) { ... } static void arc_kmem_reap_now(arc_reclaim_strategy_t strat) { ... if (arc_meta_used >= arc_meta_limit) { /* * We are exceeding our meta-data cache limit. * Purge some DNLC entries to release holds on meta-data. */ dnlc_reduce_cache((void *)(uintptr_t)arc_reduce_dnlc_percent); } ... } The Tecra-S1 (32-bit Solaris x86) has > arc_meta_limit::print 0x738 <<< > arc_meta_limit::print -d 0t121110528 > ::arc { anon = -73542 mru = -735455488 mru_ghost = -735455424 mfu = -735455360 mfu_ghost = -735455296 size = 0x131dae70 p = 0xb10983e c = 0x1330105e c_min = 0x400 c_max = 0x1ce0 hits = 0x2e405 misses = 0x9092 deleted = 0x5f recycle_miss = 0x45bf mutex_miss = 0 evict_skip = 0x6e4b0 hash_elements = 0x54dd hash_elements_max = 0x54de hash_collisions = 0x398e hash_chains = 0x1887 hash_chain_max = 0x7 no_grow = 0 > 0x1ce0%4=X 738 Patching arc_meta_limit to 1/2 of arc size improves find performance. Another problem: In dbuf.c, dbuf_read_impl() arc_meta_used accounting appears to be broken, the amount of meta-data used ("arc_meta_used") is inflated: db->db.db_data = zio_buf_alloc(DN_MAX_BONUSLEN); arc_space_consume(512); Why 512? Apparently, we zio_buf_alloc DN_MAX_BONUSLEN = 0x140 bytes but consume 0x200 bytes of meta-data? (When these buffers are freed, only DN_MAX_BONUSLEN = 0x140 bytes are returned to arc meta-data) I'm currently using the following changes, which seem to restore the zfs performace to what it has been before 6542676 - more or less: diff -r bec4e9eb1f01 usr/src/uts/common/fs/zfs/arc.c --- a/usr/src/uts/common/fs/zfs/arc.c Fri Jun 01 08:24:48 2007 -0700 +++ b/usr/src/uts/common/fs/zfs/arc.c Sat Jun 02 22:09:33 2007 +0200 @@ -2781,10 +2781,10 @@ arc_init(void) arc_c = arc_c_max; arc_p = (arc_c >> 1); - /* limit meta-data to 1/4 of the arc capacity */ - arc_meta_limit = arc_c_max / 4; - if (arc_c_min < arc_meta_limit / 2 && zfs_arc_min == 0) - arc_c_min = arc_meta_limit / 2; + /* limit meta-data to 1/2 of the arc capacity */ + arc_meta_limit = arc_c_max / 2; + if (arc_c_min < arc_meta_limit / 4 && zfs_arc_min == 0) + arc_c_min = arc_meta_limit / 4; /* if kmem_flags are set, lets try to use less memory */ if (kmem_debugging()) diff -r bec4e9eb1f01 usr/src/uts/common/fs/zfs/dbuf.c --- a/usr/src/uts/common/fs/zfs/dbuf.c Fri Jun 01 08:24:48 2007 -0700 +++ b/usr/src/uts/common/fs/zfs/dbuf.c Sat Jun 02 22:09:52 2007 +0200 @@ -470,7 +470,7 @@ dbuf_read_impl(dmu_
[zfs-discuss] Re: Deterioration with zfs performance and recent zfs bits?
I wrote > Has anyone else noticed a significant zfs performance > deterioration when running recent opensolaris bits? > > My 32-bit / 768 MB Toshiba Tecra S1 notebook was able > to do a full opensolaris release build in ~ 4 hours 45 > minutes (gcc shadow compilation disabled; using an lzjb > compressed zpool / zfs on a single notebook hdd p-ata drive). > > After upgrading to 2007-05-25 opensolaris release > bits (compiled from source), the same release build now > needs ~ 6 hours; that's ~ 25% slower. It might be Bug ID 6469558 "ZFS prefetch needs to be more aware of memory pressure": http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6469558 Instead of compiling opensolaris for 4-6 hours, I've now used the following find / grep test using on-2007-05-30 sources: 1st test using Nevada build 60: % cd /files/onnv-2007-05-30 % repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} + usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:22.5 user3.3 sys 5.8 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:28.4 user3.3 sys 4.8 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:18.0 user3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:17.3 user3.3 sys 4.8 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:15.0 user3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:12.0 user3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:21.9 user3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:18.7 user3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:19.5 user3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:17.2 user3.3 sys 4.7 Same test, but running onnv-2007-05-30 release bits (compiled from source). This is at least 25% slower than snv_60: (Note: zfs_prefetch_disable = 0 , the default value) % repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} + usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 8:04.3 user7.3 sys13.2 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 6:34.4 user7.3 sys11.2 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 6:33.8 user7.3 sys11.1 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:35.6 user7.3 sys10.6 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:39.8 user7.3 sys10.6 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:37.8 user7.3 sys11.1 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:53.5 user7.3 sys11.0 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:45.2 user7.3 sys11.1 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:44.8 user7.3 sys11.0 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:49.1 user7.3 sys11.0 Then I patched zfs_prefetch_disable/W1, and now the find & grep test runs much faster on onnv-2007-05-30 bits: (Note: zfs_prefetch_disable = 1) % repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} + usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:01.3 user7.2 sys 9.9 usr/src/li
[zfs-discuss] Deterioration with zfs performace and recent zfs bits?
Has anyone else noticed a significant zfs performance deterioration when running recent opensolaris bits? My 32-bit / 768 MB Toshiba Tecra S1 notebook was able to do a full opensolaris release build in ~ 4 hours 45 minutes (gcc shadow compilation disabled; using an lzjb compressed zpool / zfs on a single notebook hdd p-ata drive). After upgrading to 2007-05-25 opensolaris release bits (compiled from source), the same release build now needs ~ 6 hours; that's ~ 25% slower. I think a change that might be responsible for this is the fix for 6542676 "ARC needs to track meta-data memory overhead" (that is, less caching with the fix for 6542676). Has anyone noticed similar zfs performace deterioration? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Preparing to compare Solaris/ZFS and FreeBSD/ZFS
performance. In-Reply-To: <[EMAIL PROTECTED]> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Approved: 3sm4u3 X-OpenSolaris-URL: http://www.opensolaris.org/jive/message.jspa?messageID=123265&tstart=0#123265 > > Or if you do want to use bfu because you really want to match your > > source code revisions up to a given day then you will need to build the > > ON consolidation yourself and you an the install the non debug bfu > > archives (note you will need to download the non debug closed bins to do > > that). > > The README.opensolaris > (http://dlc.sun.com/osol/on/downloads/current/README.opensolaris) > still states: > 2. Non-DEBUG kernel builds have not been tested. Systems that require > the ata driver are known not to work with non-DEBUG builds. > Are debug builds now know to work? s/debug/non-debug/ That used to be true, but is obsolete by now. non-debug builds work just fine. Just make sure to use a recent on-closed-bins-nd*.tar.bz2 archive, e.g. http://dlc.sun.com/osol/on/downloads/b63/on-closed-bins-nd-b63.i386.tar.bz2 The ata driver has moved from the closed bits tree to the standard/open onnv source tree, so when you compile non-DEBUG bits, you'll also get a non-DEBUG ata driver compiled from source. There used to be a problem when ata was closed, and you tried to compile non-DEBUG opensolaris from sources and mixed that with a DEBUG ata driver from the closed bits archive. That was a problem when no closed non-debug bits (on-closed-bins-nd-*.tar.bz2) were available for download. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Lots of overhead with ZFS - what am I doing wrong?
> Would you mind also doing: > > ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=1 > > to see the raw performance of underlying hardware. This dd command is reading from the block device, which might cache dataand probably splits requests into "maxphys" pieces (which happens to be 56K on an x86 box). I'd read from the raw device, /dev/rdsk/c2t1d0 ... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
Bart wrote: > Adam Leventhal wrote: > > On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat wrote: > >> Can you give some more info on what these problems are. > > > > I was thinking of this bug: > > > > 6460622 zio_nowait() doesn't live up to its name > > > > Which was surprised to find was fixed by Eric in build 59. > > > > It was pointed out by Jürgen Keil that using ZFS compression > submits a lot of prio 60 tasks to the system task queues; > this would clobber interactive performance. Actually the taskq "spa_zio_issue" / "spa_zio_intr" run at prio 99 (== maxclsyspri or MAXCLSYSPRI): http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109 Btw: In one experiment I tried to boot the kernel under kmdb control (-kd), patched "minclsyspri := 61" and used a breakpoint inside spa_active() to patch the spa_zio_* taskq to use prio 60 when importing the gzip compressed pool (so that the gzip compressed pool was using prio 60 threads and usb and other stuff was using prio >= 61 threads). That didn't help interactive performance... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> with recent bits ZFS compression is now handled concurrently with > many CPUs working on different records. > So this load will burn more CPUs and acheive it's results > (compression) faster. > > So the observed pauses should be consistent with that of a load > generating high system time. > The assumption is that compression now goes faster than when is was > single threaded. > > Is this undesirable ? We might seek a way to slow down compression in > order to limit the system load. According to this dtrace script #!/usr/sbin/dtrace -s sdt:genunix::taskq-enqueue /((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/ { @where[stack()] = count(); } tick-5s { printa(@where); trunc(@where); } ... I see bursts of ~ 1000 zio_write_compress() [gzip] taskq calls enqueued into the "spa_zio_issue" taskq by zfs`spa_sync() and its children: 0 76337 :tick-5s ... zfs`zio_next_stage+0xa1 zfs`zio_wait_for_children+0x5d zfs`zio_wait_children_ready+0x20 zfs`zio_next_stage_async+0xbb zfs`zio_nowait+0x11 zfs`dbuf_sync_leaf+0x1b3 zfs`dbuf_sync_list+0x51 zfs`dbuf_sync_indirect+0xcd zfs`dbuf_sync_list+0x5e zfs`dbuf_sync_indirect+0xcd zfs`dbuf_sync_list+0x5e zfs`dnode_sync+0x214 zfs`dmu_objset_sync_dnodes+0x55 zfs`dmu_objset_sync+0x13d zfs`dsl_dataset_sync+0x42 zfs`dsl_pool_sync+0xb5 zfs`spa_sync+0x1c5 zfs`txg_sync_thread+0x19a unix`thread_start+0x8 1092 0 76337 :tick-5s It seems that after such a batch of compress requests is submitted to the "spa_zio_issue" taskq, the kernel is busy for several seconds working on these taskq entries. It seems that this blocks all other "taskq" activity inside the kernel... This dtrace script counts the number of zio_write_compress() calls enqueued / execed by the kernel per second: #!/usr/sbin/dtrace -qs sdt:genunix::taskq-enqueue /((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/ { this->tqe = (taskq_ent_t *)arg1; @enq[this->tqe->tqent_func] = count(); } sdt:genunix::taskq-exec-end /((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/ { this->tqe = (taskq_ent_t *)arg1; @exec[this->tqe->tqent_func] = count(); } tick-1s { /* printf("%Y\n", walltimestamp); */ printf("TS(sec): %u\n", timestamp / 10); printa("enqueue %a: [EMAIL PROTECTED]", @enq); printa("exec%a: [EMAIL PROTECTED]", @exec); trunc(@enq); trunc(@exec); } I see bursts of zio_write_compress() calls enqueued / execed, and periods of time where no zio_write_compress() taskq calls are enqueued or execed. 10# ~jk/src/dtrace/zpool_gzip7.d TS(sec): 7829 TS(sec): 7830 TS(sec): 7831 TS(sec): 7832 TS(sec): 7833 TS(sec): 7834 TS(sec): 7835 enqueue zfs`zio_write_compress: 1330 execzfs`zio_write_compress: 1330 TS(sec): 7836 TS(sec): 7837 TS(sec): 7838 TS(sec): 7839 TS(sec): 7840 TS(sec): 7841 TS(sec): 7842 TS(sec): 7843 TS(sec): 7844 enqueue zfs`zio_write_compress: 1116 execzfs`zio_write_compress: 1116 TS(sec): 7845 TS(sec): 7846 TS(sec): 7847 TS(sec): 7848 TS(sec): 7849 TS(sec): 7850 TS(sec): 7851 TS(sec): 7852 TS(sec): 7853 TS(sec): 7854 TS(sec): 7855 TS(sec): 7856 TS(sec): 7857 enqueue zfs`zio_write_compress: 932 execzfs`zio_write_compress: 932 TS(sec): 7858 TS(sec): 7859 TS(sec): 7860 TS(sec): 7861 TS(sec): 7862 TS(sec): 7863 TS(sec): 7864 TS(sec): 7865 TS(sec): 7866 TS(sec): 7867 enqueue zfs`zio_write_compress: 5 execzfs`zio_write_compress: 5 TS(sec): 7868 enqueue zfs`zio_write_compress: 774 execzfs`zio_write_compress: 774 TS(sec): 7869 TS(sec): 7870 TS(sec): 7871 TS(sec): 7872 TS(sec): 7873 TS(sec): 7874 TS(sec): 7875 TS(sec): 7876 enqueue zfs`zio_write_compress: 653 execzfs`zio_write_compress: 653 TS(sec): 7877 TS(sec): 7878 TS(sec): 7879 TS(sec): 7880 TS(sec): 7881 And a final dtrace script, which monitors scheduler activity while filling a gzip compressed pool: #!/usr/sbin/dtrace -qs sched:::off-cpu, sched:::on-cpu, sched:::remain-cpu, sched:::preempt { /* @[probename, stack()] = count(); */ @[probename] = count(); } tick-1s { printf("%Y", walltimestamp); printa(@); trunc(@); } It shows periods of time with absolutely *no* scheduling activity (I guess this is when the "spa_zio_issue" taskq is working on such a bug batch of submitted gzip compression calls): 21# ~jk/src/dtrace/zpool_gzip9.d 2007 May 6 21:38:12 preempt 13 off-cpu 808 on-cpu
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> A couple more questions here. > > [mpstat] > > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > > 0 0 0 3109 3616 316 196 5 17 48 45 245 0 85 0 15 > > 1 0 0 3127 3797 592 217 4 17 63 46 176 0 84 0 15 > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > > 0 0 0 3051 3529 277 201 2 14 25 48 216 0 83 0 17 > > 1 0 0 3065 3739 606 195 2 14 37 47 153 0 82 0 17 > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > > 0 0 0 3011 3538 316 242 3 26 16 52 202 0 81 0 19 > > 1 0 0 3019 3698 578 269 4 25 23 56 309 0 83 0 17 ... > The largest numbers from mpstat are for interrupts and cross calls. > What does intrstat(1M) show? > > Have you run dtrace to determine the most frequent cross-callers? As far as I understand it, we have these frequent cross calls because 1. the test was run on an x86 MP machine 2. the kernel zmod / gzip code allocates and frees four big chunks of memory (4 * 65544 bytes) per zio_write_compress ( gzip ) call [1] Freeing these big memory chunks generates lots of cross calls, because page table entries for that memory are invalidated on all cpus (cores). Of cause this effect cannot be observed on an uniprocessor machine (one cpu / core). And apparently it isn't the root cause for the bad interactive performance with this test; the bad interactive performance can also be observed on single cpu/single core x86 machines. A possible optimization for MP machines: use some kind of kmem_cache for the gzip buffers, so that these buffers could be reused between gzip compression calls. [1] allocations per zio_write_compress() / gzip_compress() call: 1 6642 kobj_alloc:entry sz 5936, fl 1001 1 6642 kobj_alloc:entry sz 65544, fl 1001 1 6642 kobj_alloc:entry sz 65544, fl 1001 1 6642 kobj_alloc:entry sz 65544, fl 1001 1 6642 kobj_alloc:entry sz 65544, fl 1001 1 5769 kobj_free:entry fffeeb307000: sz 65544 1 5769 kobj_free:entry fffeeb2f5000: sz 65544 1 5769 kobj_free:entry fffeeb2e3000: sz 65544 1 5769 kobj_free:entry fffeeb2d1000: sz 65544 1 5769 kobj_free:entry fffed1c42000: sz 5936 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> A couple more questions here. ... > You still have idle time in this lockstat (and mpstat). > > What do you get for a lockstat -A -D 20 sleep 30? > > Do you see anyone with long lock hold times, long > sleeps, or excessive spinning? Hmm, I ran a series of "lockstat -A -l ph_mutex -s 16 -D 20 sleep 5" commands while writing to the gzip compressed zpool, and noticed these high mutex block times: Adaptive mutex block: 8 events in 5.100 seconds (2 events/sec) --- Count indv cuml rcnt nsec Lock Caller 5 62% 62% 0.00 317300109 ph_mutex+0x1380page_create_va+0x334 nsec -- Time Distribution -- count Stack 536870912 |@@ 5 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1b8 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc --- Count indv cuml rcnt nsec Lock Caller 1 12% 75% 0.00 260247717 ph_mutex+0x1a40page_create_va+0x334 nsec -- Time Distribution -- count Stack 268435456 |@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1de z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc --- Count indv cuml rcnt nsec Lock Caller 1 12% 88% 0.00 348135263 ph_mutex+0x1380page_create_va+0x334 nsec -- Time Distribution -- count Stack 536870912 |@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1a1 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc -
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
Roch Bourbonnais wrote > with recent bits ZFS compression is now handled concurrently with > many CPUs working on different records. > So this load will burn more CPUs and acheive it's results > (compression) faster. Is this done using the taskq's, created in spa_activate()? http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109 These threads seems to be running the gzip compression code, and are apparently started with a priority of maxclsyspri == 99. > So the observed pauses should be consistent with that of a load > generating high system time. > The assumption is that compression now goes faster than when is was > single threaded. > > Is this undesirable ? We might seek a way to slow > down compression in order to limit the system load. Hmm, I see that the USB device drivers are also using taskq's, see file usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c, function usba_init_pipe_handle(). The USB device driver is using a priority of minclsyspri == 60 (or "maxclsyspri - 5" == 94, in the case of isochronuous usb pipes): http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c#427 Could this be a problem? That is, when zfs' taskq is filled with lots of compression requests, there is no time left running USB taskq that have a lower priority than zfs? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> A couple more questions here. ... > What do you have zfs compresison set to? The gzip level is > tunable, according to zfs set, anyway: > > PROPERTY EDIT INHERIT VALUES > compression YES YES on | off | lzjb | gzip | gzip-[1-9] I've used the "default" gzip compression level, that is I used zfs set compression=gzip gzip_pool > You still have idle time in this lockstat (and mpstat). > > What do you get for a lockstat -A -D 20 sleep 30? # lockstat -A -D 20 /usr/tmp/fill /gzip_pool/junk lockstat: warning: 723388 aggregation drops on CPU 0 lockstat: warning: 239335 aggregation drops on CPU 1 lockstat: warning: 62366 aggregation drops on CPU 0 lockstat: warning: 51856 aggregation drops on CPU 1 lockstat: warning: 45187 aggregation drops on CPU 0 lockstat: warning: 46536 aggregation drops on CPU 1 lockstat: warning: 687832 aggregation drops on CPU 0 lockstat: warning: 575675 aggregation drops on CPU 1 lockstat: warning: 46504 aggregation drops on CPU 0 lockstat: warning: 40874 aggregation drops on CPU 1 lockstat: warning: 45571 aggregation drops on CPU 0 lockstat: warning: 33422 aggregation drops on CPU 1 lockstat: warning: 501063 aggregation drops on CPU 0 lockstat: warning: 361041 aggregation drops on CPU 1 lockstat: warning: 651 aggregation drops on CPU 0 lockstat: warning: 7011 aggregation drops on CPU 1 lockstat: warning: 61600 aggregation drops on CPU 0 lockstat: warning: 19386 aggregation drops on CPU 1 lockstat: warning: 566156 aggregation drops on CPU 0 lockstat: warning: 105502 aggregation drops on CPU 1 lockstat: warning: 25362 aggregation drops on CPU 0 lockstat: warning: 8700 aggregation drops on CPU 1 lockstat: warning: 585002 aggregation drops on CPU 0 lockstat: warning: 645299 aggregation drops on CPU 1 lockstat: warning: 237841 aggregation drops on CPU 0 lockstat: warning: 20931 aggregation drops on CPU 1 lockstat: warning: 320102 aggregation drops on CPU 0 lockstat: warning: 435898 aggregation drops on CPU 1 lockstat: warning: 115 dynamic variable drops with non-empty dirty list lockstat: warning: 385192 aggregation drops on CPU 0 lockstat: warning: 81833 aggregation drops on CPU 1 lockstat: warning: 259105 aggregation drops on CPU 0 lockstat: warning: 255812 aggregation drops on CPU 1 lockstat: warning: 486712 aggregation drops on CPU 0 lockstat: warning: 61607 aggregation drops on CPU 1 lockstat: warning: 1865 dynamic variable drops with non-empty dirty list lockstat: warning: 250425 aggregation drops on CPU 0 lockstat: warning: 171415 aggregation drops on CPU 1 lockstat: warning: 166277 aggregation drops on CPU 0 lockstat: warning: 74819 aggregation drops on CPU 1 lockstat: warning: 39342 aggregation drops on CPU 0 lockstat: warning: 3556 aggregation drops on CPU 1 lockstat: warning: ran out of data records (use -n for more) Adaptive mutex spin: 4701 events in 64.812 seconds (73 events/sec) Count indv cuml rcnt spin Lock Caller --- 1726 37% 37% 0.002 vph_mutex+0x17e8 pvn_write_done+0x10c 1518 32% 69% 0.001 vph_mutex+0x17e8 hat_page_setattr+0x70 264 6% 75% 0.002 vph_mutex+0x2000 page_hashin+0xad 194 4% 79% 0.004 0xfffed2ee0a88 cv_wait+0x69 106 2% 81% 0.002 vph_mutex+0x2000 page_hashout+0xdd 91 2% 83% 0.004 0xfffed2ee0a88 taskq_dispatch+0x2c9 83 2% 85% 0.004 0xfffed2ee0a88 taskq_thread+0x1cb 83 2% 86% 0.001 0xfffec17a56b0 ufs_iodone+0x3d 47 1% 87% 0.004 0xfffec1e4ce98 vdev_queue_io+0x85 43 1% 88% 0.006 0xfffec139a2c0 trap+0xf66 38 1% 89% 0.006 0xfffecb5f8cd0 cv_wait+0x69 37 1% 90% 0.004 0xfffec143ee90 dmult_deque+0x36 26 1% 91% 0.002 htable_mutex+0x108 htable_release+0x79 26 1% 91% 0.001 0xfffec17a56b0 ufs_putpage+0xa4 18 0% 91% 0.004 0xfffec00dca48 ghd_intr+0xa8 17 0% 92% 0.002 0xfffec00dca48 ghd_waitq_delete+0x35 12 0% 92% 0.002 htable_mutex+0x248 htable_release+0x79 11 0% 92% 0.008 0xfffec1e4ce98 vdev_queue_io_done+0x3b 10 0% 93% 0.003 0xfffec00dca48 ghd_transport+0x71 10 0% 93% 0.002 0xff00077dc138 page_get_mnode_freelist+0xdb --- Adaptive mutex block: 167 events in 64.812 seconds (3 events/sec) Count indv cuml rcnt nsec Lock Caller --- 78 47% 47% 0.0031623 vph_mutex+0x17e8 pvn_write_done+0x10c
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> I'm not quite sure what this test should show ? For me, the test shows how writing to a gzip compressed pool completely kills interactive desktop performance. At least when using an usb keyboard and mouse. (I've not yet tested with a ps/2 keyboard & mouse; or a SPARC box) > Compressing random data is the perfect way to generate heat. > After all, compression working relies on input entropy being low. > But good random generators are characterized by the opposite - output > entropy being high. Even a good compressor, if operated on a good random > generator's output, will only end up burning cycles, but not reducing the > data size. Whatever I write to the gzip compressed pool (128K of /dev/urandom random data, or 128K of a buffer filled with completely with characters, or the first 128K from /etc/termcap), the Xorg / Gnome desktop becomes completely unusable while writing to such a gzip compressed zpool / zfs. With an "lzjb" compressed zpool / zfs the system remains more or less usable... > Hence, is the request here for the compressor module > to 'adapt', kind of first-pass check the input data whether it's > sufficiently low-entropy to warrant a compression attempt ? > > If not, then what ? I'm not yet sure what the problem is. But it sure would be nice if a gzip compressed zpool / zfs wouldn't kill interactive desktop performance as is does now. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: gzip compression throttles system?
> The reason you are busy computing SHA1 hashes is you are using > /dev/urandom. The implementation of drv/random uses > SHA1 for mixing, > actually strictly speaking it is the swrand provider that does that part. Ahh, ok. So, instead of using dd reading from /dev/urandom all the time, I've now used this quick C program to write one /dev/urandom block over and over to the gzip compressed zpool: = #include #include #include int main(int argc, char **argv) { int fd; char buf[128*1024]; fd = open("/dev/urandom", O_RDONLY); if (fd < 0) { perror("open /dev/urandom"); exit(1); } if (read(fd, buf, sizeof(buf)) != sizeof(buf)) { perror("fill buf from /dev/urandom"); exit(1); } close(fd); fd = open(argv[1], O_WRONLY|O_CREAT, 0666); if (fd < 0) { perror(argv[1]); exit(1); } for (;;) { if (write(fd, buf, sizeof(buf)) != sizeof(buf)) { break; } } close(fd); exit(0); } = Avoiding the reads from /dev/urandom makes the effect even more noticeable, the machine now "freezes" for 10+ seconds. CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 3109 3616 316 1965 17 48 45 2450 85 0 15 10 0 3127 3797 592 2174 17 63 46 1760 84 0 15 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 3051 3529 277 2012 14 25 48 2160 83 0 17 10 0 3065 3739 606 1952 14 37 47 1530 82 0 17 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 3011 3538 316 2423 26 16 52 2020 81 0 19 10 0 3019 3698 578 2694 25 23 56 3090 83 0 17 # lockstat -kIW -D 20 sleep 30 Profiling interrupt: 6080 events in 31.341 seconds (194 events/sec) Count indv cuml rcnt nsec Hottest CPU+PILCaller --- 2068 34% 34% 0.00 1767 cpu[0] deflate_slow 1506 25% 59% 0.00 1721 cpu[1] longest_match 1017 17% 76% 0.00 1833 cpu[1] mach_cpu_idle 454 7% 83% 0.00 1539 cpu[0] fill_window 215 4% 87% 0.00 1788 cpu[1] pqdownheap 152 2% 89% 0.00 1691 cpu[0] copy_block 89 1% 90% 0.00 1839 cpu[1] z_adler32 77 1% 92% 0.0036067 cpu[1] do_splx 64 1% 93% 0.00 2090 cpu[0] bzero 62 1% 94% 0.00 2082 cpu[0] do_copy_fault_nta 48 1% 95% 0.00 1976 cpu[0] bcopy 41 1% 95% 0.0062913 cpu[0] mutex_enter 27 0% 96% 0.00 1862 cpu[1] build_tree 19 0% 96% 0.00 1771 cpu[1] gen_bitlen 17 0% 96% 0.00 1744 cpu[0] bi_reverse 15 0% 97% 0.00 1783 cpu[0] page_create_va 15 0% 97% 0.00 1406 cpu[1] fletcher_2_native 14 0% 97% 0.00 1778 cpu[1] gen_codes 11 0% 97% 0.00 912 cpu[1]+6 ddi_mem_put8 5 0% 97% 0.00 3854 cpu[1] fsflush_do_pages --- It seems the same problem can be observed with "lzjb" compression, but the pauses with lzjb are much shorter and the kernel consumes less system cpu time with "lzjb" (which is expected, I think). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: gzip compression throttles system?
> I just had a quick play with gzip compression on a filesystem and the > result was the machine grinding to a halt while copying some large > (.wav) files to it from another filesystem in the same pool. > > The system became very unresponsive, taking several seconds to echo > keystrokes. The box is a maxed out AMD QuadFX, so it should have plenty > of grunt for this. I've observed the same behavior. With my test I've used a zpool created on a 1GB file (on an UFS filesystem): # mkfile 1G /var/tmp/vdev_for_gzip_pool # zpool create gzip_pool /var/tmp/vdev_for_gzip_pool # zfs set compression=gzip gzip_pool # chown jk /gzip_pool Now, when I run this command... % dd bs=128k if=/dev/urandom of=/gzip_pool/junk ... the mouse cursor sometimes is frozen for two (or more) seconds. Same with keyboard input. This is on an amd64 x2 box, 4gb memory, and usb keyboard and usb mouse. Lots of system cpu time is used while the gzip compressed poll is filled: % mpstat 5 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 47 1 122 646 316 317 13 2482 10243 3 0 94 1 41 1 159 334 101 279 11 2482 9002 3 0 94 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 01 0 6860 7263 282 7322 781640 70 0 30 10 0 6866 6870 10491 850 1210 100 0 0 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 06 576 301 4653 19 18 146 1261 5 0 95 10 0 36 1471 1276 410 29 20 24 115 3350 59 0 41 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 5404 5823 309 11322 571 4200 56 0 44 10 0 5409 5431 135 121 801 1790 100 0 0 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 01 529 300 281 17 13 10 103 2740 64 0 36 10 09 1348 1169 4528 23 14 105 1630 5 0 95 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 6186 6607 282 6289 55 11 1230 88 0 12 10 0 6196 6259 53 7843 80 12 1320 75 0 25 A kernel profile seems to show that the kernel is busy with gzip'ing (and busy with computing SHA1 hashes?): # lockstat -kIW -D 20 sleep 20 Profiling interrupt: 3882 events in 20.021 seconds (194 events/sec) Count indv cuml rcnt nsec Hottest CPU+PILCaller --- 1802 46% 46% 0.00 1931 cpu[0] mach_cpu_idle 517 13% 60% 0.00 6178 cpu[1] SHA1Transform 482 12% 72% 0.00 1094 cpu[0] deflate_slow 328 8% 81% 0.00 cpu[1] longest_match 104 3% 83% 0.00 940 cpu[0] fill_window 98 3% 86% 0.0047357 cpu[1] bcopy 65 2% 87% 0.00 5438 cpu[1] SHA1Update 63 2% 89% 0.00 834 cpu[1] bzero 50 1% 90% 0.00 1042 cpu[0] pqdownheap 44 1% 92% 0.00 676 cpu[1] Encode 32 1% 92% 0.00 1136 cpu[0] copy_block 24 1% 93% 0.00 1214 cpu[1] do_copy_fault_nta 23 1% 94% 0.00 401205 cpu[0] do_splx 23 1% 94% 0.00 644 cpu[1] hmac_encr 22 1% 95% 0.00 1058 cpu[1] z_adler32 19 0% 95% 0.00 1208 cpu[0]+10 todpc_rtcget 16 0% 96% 0.00 752 cpu[0] SHA1Final 14 0% 96% 0.00 1186 cpu[1] mutex_enter 12 0% 96% 0.00 642 cpu[1] kcopy 11 0% 97% 0.00 948 cpu[0] page_create_va --- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS and UFS performance
> > That's probably bug 6382683 "lofi is confused about sync/async I/O", > > and AFAIK it's fixed in current opensolaris releases. > > > According to Bug Database bug 6382683 is in > 1-Dispatched state, what does that mean? I wonder if > the fix is available (or will be available) as a > Solaris 10 patch? Seems I was wrong, and this issue is not yet fixed. Yesterday, before replying, I've tried to check the state of the bug, but b.o.o. always returned "We encountered an unexpected error. Please try back again." :-( I repeated my test case (creating a pcfs filesystem on a lofi device from an 80gbyte file on zfs), and the write times with a current opensolaris kernel have improved by a factor of 4 (~ 10 seconds instead of 40-50 seconds), but I guess part of the improvement is because of a hardware upgrade here (zpool on two s-ata drives, instead of one p-ata drive). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS and UFS performance
> We are running Solaris 10 11/06 on a Sun V240 with 2 > CPUS and 8 GB of memory. This V240 is attached to a > 3510 FC that has 12 x 300 GB disks. The 3510 is > configured as HW RAID 5 with 10 disks and 2 spares > and it's exported to the V240 as a single LUN. > > We create iso images of our product in the following > way (high-level): > > # mkfile 3g /isoimages/myiso > # lofiadm -a /isoimages/myiso > /dev/lofi/1 > # newfs /dev/rlofi/1 > # mount /dev/lofi/1 /mnt > # cd /mnt; zcat /product/myproduct.tar.Z | tar xf - > > and we finally use mkisofs to create the iso image. > > > ZFS performance > - > When we create a ZFS file system on the above LUN and > create the iso it takes forever it seems to be > hanging in the tar extraction (we killed this after a > while i.e. > few hours). That's probably bug 6382683 "lofi is confused about sync/async I/O", and AFAIK it's fixed in current opensolaris releases. See the thread with subject "bad lofi performance with zfs file backend / bad mmap write performance" from january / february 2006: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-January/016450.html http://mail.opensolaris.org/pipermail/zfs-discuss/2006-February/016566.html Possible workaround: Create a 3gb zvol device, and use that instead of a 3gb file + lofi. Or use something like this: zfs create tank/myiso cd /tank/myiso cat /product/myproduct.tar.Z | tar xf - mkisofs ... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS and Firewire/USB enclosures
> I still haven't got any "warm and fuzzy" responses > yet solidifying ZFS in combination with Firewire or USB enclosures. I was unable to use zfs (that is "zpool create" or "mkfs -F ufs") on firewire devices, because scsa1394 would hang the system as soon as multiple concurrent write commands are submitted to it. I filed bug 6445725 (which disappeared in the scsa1394 bugs.opensolaris.org black hole), submitted a fix and requested a sponsor for the fix[*], but not much has happened with fixing this problem in opensolaris. There is no such problem with USB mass storage devices. [*] http://www.opensolaris.org/jive/thread.jspa?messageID=46190 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs legacy filesystem remounted rw: atime temporary off?
I have my /usr filesystem configured as a zfs filesystem, using a legacy mountpoint. I noticed that the system boots with atime updates temporarily turned off (and doesn't record file accesses in the /usr filesystem): # df -h /usr Filesystem size used avail capacity Mounted on files/usr-b57 98G 2.1G18G11%/usr # zfs get atime files/usr-b57 NAME PROPERTY VALUE SOURCE files/usr-b57 atime offtemporary That is, when a zfs legacy filesystem is mounted in read-only mode, and then remounted read/write, atime updates are off: # zfs create -o mountpoint=legacy files/foobar # mount -F zfs -o ro files/foobar /mnt # zfs get atime files/foobar NAME PROPERTY VALUE SOURCE files/foobar atime ondefault # mount -F zfs -o remount,rw files/foobar /mnt # zfs get atime files/foobar NAME PROPERTY VALUE SOURCE files/foobar atime off temporary Is this expected behaviour? It works if I remount with the "atime" option: # mount -F zfs -o remount,rw,atime files/foobar /mnt # zfs get atime files/foobar NAME PROPERTY VALUE SOURCE files/foobar atime ondefault This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Heavy writes freezing system
> We are having issues with some Oracle databases on > ZFS. We would appreciate any useful feedback you can > provide. > [...] > The issue seems to be > serious write contention/performance. Some read > issues also exhibit themselves, but they seem to be > secondary to the write issues. What hardware is used? Sparc? x86 32-bit? x86 64-bit? How much RAM is installed? Which version of the OS? Did you already try to monitor kernel memory usage, while writing to zfs? Maybe the kernel is running out of free memory? (I've bugs like 6483887 in mind, "without direct management, arc ghost lists can run amok") For a live system: echo ::kmastat | mdb -k echo ::memstat | mdb -k In case you've got a crash dump for the hung system, you can try the same ::kmastat and ::memstat commands using the kernel crash dumps saved in directory /var/crash/`hostname` # cd /var/crash/`hostname` # mdb -k unix.1 vmcore.1 ::memstat ::kmastat This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS related (probably) hangs due to memory exhaustion(?) with snv53
> >Hmmm, so there is lots of evictable cache here (mostly in the MFU > >part of the cache)... could you make your core file available? > >I would like to take a look at it. > > Isn't this just like: > 6493923 nfsfind on ZFS filesystem quickly depletes memory in a 1GB system > > Which was introduced in b51(or 52) and fixed in snv_54. Hmm, or like: 6483887 without direct management, arc ghost lists can run amok (which isn't fixed at this time) See also this thread: http://www.opensolaris.org/jive/thread.jspa?messageID=67370 Mark had send me some test bits with a modified arc.c; it tried to evict ghost list entries when the arc cache is in no_grow state and the arc ghost lists consume too much memory. The main change was a new function arc_buf_hdr_alloc(), in arc.c that shrinks the ghost lists when the system is running out of memory: static arc_buf_hdr_t * arc_buf_hdr_alloc(spa_t *spa, int size) { arc_buf_hdr_t *hdr; if (arc.no_grow && arc.mru_ghost->size + arc.mfu_ghost->size > arc.c) { int64_t mru_over = arc.anon->size + arc.mru->size + arc.mru_ghost->size - arc.c; if (mru_over > 0 && arc.mru_ghost->size > 0) { int64_t todelete = MIN(arc.mru_ghost->lsize, mru_over); arc_evict_ghost(arc.mru_ghost, todelete); } else { int64_t todelete = MIN(arc.mfu_ghost->lsize, arc.mru_ghost->size + arc.mfu_ghost->size - arc.c); arc_evict_ghost(arc.mfu_ghost, todelete); } } ASSERT3U(size, >, 0); hdr = kmem_cache_alloc(hdr_cache, KM_SLEEP); ASSERT(BUF_EMPTY(hdr)); hdr->b_size = size; hdr->b_spa = spa; hdr->b_state = arc.anon; hdr->b_arc_access = 0; hdr->b_flags = 0; return (hdr); } This was then used by arc_buf_alloc(): arc_buf_t * arc_buf_alloc(spa_t *spa, int size, void *tag) { arc_buf_hdr_t *hdr; arc_buf_t *buf; hdr = arc_buf_hdr_alloc(spa, size); buf = kmem_cache_alloc(buf_cache, KM_SLEEP); buf->b_hdr = hdr; ... return (buf); } This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs/fstyp slows down recognizing pcfs formatted floppies
I've noticed that fstyp on a floppy media formatted with "pcfs" now needs somewhere between 30 - 100 seconds to find out that the floppy media is formatted with "pcfs". E.g. on sparc snv_48, I currently observe this: % time fstyp /vol/dev/rdiskette0/nomedia pcfs 0.01u 0.10s 1:38.84 0.1% zfs's /usr/lib/fs/zfs/fstyp.so.1 seems to add about 40 seconds to that time, because it reads 1 mbyte from the floppy media (~ 2/3 of a 1.44MB floppy), only to find out that the floppy media does not contain a zfs pool: SPARC snv_48, before tamarack: % time /usr/lib/fs/zfs/fstyp /vol/dev/rdiskette0/nomedia unknown_fstyp (no matches) 0.01u 0.04s 0:36.27 0.1% x86, snv_53, with tamarack: % time /usr/lib/fs/zfs/fstyp /dev/rdiskette unknown_fstyp (no matches) 0.00u 0.01s 0:35.25 0.0% (the rest of the time is wasted probing for an udfs filesystem) Isn't the minimum device size required for a zfs pool 64 mbytes? (SPA_MINDEVSIZE, from the sys/fs/zfs.h header) Shouldn't zfs/fstyp skip probing for zfs / zpools on small capacity devices like a floppy media, that are less than this 64 mbytes ? diff -r 367766133bfe usr/src/cmd/fs.d/zfs/fstyp/fstyp.c --- a/usr/src/cmd/fs.d/zfs/fstyp/fstyp.cFri Dec 15 09:03:53 2006 -0800 +++ b/usr/src/cmd/fs.d/zfs/fstyp/fstyp.cSun Dec 17 11:27:08 2006 +0100 @@ -32,6 +32,8 @@ #include #include #include +#include +#include #include #include #include @@ -88,6 +90,15 @@ fstyp_mod_ident(fstyp_mod_handle_t handl char*str; uint64_t u64; charbuf[64]; + struct stat stb; + + /* +* don't probe for zfs on small media (e.g. floppy) that is +* too small for a zpool. +*/ + if (fstat(h->fd, &stb) == 0 && stb.st_size < SPA_MINDEVSIZE) { + return (FSTYP_ERR_NO_MATCH); + } if (zpool_read_label(h->fd, &h->config) != 0 || h->config == NULL) { This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Recommended Minimum Hardware for ZFS Fileserver?
> I've been looking at building this setup in some > cheap eBay rack-mount servers that are generally > single or dual 1.0GHz Pentium III, 1Gb PC133 RAM, and > I'd have to add the SATA II controller into a spare > PCI slot. > > For maximum file system performance of the ZFS pool, > would anyone care to offer hardware recommendations? For maximum file system performance of the ZFS pool, a 64-bit x86 cpu would be *much* better than a 32-bit x86 cpu. The 32-bit cpu won't use more than ~ 512Mb of RAM for ZFS' ARC cache (no matter how much is installed in the machine); a 64-bit cpu is able to use all of the available RAM for ZFS's cache. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: ZFS hangs systems during copy
> This is: > 6483887 without direct management, arc ghost lists can run amok That seems to be a new bug? http://bugs.opensolaris.org does not yet find it. > The fix I have in mind is to control the ghost lists as part of > the arc_buf_hdr_t allocations. If you want to test out my fix, > I can send you some diffs... Ok, I can do that. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zpool snapshot fails on unmounted filesystem
> I just retried to reproduce it to generate a reliable > test case. Unfortunately, I cannot reproduce the > error message. So I really have no idea what might > have cause it I also had this problem 2-3 times in the past, but I cannot reproduce it. Using dtrace against the kernel, I found out that the source of the EBUSY error 16 is the kernel function zil_suspend(): [b] ... 0<- dnode_cons 0 0-> dnode_setdblksz 0<- dnode_setdblksz14 0-> dmu_zfetch_init 0 -> list_create 0 <- list_create 3734548404 0 -> rw_init 0 <- rw_init 3734548400 0<- dmu_zfetch_init3734548400 0-> list_insert_head 0<- list_insert_head3734548052 0 <- dnode_create 3734548048 0<- dnode_special_open 3734548048 0-> dsl_dataset_set_user_ptr 0<- dsl_dataset_set_user_ptr 0 0 <- dmu_objset_open_impl 0 0<- dmu_objset_open 0 0-> dmu_objset_zil 0<- dmu_objset_zil 3700903200 0-> zil_suspend 0 | zil_suspend:entry zh_claim_txg: 83432 0<- zil_suspend16 0-> dmu_objset_close 0 -> dsl_dataset_close 0-> dbuf_rele 0 -> dbuf_evict_user 0-> dsl_dataset_evict 0 -> unique_remove ... 1200 /* 1201 * Suspend an intent log. While in suspended mode, we still honor 1202 * synchronous semantics, but we rely on txg_wait_synced() to do it. 1203 * We suspend the log briefly when taking a snapshot so that the snapshot 1204 * contains all the data it's supposed to, and has an empty intent log. 1205 */ 1206 int 1207 zil_suspend(zilog_t *zilog) 1208 { 1209 const zil_header_t *zh = zilog->zl_header; 1210 lwb_t *lwb; 1211 1212 mutex_enter(&zilog->zl_lock); 1213 if (zh->zh_claim_txg != 0) {/* unplayed log */ 1214 mutex_exit(&zilog->zl_lock); 1215 return (EBUSY); 1216 } ... [/b] It seems that you can identify zfs filesystems that fail zfs snapshot with error 16 EBUSY using zdb -iv {your_zpool_here} | grep claim_txg If there are any ZIL headers listed with a claim_txg != 0, the dataset that uses this ZIL should fail zfs snapshot with error 16, EBUSY. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: ZFS hangs systems during copy
> >> Sounds familiar. Yes it is a small system a Sun blade 100 with 128MB of > >> memory. > > > > Oh, 128MB... > > > Btw, does anyone know if there are any minimum hardware (physical memory) > > requirements for using ZFS? > > > > It seems as if ZFS wan't tested that much on machines with 256MB (or less) > > memory... > > The minimum hardware requirement for Solaris 10 (including ZFS) is > 256MB, and we did test with that :-) > > On small memory systems, make sure that you are running with > kmem_flags=0 (this is the default on non-debug builds, but debug builds > default to kmem_flags=f and you will have to manually change it in > /etc/system). I do have kernel memory allocator debugging disabled; both S10 6/2006 and SX:CR snv48 are non-debug builds. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS hangs systems during copy
> ZFS 11.0 on Solaris release 06/06, hangs systems when > trying to copy files from my VXFS 4.1 file system. > any ideas what this problem could be?. What kind of system is that? How much memory is installed? I'm able to hang an Ultra 60 with 256 MByte of main memory, simply by writing big files to a ZFS filesystem. The problem happens with both Solaris 10 6/2006 and Solaris Express snv_48. In my case there seems to be a problem with ZFS' ARC cache, which is not returning memory to the kernel, when free memory gets low. Instead, ZFS' ARC cache data structures keeps growing until the machine is running out of kernel memory. At this point the machine hangs, lots of kernel threads are waiting for free memory, and the box must be power cycled (Well, unpluging and re-connecting the type 5 keyboard works and gets me to the OBP, where I can force a system crashdump and reboot). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss