from:"Jürgen Keil"

Re: [zfs-discuss] [OpenIndiana-discuss] format dumps the core

2010-11-06 Thread Jürgen Keil

> r...@tos-backup:~# pstack /dev/rdsk/core
> core '/dev/rdsk/core' of 1217:  format
> fee62e4a UDiv (4, 0, 8046c80, 80469a0, 8046a30,  8046a50) + 2a
> 08079799 auto_sense (4, 0, 8046c80, 0) + 281
> ...

Seems that one function call is missing in the back trace
between auto_sense and UDiv, because UDiv does not setup
a complete stack frame.

Looking at the source ...
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/format/auto_sense.c#819
... you can get some extra debug output
from format when you specify the "-M" option.

E.g. with an usb flash memory stick and format -eM
I get

# format -eM
Searching for disks...
c11t0d0: attempting auto configuration
Inquiry:
00 80 02 02 1f 00 00 00 53 61 6e 44 69 73 6b 20 SanDisk 
55 33 20 43 6f 6e 74 6f 75 72 20 20 20 20 20 20 U3 Contour  
34 2e 304.0
Product id: U3 Contour  
Capacity: 00 7a 46 90 00 00 02 00 
blocks:  8013456 (0x7a4690)
blksize: 512
disk name:  `r  `
Request sense for command mode sense failed
Sense data:
f0 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 
Mode sense page 0x3 failed
Request sense for command mode sense failed
Sense data:
f0 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 
Mode sense page 0x4 failed
Geometry:
pcyl:1956
ncyl:1954
heads:   128
nsects:  32
acyl:2
bcyl:0
rpm: 0
nblocks: 8013457
The current rpm value 0 is invalid, adjusting it to 3600

Geometry after adjusting for capacity:
pcyl:1956
ncyl:1954
heads:   128
nsects:  32
acyl:2
rpm: 3600

Partition 0:   128.00MB   64 cylinders
Partition 1:   128.00MB   64 cylinders
Partition 2: 3.82GB 1956 cylinders
Partition 6: 3.56GB 1825 cylinders
Partition 8: 2.00MB1 cylinders

Inquiry:
00 00 03 02 1f 00 00 02 41 54 41 20 20 20 20 20 ATA 
48 69 74 61 63 68 69 20 48 54 53 37 32 33 32 33 Hitachi HTS72323
43 33 30C30
done

c11t0d0: configured with capacity of 3.82GB
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [OpenIndiana-discuss] format dumps the core

2010-10-31 Thread Jürgen Keil

> - Original Message -
...
> > r...@tos-backup:~# format
> > Searching for disks...Arithmetic Exception (core dumped)

> This error also seems to occur on osol 134. Any idea
> what this might be?

What stack backtrace is reported for that core dump ("pstack core") ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Root pool on boot drive lost on another machine because of devids

2010-08-21 Thread Jürgen Keil

> I have a USB flash drive which boots up my
> opensolaris install. What happens is that whenever I
> move to a different machine,
> the root pool is lost because the devids don't match
> with what's in /etc/zfs/zpool.cache and the system
> just can't find the rpool.

See defect 4755 or defect 5484

https://defect.opensolaris.org/bz/show_bug.cgi?id=4755
https://defect.opensolaris.org/bz/show_bug.cgi?id=5484

When I last experimented with booting Solaris
from flash memory sticks I modified scsa2usb
so that it would construct a devid for the usb
flash memory stick,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs periodic writes on idle system [Re: Getting desktop to auto sleep]

2010-06-21 Thread Jürgen Keil

> Why does zfs produce a batch of writes every 30 seconds on opensolaris b134
> (5 seconds on a post b142 kernel), when the system is idle?

It was caused by b134 gnome-terminal. I had an iostat
running in a gnome-terminal window, and the periodic
iostat output is written to a temporary file by gnome-terminal.
This kept the hdd busy. Older gnome-terminals (b111)
didn't write terminal output to a disk file. Workaround is
to use xterm instead of b134 gnome-terminal. for a
command that periodically produces output
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs periodic writes on idle system [Re: Getting desktop to auto sleep]

2010-06-20 Thread Jürgen Keil

Why does zfs produce a batch of writes every 30 seconds on opensolaris b134
(5 seconds on a post b142 kernel), when the system is idle?

On an idle OpenSolaris 2009.06 (b111) system,  /usr/demo/dtrace/iosnoop.d
shows no i/o activity for at least 15 minutes.

The same dtrace test on an idle b134 system shows a batch of writes every 30 
seconds.

And on current opensolaris bits, on an idle system, I see writes every 5 
seconds.


The periodic writes prevents that the disk can enter power save mode.
And this breaks the /etc/power.conf autoS3 feature.  Why does zfs have
to write something to disk when the system is idle?



> > Putting the flag does not seem to do anything to the
> > system. Here is my power.conf file: 
> ...
> > autopm  enable
> > autoS3  enable
> > S3-support  enable
> 
> Problem seems to be that all power managed devices
> must be at their lowest power level, otherwise autoS3
> won't suspend the system.  And somehow one or more
> device does not reach the lowest power level.
...
> The laptop still does not power down, because every
> 30 seconds there is a batch of writes to the hdd drive,
> apparently from zfs, and that keeps the hdd powered
> up.
> 
> The periodic writes can be monitored with:
> 
> dtrace -s /usr/demo/dtrace/iosnoop.d
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS and 4kb sector Drives (All new western digital GREEN Drives?)

2010-03-27 Thread Jürgen Keil

> It would be nice if the 32bit osol kernel support
> 48bit LBA 

Is already supported, for may years (otherwise
disks with a capacity >= 128GB could not be
used with Solaris) ...

> (similar to linux, not sure if 32bit BSD
> supports 48bit LBA ), then the drive would probably
> work - perhaps later in the year we will have time to
> work on a patch to support 48bit lba on the 32bit
> osol kernels...

I think that - as a start - you have to eliminate the use
of the (signed 32-bit long on 32-bit kernel) daddr_t data
type in the kernel, and switch everything to 64 bit
diskaddr_t, and fix all device drivers that are currently
using daddr_t (including getting 3rd party device drivers
fixed).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] opensolaris fresh install lock up

2010-01-17 Thread Jürgen Keil

> > in the build 130 annoucement you can find this:
> > 13540 Xserver crashes and freezes a system installed with LiveCD on bld 130
>
> It is for sure this bug.  This is ok, i
> can do most of what i need via ssh.  I just
> wasn't sure if it was a bug or if i had done
> something wrongi had tried installing 2-3 times
> and it kept happening...was driving me insane.
> 
> I can deal with it if it's something that
> will be fixed in 131 (which is what the bug page
> seems to hint at)

A part of the problem will be fixed in b131: CR 6913965
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913965

But it seems the segfault from CR 6913157
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913157
is not yet fixed in b131.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] opensolaris fresh install lock up

2010-01-17 Thread Jürgen Keil

> I just installed opensolaris build 130 which i
> downloaded from genunix.  The install went
> fineand the first reboot after install seemed to
> work but when i powered down and rebooted fully, it
> locks up as soon as i log in. 

Hmm, seems you're asking in the wrong forum.
Sounds more like a desktop or x-window problem
to me.  Why do you think this is a zfs problem?

> Gnome is still showing
> the icon it shows when stuff hasn't finished
> loadingis there any way i can find out why
> it's locking up and how to fix it?

Hmm, in the build 130 annoucement you can find this:
( http://www.opensolaris.org/jive/thread.jspa?threadID=120631&tstart=0 )


13540 Xserver crashes and freezes a system installed with LiveCD on bld 130
http://defect.opensolaris.org/bz/show_bug.cgi?id=13540

After installation, the X server may crash and appears to not
be restarted by the GNOME Display Manager (gdm).

Work-around: None at this time.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread Jürgen Keil

> > I wasnt clear in my description, I m referring to ext4 on Linux. In 
> > fact on a system with low RAM even the dd command makes the system 
> > horribly unresponsive.
> >
> > IMHO not having fairshare or timeslicing between different processes 
> > issuing reads is frankly unacceptable given a lame user can bring 
> > the system to a halt with 3 large file copies. Are there ZFS 
> > settings or Project Resource Control settings one can use to limit 
> > abuse from individual processes?
> 
> I am confused.  Are you talking about ZFS under OpenSolaris, or are 
> you talking about ZFS under Linux via Fuse?
> 
> Do you have compression or deduplication enabled on
> the zfs  filesystem?
> 
> What sort of system are you using?

I was able to reproduce the problem running
current (mercurial) opensolaris bits, with the
"dd" command:

  dd if=/dev/urandom of=largefile.txt bs=1048576k count=8

dedup is off, compression is on. System is a 32-bit laptop
with 2GB of memory, single core cpu.  The system was
unusable / unresponsive for about 5 minutes before I was
able to interrupt the dd process.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting & reservations

2009-11-03 Thread Jürgen Keil

> But: Isn't there an implicit expectation for a space guarantee associated 
> with a 
> dataset? In other words, if a dataset has 1GB of data, isn't it natural to 
> expect to be able to overwrite that space with other
> data?

Is there such a space guarantee for compressed or cloned zfs?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting

2009-11-03 Thread Jürgen Keil

> Well, then you could have more "logical space" than
> "physical space", and that would be extremely cool,

I think we already have that, with zfs clones.

I often clone a zfs onnv workspace, and everything
is "deduped" between zfs parent snapshot and clone
filesystem.  The clone (initially) needs no extra zpool
space.

And with zfs clone I can actually use all
the remaining free space from the zpool.

With zfs deduped blocks, I can't ...

> but what happens if for some reason you wanted to
> turn off dedup on one of the filesystems? It might
> exhaust all the pool's space to do this.

As far as I understand it, nothing happens for existing
deduped blocks when you turn off dedup for a zfs
filesystem.  The new dedup=off setting is affecting
new written blocks only.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread Jürgen Keil

> I think I'm observing the same (with changeset 10936) ...

# mkfile 2g /var/tmp/tank.img
# zpool create tank /var/tmp/tank.img
# zfs set dedup=on tank
# zfs create tank/foobar


> dd if=/dev/urandom of=/tank/foobar/file1 bs=1024k count=512
512+0 records in
512+0 records out
> cp /tank/foobar/file1 /tank/foobar/file2
> cp /tank/foobar/file1 /tank/foobar/file3
> cp /tank/foobar/file1 /tank/foobar/file4
/tank/foobar/file4: No space left on device

>  zfs list -r tank
NAME  USED  AVAIL  REFER  MOUNTPOINT
tank 1.95G  022K  /tank
tank/foobar  1.95G  0  1.95G  /tank/foobar

> zpool list tank
NAME   SIZE   USED  AVAILCAP  DEDUP  HEALTH  ALTROOT
tank  1.98G   515M  1.48G25%  3.90x  ONLINE  -
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread Jürgen Keil

> So.. it seems that data is deduplicated, zpool has
> 54.1G of free space, but I can use only 40M.
> 
> It's x86, ONNV revision 10924, debug build, bfu'ed from b125.

I think I'm observing the same (with changeset 10936) ...

I created a 2GB file, and a "tank" zpool on top of that file,
with compression and dedup enabled:

mkfile 2g /var/tmp/tank.img
zpool create tank /var/tmp/tank.img
zfs set dedup=on tank
zfs set compression=on tank


Now I tried to create four zfs filesystems, 
and filled them by pulling and updating
the same set of onnv sources from mercurial.

One copy needs ~ 800MB of disk space 
uncompressed, or ~ 520MB compressed. 
During the 4th "hg update":

> hg update
abort: No space left on device: 
/tank/snv_128_yy/usr/src/lib/libast/sparcv9/src/lib/libast/FEATURE/common


> zpool list tank
NAME   SIZE   USED  AVAILCAP  DEDUP  HEALTH  ALTROOT
tank  1,98G   720M  1,28G35%  3.70x  ONLINE  -


> zfs list -r tank
NAME  USED  AVAIL  REFER  MOUNTPOINT
tank 1,95G  026K  /tank
tank/snv_128  529M  0   529M  /tank/snv_128
tank/snv_128_jk   530M  0   530M  /tank/snv_128_jk
tank/snv_128_xx   530M  0   530M  /tank/snv_128_xx
tank/snv_128_yy   368M  0   368M  /tank/snv_128_yy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Change physical path to a zpool.

2009-10-24 Thread Jürgen Keil

> I have a functional OpenSolaris x64 system on which I need to physically
> move the boot disk, meaning its physical device path will change and
> probably its cXdX name.
> 
> When I do this the system fails to boot
...
> How do I inform ZFS of the new path?
...
> Do I need to boot from the LiveCD and then import the
> pool from its new path?

Exactly.

Boot from the livecd with the disk connected on the
new physical path, and run "pfexec zpool import -f rpool",
followed by a reboot.

That'll update the zpool's label with the new physical
device path information.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Install and boot from USB stick?

2009-08-02 Thread Jürgen Keil

> Does this give you anything?
> 
> [url=http://bildr.no/view/460193][img]http://bildr.no/thumb/460193.jpeg[/img][/url]

That looks like the zfs mountroot panic you
get when the root disk was moved to a different
physical location (e.g. different usb port).
In this case the physical device path recorded
in the zpool on disk label cannot be used to
access the root disk.

Updating the physical device path works by
booting the livecd and "zpool import -f rpool".
zpool import will rewrite the on disk label with
the new physical device path, so that the next
boot from the device should work.

In theory, zfs is able to find the disk in any other 
physical location using the storage device's "devid"
property. Unfortunately most usb sticks use the device 
type of "removable media disk" devices, and in this
case Solaris won't generate "devids" for the usb 
stick.  And the result is that the root disk can't be
found, when it was moved around and connected to
a different usb port.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Install and boot from USB stick?

2009-08-02 Thread Jürgen Keil

> No there was no error level fatal.
> 
> Well, here is what I have tried since:
> 
> a) I´ve tried to install a custom grub like described here:
> http://defect.opensolaris.org/bz/show_bug.cgi?id=4755#c28
> With that in place, I just get the grub prompt. I´ve
> tried to zpool import -f rpool when this occoured (I
> read somewhere that it might help, but it didnt).

The grub in OS 2009.06 should not be affected by bug 4755
any more.

I think the grub from bug 4755 comment 28 is too old and
does not support the latest zpool format version updates;
so that it can't read from a current (version 17?) zpool.

> b) I noticed when booting from the livecd (text mode),
> with the newly installed usb stick in, i get this:
> [url=http://bildr.no/view/460143][img]http://bildr.no/thumb/460143.jpeg[/img][/url]

Hmm, seems that Solaris' disk driver is receiving
bogus "mode sense" data from the usb stick ?

I think with scsa2usb.conf changed and reduced command
set enabled, we could avoid sending mode sense
commands to the usb flash stick...

> And then, when i imported the zpool to edit
> scsa2usb.conf, I get these messages again:
> [url=http://bildr.no/view/460144][img]http://bildr.no/thumb/460144.jpeg[/img][/url]
> Then, when i were done editing scsa2usb.conf, and
> rebooted, thoose same messages appears once more.

Hmm, so we can get these unit attention messages
both when booted from the usb stick, and when 
booted from the live cd.

> c) I´ve tried to edit grub after rebooting from a
> fresh install, remoeving splashimage, back/fron
> color, and ´console=graphics´, and adding ´-v´ after
> ´kernel$´. When doing this, nothing happens. I press
> ´b´to boot, the menu list disappears, but the grub
> image is still there (the splashimage where the logo
> is placed down right).

Should work, it should boot the kernel in text mode;
I just tested it with an OS 2009.06 install / virtualbox
guest.

> d) I´ve noticed that after installation, the
> installation log says the same as this:
> http://defect.opensolaris.org/bz/show_bug.cgi?id=4755#c25
> 
> 
> I´m running out of ideas. I´ve seen someone mention
> that you can replace $ZFSBOOT (cant remember the
> correct variable name atm.) with the full path of the
> usb stick in grub. 

Probably something like this:
http://www.opensolaris.org/os/community/xen/devdocs/install-depedencies/

  "-B zfs-bootfs=rpool/57, "

Problem is that the zfs id of the boot filesystem (57)
isn't fixed, and changes when you create new 
boot environments.

And I think the id could change between different opensolaris
releases.


And I'm not sure how far your get with booting from the usb stick;
I suspect that your system has already mounted the zfs filesystem
root just fine, but gets into trouble due to these "unit attention"
"medium may have changed" events received from the usb stick.

Maybe you could try to boot in text mode from the usb stick
with options " -kv", and when the system is stuck with booting,
enter the kernel debugger by pressing "F1-a", and printing a
process tree listing with the kmdb ::ptree command?
If we get a process tree listing, that would be an indication
how far the kernel got with the boot process.

> Im going to try OS 2008.05, 2008.11 and the latest
> dev build of OS from genunix.org to see if any of
> thoose are able to create a proper installation. I´ve
> seen so much complaints around theese three build
> regarding USB drives that maybe it will work.

Do you have a non Kingston usb flash memory stick
(or e.g. a usb hard disk drive) that you could try instead?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Install and boot from USB stick?

2009-08-01 Thread Jürgen Keil

> > Are there any message with "Error level: fatal" ? 
> 
> Not that I know of, however, i can check. But im
> unable to find out what to change in grub to get
> verbose output rather than just the splashimage.

Edit the grub commands, delete all splashimage, 
foreground and background lines, and delete the
console=graphics option from the kernel$ line.

To enable verbose kernel message, append kernel
boot option " -v" at the end of the kernel$ boot
command line.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Install and boot from USB stick?

2009-08-01 Thread Jürgen Keil

> Nah, that didnt seem to do the trick.
> 
>  After unmounting
> and rebooting, i get the same error msg from my
> previous post.

Did you get these scsi error messages during installation
to the usb stick, too?

Another thing that confuses me:  the unit attention /
medium may have changed message is using
"error level: retryable".  I think the sd disk driver
is supposed to just retry the read or write operation.
The message seems more like a warning message,
not a fatal error.
Are there any message with "Error level: fatal" ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Install and boot from USB stick?

2009-07-31 Thread Jürgen Keil

> How can i implement that change, after installing the
> OS? Or do I need to build my own livecd?

Boot from the livecd, attach the usb stick,
open a terminal window, "pfexec bash" starts
a root shell, "zpool import -f rpool" should 
find and import the zpool from the usb stick.

Mount the root filesystem from the usb stick;
zfs set mountpoint=legacy rpool/ROOT/opensolaris
mount -F zfs rpool/ROOT/opensolaris /mnt

And edit /mnt/kernel/drv/scsa2usb.conf
E.g. try 

  attribute-override-list = "vid=* reduced-cmd-support=true";

Try to boot from the usb stick, using the "reboot" command.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Install and boot from USB stick?

2009-07-31 Thread Jürgen Keil

> Well, here is the error:
> 
> ... usb stick reports(?) scsi error: medium may have changed ...

That's strange.  The media in a flash memory
stick can't be changed - although most sticks
report that they do have removable media.

Maybe this stick needs one of the workarounds
that can be enabled in /kernel/drv/scsa2usb.conf ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Install and boot from USB stick?

2009-07-31 Thread Jürgen Keil

> I've found it only works for USB sticks up to 4GB :(
> If I tried a USB stick bigeer than that, it didn't boot.

Works for me on 8GB USB sticks.

It is possible that the stick you've tried has some
issues with the Solaris USB drivers, and needs to
have one of the workarounds from the
scsa2usb.conf file enabled.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Install and boot from USB stick?

2009-07-30 Thread Jürgen Keil

> The GRUB menu is presented, no problem there, and
> then the opensolaris progress bar. But im unable to
> find a way to view any details on whats happening
> there. The progress bar just keep scrolling and
> scrolling.

Press the ESC key; this should switch back from
graphics to text mode and most likely you'll see
that the OS is waiting for some console user input.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs on 32 bit?

2009-06-17 Thread Jürgen Keil

> > 32 bit Solaris can use at most 2^31 as disk address; a disk block is
> > 512bytes, so in total it can address 2^40 bytes.
> >
> > A SMI label found in Solaris 10 (update 8?) and OpenSolaris has been 
> > enhanced
> > and can address 2TB but only on a 64 bit system.
> 
> is what the problem is.  so 32-bit zfs cannot use disks larger than
> 1(.09951)tb regardless of whether it's for the root pool or not.

I think this isn't a problem with the 32-bit zfs module, but with
all of the 32-bit Solaris kernel.  The daddr_t type is used in a 
*lot* of places, and is defined as a signed 32-bit integer ("long")
in the 32-bit kernel.  It seems that there already are 64-bit
disk address types defined, diskaddr_t and lldaddr_t 
(that could be used in the 32-bit kernel, too), but a lot
of the existing kernel code doesn't use them.  And redefining
the existing daddr_t type to 64-bit "long long" for the 32-bit
kernel won't work, because it would break binary 
compatibility.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] moving a disk between controllers

2009-06-17 Thread Jürgen Keil

> I had a system with it's boot drive
> attached to a backplane which worked fine. I tried
> moving that drive to the onboard controller and a few
> seconds into booting it would just reboot. 

In certain cases zfs is able to find the drive on the
new physical device path (IIRC: the disk's "devid" didn't
change and the new physical location of the disk
is already present in the /etc/devices/devid_cache).

But in most cases you have to boot from the installation
media and "zpool import -f rpool" the pool, with the
disk attached at the new physical device path, so that
the new physical device path gets recorded in zpool's
on-disk label.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs on 32 bit?

2009-06-17 Thread Jürgen Keil

> Not a ZFS bug.  IIRC, the story goes something like this: a SMI
> label only works to 1 TByte, so to use > 1 TByte, you need an
> EFI label.  For older x86 systems -- those which are 32-bit -- you
> probably have a BIOS which does not handle EFI labels.   This
> will become increasingly irritating since 2 TByte disks are now
> hitting the store shelves, but it doesn't belong in a ZFS category.

Hasn't the 1TB limit for SMI labels been fixed
(= limit raised to 2TB) by "PSARC/2008/336 Extended VTOC" ?
http://www.opensolaris.org/os/community/on/flag-days/pages/2008091102/

But there still is a 1TB limit for 32-bit kernel, the PSARC case includes this:

The following functional limitations are applicable:
* 32-bit kernel will not support disks > 1 TB.
...


Btw. on older Solaris releases the install media always
booted into a 32-bit kernel, even on systems that are
capable to run the 64-bit kernel.  Seems to have
been changed with the latest opensolaris releases and
that PSARC case, so that 64-bit systems can install to
a disk > 1TB.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs on 32 bit?

2009-06-14 Thread Jürgen Keil

> besides performance aspects, what`s the con`s of
> running zfs on 32 bit ?

The default 32 bit kernel can cache a limited amount of data
(< 512MB) - unless you lower the "kernelbase" parameter.
In the end the small cache size on 32 bit explains the inferior
performance compared to the 64 bit kernel.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS snapshot splitting & joining

2009-02-12 Thread Jürgen Keil

> The problem was with the shell.  For whatever reason,
> /usr/bin/ksh can't rejoin the files correctly.  When
> I switched to /sbin/sh, the rejoin worked fine, the
> cksum's matched, ...
> 
> The ksh I was using is:
> 
> # what /usr/bin/ksh
> /usr/bin/ksh:
> Version M-11/16/88i
> SunOS 5.10 Generic 118873-04 Aug 2006
> 
> So, is this a bug in the ksh included with Solaris 10? 

Are you able to reproduce the issue with a script like this
(needs ~ 200 gigabytes of free disk space) ?  I can't...

==
% cat split.sh
#!/bin/ksh

bs=1k
count=`expr 57 \* 1024 \* 1024`
split_bs=8100m

set -x

dd if=/dev/urandom of=data.orig bs=${bs} count=${count}
split -b ${split_bs} data.orig data.split.
ls -l data.split.*
cat data.split.a[a-z] > data.join
cmp -l data.orig data.join
==


On SX:CE / OpenSolaris the same version of /bin/ksh = /usr/bin/ksh
is present:

% what /usr/bin/ksh
/usr/bin/ksh:
Version M-11/16/88i
SunOS 5.11 snv_104 November 2008

I did run the script in a directory in an uncompressed zfs filesystem:

% ./split.sh 
+ dd if=/dev/urandom of=data.orig bs=1k count=59768832
59768832+0 records in
59768832+0 records out
+ split -b 8100m data.orig data.split.
+ ls -l data.split.aa data.split.ab data.split.ac data.split.ad data.split.ae 
data.split.af data.split.ag data.split.ah
-rw-r--r--   1 jk   usr  8493465600 Feb 12 18:31 data.split.aa
-rw-r--r--   1 jk   usr  8493465600 Feb 12 18:35 data.split.ab
-rw-r--r--   1 jk   usr  8493465600 Feb 12 18:39 data.split.ac
-rw-r--r--   1 jk   usr  8493465600 Feb 12 18:43 data.split.ad
-rw-r--r--   1 jk   usr  8493465600 Feb 12 18:48 data.split.ae
-rw-r--r--   1 jk   usr  8493465600 Feb 12 18:53 data.split.af
-rw-r--r--   1 jk   usr  8493465600 Feb 12 18:58 data.split.ag
-rw-r--r--   1 jk   usr  1749024768 Feb 12 18:58 data.split.ah
+ cat data.split.aa data.split.ab data.split.ac data.split.ad data.split.ae 
data.split.af data.split.ag data.split.ah
+ 1> data.join
+ cmp -l data.orig data.join
2002.33u 2302.05s 1:51:06.85 64.5%


As expected, it works without problem. The files are 
bit for bit identical after splitting and joining.

For me this looks more as if your hardware is broken:
http://opensolaris.org/jive/thread.jspa?messageID=338148

A single bad bit (!) in the middle of the joined file is very suspicious...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Jürgen Keil

> bash-3.00# zfs mount usbhdd1
> cannot mount 'usbhdd1': E/A-Fehler
> bash-3.00#

Why is there an I/O error?

Is there any information logged to /var/adm/messages when this
I/O error is reported?  E.g. timeout errors for the USB storage device?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import of bootable root pool renders it unbootable

2008-10-13 Thread Jürgen Keil

> Again, what I'm trying to do is to boot the same OS from physical
> drive - once natively on my notebook, the other time from withing
> Virtualbox. There are two problems, at least. First is the bootpath as
> in VB it emulates the disk as IDE while booting natively it is sata.

When I started experimenting with installing SXCE to an USB flash
memory stick, which should be bootable on different machines,
I initially worked around this problem by creating multiple
/boot/grub/menu.lst boot entries, one for each supported 
bootable machine.  The difference between the grub boot entries
was the "-B bootpath=/physical/device/path" option.


Fortunately, this has become much easier in recent builds,
because zfs boot is now able to open the pool by using a
disk unique "devid" in addition to using physical device paths.
Whatever the physical device path will be on a random selected
x86 machine where I try to boot my usb flash memory stick,
the sd driver will always generate the same unique "devid" for
the flash memory stick, and zfs boot is able to find and open
the correct usb storage device in the system that has the
desired "devid" for the pool.


In case sata on the notebook will create the same "devid" for the
disk as virtualbox with p-ata, the zpool should be bootable just fine
on the two different boxes.  But apparently the "devid" created for
the disk with sata on the notebook is different from the "devid"
created for the disk when running under virtualbox...
(that is, pool open by physical device path and by devid fails)


I guess what we need is the fix for this bug, which allows to open
the pool by the boot disk's unique "guid":

Bug ID   6513775
Synopsiszfs root disk portability
http://bugs.opensolaris.org/view_bug.do?bug_id=6513775

> The other one seems to be hostid stored in a pool.

This shouldn't be a problem for x86, because the
hostid is stored in a file (sysinit kernel module) in the
root filesystem, on x86.   Wherever you boot that disk,
the hostid will move with it.

Well, unless you boot some other installed Solaris /
OpenSolaris system (has a unique hostid / sysinit file)
and import that zfs root pool.  In this case the hostid
stored in the zpool label will change.

This should change in build 100, with the putback for 
this bug:

Bug ID   6716241
SynopsisChanging hostid, by moving in a new sysinit file, panics a zfs 
root file system
http://bugs.opensolaris.org/view_bug.do?bug_id=6716241

AFAIR, a hostid mismatch will be ignored when mounting a zfs
root file system.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import of bootable root pool renders it unbootable

2008-10-06 Thread Jürgen Keil

> Cannot mount root on /[EMAIL PROTECTED],0/pci103c,[EMAIL PROTECTED],2/[EMAIL 
> PROTECTED],0:a fstype zfs

Is that physical device path correct for your new  system?

Or is this the physical device path (stored on-disk in the zpool label)
from some other system?   In this case you may be able to work around
the problem by passing a "-B bootpath=..."  option to the kernel

e.g. something like this:

kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS,bootpath="/[EMAIL 
PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a"


You can find out the correct physical device path string
for the zfs root disk by booting the system from the optical 
installation media, and running the format utility.

OTOH, if you already have booted from the optical installation
media, it's easiest to just import the root zpool from the
installation system, because that'll update the physical device
path in the zpool's label on disk (and it clears the hostid stored in
the zpool label - another problem that could prevent mounting
the zfs root).
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SIL3124 stability?

2008-09-25 Thread Jürgen Keil

> THe lock I observed happened inside the BIOS of the card after the main board
> BIOS jumped into the board BIOS. This was before any bootloader has been 
> ionvolved.

Is there a disk using a zpool with an EFI disk label?  Here's a link to an old
thread about systems hanging in BIOS POST when they see disks with
EFI disk labels:

http://www.opensolaris.org/jive/thread.jspa?messageID=18211
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] CF to SATA adapters for boot device

2008-08-27 Thread Jürgen Keil

> What Widows utility you are talking about?  I have
> used the Sandisk utility program to remove the U3
> Launchpad (which creates a permanent hsfs partition
> in the flash disk), but it does not help the problem.

That's the problem, most usb sticks don't require any
special software and just work with the OS' usb mass
storage support.

IIRC, I once had a 128MB Prolific USB stick (sold as
Kingmax) which could be partitioned into two devices,
one of them could be password protected and it was
possible to configure on of the mass storage devices
as "HDD" (= fixed media) or "FDD" (=floppy / removable
media).  There was an extra windows utility to use/configure
these extra features.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] CF to SATA adapters for boot device

2008-08-25 Thread Jürgen Keil

W. Wayne Liauh wrote:

> If you are running B95, that "may" be the problem.  I
> have no problem booting B93 (& previous builds) from
> a USB stick, but B95, which has a newer version of
> ZFS, does not allow me to boot from it (& the USB
> stick was of course recognized during installation of
> B95, just won't boot).

I suspect the problem with ZFS boot from USB sticks is,
that the kernel does not create "devid" properties for the
USB stick, and apparently those devids are now required for
zfs booting.

The kernel (sd driver) does not create "devid" properties for USB flash
memory sticks, because most (all ?) of them nowadays report
that they use removable media - which is a lie, I'm not able to change the
media / flash roms in such a device.

If you have a windows utility distributed with your flash memory
stick that allows configuration of the removable media attribute:
try to set it to "fixed media". For such an usb storage  device with
fixed media, the sd(7d) driver should create "devid" properties, and
zfs booting works just fine for such an usb flash memory stick.

Btw. you can view the "removable media" attribute with the
command "cdrecord -scanbus".  I'm getting this, for two different
usb flash memory sticks (note: it reports "Removable Disk", not
just "Disk"):

scsibus7:
7,0,0   700) 'Samsung ' 'Mighty Drive' 'PMAP' Removable Disk

scsibus10:
10,0,0  1000) 'OCZ ' 'ATV ' '1100' Removable Disk

This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] error found while scubbing, how to fix it?

2008-08-21 Thread Jürgen Keil

> On 08/21/08 17:26, Jürgen Keil wrote:
> > Looks like bug 6727872, which is fixed in build 96.
> > http://bugs.opensolaris.org/view_bug.do?bug_id=6727872
> 
> that pool contains normal OpenSolaris mountpoints,

Did you upgrade the opensolaris installation in the past?

AFAIK the opensolaris upgrade procedure results in 
cloned zfs filesystems rpool/RPOOL/opensolaris,
rpool/RPOOL/opensolaris-1, ..., rpool/RPOOL/opensolaris-N
And only the latest one is mounted, the other (older)
zfs root filesystems are unmounted.

> what do you meen abount umounting and remounting it?

The bug happens with unmounted filesystems, so you
need to mount them first, then umount.

Something like

   mount -F zfs rpool/RPOOL/opensolaris /mnt && umount /mnt
   mount -F zfs rpool/RPOOL/opensolaris-1 /mnt && umount /mnt
  ...

> I need to do this with a live cd?

No.  You can do that when the system is booted from the hdd.

This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] error found while scubbing, how to fix it?

2008-08-21 Thread Jürgen Keil

> I have OpenSolaris (snv_95) installed into my laptop (single sata disk) 
> and tomorrow I updated my pool with:
> 
> # zpool -V 11 -a
> 
> and after I start a scrub into the pool with:
> 
> # zpool scrub rpool
> 
> # zpool status -vx
> 
>   NAMESTATE READ WRITE CKSUM
>   rpool   ONLINE   0 0 4
> c5t0d0s0  ONLINE   0 0 4
> 
> errors: Permanent errors have been detected in the
> following files:
> 
>  :<0x0>

Looks like bug 6727872, which is fixed in build 96.
http://bugs.opensolaris.org/view_bug.do?bug_id=6727872

Do you have unmounted zfs filesystems that use
legacy mountpoints?If yes: Don't worry, the hdd and
the zpool on it should be fine.  Workaround is to mount
and unmount all those zfs filesystems, and on the next 
zpool scrub there should be no more checksum errors.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

2008-07-23 Thread Jürgen Keil

I wrote:
> Bill Sommerfeld wrote:
> > On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote:
> > > > I ran a scrub on a root pool after upgrading to snv_94, and got 
> > > > checksum errors:
> > > 
> > > Hmm, after reading this, I started a zpool scrub on my mirrored pool, 
> > > on a system that is running post snv_94 bits:  It also found checksum 
> > > errors
> > > 
> > once is accident.  twice is coincidence.  three times is enemy action :-)
> > 
> > I'll file a bug as soon as I can 
> 
> I filed 6727872, for the problem with zpool scrub checksum errors
> on unmounted zfs filesystems with an unplayed ZIL.

6727872 has already been fixed, in what will become snv_96.

For my zpool, zpool scrub doesn't report checksum errors any more.

But: something is still a bit strange with the data reported by zpool status.
The error counts displayed by zpool status are all 0 (during the scrub, and when
the scrub has completed), but when zpool scrub completes it tells me that
"scrub completed after 0h58m with 6 errors".  But it doesn't list the errors.

# zpool status -v files
  pool: files
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub in progress for 0h57m, 99.39% done, 0h0m to go
config:

NAME  STATE READ WRITE CKSUM
files ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c8t0d0s6  ONLINE   0 0 0
c9t0d0s6  ONLINE   0 0 0

errors: No known data errors

# zpool status -v files
  pool: files
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub completed after 0h58m with 6 errors on Wed Jul 23 18:23:00 2008
config:

NAME  STATE READ WRITE CKSUM
files ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c8t0d0s6  ONLINE   0 0 0
c9t0d0s6  ONLINE   0 0 0

errors: No known data errors

This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Moving ZFS root pool to different system breaks boot

2008-07-23 Thread Jürgen Keil

> Recently, I needed to move the boot disks containing a ZFS root pool in an
> Ultra 1/170E running snv_93 to a different system (same hardware) because
> the original system was broken/unreliable.
> 
> To my dismay, unlike with UFS, the new machine wouldn't boot:
> 
> WARNING: pool 'root' could not be loaded as it was
> last accessed by another system (host:  hostid:
> 0x808f7fd8).  See: http://www.sun.com/msg/ZFS-8000-EY
> 
> panic[cpu0]/thread=180e000: BAD TRAP: type=31 rp=180acc0 addr=0 mmu_fsr=0 
> occurred in module "unix" due to a NULL pointer dereference
...
> suffering from the absence of SPARC failsafe archives after liveupgrade
> (recently mentioned on install-discuss), I'd have been completely stuck.

Yes, on x86 you can boot into failsafe and let it mount the root pool
under /a and then reboot.  This removes the hostid from the configuration
information in the zpool's label.

I guess that on SPARC you could boot from the installation optical media
(or from a network server), and zpool import -f the root pool; that should
put the correct hostid into the root pool's label.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

2008-07-22 Thread Jürgen Keil

Bill Sommerfeld wrote:
> On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote:
> > > I ran a scrub on a root pool after upgrading to snv_94, and got checksum 
> > > errors:
> > 
> > Hmm, after reading this, I started a zpool scrub on my mirrored pool, 
> > on a system that is running post snv_94 bits:  It also found checksum errors
> > 
> once is accident.  twice is coincidence.  three times is enemy action :-)
> 
> I'll file a bug as soon as I can 

I filed 6727872, for the problem with zpool scrub checksum errors
on unmounted zfs filesystems with an unplayed ZIL.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

2008-07-21 Thread Jürgen Keil

Rustam wrote:

> I'm living with this error for almost 4 months and probably have record
> number of checksum errors:

> # zpool status -xv
>   pool: box5
...
> errors: Permanent errors have been detected in the
> following files:
>  
> box5:<0x0>
>
> I've Sol 10 U5 though.

I suspect that this (S10u5)  is a different issue, because for my
system's pool it seems to be caused by the opensolaris putback
on July 07th  for these fixes:

6343667 scrub/resilver has to start over when a snapshot is taken
6343693 'zpool status' gives delayed start for 'zpool scrub'
6670746 scrub on degraded pool return the status of 'resilver completed'?
6675685 DTL entries are lost resulting in checksum errors
6706404 get_history_one() can dereference off end of hist_event_table[]
6715414 assertion failed: ds->ds_owner != tag in dsl_dataset_rele()
6716437 ztest gets SEGV in arc_released()
6722838 bfu does not update grub

This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

2008-07-21 Thread Jürgen Keil

Bill Sommerfeld wrote:

> On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote:
> > > I ran a scrub on a root pool after upgrading to snv_94, and got checksum 
> > > errors:
> > 
> > Hmm, after reading this, I started a zpool scrub on my mirrored pool, 
> > on a system that is running post snv_94 bits:  It also found checksum errors
> > 
> 
> out of curiosity, is this a root pool?  

It started as standard pool, and is using version 3 zpool format.

I'm using a small ufs root, and have /usr as a zfs filesystem on
that pool.

At some point in the past i did setup a zfs root and /usr filesystem
for experimenting with xVM unstable bits.

> A second system of mine with a mirrored root pool (and an additional
> large multi-raidz pool) shows the same symptoms on the mirrored root
> pool only.
> 
> once is accident.  twice is coincidence.  three times is enemy action :-)
> 
> I'll file a bug as soon as I can (I'm travelling at the moment with
> spotty connectivity), citing my and your reports.

Btw. I also found the scrub checksum errors on a non-mirrored zpool
(laptop with only one hdd).

And on one zpool that was using a non-mirrored, striped pool on two
S-ATA drives.

I think that in my case the cause for the scrub checksum errors is an
open ZIL transaction on an *unmounted* zfs filesystem.  In the past
such a zfs state prevented creating snapshots for the unmounted zfs,
see bug 6482985, 6462803.  That is still the case.  But now it also
seems to trigger checksum errors for a zpool scrub.

Stack backtrace for the ECKSUM (which gets translated into EIO errors
in arc_read_done()):

  1  64703   arc_read_nolock:return, rval 5
  zfs`zil_read_log_block+0x140
  zfs`zil_parse+0x155
  zfs`traverse_zil+0x55
  zfs`scrub_visitbp+0x284
  zfs`scrub_visit_rootbp+0x4e
  zfs`scrub_visitds+0x82
  zfs`dsl_pool_scrub_sync+0x109
  zfs`dsl_pool_sync+0x158
  zfs`spa_sync+0x254
  zfs`txg_sync_thread+0x226
  unix`thread_start+0x8

Does a "zdb -ivv {pool}" report any ZIL headers with a claim_txg != 0
on your pools?  Is the dataset that is associated with such a ZIL an
unmounted zfs?

# zdb -ivv files | grep claim_txg
ZIL header: claim_txg 5164405, seq 0
ZIL header: claim_txg 0, seq 0
ZIL header: claim_txg 0, seq 0
ZIL header: claim_txg 0, seq 0
ZIL header: claim_txg 0, seq 0
ZIL header: claim_txg 5164405, seq 0
ZIL header: claim_txg 0, seq 0

# zdb -i files/matrix-usr
Dataset files/matrix-usr [ZPL], ID 216, cr_txg 5091978, 2.39G, 192089 objects

ZIL header: claim_txg 5164405, seq 0

first block: [L0 ZIL intent log] 1000L/1000P DVA[0]=<0:12421e:1000> 
zilog uncompressed LE contiguous birth=5163908 fill=0 
cksum=c368086f1485f7c4:39a549a81d769386:d8:3

Block seqno 3, already claimed, [L0 ZIL intent log] 1000L/1000P 
DVA[0]=<0:12421e:1000> zilog uncompressed LE contiguous birth=5163908 
fill=0 cksum=c368086f1485f7c4:39a549a81d769386:d8:3

On two of my zpools I've eliminated the zpool scrub checksum errors by
mounting /  unmounting the zfs with the unplayed ZIL.

This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

2008-07-21 Thread Jürgen Keil

Miles Nordin wrote:

>  "jk" == Jürgen Keil <[EMAIL PROTECTED]> writes:
> jk> And a zpool scrub under snv_85 doesn't find  checksum errors, either.
> how about a second scrub with snv_94?  are the checksum errors gone
> the second time around?

Nope.

I've now seen this problem on 4 zpools on three different systems.
Post snv_94 (bfu'ed) reports checksum errors during scrub, and the
scrub under the original nevada release (snv_85, snv_89 and snv_91)
didn't report checksum errors.

This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

2008-07-18 Thread Jürgen Keil

> > I ran a scrub on a root pool after upgrading to snv_94, and got checksum 
> > errors:
> 
> Hmm, after reading this, I started a zpool scrub on my mirrored pool, 
> on a system that is running post snv_94 bits:  It also found checksum errors
...
> OTOH, trying to verify checksums with zdb -c didn't
> find any problems:

And  a zpool scrub under snv_85 doesn't find checksum errors, either.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

2008-07-18 Thread Jürgen Keil

> I ran a scrub on a root pool after upgrading to snv_94, and got checksum 
> errors:

Hmm, after reading this, I started a zpool scrub on my mirrored pool, 
on a system that is running post snv_94 bits:  It also found checksum errors

# zpool status files
  pool: files
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h46m with 9 errors on Fri Jul 18 13:33:56 2008
config:

NAME  STATE READ WRITE CKSUM
files DEGRADED 0 018
  mirror  DEGRADED 0 018
c8t0d0s6  DEGRADED 0 036  too many errors
c9t0d0s6  DEGRADED 0 036  too many errors

errors: No known data errors


Addding the -v option to zpool status returned:


errors: Permanent errors have been detected in the following files:

:<0x0>



OTOH, trying to verify checksums with zdb -c didn't find any problems:

# zdb -cvv files

Traversing all blocks to verify checksums and verify nothing leaked ...

No leaks (block sum matches space maps exactly)

bp count: 2804880
bp logical:121461614592  avg:  43303
bp physical:   84585684992   avg:  30156compression:   1.44
bp allocated:  85146115584   avg:  30356compression:   1.43
SPA allocated: 85146115584  used: 79.30%

951.08u 419.55s 2:24:34.32 15.8%
#
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [caiman-discuss] swap & dump on ZFS volume

2008-07-01 Thread Jürgen Keil

Mike Gerdts wrote

> By default, only kernel memory is dumped to the dump device.  Further,
> this is compressed.  I have heard that 3x compression is common and
> the samples that I have range from 3.51x - 6.97x.

My samples are in the range 1.95x - 3.66x.  And yes, I lost
a few crash dumps on a box with a 2GB swap slice, after
physical memory was upgraded from 4GB to 8GB.

% grep "pages dumped" /var/adm/messages*
/var/adm/messages:Jun 27 13:43:56 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 593680 pages dumped, compression ratio 3.51, 
/var/adm/messages.0:Jun 25 13:08:22 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 234922 pages dumped, compression ratio 2.39, 
/var/adm/messages.1:Jun 12 13:22:53 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 399746 pages dumped, compression ratio 1.95, 
/var/adm/messages.1:Jun 12 19:00:01 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 245417 pages dumped, compression ratio 2.41, 
/var/adm/messages.1:Jun 16 19:15:37 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 710001 pages dumped, compression ratio 3.48, 
/var/adm/messages.1:Jun 16 19:21:35 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 315989 pages dumped, compression ratio 3.66, 
/var/adm/messages.2:Jun 11 15:40:32 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 341209 pages dumped, compression ratio 2.68,
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS boot issues on older P3 system.

2008-06-30 Thread Jürgen Keil

> I wanted to resurrect an old dual P3 system with a couple of IDE drives
> to use as a low power quiet NIS/DHCP/FlexLM server so I tried installing
> ZFS boot from build 90.

> Jun 28 16:09:19 zack scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
> PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED] (ata0):
> Jun 28 16:09:19 zacktimeout: abort request, target=0 lun=0

I suspect that the root cause for these timeout bugs on MP systems
is 6657646, and it is supposed to be fixed in snv_92:

http://bugs.opensolaris.org/view_bug.do?bug_id=6657646
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS very slow under xVM

2007-11-02 Thread Jürgen Keil

> I've got Solaris Express Community Edition build 75
> (75a) installed on an Asus P5K-E/WiFI-AP (ip35/ICH9R
> based) board.  CPU=Q6700, RAM=8Gb, disk=Samsung
> HD501LJ and (older) Maxtor 6H500F0.
> 
> When the O/S is running on bare metal, ie no xVM/Xen
> hypervisor, then everything is fine.
> 
> When it's booted up running xVM and the hypervisor,
> then unlike plain disk I/O, and unlike svm volumes,
> zfs is around 20 time slower.

Just a wild guess, but since we're just seeing a similar
strange performance problem on an Intel quadcore system
with 8GB or memory


Can you try to remove some part of the ram, so that the
system runs on 4GB instead of 8GB?  Or use xen / 
solaris boot options to restrict physical memory usage to
the low 4GB range?


It seems that on certain mainboards [*] the bios is unable to
install mtrr cachable ranges for all of the 8GB system ram,
when when some important stuff ends up in uncachable ram,
performance gets *really* bad.

[*] http://lkml.org/lkml/2007/6/1/231
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs: allocating allocated segment(offset=77984887808

2007-10-12 Thread Jürgen Keil

size=66560)
In-Reply-To: <[EMAIL PROTECTED]>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Approved: 3sm4u3
X-OpenSolaris-URL: 
http://www.opensolaris.org/jive/message.jspa?messageID=163221&tstart=0#163221

> how does one free segment(offset=77984887808 size=66560)
> on a pool that won't import?
> 
> looks like I found
> http://bugs.opensolaris.org/view_bug.do?bug_id=6580715
> http://mail.opensolaris.org/pipermail/zfs-discuss/2007-September/042541.html

Btw. my machine from that mail.opensolaris.org zfs-discuss thread,
which paniced with "freeing free segment", did have a defective ram
module.

I don't know for sure, but I suspect that the bad ram module might
have been the root cause for that "freeing free segment" zfs panic, 
too ...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Bug 6580715, panic: freeing free segment

2007-10-12 Thread Jürgen Keil

A few weeks ago, I wrote:

> Yesterday I tried to clone a xen dom0 zfs root
> filesystem and hit this panic (probably Bug ID 6580715):
> 
> 
> > ::status
> debugging crash dump vmcore.6 (64-bit) from moritz
> operating system: 5.11 wos_b73 (i86pc)
> panic message: freeing free segment (vdev=0 offset=11c14df000 size=1000)
> dump content: kernel pages only
> 
> > $c
> vpanic()
> vcmn_err+0x28(3, f812d818, ff0004850798)
> zfs_panic_recover+0xb6()
> metaslab_free_dva+0x1a2(ff01487ec580, ff0162231b20, 20b236c, 0)
> metaslab_free+0x97(ff01487ec580, ff0162231b20, 20b236c, 0)
> zio_free_blk+0x4c(ff01487ec580, ff0162231b20, 20b236c)
> zil_sync+0x334(ff015b7d94c0, ff015689d180)
> dmu_objset_sync+0x18e(ff014ff39c40, ff017c500d58, ff015689d180)
> dsl_dataset_sync+0x5d(ff01571efa00, ff017c500d58, ff015689d180)
> dsl_pool_sync+0xb5(ff014f4ace00, 20b236c)
> spa_sync+0x1c5(ff01487ec580, 20b236c)
> txg_sync_thread+0x19a(ff014f4ace00)
> thread_start+8()


Btw, a few weeks later I got more strange panics on this machine,
in the procfs filesystem module, which I finally traced as a single
defective bit in a ddr2 ram module (verified by memtest86).

So, I guess it's possible that the above zfs panic happened due
to the defective ram module.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Mountroot and Bootroot Comparison

2007-10-05 Thread Jürgen Keil

> Regarding compression, if I am not mistaken, grub
> cannot access files  that are compressed.

There was a bug where grub was unable to access files
on zfs that contained holes:

Bug ID   6541114
SynopsisGRUB/ZFS fails to load files from a default compressed (lzjb) 
root
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6541114

That has been fixed in snv_71.  The description text is misleading,
there was no issue with reading lzjb compressed files, the bug 
occurred when reading "hole" blocks from a zfs file.



Grub is unable to read from gzip compressed zfs filesystems, though:

Bug ID   6538017
SynopsisZFS boot to support gzip decompression
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538017
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs boot doesn't support /usr on a separate partition.

2007-10-01 Thread Jürgen Keil

> Should I bfu to the latest bits to fix this  
> problem or do I also need to install b72?

bfu to b72 (or newer) should be OK, iff there really is
a difference with shared library dependencies between
b70 and b72. I'm not sure about b70; but b72 with
just an empty /usr directory in the root filesystem,
used as a mount point for for mounting a zfs /usr
works just fine.

Are you trying to setup a system that boots from a 
zfs root filesystem, and has /usr on a separate zfs
filesystem?  What exactly is the panic that you get
when you try to boot with option "-k"?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs boot doesn't support /usr on a separate partition.

2007-10-01 Thread Jürgen Keil

> I would like confirm that Solaris Express Developer Edition 09/07  
> b70, you can't have /usr on a separate zfs filesystem because of  
> broken dependencies.
> 
> 1/ Part of the problem is that /sbin/zpool is linked to 
> /usr/lib/libdiskmgt.so.1

Yep, in the past this happened on several occasions for me:  /sbin/zfs,
 /etc/fs/zfs/mount or /lib/libzfs.so.1 depends on libraries that can only be 
found
in /usr/lib and are not yet available when you have /usr in a separate zfs
filesystem - the system becomes unbootable.

See also bugs like this:

Bug ID   6570056
Synopsis/sbin/zpool should not link to files in /usr/lib
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6570056

Bug ID   6494840
Synopsislibzfs should dlopen libiscsitgt rather than linking to it
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494840



Workaround was to copy the relevant missing shared libraries into 
the root filesystem.

I currently have snv_72 installed, and bfu'ed to the latest opensolaris
bits.  And I have /usr on a zfs filesystem. This doesn't need extra
copies of libraries from /usr/lib in the root filesystem, and is able to
mount a separate zfs /usr filesystem.


Note that the system apparently doesn't need /sbin/zpool for mounting
a zfs /usr filesystem; /etc/fs/zfs/mount or /sbin/zfs should be enough.

Not sure why your system is rebooting, though.  You should boot it
with option "-k", so that you can read the exact panic message.

Note that you need a valid /etc/zfs/zpool.cache file, and for zfs *root*
you also have to make sure that the /etc/zfs/zpool.cache file can be
found in the boot archive.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Boot Won't work with a straight or mirror zfsroot

2007-09-28 Thread Jürgen Keil

> 
> Using build 70, I followed the zfsboot instructions
> at http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/ 
> to the  letter.
> 
> I tried first with a mirror zfsroot, when I try to boot to zfsboot  
> the screen is flooded with "init(1M) exited on fatal signal 9"

Could be this problem:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6423745

> This is everything I did:

> zpool create -f rootpool c1t0d0s0
> zfs create rootpool/rootfs
> 
> zfs set mountpoint=legacy rootpool/rootfs
> mkdir /zfsroot
> mount -F zfs rootpool/rootfs /zfsroot

Ok.
 
> cd /zfsroot ; mkdir -p usr opt var home export/home
> 
> mount -F zfs datapool/usr /zfsroot/usr
> mount -F zfs datapool/opt /zfsroot/opt
> mount -F zfs datapool/var /zfsroot/var
> mount -F zfs datapool/home /zfsroot/export/home
> 
> Added the following to /etc/vfstab
> rootpool/rootfs - /zfsroot  zfs - yes -
> datapool/usr- /zfsroot/usr  zfs - yes -
> datapool/var- /zfsroot/var  zfs - yes -
> datapool/opt- /zfsroot/opt  zfs - yes -
> datapool/home   - /zfsroot/export/home zfs - yes
> -
> /zvol/dsk/datapool/swap -   -   swap-
>
>  -
> cd / ; find . -xdev -depth -print | cpio -pvdm /zfsroot
> cd / ; find usr -xdev -depth -print | cpio -pvdm /zfsroot
> cd / ; find var -xdev -depth -print | cpio -pvdm /zfsroot
> cd / ; find opt -xdev -depth -print | cpio -pvdm /zfsroot
> cd / ; find export/home -xdev -depth -print | cpio -pvdm /zfsroot
> 
> # ran this script:
> http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/create_dirs/
> 
> mount -F lofs -o nosub / /mnt
> (cd /mnt; tar cvf - devices dev ) | (cd /zfsroot; tar xvf -)
> umount /mnt

Your source root filesystem is on UFS?

I think much of the above steps could be simplified by populating
the zfs root filesystem like this:

mount -F zfs rootpool/rootfs /zfsroot
ufsdump 0f - / | (cd /zfsroot; ufsrestore xf -)
umount /zfsroot

That way, you don't have to use the "create_dirs" script,
or mess with the /devices and /dev device tree and the
lofs mount.

Using ufsdump/ufsrestore also gets the lib/libc.so.1 file correct
in the rootfs zfs, which typically has some lofs file mounted on
top of it.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Bug 6580715, panic: freeing free segment

2007-09-03 Thread Jürgen Keil

Yesterday I tried to clone a xen dom0 zfs root filesystem and hit this panic
(probably Bug ID 6580715):


System is running last week's opensolaris bits (but I'm also accessing the zpool
using the xen snv_66 bits).

files/s11-root-xen: is an existing version 1 zfs

files/[EMAIL PROTECTED]: new snapshot

files/s11-root-xen-uppc: clone for files/[EMAIL PROTECTED]



- initially the files/[EMAIL PROTECTED] snapshot couldn't be created,
  because files/s11-root-xen (zfs with legacy mount / not mounted) was "busy"

  This should be bug 6462803 or 6482985.

  Workaround: manually mount files/s11-root-xen and umount it -
  this clears the unplayed log


- created files/[EMAIL PROTECTED] and cloned it as files/s11-root-xen-uppc,
  set files/s11-root-xen-uppc mountpoint as legacy


- mount files/s11-root-xen-uppc and edited a few files using vi,
  after writing back one of them and leaving vi, system crashed


Looks like the new zfs filesystem is using log blocks, that are not allocated?
(see the zdb output below)



Details for the initial panic:

> ::status
debugging crash dump vmcore.6 (64-bit) from moritz
operating system: 5.11 wos_b73 (i86pc)
panic message: freeing free segment (vdev=0 offset=11c14df000 size=1000)
dump content: kernel pages only

> $c
vpanic()
vcmn_err+0x28(3, f812d818, ff0004850798)
zfs_panic_recover+0xb6()
metaslab_free_dva+0x1a2(ff01487ec580, ff0162231b20, 20b236c, 0)
metaslab_free+0x97(ff01487ec580, ff0162231b20, 20b236c, 0)
zio_free_blk+0x4c(ff01487ec580, ff0162231b20, 20b236c)
zil_sync+0x334(ff015b7d94c0, ff015689d180)
dmu_objset_sync+0x18e(ff014ff39c40, ff017c500d58, ff015689d180)
dsl_dataset_sync+0x5d(ff01571efa00, ff017c500d58, ff015689d180)
dsl_pool_sync+0xb5(ff014f4ace00, 20b236c)
spa_sync+0x1c5(ff01487ec580, 20b236c)
txg_sync_thread+0x19a(ff014f4ace00)
thread_start+8()

> ::msgbuf
MESSAGE   
zfs0 is /pseudo/[EMAIL PROTECTED]
pcplusmp: pci-ide (pci-ide) instance #1 vector 0x17 ioapic 0x2 intin 0x17 is bou
nd to cpu 0
IDE device at targ 0, lun 0 lastlun 0x0
model SAMSUNG HD300LJ
ATA/ATAPI-7 supported, majver 0xfe minver 0x21
PCI Express-device: [EMAIL PROTECTED], ata2
ata2 is /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]
UltraDMA mode 6 selected
Disk0:  
cmdk0 at ata2 target 0 lun 0
cmdk0 is /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
NOTICE: nge0: Using FIXED interrupt type

NOTICE: IRQ20 is being shared by drivers with different interrupt levels.
This may result in reduced system performance.
NOTICE: nge0 registered
NOTICE: nge0 link up, 100 Mbps, full duplex
NOTICE: cpqhpc: 64-bit driver module not found
UltraDMA mode 6 selected
dump on /dev/dsk/c1d0s1 size 2055 MB
UltraDMA mode 6 selected
pseudo-device: devinfo0
devinfo0 is /pseudo/[EMAIL PROTECTED]
iscsi0 at root
iscsi0 is /iscsi
xsvc0 at root: space 0 offset 0
xsvc0 is /[EMAIL PROTECTED],0
pseudo-device: pseudo1
pseudo1 is /pseudo/[EMAIL PROTECTED]
pcplusmp: fdc (fdc) instance 0 vector 0x6 ioapic 0x2 intin 0x6 is bound to cpu 0
ISA-device: fdc0
pseudo-device: ramdisk1024
ramdisk1024 is /pseudo/[EMAIL PROTECTED]
pcplusmp: lp (ecpp) instance 0 vector 0x7 ioapic 0x2 intin 0x7 is bound to cpu 1
ISA-device: ecpp0
ecpp0 is /isa/[EMAIL PROTECTED],378
fd0 at fdc0
fd0 is /isa/[EMAIL PROTECTED],3f0/[EMAIL PROTECTED],0
NOTICE: audiohd0: codec info: vid=0x11d4198b, sid=0x, rev=0x00100200
NOTICE: IRQ21 is being shared by drivers with different interrupt levels.
This may result in reduced system performance.
PCI Express-device: pci1043,[EMAIL PROTECTED],1, audiohd0
audiohd0 is /[EMAIL PROTECTED],0/pci1043,[EMAIL PROTECTED],1 
pcplusmp: ide (ata) instance 0 vector 0xe ioapic 0x2 intin 0xe is bound to cpu 0
ATAPI device at targ 1, lun 0 lastlun 0x0
model _NEC DVD_RW ND-4550A
PCI Express-device: [EMAIL PROTECTED], ata0
ata0 is /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]
UltraDMA mode 2 selected
UltraDMA mode 2 selected
UltraDMA mode 2 selected
PCI-device: pci1274,[EMAIL PROTECTED], audioens0
audioens0 is /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1274,[EMAIL 
PROTECTED]
pseudo-device: lockstat0
lockstat0 is /pseudo/[EMAIL PROTECTED]
pseudo-device: llc10
llc10 is /pseudo/[EMAIL PROTECTED]
pseudo-device: lofi0
lofi0 is /pseudo/[EMAIL PROTECTED]
pseudo-device: profile0
profile0 is /pseudo/[EMAIL PROTECTED]
pseudo-device: systrace0
systrace0 is /pseudo/[EMAIL PROTECTED]
pseudo-device: fbt0
fbt0 is /pseudo/[EMAIL PROTECTED]
pseudo-device: sdt0
sdt0 is /pseudo/[EMAIL PROTECTED]
pseudo-device: fasttrap0
fasttrap0 is /pseudo/[EMAIL PROTECTED]
pseudo-device: power0
power0 is /pseudo/[EMAIL PROTECTED]
pseudo-device: fcp0
fcp0 is /pseudo/[EMAIL PROTECTED]
pseudo-device: fcsm0
fcsm0 is /pse

Re: [zfs-discuss] EOF broken on zvol raw devices?

2007-08-27 Thread Jürgen Keil

> > I tried to copy a 8GB Xen domU disk image from a zvol device
> > to an image file on an ufs filesystem, and was surprised that
> > reading from the zvol character device doesn't detect "EOF".
> 
> I've filed bug 6596419...

Requesting a sponsor for bug 6596419...
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6596419

My suggested fix is included in the bug report.

My contributor agreement # : OS0003
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] EOF broken on zvol raw devices?

2007-08-23 Thread Jürgen Keil

> I tried to copy a 8GB Xen domU disk image from a zvol device
> to an image file on an ufs filesystem, and was surprised that
> reading from the zvol character device doesn't detect "EOF".

I've filed bug 6596419...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] EOF broken on zvol raw devices?

2007-08-23 Thread Jürgen Keil

> I tried to copy a 8GB Xen domU disk image from a zvol device
> to an image file on an ufs filesystem, and was surprised that
> reading from the zvol character device doesn't detect "EOF".
> 
> On snv_66 (sparc) and snv_73 (x86) I can reproduce it, like this:
> 
> # zfs create -V 1440k tank/floppy-img
> 
> # dd if=/dev/zvol/dsk/tank/floppy-img of=/dev/null
> bs=1k count=2000
> 1440+0 records in
> 1440+0 records out
> (no problem on block device, we detect eof after
> reading 1440k)
> 
> 
> # dd if=/dev/zvol/rdsk/tank/floppy-img of=/dev/null
> bs=1k count=2000
> 2000+0 records in
> 2000+0 records out
> 
> (Oops!  No eof detected on zvol raw device after
> reading 1440k?)

After looking at the code in usr/src/uts/common/fs/zfs/zvol.c
it seems that neither zvol_read() nor zvol_write() cares about
the zvol's "zv_volsize".

I think we need something like this:

diff -r 26be3efbd346 usr/src/uts/common/fs/zfs/zvol.c
--- a/usr/src/uts/common/fs/zfs/zvol.c  Thu Aug 23 00:53:10 2007 -0700
+++ b/usr/src/uts/common/fs/zfs/zvol.c  Thu Aug 23 16:30:41 2007 +0200
@@ -904,6 +904,7 @@ zvol_read(dev_t dev, uio_t *uio, cred_t 
 {
minor_t minor = getminor(dev);
zvol_state_t *zv;
+   uint64_t volsize;
rl_t *rl;
int error = 0;
 
@@ -914,10 +915,16 @@ zvol_read(dev_t dev, uio_t *uio, cred_t 
if (zv == NULL)
return (ENXIO);
 
+   volsize = zv->zv_volsize;
+
rl = zfs_range_lock(&zv->zv_znode, uio->uio_loffset, uio->uio_resid,
RL_READER);
-   while (uio->uio_resid > 0) {
+   while (uio->uio_resid > 0 && uio->uio_loffset < volsize) {
uint64_t bytes = MIN(uio->uio_resid, DMU_MAX_ACCESS >> 1);
+
+   /* don't read past the end */
+   if (bytes > volsize - uio->uio_loffset)
+   bytes = volsize - uio->uio_loffset;
 
error =  dmu_read_uio(zv->zv_objset, ZVOL_OBJ, uio, bytes);
if (error)
@@ -933,6 +940,7 @@ zvol_write(dev_t dev, uio_t *uio, cred_t
 {
minor_t minor = getminor(dev);
zvol_state_t *zv;
+   uint64_t volsize;
rl_t *rl;
int error = 0;
 
@@ -943,13 +951,19 @@ zvol_write(dev_t dev, uio_t *uio, cred_t
if (zv == NULL)
return (ENXIO);
 
+   volsize = zv->zv_volsize;
+
rl = zfs_range_lock(&zv->zv_znode, uio->uio_loffset, uio->uio_resid,
RL_WRITER);
-   while (uio->uio_resid > 0) {
+   while (uio->uio_resid > 0 && uio->uio_loffset < volsize) {
uint64_t bytes = MIN(uio->uio_resid, DMU_MAX_ACCESS >> 1);
uint64_t off = uio->uio_loffset;
-
-   dmu_tx_t *tx = dmu_tx_create(zv->zv_objset);
+   dmu_tx_t *tx;
+
+   if (bytes > volsize - off)  /* don't write past the end */
+   bytes = volsize - off;
+
+   tx = dmu_tx_create(zv->zv_objset);
dmu_tx_hold_write(tx, ZVOL_OBJ, off, bytes);
error = dmu_tx_assign(tx, TXG_WAIT);
if (error) {
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] EOF broken on zvol raw devices?

2007-08-23 Thread Jürgen Keil

I tried to copy a 8GB Xen domU disk image from a zvol device
to an image file on an ufs filesystem, and was surprised that
reading from the zvol character device doesn't detect "EOF".

On snv_66 (sparc) and snv_73 (x86) I can reproduce it, like this:

# zfs create -V 1440k tank/floppy-img

# dd if=/dev/zvol/dsk/tank/floppy-img of=/dev/null bs=1k count=2000
1440+0 records in
1440+0 records out
(no problem on block device, we detect eof after reading 1440k)


# dd if=/dev/zvol/rdsk/tank/floppy-img of=/dev/null bs=1k count=2000
2000+0 records in
2000+0 records out

(Oops!  No eof detected on zvol raw device after reading 1440k?)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] nv-69 install panics dell precision 670

2007-08-14 Thread Jürgen Keil

> using hyperterm, I captured the panic message as:
> 
> SunOS Release 5.11 Version snv_69 32-bit
> Copyright 1983-2007 Sun Microsystems, Inc.  All
> rights reserved.
> Use is subject to license terms.
> 
> panic[cpu0]/thread=fec1ede0: Can't handle mwait size
> 0
> 
> fec37e70 unix:mach_alloc_mwait+72 (fec2006c)
> fec37e8c unix:mach_init+b0 (c0ce80, fe800010, f)
> fec37eb8 unix:psm_install+95 (fe84166e, 3, fec37e)
> fec37ec8 unix:startup_end+93 (fec37ee4, fe91731e,)
> fec37ed0 unix:startup+3a (fe800010, fec33c98,)
> fec37ee4 genunix:main+1e ()
> 
> skipping system dump - no dump device configured
> rebooting...
> 
> this behavior loops endlessly

Have a look at these bugs:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6577473
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6588054

It seems to be fixed in snv_70, and apparently you can work around
the bug by setting some kernel variables, see bug 6588054
(idle_cpu_prefer_mwait = 0, cpuid_feature_ecx_exclude = 8)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Unremovable file in ZFS filesystem.

2007-08-09 Thread Jürgen Keil

> I managed to create a link in a ZFS directory that I can't remove.  
>
> # find . -print
> .
> ./bayes_journal
> find: stat() error ./bayes.lock.router.3981: No such
> file or directory
> ./user_prefs
> #
> 
> 
> ZFS scrub shows no problems in the pool.  Now, this
> was probably cause when I was doing some driver work
> so I'm not too surprised, BUT it would be nice if
> there was a way to clean this up without having to
> copy the filesystem to a new zfs filesystem and
> destroying the current one.

Are you running an opensolaris using release or debug kernel bits?

Maybe a kernel with a zfs compiled as debug bits would print
some extra error messages or maybe panic the machine when
that broken file is accessed?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS boot: 3 smaller glitches with console,

2007-08-09 Thread Jürgen Keil

> in my setup i do not install the ufsroot.
> 
> i have 2 disks 
> -c0d0 for the ufs install 
> -c1d0s0 which is my zfs root i want to exploit
> 
> my idea is to remove the c0d0 disk when the system will be ok

Btw. if you're trying to pull the ufs disk c0d0 from the system, and
physically move the zfs root disk from c1d0 -> c0d0 and use that as
the only disk (= boot disk) in the system, you'll probably run into the
problem that zfs root becomes unbootable, because in the
etc/zfs/zpool.cache file the c1d0 name is still recorded for the
zpool containing the rootfs.

To fix it you probably have to boot a failsafe kernel from somewhere,
zpool import the pool from the disk's new location, and copy the
updated /etc/zfs/zpool.cache into the zfs root filesystem and build
new boot archives there...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS boot: 3 smaller glitches with console,

2007-08-09 Thread Jürgen Keil

> it seems i have the same problem after zfs boot
> installation (following this setup on a snv_69 release
> http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/ ).

Hmm, in step 4., wouldn't it be better to use ufsdump / ufsrestore
instead of find / cpio to clone the ufs root into the zfs root pool?

cd /zfsroot
ufsdump 0f - / | ufsrestore -xf -


Advantages:

- it copies the mountpoint for the /etc/dfs/dfstab filesystem
  (and all the other mountpoints, like /tmp, /proc, /etc/mnttab, ...)


- it does not mess up the /lib/libc.so.1 shared library

  I think the procedure at the above url could copy the wrong
  version of the shared libc.so.1 into the zfsroot /lib/libc.so.1;
  this might explain bugs like 6423745,
  Synopsis: zfs root pool created while booted 64 bit can not be booted 32 bit

  
- the files hidden by the /devices mount are copied,too


> The outputs from the requested command
> are similar to the outputs posted by dev2006.
> 
> Reading this page, i found no solution concerning the
> /dev/random problem. Is there somewhere a procedure
> to repair my install ?


AFAICT, there's nothing you can do to avoid the
"WARNING: No randomness provider enabled for /dev/random."
message with zfs root at this time.  It seems that zfs mountroot
needs some random numbers for mounting the zfs root filesystem,
and at that point early during the bootstrap there isn't a fully initialized
random device available.  This fact is remembered by the random
device and is reported later on, when the system is fully booted.

I think when the system is fully booted from zfs root, the random
device should work just fine.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SiI 3114 Chipset on Syba Card - Solaris Hangs

2007-08-07 Thread Jürgen Keil

> I'm running snv 65 and having an issue
> much like this:
>http://osdir.com/ml/solaris.opensolaris.help/2006-11/msg00047.html

Bug 6414472?

> Has anyone found a workaround?

You can try to patch my suggested fix for 6414472 into the ata binary
and see if it helps:

http://www.opensolaris.org/jive/thread.jspa?messageID=84127𔢟

I don't have access to the snv_65 media, but for snv_66 (32-bit)
the code has changed slightly, and the instruction to patch
can be found at address "ata_id_common+0x3c",
so the patch procedure would be

::bp ata`ata_id_common
:c
::delete 1
ata_id_common+0x3c?w a6a
:c

> Or is this the issue with the BIOS not liking EFI information that ZFS
> uses?

If it is 6414472: No; BIOS wouldn't be used any more at the
point 6414472 is hanging the system...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174

2007-08-06 Thread Jürgen Keil

> By coincidence, I spent some time dtracing 6560174 yesterday afternoon on 
> b62, and these bugs are indeed duplicates. I never noticed 6445725 because my 
> system wasn't hanging but as the notes say, the fix for 6434435 changes the 
> problem, and instead the error that gets propogated back from t1394_write() 
> causes "transport rejected" messages.

Yes, I had filed two bugs (6445725 / 6434435) a  year ago and started the
opensolaris request-sponsor process for both.  The fix for 6434435 has been
integrated, but 6445725 is stuck somehow.  

> I see your proposed fix (which looks very plausible) is dated over a year 
> ago... Have you heard anything on when it might get integrated?

No, nothing.

I did send Alan Perry (@sun.com) a mail last friday, asking about the state
of bug 6445725 and my suggested fix, but so far received no reply...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174

2007-08-03 Thread Jürgen Keil

> > 3) Can your code diffs be integrated into the OS on my end to use this 
> > drive, and if so, how?
> 
> I believe the bug is still being worked on, right Jürgen ?

The opensolaris sponsor process for fixing bug 6445725 seems
to got stuck.  I ping'ed Alan P. on the state of that bug...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174

2007-08-03 Thread Jürgen Keil

> > Nope, no work-around.  
> 
> OK. Then I have 3 questions:
> 
> 1) How do I destroy the pool that was on the firewire
> drive? (So that zfs stops complaining about it)

Even if the drive is disconnected, it should be possible
to "zpool export" it, so that the OS forgets about it
and doesn't try to mount from that pool during the next
boot.

 
> 2) How can I reformat the firewire drive? Does this
> need to be done on a non-Solaris OS?

When 6445725 is fixed, it should be possible to reformat
and / or use it with Solaris.


> 3) Can your code diffs be integrated into the OS on
> my end to use this drive, and if so, how?

Sure.  You need the opensolaris "ON Source", unpack them,
apply the patch from the website using something like
"gpatch -p0 < scsa1394-mkfs-hang2-alt" and build everything
using the "nightly" command.

You'll also need to install the "ON Specific Build Tools" package,
the "ON Binary-Only Components", and the correct Studio 11 compiler
for building the opensolaris sources.

Here are some detailed instuctions on building the opensolaris sources:

http://www.blastwave.org/articles/BLS-0050/index.html


Unfortunately, the sources for your installed version (build_64a) are
missing on http://dlc.sun.com/osol/on/downloads ; there are sources
for build 63 and 65, but not for 64a .

You could pick a newer release of the opensolaris sources (the latest
available for download is build_69), patch the sources and compile them,
and upgrade your installation to that newer release, using the "bfu" 
command.


Or pick a slightly newer release than 64a, patch & compile (make sure
to compile as a "release" build) , and just replace the firewire kernel driver
modules that are affected by the bugfix, "scsa1394" and "sbp":

usr/src/uts/intel/scsa1394/obj32/scsa1394 -> /kernel/drv/scsa1394
usr/src/uts/intel/scsa1394/obj64/scsa1394 -> /kernel/drv/amd64/scsa1394
usr/src/uts/intel/sbp2/obj32/sbp2 -> /kernel/misc/sbp2
usr/src/uts/intel/sbp2/obj64/sbp2 -> /kernel/misc/amd64/sbp2
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174

2007-08-02 Thread Jürgen Keil

> > And 6560174 might be a duplicate of 6445725
> 
> I see what you mean. Unfortunately there does not
> look to be a work-around. 

Nope, no work-around.  This is a scsa1394 bug; it
has some issues when it is used from interrupt context.

I have some source code diffs, that are supposed to
fix the issue, see this thread:

http://www.opensolaris.org/jive/thread.jspa?messageID=46190
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Firewire zpool transport rejected fatal error, 6560174

2007-08-02 Thread Jürgen Keil

> I think I have ran into this bug, 6560174, with a firewire drive. 

And 6560174 might be a duplicate of 6445725
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] snv_70 -> snv_66: ZPL_VERSION 2, File system version mismatch ....?

2007-07-20 Thread Jürgen Keil

Yesterday I was surprised because an old snv_66 kernel
(installed as a new zfs rootfs) refused to mount.
Error message was

Mismatched versions:  File system is version 2 on-disk format,
which is incompatible with this software version 1!


I tried to prepare that snv_66 rootfs when running snv_70 bits,
using something like this

   zfs create tank/s11-root-xen
   zfs set mountpoint=legacy tank/s11-root-xen
   mount -F zfs tank/s11-root-xen /mnt
   cd /mnt
   ufsdump 0f - /dev/rdsk/c4d0s4 | ufsrestore -xf -
   ...


Problem is that snv_70 "zfs create" now seems to construct
ZPL_VERSION 2 zfs filesystems, which cannot be mounted
by older version of the zfs software, e.g. by snv_66 or s10u2.

Btw. I never upgraded this zpool to a zpool version > 2, 
to allow using that zpool and zfs filesystems both with
Nevada and S10.

Now it seems I still could work around that problem with
ZPL_VERSION mismatch by booting the oldest Solaris
release that is supposed to mount a zfs filesystem and
create the zfs filesystem from there.


How about a new feature for "zpool create" and "zfs create" to
allow creation of a zpool or zfs that is not using the newest
version but some older version (that the user has specified on
the command line), so that the new zpool or zfs can be used on
older systems (e.g. on hotpluggable / removable media, or on a
disk that is shared between different Solaris releases)?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS usb keys

2007-06-27 Thread Jürgen Keil

> Shouldn't S10u3 just see the newer on-disk format and
> report that fact, rather than complain it is corrupt?

Yep, I just tried it, and it refuses to "zpool import" the newer pool,
telling me about the incompatible version.  So I guess the pool
format isn't the correct explanation for the Dick Davies' (number9)
problem.



On a S-x86 box running snv_68, ZFS version 7:

# mkfile 256m /home/leo.nobackup/tmp/zpool_test.vdev
# zpool create test_pool /home/leo.nobackup/tmp/zpool_test.vdev
# zpool export test_pool


On a S-sparc box running snv_61, ZFS version 3
(I get the same error on S-x86, running S10U2, ZFS version 2):

# zpool import -d /home/leo.nobackup/tmp/
  pool: test_pool
id: 6231880247307261822
 state: FAULTED
status: The pool is formatted using an incompatible version.
action: The pool cannot be imported.  Access the pool on a system running newer
software, or recreate the pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-A5
config:

test_pool  UNAVAIL   newer version
  /home/leo.nobackup/tmp//zpool_test.vdev  ONLINE
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS usb keys

2007-06-26 Thread Jürgen Keil

> I used a zpool on a usb key today to get some core files off a non-networked
> Thumper running S10U4 beta.
> 
> Plugging the stick into my SXCE b61 x86 machine worked fine; I just had to
> 'zpool import sticky' and it worked ok.
> 
> But when we attach the drive to a blade 100 (running s10u3), it sees the
> pool as corrupt. I thought I'd been too hasty pulling out the stick,
> but it works ok back in the b61 desktop and Thumper.
> 
> I'm trying to figure out if this is an endian thing (which I thought
> ZFS was immune from) - or has the b61 machine upgraded the zpool
> format?

Most likely the zpool on the usb stick was formatted using a zpool version
that s10u3 does not yet support.

Check with "zpool version" on the b61 machine which zpool version is
supported by b61, any which zpool version is on the usb stick.
Repeat on the s10u3 machine.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: zfs compression - scale to multiple cpu ?

2007-06-18 Thread Jürgen Keil

> i think i have read somewhere that zfs gzip
> compression doesn`t scale well since the in-kernel
> compression isn`t done multi-threaded.
> 
> is this true - and if so - will this be fixed ?

If you're writing lots of data, zfs gzip compression 
might not be a good idea for a desktop machine, because
it completely kills interactive performance.

See this thread:
http://www.opensolaris.org/jive/thread.jspa?messageID=118116𜵤
http://mail.opensolaris.org/pipermail/zfs-discuss/2007-May/thread.html#27841


It does compress (scale) on up-to 8 cpu cores, though.
See "zio_taskq_threads" in usr/src/uts/common/fs/zfs/spa.c

> what about default lzjb compression - is it different
> regarding this "issue" ?

lzjb doesn't consume that much kernel cpu time (compared to gzip),
so the machine remains more or less usable for interactive usage.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: SMART

2007-06-08 Thread Jürgen Keil

> You are right... I shouldn't post in the middle of
> the night... nForce chipsets don't support AHCI.

Btw. does anybody have a status update for bug 6296435,
"native sata driver needed for nVIDIA mcp04 and mcp55 controllers"
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6296435
?

Commit to Fix target was "snv_59", but we're at "snv_67" now...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: Deterioration with zfs performance and recent zfs bits?

2007-06-05 Thread Jürgen Keil

> Hello Jürgen,
> 
> Monday, June 4, 2007, 7:09:59 PM, you wrote:
> 
> >> > Patching zfs_prefetch_disable = 1 has helped
> >> It's my belief this mainly aids scanning metadata. my
> >> testing with rsync and yours with find (and seen with
> >> du & ; zpool iostat -v 1 ) pans this out..
> >> mainly tracked in bug 6437054 vdev_cache: wise up or die
> >>   http://www.opensolaris.org/jive/thread.jspa?messageID=42212
> >> 
> >> so to link your code, it might help, but if one ran
> >> a clean down the tree, it would hurt compile times.
> 
> 
> JK> I think the slowdown that I'm observing is due to the changes
> JK> that have been made for 6542676 "ARC needs to track meta-data
> JK> memory overhead".
> JK >
> JK> There is now a limit of 1/4 of arc size ("arc_meta_limit")
> JK> for zfs meta-data.
> 
> Not good - I have some systems with TBs of meta-data mostly.
> I guess there's some tunable...

AFAICT, you can patch the kernel global variable "arc_meta_limit"
at run time, using mdb -wk (variable should be visible in build 66
or newer)

But you can't tune it via an /etc/system "set" command.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Deterioration with zfs performance and recent zfs bits?

2007-06-04 Thread Jürgen Keil

I wrote

> Instead of compiling opensolaris for 4-6 hours, I've now used
> the following find / grep test using on-2007-05-30 sources:
> 
> 1st test using Nevada build 60:
> 
> % cd /files/onnv-2007-05-30
> % repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} +

This find + grep command basically

- does a recursive scan looking for *.h and *.c files
- at the end of the recursive directory scan invokes one grep
  command with ~ 2 filename args.


Simplifying the test a bit more:  snv_60 is able to cache all meta-data
for a compiled onnv source tree, on a 32-bit x86 machine with
768 mb of physical memory:

% cd /files/wos_b67
% repeat 10 sh -c "/bin/time find usr/src/ -name '*.[hc]' -print|wc"

real 2:11.7
user0.2
sys 3.2
   19355   19355  772864

real2.4
user0.1
sys 1.4
   19355   19355  772864

real2.2
user0.1
sys 1.5
   19355   19355  772864

real2.0
user0.1
sys 1.4
   19355   19355  772864

real 1:21.8  << seems that some meta data was freed 
here...
user0.2
sys 1.7
   19355   19355  772864

real 1:21.0
user0.2
sys 1.7
   19355   19355  772864

real   45.9
user0.1
sys 1.6
   19355   19355  772864

real3.2
user0.1
sys 1.3
   19355   19355  772864

real1.9
user0.1
sys 1.3
   19355   19355  772864

real2.8
user0.1
sys 1.3
   19355   19355  772864


(and the next 10 finds all completed in ~2 seconds per find)


build 67 is unable to cache the meta-data, for the same find
command on the same zfs:

% cd /files/wos_b67
% repeat 10 sh -c "/bin/time find usr/src/ -name '*.[hc]' -print|wc"

real 3:20.7
user0.5
sys 7.5
   19355   19355  772864

real 3:07.0
user0.5
sys 5.5
   19355   19355  772864

real 2:44.6
user0.5
sys 4.7
   19355   19355  772864

real 2:06.1
user0.4
sys 3.9
   19355   19355  772864

real 1:16.1
user0.4
sys 3.5
   19355   19355  772864

real   33.0
user0.4
sys 2.7
   19355   19355  772864

real   40.8
user0.4
sys 3.0
   19355   19355  772864

real   18.8
user0.3
sys 2.6
   19355   19355  772864

real 2:32.2
user0.4
sys 4.2
   19355   19355  772864

real 2:05.4
user0.4
sys 3.9
   19355   19355  772864
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Deterioration with zfs performance and recent zfs bits?

2007-06-04 Thread Jürgen Keil

> > Patching zfs_prefetch_disable = 1 has helped
> It's my belief this mainly aids scanning metadata. my
> testing with rsync and yours with find (and seen with
> du & ; zpool iostat -v 1 ) pans this out..
> mainly tracked in bug 6437054 vdev_cache: wise up or die
> http://www.opensolaris.org/jive/thread.jspa?messageID=42212
> 
> so to link your code, it might help, but if one ran
> a clean down the tree, it would hurt compile times.



I think the slowdown that I'm observing is due to the changes
that have been made for 6542676 "ARC needs to track meta-data
memory overhead".

There is now a limit of 1/4 of arc size ("arc_meta_limit")
for zfs meta-data.

On a 32-bit x86 platform with > 512MB physical memory,
the arc size is limited to 3/4 of the size of the kernel heap
arena, which is 3/4 * ~ 650MB => ~ 500MB.
1/4 of that 500MB is ~ 125MB for zfs meta data.

When more than 1/4 of arc is used for meta-data,
meta-data allocations steal space from arc mru/mfu list.

When more than 1/4 of arc is used for meta-data and 
arc_reclaim_needed() returns TRUE, entries from the
dnlc cache are purged and arc data is evicted.

Apparently, before 6542676 it was possible to use a lot
more meta-data if we compare it to what is possible now
with 6542676.


void
arc_init(void)
{
...
/* limit meta-data to 1/4 of the arc capacity */
arc_meta_limit = arc_c_max / 4;
...
}

static int
arc_evict_needed(arc_buf_contents_t type)
{
if (type == ARC_BUFC_METADATA && arc_meta_used >= arc_meta_limit)
return (1);
...
}

static void
arc_get_data_buf(arc_buf_t *buf)
{
/*
 * We have not yet reached cache maximum size,
 * just allocate a new buffer.
 */
if (!arc_evict_needed(type)) {
...
goto out;
}

/*
 * If we are prefetching from the mfu ghost list, this buffer
 * will end up on the mru list; so steal space from there.
 */
...

if ((buf->b_data = arc_evict(state, size, TRUE, type)) == NULL) {

...  
}

static void
arc_kmem_reap_now(arc_reclaim_strategy_t strat)
{
...
if (arc_meta_used >= arc_meta_limit) {
/*
 * We are exceeding our meta-data cache limit.
 * Purge some DNLC entries to release holds on meta-data.
 */
dnlc_reduce_cache((void *)(uintptr_t)arc_reduce_dnlc_percent);
}
...
}


The Tecra-S1 (32-bit Solaris x86) has

> arc_meta_limit::print   
0x738 <<<
> arc_meta_limit::print -d
0t121110528
> ::arc
{
anon = -73542
mru = -735455488
mru_ghost = -735455424
mfu = -735455360
mfu_ghost = -735455296
size = 0x131dae70
p = 0xb10983e
c = 0x1330105e
c_min = 0x400
c_max = 0x1ce0   
hits = 0x2e405
misses = 0x9092
deleted = 0x5f
recycle_miss = 0x45bf
mutex_miss = 0
evict_skip = 0x6e4b0
hash_elements = 0x54dd
hash_elements_max = 0x54de
hash_collisions = 0x398e
hash_chains = 0x1887
hash_chain_max = 0x7
no_grow = 0
> 0x1ce0%4=X  
738 

Patching arc_meta_limit to 1/2 of arc size improves find performance.


Another problem:
In dbuf.c, dbuf_read_impl() arc_meta_used accounting appears to be
broken, the amount of meta-data used ("arc_meta_used") is inflated:

db->db.db_data = zio_buf_alloc(DN_MAX_BONUSLEN);
arc_space_consume(512);

Why 512?  Apparently, we zio_buf_alloc DN_MAX_BONUSLEN = 0x140 bytes
but consume 0x200 bytes of meta-data?
(When these buffers are freed, only DN_MAX_BONUSLEN = 0x140 bytes
are returned to arc meta-data)



I'm currently using the following changes, which seem to 
restore the zfs performace to what it has been before
6542676 - more or less:


diff -r bec4e9eb1f01 usr/src/uts/common/fs/zfs/arc.c
--- a/usr/src/uts/common/fs/zfs/arc.c   Fri Jun 01 08:24:48 2007 -0700
+++ b/usr/src/uts/common/fs/zfs/arc.c   Sat Jun 02 22:09:33 2007 +0200
@@ -2781,10 +2781,10 @@ arc_init(void)
arc_c = arc_c_max;
arc_p = (arc_c >> 1);
 
-   /* limit meta-data to 1/4 of the arc capacity */
-   arc_meta_limit = arc_c_max / 4;
-   if (arc_c_min < arc_meta_limit / 2 && zfs_arc_min == 0)
-   arc_c_min = arc_meta_limit / 2;
+   /* limit meta-data to 1/2 of the arc capacity */
+   arc_meta_limit = arc_c_max / 2;
+   if (arc_c_min < arc_meta_limit / 4 && zfs_arc_min == 0)
+   arc_c_min = arc_meta_limit / 4;
 
/* if kmem_flags are set, lets try to use less memory */
if (kmem_debugging())
diff -r bec4e9eb1f01 usr/src/uts/common/fs/zfs/dbuf.c
--- a/usr/src/uts/common/fs/zfs/dbuf.c  Fri Jun 01 08:24:48 2007 -0700
+++ b/usr/src/uts/common/fs/zfs/dbuf.c  Sat Jun 02 22:09:52 2007 +0200
@@ -470,7 +470,7 @@ dbuf_read_impl(dmu_

[zfs-discuss] Re: Deterioration with zfs performance and recent zfs bits?

2007-06-01 Thread Jürgen Keil

I wrote

> Has anyone else noticed a significant zfs performance
> deterioration when running recent opensolaris bits?
> 
> My 32-bit / 768 MB Toshiba Tecra S1 notebook was able
> to do a full opensolaris release build in ~ 4 hours 45
> minutes (gcc shadow compilation disabled; using an lzjb
> compressed zpool / zfs on a single notebook hdd p-ata drive).
> 
> After upgrading to 2007-05-25 opensolaris release
> bits (compiled from source), the same release build now
> needs ~ 6 hours; that's ~ 25% slower.

It might be Bug ID 6469558
"ZFS prefetch needs to be more aware of memory pressure":
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6469558


Instead of compiling opensolaris for 4-6 hours, I've now used
the following find / grep test using on-2007-05-30 sources:


1st test using Nevada build 60:

% cd /files/onnv-2007-05-30
% repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} +
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:22.5
user3.3
sys 5.8
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:28.4
user3.3
sys 4.8
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:18.0
user3.3
sys 4.7
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:17.3
user3.3
sys 4.8
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:15.0
user3.3
sys 4.7
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:12.0
user3.3
sys 4.7
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:21.9
user3.3
sys 4.7
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:18.7
user3.3
sys 4.7
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:19.5
user3.3
sys 4.7
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:17.2
user3.3
sys 4.7


Same test, but running onnv-2007-05-30 release bits
(compiled from source).  This is at least 25% slower
than snv_60:


(Note: zfs_prefetch_disable = 0 , the default value)

% repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} +
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 8:04.3
user7.3
sys13.2
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 6:34.4
user7.3
sys11.2
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 6:33.8
user7.3
sys11.1
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 5:35.6
user7.3
sys10.6
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 5:39.8
user7.3
sys10.6
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 5:37.8
user7.3
sys11.1
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 5:53.5
user7.3
sys11.0
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 5:45.2
user7.3
sys11.1
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 5:44.8
user7.3
sys11.0
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 5:49.1
user7.3
sys11.0



Then I patched zfs_prefetch_disable/W1, and now 
the find & grep test runs much faster on
onnv-2007-05-30 bits:

(Note: zfs_prefetch_disable = 1)

% repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} +
usr/src/lib/pam_modules/authtok_check/authtok_check.c:   * user entering 
FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while

real 4:01.3
user7.2
sys 9.9
usr/src/li

[zfs-discuss] Deterioration with zfs performace and recent zfs bits?

2007-05-29 Thread Jürgen Keil

Has anyone else noticed a significant zfs performance deterioration
when running recent opensolaris bits?

My 32-bit / 768 MB Toshiba Tecra S1 notebook was able to do a 
full opensolaris release build in ~ 4 hours 45 minutes (gcc shadow 
compilation disabled; using an lzjb compressed zpool / zfs on a
single notebook hdd p-ata drive).

After upgrading to 2007-05-25 opensolaris release bits (compiled from 
source), the same release build now needs ~ 6 hours;
that's ~ 25% slower.



I think a change that might be responsible for this is the fix for
6542676 "ARC needs to track meta-data memory overhead"
(that is, less caching with the fix for 6542676).

Has anyone noticed similar zfs performace deterioration?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Preparing to compare Solaris/ZFS and FreeBSD/ZFS

2007-05-25 Thread Jürgen Keil

performance.
In-Reply-To: <[EMAIL PROTECTED]>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Approved: 3sm4u3
X-OpenSolaris-URL: 
http://www.opensolaris.org/jive/message.jspa?messageID=123265&tstart=0#123265

> > Or if you do want to use bfu because you really want to match your
> > source code revisions up to a given day then you will need to build the
> > ON consolidation yourself and you an the install the non debug bfu
> > archives (note you will need to download the non debug closed bins to do
> > that).
> 
> The README.opensolaris
> (http://dlc.sun.com/osol/on/downloads/current/README.opensolaris)
> still states:
> 2. Non-DEBUG kernel builds have not been tested.  Systems that require
> the ata driver are known not to work with non-DEBUG  builds.

> Are debug builds now know to work?
s/debug/non-debug/

That used to be true, but is obsolete by now.
non-debug builds work just fine.  Just make sure
to use a recent on-closed-bins-nd*.tar.bz2
archive, e.g.
  http://dlc.sun.com/osol/on/downloads/b63/on-closed-bins-nd-b63.i386.tar.bz2


The ata driver has moved from the closed bits tree 
to the standard/open onnv source tree, so when you
compile non-DEBUG bits, you'll also get a non-DEBUG
ata  driver compiled from source.

There used to be a problem when ata was closed, and
you tried to compile non-DEBUG opensolaris from sources
and mixed that with a DEBUG ata driver from the closed
bits archive.  That was a problem when no closed non-debug
bits (on-closed-bins-nd-*.tar.bz2) were available for download.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-15 Thread Jürgen Keil

> Would you mind also doing:
>
> ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=1
>
> to see the raw performance of underlying hardware.

This dd command is reading from the block device,
which might cache dataand probably splits requests
into "maxphys" pieces (which happens to be 56K on an 
x86 box).

I'd read from the raw device, /dev/rdsk/c2t1d0 ...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-10 Thread Jürgen Keil

Bart wrote:
> Adam Leventhal wrote:
> > On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat wrote:
> >> Can you give some more info on what these problems are.
> > 
> > I was thinking of this bug:
> > 
> >   6460622 zio_nowait() doesn't live up to its name
> > 
> > Which was surprised to find was fixed by Eric in build 59.
> > 
> 
> It was pointed out by Jürgen Keil that using ZFS compression
> submits a lot of prio 60 tasks to the system task queues;
> this would clobber interactive performance.

Actually the taskq "spa_zio_issue" / "spa_zio_intr" run at
prio 99 (== maxclsyspri or MAXCLSYSPRI):

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109

Btw: In one experiment I tried to boot the kernel under kmdb
control (-kd), patched "minclsyspri := 61" and used a
breakpoint inside spa_active() to patch the spa_zio_* taskq
to use prio 60 when importing the gzip compressed pool
(so that the gzip compressed pool was using prio 60 threads
and usb and other stuff was using prio >= 61 threads).
That didn't help interactive performance...

This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-07 Thread Jürgen Keil

> with recent bits ZFS compression is now handled concurrently with  
> many CPUs working on different records.
> So this load will burn more CPUs and acheive it's results  
> (compression) faster.
> 
> So the observed pauses should be consistent with that of a load  
> generating high system time.
> The assumption is that compression now goes faster than when is was  
> single threaded.
> 
> Is this undesirable ? We might seek a way to slow down compression in  
> order to limit the system load.

According to this dtrace script

#!/usr/sbin/dtrace -s

sdt:genunix::taskq-enqueue
/((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/
{
@where[stack()] = count();
}

tick-5s {
printa(@where);
trunc(@where);
}




... I see bursts of ~ 1000 zio_write_compress() [gzip] taskq calls
enqueued into the "spa_zio_issue" taskq by zfs`spa_sync() and
its children:

  0  76337 :tick-5s 
...
  zfs`zio_next_stage+0xa1
  zfs`zio_wait_for_children+0x5d
  zfs`zio_wait_children_ready+0x20
  zfs`zio_next_stage_async+0xbb
  zfs`zio_nowait+0x11
  zfs`dbuf_sync_leaf+0x1b3
  zfs`dbuf_sync_list+0x51
  zfs`dbuf_sync_indirect+0xcd
  zfs`dbuf_sync_list+0x5e
  zfs`dbuf_sync_indirect+0xcd
  zfs`dbuf_sync_list+0x5e
  zfs`dnode_sync+0x214
  zfs`dmu_objset_sync_dnodes+0x55
  zfs`dmu_objset_sync+0x13d
  zfs`dsl_dataset_sync+0x42
  zfs`dsl_pool_sync+0xb5
  zfs`spa_sync+0x1c5
  zfs`txg_sync_thread+0x19a
  unix`thread_start+0x8
 1092

  0  76337 :tick-5s 



It seems that after such a batch of compress requests is
submitted to the "spa_zio_issue" taskq, the kernel is busy
for several seconds working on these taskq entries.
It seems that this blocks all other "taskq" activity inside the
kernel...



This dtrace script counts the number of 
zio_write_compress() calls enqueued / execed 
by the kernel per second:

#!/usr/sbin/dtrace -qs

sdt:genunix::taskq-enqueue
/((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/
{
this->tqe = (taskq_ent_t *)arg1;
@enq[this->tqe->tqent_func] = count();
}

sdt:genunix::taskq-exec-end
/((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/
{
this->tqe = (taskq_ent_t *)arg1;
@exec[this->tqe->tqent_func] = count();
}

tick-1s {
/*
printf("%Y\n", walltimestamp);
*/
printf("TS(sec): %u\n", timestamp / 10);
printa("enqueue %a: [EMAIL PROTECTED]", @enq);
printa("exec%a: [EMAIL PROTECTED]", @exec);
trunc(@enq);
trunc(@exec);
}




I see bursts of zio_write_compress() calls enqueued / execed,
and periods of time where no zio_write_compress() taskq calls
are enqueued or execed.

10#  ~jk/src/dtrace/zpool_gzip7.d 
TS(sec): 7829
TS(sec): 7830
TS(sec): 7831
TS(sec): 7832
TS(sec): 7833
TS(sec): 7834
TS(sec): 7835
enqueue zfs`zio_write_compress: 1330
execzfs`zio_write_compress: 1330
TS(sec): 7836
TS(sec): 7837
TS(sec): 7838
TS(sec): 7839
TS(sec): 7840
TS(sec): 7841
TS(sec): 7842
TS(sec): 7843
TS(sec): 7844
enqueue zfs`zio_write_compress: 1116
execzfs`zio_write_compress: 1116
TS(sec): 7845
TS(sec): 7846
TS(sec): 7847
TS(sec): 7848
TS(sec): 7849
TS(sec): 7850
TS(sec): 7851
TS(sec): 7852
TS(sec): 7853
TS(sec): 7854
TS(sec): 7855
TS(sec): 7856
TS(sec): 7857
enqueue zfs`zio_write_compress: 932
execzfs`zio_write_compress: 932
TS(sec): 7858
TS(sec): 7859
TS(sec): 7860
TS(sec): 7861
TS(sec): 7862
TS(sec): 7863
TS(sec): 7864
TS(sec): 7865
TS(sec): 7866
TS(sec): 7867
enqueue zfs`zio_write_compress: 5
execzfs`zio_write_compress: 5
TS(sec): 7868
enqueue zfs`zio_write_compress: 774
execzfs`zio_write_compress: 774
TS(sec): 7869
TS(sec): 7870
TS(sec): 7871
TS(sec): 7872
TS(sec): 7873
TS(sec): 7874
TS(sec): 7875
TS(sec): 7876
enqueue zfs`zio_write_compress: 653
execzfs`zio_write_compress: 653
TS(sec): 7877
TS(sec): 7878
TS(sec): 7879
TS(sec): 7880
TS(sec): 7881


And a final dtrace script, which monitors scheduler activity while
filling a gzip compressed pool:

#!/usr/sbin/dtrace -qs

sched:::off-cpu,
sched:::on-cpu,
sched:::remain-cpu,
sched:::preempt
{
/*
@[probename, stack()] = count();
*/
@[probename] = count();
}


tick-1s {
printf("%Y", walltimestamp);
printa(@);
trunc(@);
}


It shows periods of time with absolutely *no*
scheduling activity (I guess this is when the
"spa_zio_issue" taskq is working on such a bug
batch of submitted gzip compression calls):

21# ~jk/src/dtrace/zpool_gzip9.d
2007 May  6 21:38:12
  preempt  13
  off-cpu 808
  on-cpu

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-07 Thread Jürgen Keil

> A couple more questions here.
> 
> [mpstat]
> 
> > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
> > 0 0 0 3109 3616 316 196 5 17 48 45 245 0 85 0 15
> > 1 0 0 3127 3797 592 217 4 17 63 46 176 0 84 0 15
> > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
> > 0 0 0 3051 3529 277 201 2 14 25 48 216 0 83 0 17
> > 1 0 0 3065 3739 606 195 2 14 37 47 153 0 82 0 17
> > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
> > 0 0 0 3011 3538 316 242 3 26 16 52 202 0 81 0 19
> > 1 0 0 3019 3698 578 269 4 25 23 56 309 0 83 0 17
...
> The largest numbers from mpstat are for interrupts and cross calls.
> What does intrstat(1M) show?
>
> Have you run dtrace to determine the most frequent cross-callers?

As far as I understand it, we have these frequent cross calls
because 
1. the test was run on an x86 MP machine
2. the kernel zmod / gzip code allocates and frees four big chunks of
   memory (4 * 65544 bytes) per zio_write_compress ( gzip ) call  [1]

Freeing these big memory chunks generates lots of cross calls,
because page table entries for that memory are invalidated on all
cpus (cores).


Of cause this effect cannot be observed on an uniprocessor machine
(one cpu / core).

And apparently it isn't the root cause for the bad interactive
performance with this test;  the bad interactive performance can
also be observed on single cpu/single core x86 machines.


A possible optimization for MP machines:  use some kind of
kmem_cache for the gzip buffers, so that these buffers could
be reused between gzip compression calls.


[1] allocations per zio_write_compress() / gzip_compress() call:

  1   6642 kobj_alloc:entry sz 5936, fl 1001
  1   6642 kobj_alloc:entry sz 65544, fl 1001
  1   6642 kobj_alloc:entry sz 65544, fl 1001
  1   6642 kobj_alloc:entry sz 65544, fl 1001
  1   6642 kobj_alloc:entry sz 65544, fl 1001
  1   5769  kobj_free:entry fffeeb307000: sz 65544
  1   5769  kobj_free:entry fffeeb2f5000: sz 65544
  1   5769  kobj_free:entry fffeeb2e3000: sz 65544
  1   5769  kobj_free:entry fffeeb2d1000: sz 65544
  1   5769  kobj_free:entry fffed1c42000: sz 5936
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-04 Thread Jürgen Keil

> A couple more questions here.
...
> You still have idle time in this lockstat (and mpstat).
> 
> What do you get for a lockstat -A -D 20 sleep 30?
> 
> Do you see anyone with long lock hold times, long
> sleeps, or excessive spinning?

Hmm, I ran a series of "lockstat -A -l ph_mutex -s 16 -D 20 sleep 5"
commands while writing to the gzip compressed zpool, and noticed
these high mutex block times:


Adaptive mutex block: 8 events in 5.100 seconds (2 events/sec)

---
Count indv cuml rcnt nsec Lock   Caller  
5  62%  62% 0.00 317300109 ph_mutex+0x1380page_create_va+0x334

  nsec -- Time Distribution -- count Stack   
 536870912 |@@ 5 segkmem_page_create+0x89
 segkmem_xalloc+0xbc 
 segkmem_alloc_vn+0xcd   
 segkmem_alloc+0x20  
 vmem_xalloc+0x4fc   
 vmem_alloc+0x159
 kmem_alloc+0x4f 
 kobj_alloc+0x7e 
 kobj_zalloc+0x1c
 zcalloc+0x2d
 z_deflateInit2_+0x1b8   
 z_deflateInit_+0x32 
 z_compress_level+0x77   
 gzip_compress+0x4b  
 zio_compress_data+0xbc  
---
Count indv cuml rcnt nsec Lock   Caller  
1  12%  75% 0.00 260247717 ph_mutex+0x1a40page_create_va+0x334

  nsec -- Time Distribution -- count Stack   
 268435456 |@@ 1 segkmem_page_create+0x89
 segkmem_xalloc+0xbc 
 segkmem_alloc_vn+0xcd   
 segkmem_alloc+0x20  
 vmem_xalloc+0x4fc   
 vmem_alloc+0x159
 kmem_alloc+0x4f 
 kobj_alloc+0x7e 
 kobj_zalloc+0x1c
 zcalloc+0x2d
 z_deflateInit2_+0x1de   
 z_deflateInit_+0x32 
 z_compress_level+0x77   
 gzip_compress+0x4b  
 zio_compress_data+0xbc  
---
Count indv cuml rcnt nsec Lock   Caller  
1  12%  88% 0.00 348135263 ph_mutex+0x1380page_create_va+0x334

  nsec -- Time Distribution -- count Stack   
 536870912 |@@ 1 segkmem_page_create+0x89
 segkmem_xalloc+0xbc 
 segkmem_alloc_vn+0xcd   
 segkmem_alloc+0x20  
 vmem_xalloc+0x4fc   
 vmem_alloc+0x159
 kmem_alloc+0x4f 
 kobj_alloc+0x7e 
 kobj_zalloc+0x1c
 zcalloc+0x2d
 z_deflateInit2_+0x1a1   
 z_deflateInit_+0x32 
 z_compress_level+0x77   
 gzip_compress+0x4b  
 zio_compress_data+0xbc  
-

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-04 Thread Jürgen Keil

Roch Bourbonnais wrote

> with recent bits ZFS compression is now handled concurrently with  
> many CPUs working on different records.
> So this load will burn more CPUs and acheive it's results  
> (compression) faster.

Is this done using the taskq's, created in spa_activate()?

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109

These threads seems to be running the gzip compression code,
and are apparently started with a priority of maxclsyspri == 99.

> So the observed pauses should be consistent with that of a load  
> generating high system time.
> The assumption is that compression now goes faster than when is was  
> single threaded.
> 
> Is this undesirable ? We might seek a way to slow
> down compression in  order to limit the system load.

Hmm, I see that the USB device drivers are also using taskq's,
see file usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c,
function usba_init_pipe_handle().  The USB device driver is
using a priority of minclsyspri == 60 (or "maxclsyspri - 5" == 94,
in the case of isochronuous usb pipes):

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c#427


Could this be a problem?  That is, when zfs' taskq is filled with
lots of compression requests, there is no time left running USB
taskq that have a lower priority than zfs?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-04 Thread Jürgen Keil

> A couple more questions here.
... 
> What do you have zfs compresison set to?  The gzip level is
> tunable, according to zfs set, anyway:
> 
> PROPERTY   EDIT  INHERIT   VALUES
> compression YES  YES   on | off | lzjb | gzip | gzip-[1-9]

I've used the "default" gzip compression level, that is I used

zfs set compression=gzip gzip_pool

> You still have idle time in this lockstat (and mpstat).
> 
> What do you get for a lockstat -A -D 20 sleep 30?

# lockstat -A -D 20 /usr/tmp/fill /gzip_pool/junk
lockstat: warning: 723388 aggregation drops on CPU 0
lockstat: warning: 239335 aggregation drops on CPU 1
lockstat: warning: 62366 aggregation drops on CPU 0
lockstat: warning: 51856 aggregation drops on CPU 1
lockstat: warning: 45187 aggregation drops on CPU 0
lockstat: warning: 46536 aggregation drops on CPU 1
lockstat: warning: 687832 aggregation drops on CPU 0
lockstat: warning: 575675 aggregation drops on CPU 1
lockstat: warning: 46504 aggregation drops on CPU 0
lockstat: warning: 40874 aggregation drops on CPU 1
lockstat: warning: 45571 aggregation drops on CPU 0
lockstat: warning: 33422 aggregation drops on CPU 1
lockstat: warning: 501063 aggregation drops on CPU 0
lockstat: warning: 361041 aggregation drops on CPU 1
lockstat: warning: 651 aggregation drops on CPU 0
lockstat: warning: 7011 aggregation drops on CPU 1
lockstat: warning: 61600 aggregation drops on CPU 0
lockstat: warning: 19386 aggregation drops on CPU 1
lockstat: warning: 566156 aggregation drops on CPU 0
lockstat: warning: 105502 aggregation drops on CPU 1
lockstat: warning: 25362 aggregation drops on CPU 0
lockstat: warning: 8700 aggregation drops on CPU 1
lockstat: warning: 585002 aggregation drops on CPU 0
lockstat: warning: 645299 aggregation drops on CPU 1
lockstat: warning: 237841 aggregation drops on CPU 0
lockstat: warning: 20931 aggregation drops on CPU 1
lockstat: warning: 320102 aggregation drops on CPU 0
lockstat: warning: 435898 aggregation drops on CPU 1
lockstat: warning: 115 dynamic variable drops with non-empty dirty list
lockstat: warning: 385192 aggregation drops on CPU 0
lockstat: warning: 81833 aggregation drops on CPU 1
lockstat: warning: 259105 aggregation drops on CPU 0
lockstat: warning: 255812 aggregation drops on CPU 1
lockstat: warning: 486712 aggregation drops on CPU 0
lockstat: warning: 61607 aggregation drops on CPU 1
lockstat: warning: 1865 dynamic variable drops with non-empty dirty list
lockstat: warning: 250425 aggregation drops on CPU 0
lockstat: warning: 171415 aggregation drops on CPU 1
lockstat: warning: 166277 aggregation drops on CPU 0
lockstat: warning: 74819 aggregation drops on CPU 1
lockstat: warning: 39342 aggregation drops on CPU 0
lockstat: warning: 3556 aggregation drops on CPU 1
lockstat: warning: ran out of data records (use -n for more)

Adaptive mutex spin: 4701 events in 64.812 seconds (73 events/sec)

Count indv cuml rcnt spin Lock   Caller  
---
 1726  37%  37% 0.002 vph_mutex+0x17e8   pvn_write_done+0x10c
 1518  32%  69% 0.001 vph_mutex+0x17e8   hat_page_setattr+0x70   
  264   6%  75% 0.002 vph_mutex+0x2000   page_hashin+0xad
  194   4%  79% 0.004 0xfffed2ee0a88 cv_wait+0x69
  106   2%  81% 0.002 vph_mutex+0x2000   page_hashout+0xdd   
   91   2%  83% 0.004 0xfffed2ee0a88 taskq_dispatch+0x2c9
   83   2%  85% 0.004 0xfffed2ee0a88 taskq_thread+0x1cb  
   83   2%  86% 0.001 0xfffec17a56b0 ufs_iodone+0x3d 
   47   1%  87% 0.004 0xfffec1e4ce98 vdev_queue_io+0x85  
   43   1%  88% 0.006 0xfffec139a2c0 trap+0xf66  
   38   1%  89% 0.006 0xfffecb5f8cd0 cv_wait+0x69
   37   1%  90% 0.004 0xfffec143ee90 dmult_deque+0x36
   26   1%  91% 0.002 htable_mutex+0x108 htable_release+0x79 
   26   1%  91% 0.001 0xfffec17a56b0 ufs_putpage+0xa4
   18   0%  91% 0.004 0xfffec00dca48 ghd_intr+0xa8   
   17   0%  92% 0.002 0xfffec00dca48 ghd_waitq_delete+0x35   
   12   0%  92% 0.002 htable_mutex+0x248 htable_release+0x79 
   11   0%  92% 0.008 0xfffec1e4ce98 vdev_queue_io_done+0x3b 
   10   0%  93% 0.003 0xfffec00dca48 ghd_transport+0x71  
   10   0%  93% 0.002 0xff00077dc138 
page_get_mnode_freelist+0xdb
---

Adaptive mutex block: 167 events in 64.812 seconds (3 events/sec)

Count indv cuml rcnt nsec Lock   Caller  
---
   78  47%  47% 0.0031623 vph_mutex+0x17e8   pvn_write_done+0x10c

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-03 Thread Jürgen Keil

> I'm not quite sure what this test should show ?

For me, the test shows how writing to a gzip compressed
pool completely kills interactive desktop performance.

At least when using an usb keyboard and mouse.
(I've not yet tested with a ps/2 keyboard & mouse; or
a SPARC box)

> Compressing random data is the perfect way to generate heat.
> After all, compression working relies on input entropy being low.
> But good random generators are characterized by the opposite - output 
> entropy being high. Even a good compressor, if operated on a good random
> generator's output, will only end up burning cycles, but not reducing the
> data size.

Whatever I write to the gzip compressed pool
(128K of /dev/urandom random data, or 128K of a 
buffer filled with completely with  characters, or 
the first 128K from /etc/termcap),  the Xorg / Gnome
desktop becomes completely unusable while 
writing to such a gzip compressed zpool / zfs.

With an "lzjb" compressed zpool / zfs the system
remains more or less usable...

> Hence, is the request here for the compressor module
> to 'adapt', kind of first-pass check the input data whether it's
> sufficiently low-entropy to warrant a compression attempt ?
> 
> If not, then what ?

I'm not yet sure what the problem is. But it sure would be nice
if a gzip compressed zpool / zfs wouldn't kill interactive desktop
performance as is does now.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: gzip compression throttles system?

2007-05-03 Thread Jürgen Keil

> The reason you are busy computing SHA1 hashes is you are using 
> /dev/urandom.  The implementation of drv/random uses
> SHA1 for mixing, 
> actually strictly speaking it is the swrand provider that does that part.

Ahh, ok.

So, instead of using dd reading from /dev/urandom all the time,
I've now used this quick C program to write one /dev/urandom block
over and over to the gzip compressed zpool:

=
#include 
#include 
#include 

int
main(int argc, char **argv)
{
int fd;
char buf[128*1024];

fd = open("/dev/urandom", O_RDONLY);
if (fd < 0) {
perror("open /dev/urandom");
exit(1);
}
if (read(fd, buf, sizeof(buf)) != sizeof(buf)) {
perror("fill buf from /dev/urandom");
exit(1);
}
close(fd);
fd = open(argv[1], O_WRONLY|O_CREAT, 0666);
if (fd < 0) {
perror(argv[1]);
exit(1);
}
for (;;) {
if (write(fd, buf, sizeof(buf)) != sizeof(buf)) {
break;
}
}
close(fd);
exit(0);
}
=


Avoiding the reads from /dev/urandom makes the effect even
more noticeable, the machine now "freezes" for 10+ seconds.

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0 3109  3616  316  1965   17   48   45   2450  85   0  15
  10   0 3127  3797  592  2174   17   63   46   1760  84   0  15
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0 3051  3529  277  2012   14   25   48   2160  83   0  17
  10   0 3065  3739  606  1952   14   37   47   1530  82   0  17
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0 3011  3538  316  2423   26   16   52   2020  81   0  19
  10   0 3019  3698  578  2694   25   23   56   3090  83   0  17

# lockstat -kIW -D 20 sleep 30

Profiling interrupt: 6080 events in 31.341 seconds (194 events/sec)

Count indv cuml rcnt nsec Hottest CPU+PILCaller  
---
 2068  34%  34% 0.00 1767 cpu[0] deflate_slow
 1506  25%  59% 0.00 1721 cpu[1] longest_match   
 1017  17%  76% 0.00 1833 cpu[1] mach_cpu_idle   
  454   7%  83% 0.00 1539 cpu[0] fill_window 
  215   4%  87% 0.00 1788 cpu[1] pqdownheap  
  152   2%  89% 0.00 1691 cpu[0] copy_block  
   89   1%  90% 0.00 1839 cpu[1] z_adler32   
   77   1%  92% 0.0036067 cpu[1] do_splx 
   64   1%  93% 0.00 2090 cpu[0] bzero   
   62   1%  94% 0.00 2082 cpu[0] do_copy_fault_nta   
   48   1%  95% 0.00 1976 cpu[0] bcopy   
   41   1%  95% 0.0062913 cpu[0] mutex_enter 
   27   0%  96% 0.00 1862 cpu[1] build_tree  
   19   0%  96% 0.00 1771 cpu[1] gen_bitlen  
   17   0%  96% 0.00 1744 cpu[0] bi_reverse  
   15   0%  97% 0.00 1783 cpu[0] page_create_va  
   15   0%  97% 0.00 1406 cpu[1] fletcher_2_native   
   14   0%  97% 0.00 1778 cpu[1] gen_codes   
   11   0%  97% 0.00  912 cpu[1]+6   ddi_mem_put8
5   0%  97% 0.00 3854 cpu[1] fsflush_do_pages
---


It seems the same problem can be observed with "lzjb" compression,
but the pauses with lzjb are much shorter and the kernel consumes
less system cpu time with "lzjb" (which is expected, I think).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: gzip compression throttles system?

2007-05-03 Thread Jürgen Keil

> I just had a quick play with gzip compression on a filesystem and the
> result was the machine grinding to a halt while copying some large
> (.wav) files to it from another filesystem in the same pool.
> 
> The system became very unresponsive, taking several seconds to echo
> keystrokes.  The box is a maxed out AMD QuadFX, so it should have plenty
> of grunt for this.

I've observed the same behavior. With my test I've used a zpool
created on a 1GB file (on an UFS filesystem):

# mkfile 1G  /var/tmp/vdev_for_gzip_pool
# zpool create gzip_pool /var/tmp/vdev_for_gzip_pool
# zfs set compression=gzip gzip_pool
# chown jk /gzip_pool

Now, when I run this command...

% dd bs=128k if=/dev/urandom of=/gzip_pool/junk

... the mouse cursor sometimes is frozen for two
(or more) seconds. Same with keyboard input.

This is on an amd64 x2 box, 4gb memory, and usb
keyboard and usb mouse.


Lots of system cpu time is used while the gzip compressed
poll is filled:

% mpstat 5
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   47   1  122   646  316  317   13   2482  10243   3   0  94
  1   41   1  159   334  101  279   11   2482   9002   3   0  94
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  01   0 6860  7263  282   7322   781640  70   0  30
  10   0 6866  6870   10491   850   1210 100   0   0
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   06   576  301  4653   19   18  146   1261   5   0  95
  10   0   36  1471 1276  410   29   20   24  115   3350  59   0  41
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0 5404  5823  309  11322   571   4200  56   0  44
  10   0 5409  5431   135   121   801   1790 100   0   0
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   01   529  300  281   17   13   10  103   2740  64   0  36
  10   09  1348 1169  4528   23   14  105   1630   5   0  95
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0 6186  6607  282   6289   55   11   1230  88   0  12
  10   0 6196  6259   53   7843   80   12   1320  75   0  25

A kernel profile seems to show that the kernel is busy with gzip'ing
(and busy with computing SHA1 hashes?):

# lockstat -kIW -D 20 sleep 20

Profiling interrupt: 3882 events in 20.021 seconds (194 events/sec)

Count indv cuml rcnt nsec Hottest CPU+PILCaller  
---
 1802  46%  46% 0.00 1931 cpu[0] mach_cpu_idle   
  517  13%  60% 0.00 6178 cpu[1] SHA1Transform   
  482  12%  72% 0.00 1094 cpu[0] deflate_slow
  328   8%  81% 0.00  cpu[1] longest_match   
  104   3%  83% 0.00  940 cpu[0] fill_window 
   98   3%  86% 0.0047357 cpu[1] bcopy   
   65   2%  87% 0.00 5438 cpu[1] SHA1Update  
   63   2%  89% 0.00  834 cpu[1] bzero   
   50   1%  90% 0.00 1042 cpu[0] pqdownheap  
   44   1%  92% 0.00  676 cpu[1] Encode  
   32   1%  92% 0.00 1136 cpu[0] copy_block  
   24   1%  93% 0.00 1214 cpu[1] do_copy_fault_nta   
   23   1%  94% 0.00   401205 cpu[0] do_splx 
   23   1%  94% 0.00  644 cpu[1] hmac_encr   
   22   1%  95% 0.00 1058 cpu[1] z_adler32   
   19   0%  95% 0.00 1208 cpu[0]+10  todpc_rtcget
   16   0%  96% 0.00  752 cpu[0] SHA1Final   
   14   0%  96% 0.00 1186 cpu[1] mutex_enter 
   12   0%  96% 0.00  642 cpu[1] kcopy   
   11   0%  97% 0.00  948 cpu[0] page_create_va  
---
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS and UFS performance

2007-03-29 Thread Jürgen Keil

> > That's probably bug 6382683 "lofi is confused about sync/async I/O",
> > and AFAIK it's fixed in current opensolaris releases.
> > 
> According to Bug Database bug 6382683 is in
> 1-Dispatched state, what does that mean? I wonder if
> the fix is available (or will be available) as a
> Solaris 10 patch?

Seems I was wrong, and this issue is not yet fixed.

Yesterday, before replying, I've tried to check the state
of the bug, but b.o.o. always returned "We encountered
an unexpected error. Please try back again."  :-(


I repeated my test case (creating a pcfs filesystem on a 
lofi device from an 80gbyte file on zfs), and the write 
times with a current opensolaris kernel have improved by
a factor of 4 (~ 10 seconds instead of 40-50 seconds),
but I guess part of the improvement is because of a hardware
upgrade here (zpool on two s-ata drives, instead of one p-ata
drive).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS and UFS performance

2007-03-28 Thread Jürgen Keil

> We are running Solaris 10 11/06 on a Sun V240 with 2
> CPUS and 8 GB of memory. This V240 is attached to a
> 3510 FC that has 12 x 300 GB disks. The 3510 is
> configured as HW RAID 5 with 10 disks and 2 spares
> and it's exported to the V240 as a single LUN.
> 
> We create iso images of our product in the following
> way (high-level):
> 
> # mkfile 3g /isoimages/myiso
> # lofiadm -a /isoimages/myiso
> /dev/lofi/1
> # newfs /dev/rlofi/1
> # mount /dev/lofi/1 /mnt
> # cd /mnt; zcat /product/myproduct.tar.Z | tar xf -
> 
> and we finally use mkisofs to create the iso image.
> 

> 
> ZFS performance
> -
> When we create a ZFS file system on the above LUN and
> create the iso it takes forever it seems to be
> hanging in the tar extraction (we killed this after a
> while i.e. > few hours).

That's probably bug 6382683 "lofi is confused about sync/async I/O",
and AFAIK it's fixed in current opensolaris releases.

See the thread with subject "bad lofi performance with zfs file backend /
bad mmap write performance"  from january / february 2006:

http://mail.opensolaris.org/pipermail/zfs-discuss/2006-January/016450.html
http://mail.opensolaris.org/pipermail/zfs-discuss/2006-February/016566.html

Possible workaround:
Create a 3gb zvol device, and use that instead of a 3gb file + lofi.

Or use something like this:

  zfs create tank/myiso
  cd /tank/myiso
  cat /product/myproduct.tar.Z | tar xf -
  mkisofs ...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS and Firewire/USB enclosures

2007-03-20 Thread Jürgen Keil

> I still haven't got any "warm and fuzzy" responses
> yet solidifying ZFS in combination with Firewire or USB enclosures.

I was unable to use zfs (that is "zpool create" or "mkfs -F ufs") on
firewire devices, because scsa1394 would hang the system as
soon as multiple concurrent write commands are submitted to it.

I filed bug 6445725 (which disappeared in the scsa1394
bugs.opensolaris.org black hole), submitted a fix and
requested a sponsor for the fix[*], but not much has happened
with fixing this problem in opensolaris.  

There is no such problem with USB mass storage devices.

[*] http://www.opensolaris.org/jive/thread.jspa?messageID=46190
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs legacy filesystem remounted rw: atime temporary off?

2007-02-05 Thread Jürgen Keil

I have my /usr filesystem configured as a zfs filesystem,
using a legacy mountpoint.  I noticed that the system boots
with atime updates temporarily turned off (and doesn't record
file accesses in the /usr filesystem):

# df -h /usr
Filesystem size   used  avail capacity  Mounted on
files/usr-b57   98G   2.1G18G11%/usr

# zfs get atime files/usr-b57
NAME   PROPERTY  VALUE  SOURCE
files/usr-b57  atime offtemporary


That is, when a zfs legacy filesystem is mounted in
read-only mode, and then remounted read/write,
atime updates are off:

# zfs create -o mountpoint=legacy files/foobar

# mount -F zfs -o ro files/foobar /mnt

# zfs get atime files/foobar
NAME  PROPERTY  VALUE SOURCE
files/foobar  atime ondefault

# mount -F zfs -o remount,rw files/foobar /mnt

# zfs get atime files/foobar
NAME  PROPERTY  VALUE SOURCE
files/foobar  atime off   temporary


Is this expected behaviour?

It works if I remount with the "atime" option:

# mount -F zfs -o remount,rw,atime files/foobar /mnt

# zfs get atime files/foobar
NAME  PROPERTY  VALUE SOURCE
files/foobar  atime ondefault
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Heavy writes freezing system

2007-01-16 Thread Jürgen Keil

> We are having issues with some Oracle databases on
> ZFS. We would appreciate any useful feedback you can
> provide.
> [...]
> The issue seems to be
> serious write contention/performance. Some read
> issues also exhibit themselves, but they seem to be
> secondary to the write issues.

What hardware is used?  Sparc? x86 32-bit? x86 64-bit?
How much RAM is installed?
Which version of the OS? 


Did you already try to monitor kernel memory usage,
while writing to zfs?  Maybe the kernel is running out of
free memory?  (I've bugs like 6483887 in mind, 
"without direct management, arc ghost lists can run amok")

For a live system:

echo ::kmastat | mdb -k
echo ::memstat | mdb -k


In case you've got a crash dump for the hung system, you
can try the same ::kmastat and ::memstat commands using the 
kernel crash dumps saved in directory /var/crash/`hostname`

# cd /var/crash/`hostname`
# mdb -k unix.1 vmcore.1
::memstat
::kmastat
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS related (probably) hangs due to memory exhaustion(?) with snv53

2007-01-05 Thread Jürgen Keil

> >Hmmm, so there is lots of evictable cache here (mostly in the MFU
> >part of the cache)... could you make your core file available?
> >I would like to take a look at it.
> 
> Isn't this just like:
> 6493923 nfsfind on ZFS filesystem quickly depletes memory in a 1GB system
> 
> Which was introduced in b51(or 52) and fixed in snv_54.

Hmm, or like:
6483887 without direct management, arc ghost lists can run amok
(which isn't fixed at this time)

See also this thread:
http://www.opensolaris.org/jive/thread.jspa?messageID=67370

Mark had send me some test bits with a modified arc.c; it tried to
evict ghost list entries when the arc cache is in no_grow state
and the arc ghost lists consume too much memory.

The main change was a new function arc_buf_hdr_alloc(), in arc.c
that shrinks the ghost lists when the system is running out
of memory:

static arc_buf_hdr_t *
arc_buf_hdr_alloc(spa_t *spa, int size)
{
arc_buf_hdr_t *hdr;

if (arc.no_grow && arc.mru_ghost->size + arc.mfu_ghost->size > arc.c) {
int64_t mru_over = arc.anon->size + arc.mru->size +
arc.mru_ghost->size - arc.c;

if (mru_over > 0 && arc.mru_ghost->size > 0) {
int64_t todelete = MIN(arc.mru_ghost->lsize, mru_over);
arc_evict_ghost(arc.mru_ghost, todelete);
} else {
int64_t todelete = MIN(arc.mfu_ghost->lsize,
arc.mru_ghost->size + arc.mfu_ghost->size - arc.c);
arc_evict_ghost(arc.mfu_ghost, todelete);
}
}

ASSERT3U(size, >, 0);
hdr = kmem_cache_alloc(hdr_cache, KM_SLEEP);
ASSERT(BUF_EMPTY(hdr));
hdr->b_size = size;
hdr->b_spa = spa;
hdr->b_state = arc.anon;
hdr->b_arc_access = 0;
hdr->b_flags = 0;
return (hdr);
}



This was then used by arc_buf_alloc():

arc_buf_t *
arc_buf_alloc(spa_t *spa, int size, void *tag)
{
arc_buf_hdr_t *hdr;
arc_buf_t *buf;

hdr = arc_buf_hdr_alloc(spa, size);
buf = kmem_cache_alloc(buf_cache, KM_SLEEP);
buf->b_hdr = hdr;
...
return (buf);
}
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs/fstyp slows down recognizing pcfs formatted floppies

2006-12-18 Thread Jürgen Keil

I've noticed that fstyp on a floppy media formatted with "pcfs" now needs 
somewhere between
30 - 100 seconds to find out that the floppy media is formatted with "pcfs".

E.g. on sparc snv_48, I currently observe this:
% time fstyp /vol/dev/rdiskette0/nomedia
pcfs
0.01u 0.10s 1:38.84 0.1%


zfs's /usr/lib/fs/zfs/fstyp.so.1 seems to add about 40 seconds to that time, 
because it
reads 1 mbyte from the floppy media (~ 2/3 of a 1.44MB floppy), only to find 
out that the
floppy media does not contain a zfs pool:

SPARC snv_48, before tamarack:
% time /usr/lib/fs/zfs/fstyp /vol/dev/rdiskette0/nomedia
unknown_fstyp (no matches)
0.01u 0.04s 0:36.27 0.1%

x86, snv_53, with tamarack:
% time /usr/lib/fs/zfs/fstyp /dev/rdiskette
unknown_fstyp (no matches)
0.00u 0.01s 0:35.25 0.0%

(the rest of the time is wasted probing for an udfs filesystem)


Isn't the minimum device size required for a zfs pool 64 mbytes?
(SPA_MINDEVSIZE, from the sys/fs/zfs.h header)

Shouldn't zfs/fstyp skip probing for zfs / zpools on small capacity 
devices like a floppy media, that are less than this 64 mbytes ?

diff -r 367766133bfe usr/src/cmd/fs.d/zfs/fstyp/fstyp.c
--- a/usr/src/cmd/fs.d/zfs/fstyp/fstyp.cFri Dec 15 09:03:53 2006 -0800
+++ b/usr/src/cmd/fs.d/zfs/fstyp/fstyp.cSun Dec 17 11:27:08 2006 +0100
@@ -32,6 +32,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -88,6 +90,15 @@ fstyp_mod_ident(fstyp_mod_handle_t handl
char*str;
uint64_t u64;
charbuf[64];
+   struct stat stb;
+
+   /*
+* don't probe for zfs on small media (e.g. floppy) that is
+* too small for a zpool.
+*/
+   if (fstat(h->fd, &stb) == 0 && stb.st_size < SPA_MINDEVSIZE) {
+   return (FSTYP_ERR_NO_MATCH);
+   }

if (zpool_read_label(h->fd, &h->config) != 0 ||
h->config == NULL) {
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Recommended Minimum Hardware for ZFS Fileserver?

2006-10-30 Thread Jürgen Keil

> I've been looking at building this setup in some
> cheap eBay rack-mount servers that are generally
> single or dual 1.0GHz Pentium III, 1Gb PC133 RAM, and
> I'd have to add the SATA II controller into a spare
> PCI slot.
> 
> For maximum file system performance of the ZFS pool,
> would anyone care to offer hardware recommendations?

For maximum file system performance of the ZFS pool,
a 64-bit x86 cpu would be *much* better than a 32-bit x86 cpu.

The 32-bit cpu won't use more than ~ 512Mb of RAM for
ZFS' ARC cache (no matter how much is installed in the
machine); a 64-bit cpu is able to use all of the
available RAM for ZFS's cache.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: ZFS hangs systems during copy

2006-10-27 Thread Jürgen Keil

> This is:
> 6483887 without direct management, arc ghost lists can run amok

That seems to be a new bug?
http://bugs.opensolaris.org does not yet find it.

> The fix I have in mind is to control the ghost lists as part of
> the arc_buf_hdr_t allocations.  If you want to test out my fix,
> I can send you some diffs...

Ok, I can do that.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: zpool snapshot fails on unmounted filesystem

2006-10-27 Thread Jürgen Keil

> I just retried to reproduce it to generate a reliable
> test case. Unfortunately, I cannot reproduce the
> error message. So I really have no idea what might
> have cause it

I also had this problem 2-3 times in the past,
but I cannot reproduce it.



Using dtrace against the kernel, I found out that the source
of the EBUSY error 16 is the kernel function zil_suspend():

[b]
...
  0<- dnode_cons  0
  0-> dnode_setdblksz
  0<- dnode_setdblksz14
  0-> dmu_zfetch_init
  0  -> list_create
  0  <- list_create  3734548404
  0  -> rw_init
  0  <- rw_init  3734548400
  0<- dmu_zfetch_init3734548400
  0-> list_insert_head
  0<- list_insert_head3734548052
  0  <- dnode_create 3734548048
  0<- dnode_special_open 3734548048
  0-> dsl_dataset_set_user_ptr
  0<- dsl_dataset_set_user_ptr 0
  0  <- dmu_objset_open_impl  0
  0<- dmu_objset_open 0
  0-> dmu_objset_zil
  0<- dmu_objset_zil 3700903200
  0-> zil_suspend
  0 | zil_suspend:entry   zh_claim_txg: 83432
  0<- zil_suspend16
  0-> dmu_objset_close
  0  -> dsl_dataset_close
  0-> dbuf_rele
  0  -> dbuf_evict_user
  0-> dsl_dataset_evict
  0  -> unique_remove
...

  1200  /*
  1201   * Suspend an intent log.  While in suspended mode, we still honor
  1202   * synchronous semantics, but we rely on txg_wait_synced() to do it.
  1203   * We suspend the log briefly when taking a snapshot so that the 
snapshot
  1204   * contains all the data it's supposed to, and has an empty intent log.
  1205   */
  1206  int
  1207  zil_suspend(zilog_t *zilog)
  1208  {
  1209  const zil_header_t *zh = zilog->zl_header;
  1210  lwb_t *lwb;
  1211
  1212  mutex_enter(&zilog->zl_lock);
  1213  if (zh->zh_claim_txg != 0) {/* unplayed log */
  1214  mutex_exit(&zilog->zl_lock);
  1215  return (EBUSY);
  1216  }
...
[/b]



It seems that you can identify zfs filesystems that fail
zfs snapshot with error 16 EBUSY using

zdb -iv {your_zpool_here} | grep claim_txg

If there are any ZIL headers listed with a claim_txg != 0, the
dataset that uses this ZIL should fail zfs snapshot with
error 16, EBUSY.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: ZFS hangs systems during copy

2006-10-27 Thread Jürgen Keil

> >> Sounds familiar. Yes it is a small system a Sun blade 100 with 128MB of 
> >> memory.
> > 
> > Oh, 128MB...
> 
> > Btw, does anyone know if there are any minimum hardware (physical memory)
> > requirements for using ZFS?
> > 
> > It seems as if ZFS wan't tested that much on machines with 256MB (or less)
> > memory...
> 
> The minimum hardware requirement for Solaris 10 (including ZFS) is 
> 256MB, and we did test with that :-)
> 
> On small memory systems, make sure that you are running with 
> kmem_flags=0 (this is the default on non-debug builds, but debug builds 
> default to kmem_flags=f and you will have to manually change it in 
> /etc/system).

I do have kernel memory allocator debugging disabled; both S10 6/2006
and SX:CR snv48 are non-debug builds.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS hangs systems during copy

2006-10-26 Thread Jürgen Keil

> ZFS 11.0 on Solaris release 06/06, hangs systems when
> trying to copy files from my VXFS 4.1 file system.
> any ideas what this problem could be?.

What kind of system is that?  How much memory is installed?

I'm able to hang an Ultra 60 with 256 MByte of main memory,
simply by writing big files to a ZFS filesystem.  The problem
happens with both Solaris 10 6/2006 and Solaris Express snv_48.

In my case there seems to be a problem with ZFS' ARC cache,
which is not returning memory to the kernel, when free memory
gets low.  Instead, ZFS' ARC cache data structures keeps growing
until the machine is running out of kernel memory.  At this point
the machine hangs, lots of kernel threads are waiting for free memory,
and the box must be power cycled (Well, unpluging and re-connecting
the type 5 keyboard works and gets me to the OBP, where I can force
a system crashdump and reboot).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

1 2 >

1 - 100 of 111 matches

Mail list logo