from:"Ben"

Re: [zfs-discuss] Running on Dell hardware?

2011-01-12 Thread Ben Rockwood

If you're still having issues go into the BIOS and disable C-States, if you 
haven't already.  It is responsible for most of the problems with 11th Gen 
PowerEdge.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Disk keeps resilvering, was: Replacing a disk never completes

2010-09-30 Thread Ben Miller


On 09/22/10 04:27 PM, Ben Miller wrote:

On 09/21/10 09:16 AM, Ben Miller wrote:



I had tried a clear a few times with no luck. I just did a detach and that
did remove the old disk and has now triggered another resilver which
hopefully works. I had tried a remove rather than a detach before, but that
doesn't work on raidz2...

thanks,
Ben


I made some progress. That resilver completed with 4 errors. I cleared
those and still had the one error metadata:0x0 so I started a scrub.
The scrub restarted the resilver on c4t0d0 again though! There currently
are no errors anyway, but the resilver will be running for the next day+.
Is this another bug or will doing a scrub eventually lead to a scrub of the
pool instead of the resilver?

Ben


	Well not much progress.  The one permanent error metadata:0x0 came 
back.  And the disk keeps wanting to resilver when trying to do a scrub. 
Now after the last resilver I have more checksum errors on the pool, but 
not on any disks:

NAME  STATE READ WRITE CKSUM
pool2 ONLINE  0 037
...
  raidz2-1ONLINE  0 074

All other checksum totals are 0.  So three problems:
1. How to get the disk to stop resilvering?

	2. How do you get checksum errors on the pool, but no disk is identified? 
 If I clear them and let the resilver go again more checksum errors 
appear.  So how to get rid of these errors?


	3. How to get rid of the metadata:0x0 error?  I'm currently destroying old 
snapshots (though that bug was fixed quite awhile ago and I'm running 
b134).  I can try unmounting filesystems and remounting next (all are 
currently mounted).  I can also schedule a reboot for next week if anyone 
things that would help.


thanks,
Ben

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Replacing a disk never completes

2010-09-21 Thread Ben Miller


On 09/20/10 10:45 AM, Giovanni Tirloni wrote:

On Thu, Sep 16, 2010 at 9:36 AM, Ben Miller bmil...@mail.eecis.udel.edu
mailto:bmil...@mail.eecis.udel.edu wrote:

I have an X4540 running b134 where I'm replacing 500GB disks with 2TB
disks (Seagate Constellation) and the pool seems sick now.  The pool
has four raidz2 vdevs (8+2) where the first set of 10 disks were
replaced a few months ago.  I replaced two disks in the second set
(c2t0d0, c3t0d0) a couple of weeks ago, but have been unable to get the
third disk to finish replacing (c4t0d0).

I have tried the resilver for c4t0d0 four times now and the pool also
comes up with checksum errors and a permanent error (metadata:0x0).
  The first resilver was from 'zpool replace', which came up with
checksum errors.  I cleared the errors which triggered the second
resilver (same result).  I then did a 'zpool scrub' which started the
third resilver and also identified three permanent errors (the two
additional were in files in snapshots which I then destroyed).  I then
did a 'zpool clear' and then another scrub which started the fourth
resilver attempt.  This last attempt identified another file with
errors in a snapshot that I have now destroyed.

Any ideas how to get this disk finished being replaced without
rebuilding the pool and restoring from backup?  The pool is working,
but is reporting as degraded and with checksum errors.


[...]

Try to run a `zpool clear pool2` and see if clears the errors. If not, you
may have to detach `c4t0d0s0/o`.

I believe it's a bug that was fixed in recent builds.

	I had tried a clear a few times with no luck.  I just did a detach and 
that did remove the old disk and has now triggered another resilver which 
hopefully works.  I had tried a remove rather than a detach before, but 
that doesn't work on raidz2...


thanks,
Ben


--
Giovanni Tirloni
gtirl...@sysdroid.com mailto:gtirl...@sysdroid.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Replacing a disk never completes

2010-09-16 Thread Ben Miller

I have an X4540 running b134 where I'm replacing 500GB disks with 2TB disks 
(Seagate Constellation) and the pool seems sick now.  The pool has four 
raidz2 vdevs (8+2) where the first set of 10 disks were replaced a few 
months ago.  I replaced two disks in the second set (c2t0d0, c3t0d0) a 
couple of weeks ago, but have been unable to get the third disk to finish 
replacing (c4t0d0).


I have tried the resilver for c4t0d0 four times now and the pool also comes 
up with checksum errors and a permanent error (metadata:0x0).  The 
first resilver was from 'zpool replace', which came up with checksum 
errors.  I cleared the errors which triggered the second resilver (same 
result).  I then did a 'zpool scrub' which started the third resilver and 
also identified three permanent errors (the two additional were in files in 
snapshots which I then destroyed).  I then did a 'zpool clear' and then 
another scrub which started the fourth resilver attempt.  This last attempt 
identified another file with errors in a snapshot that I have now destroyed.


Any ideas how to get this disk finished being replaced without rebuilding 
the pool and restoring from backup?  The pool is working, but is reporting 
as degraded and with checksum errors.


Here is what the pool currently looks like:

 # zpool status -v pool2
  pool: pool2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 33h9m with 4 errors on Thu Sep 16 00:28:14
config:

NAME  STATE READ WRITE CKSUM
pool2 DEGRADED 0 0 8
  raidz2-0ONLINE   0 0 0
c0t4d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
c2t4d0ONLINE   0 0 0
c3t4d0ONLINE   0 0 0
c4t4d0ONLINE   0 0 0
c5t4d0ONLINE   0 0 0
c2t5d0ONLINE   0 0 0
c3t5d0ONLINE   0 0 0
c4t5d0ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
  raidz2-1DEGRADED 0 014
c0t5d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
c2t1d0ONLINE   0 0 0
c3t1d0ONLINE   0 0 0
c4t1d0ONLINE   0 0 0
c5t1d0ONLINE   0 0 0
c2t0d0ONLINE   0 0 0
c3t0d0ONLINE   0 0 0
replacing-8   DEGRADED 0 0 0
  c4t0d0s0/o  OFFLINE  0 0 0
  c4t0d0  ONLINE   0 0 0  268G resilvered
c5t0d0ONLINE   0 0 0
  raidz2-2ONLINE   0 0 0
c0t6d0ONLINE   0 0 0
c1t6d0ONLINE   0 0 0
c2t6d0ONLINE   0 0 0
c3t6d0ONLINE   0 0 0
c4t6d0ONLINE   0 0 0
c5t6d0ONLINE   0 0 0
c2t7d0ONLINE   0 0 0
c3t7d0ONLINE   0 0 0
c4t7d0ONLINE   0 0 0
c5t7d0ONLINE   0 0 0
  raidz2-3ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 0
c2t3d0ONLINE   0 0 0
c3t3d0ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
c5t3d0ONLINE   0 0 0
c2t2d0ONLINE   0 0 0
c3t2d0ONLINE   0 0 0
c4t2d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 0
logs
  mirror-4ONLINE   0 0 0
c0t1d0s0  ONLINE   0 0 0
c1t3d0s0  ONLINE   0 0 0
cache
  c0t3d0s7ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

metadata:0x0
0x167a2:0x552ed
(This second file was in a snapshot I destroyed after the resilver 
completed).


# zpool list pool2
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
pool2  31.8T  13.8T  17.9T43%  1.65x  DEGRADED  -

The slog is a mirror of two SLC SSDs and the L2ARC is an MLC SSD.

thanks,
Ben
___
zfs-discuss mailing list

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-14 Thread Ben Rockwood

 On 8/14/10 1:12 PM, Frank Cusack wrote:

 Wow, what leads you guys to even imagine that S11 wouldn't contain
 comstar, etc.?  *Of course* it will contain most of the bits that
 are current today in OpenSolaris.

That's a very good question actually.  I would think that COMSTAR would
stay because its used by the Fishworks appliance... however, COMSTAR is
a competitive advantage for DIY storage solutions.  Maybe they will rip
it out of S11 and make it an add-on or something.   That would suck.

I guess the only real reason you can't yank COMSTAR is because its now
the basis for iSCSI Target support.  But again, there is nothing saying
that Target support has to be part of the standard OS offering.

Scary to think about. :)

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-13 Thread Ben Rockwood

 On 8/13/10 9:02 PM, C. Bergström wrote:
 Erast wrote:


 On 08/13/2010 01:39 PM, Tim Cook wrote:
 http://www.theregister.co.uk/2010/08/13/opensolaris_is_dead/

 I'm a bit surprised at this development... Oracle really just doesn't
 get it.  The part that's most disturbing to me is the fact they
 won't be
 releasing nightly snapshots.  It appears they've stopped Illumos in its
 tracks before it really even got started (perhaps that explains the
 timing of this press release)

 Wrong. Be patient, with the pace of current Illumos development it
 soon will have all the closed binaries liberated and ready to sync up
 with promised ON code drops as dictated by GPL and CDDL licenses.
 Illumos is just a source tree at this point.  You're delusional,
 misinformed, or have some big wonderful secret if you believe you have
 all the bases covered for a pure open source distribution though..

 What's closed binaries liberated really mean to you?

 Does it mean
a. You copy over the binary libCrun and continue to use some
 version of Sun Studio to build onnv-gate
b. You debug the problems with and start to use ancient gcc-3 (at
 the probably expense of performance regressions which most people
 would find unacceptable)
c. Your definition is narrow and has missed some closed binaries


 I think it's great people are still hopeful, working hard and going to
 steward this forward, but I wonder.. What pace are you referring to? 
 The last commit to illumos-gate was 6 days ago and you're already not
 even keeping it in sync..  Can you even build it yet and if so where's
 the binaries?

Illumos is 2 weeks old.  Lets cut it a little slack. :)


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS compression

2010-07-25 Thread Ben

Hi all,

I'm running out of space on my OpenSolaris file server and can't afford to buy 
any new storage for a short while.  Seeing at the machine has a dual core CPU 
at 2.2GHz and 4GB ram, I was thinking compression might be the way to go...

I've read a small amount about compression, enough to find that it'll effect 
performance (not a problem for me) and that once you enable compression it only 
effects new files written to the file system.  
Is this still true of b134?  And if it is, how can I compress all of the 
current data on the file system?  Do I have to move it off then back on?

Thanks for any advice,
Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS compression

2010-07-25 Thread Ben

Thanks Alex,

I've set compression on and have transferred data from the OpenSolaris machine 
to my Mac, deleted any snapshots and am now transferring them back.
It seems to be working, but there's lots to transfer!

I didn't know that MacZFS was still going, it's great to hear that people are 
still working on it.  I may have to pluck up the courage to put it on my Mac 
Pro if I do a rebuild anytime soon.

Thanks again,
Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS mirror to RAIDz?

2010-07-16 Thread Ben

Hi all,

I currently have four drives in my OpenSolaris box.  The drives are split into 
two mirrors, one mirror containing my rpool (disks 1  2) and one containing 
other data (disks 2  3).

I'm running out of space on my data mirror and am thinking of upgrading it to 
two 2TB disks. I then considered replacing disk 2 with a 2TB disk and making a 
RAIDz from the three new drives.

I know this would leave my rpool vulnerable to hard drive failure, but I've got 
no data on it that can't be replaced with a reinstall.

Can this be done easily?  Or will I have to transfer all of my data to another 
machine and build the RAIDz from scratch, then transfer the data back?

Thanks for any advice,
Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Using a zvol from your rpool as zil for another zpool

2010-07-02 Thread Ben Taylor

 We have a server with a couple X-25E's and a bunch of
 larger SATA
 disks.
 
 To save space, we want to install Solaris 10 (our
 install is only about
 1.4GB) to the X-25E's and use the remaining space on
 the SSD's for ZIL
 attached to a zpool created from the SATA drives.
 
 Currently we do this by installing the OS using
 SVM+UFS (to mirror the
 OS between the two SSD's) and then using the
 remaining space on a slice
 as ZIL for the larger SATA-based zpool.
 
 However, SVM+UFS is more annoying to work with as far
 as LiveUpgrade is
 concerned.  We'd love to use a ZFS root, but that
 requires that the
 entire SSD be dedicated as an rpool leaving no space
 for ZIL.  Or does
 it?

For every system I have ever done zfs root on, it's always
been a slice on a disk.  As an example, we have an x4500
with 1TB disks.  For that root config, we are planning on
something like 150G on s0, and the rest on S3. s0 for
the rpool, and s3 for the qpool.  We didn't want to have
to deal with issues around flashing a huge volume, as
we found out with our other x4500 with 500GB disks.

AFAIK, it's only non-rpool disks that use the whole disk,
and I doubt there's some sort of specific feature with
an SSD, but I could be wrong.

I like your idea of a reasonably sized root rpool and the
rest used for the ZIL.  But if you're going to do LU,
you should probably take a good look at how much space
you need for the clones and snapshots on the rpool

Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS on Ubuntu

2010-06-26 Thread Ben Miles

What supporting applications are there on Ubuntu for RAIDZ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS on Ubuntu

2010-06-26 Thread Ben Miles

I tried to post this question on the Ubuntu forum.
Within 30 minutes my post was on the second page of new posts...

Yah.  Im really not down with using Ubuntu on my server here.  But I may be 
forced to.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS on Ubuntu

2010-06-25 Thread Ben Miles

How much of a difference is there in supporting applications in between Ubuntu 
and OpenSolaris?
I was not considering Ubuntu until OpenSOlaris would not load onto my machine...

Any info would be great. I have not been able to find any sort of comparison of 
ZFS on Ubuntu and OS.

Thanks.

(My current OS install troubleshoot thread - 
http://opensolaris.org/jive/thread.jspa?messageID=488193#488193)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Pool is wrong size in b134

2010-06-17 Thread Ben Miller

I upgraded a server today that has been running SXCE b111 to the 
OpenSolaris preview b134.  It has three pools and two are fine, but one 
comes up with no space available in the pool (SCSI jbod of 300GB disks). 
The zpool version is at 14.


I tried exporting the pool and re-importing and I get several errors like 
this both exporting and importing:


# zpool export pool1
WARNING: metaslab_free_dva(): bad DVA 0:645838978048
WARNING: metaslab_free_dva(): bad DVA 0:645843271168
...

I tried removing the zpool.cache file, rebooting, importing and receive no 
warnings, but still reporting the wrong avail and size.


# zfs list pool1
NAMEUSED  AVAIL  REFER  MOUNTPOINT
pool1   396G  0  3.22M  /export/home
# zpool list pool1
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
pool1   476G   341G   135G71%  1.00x  ONLINE  -
# zpool status pool1
  pool: pool1
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
pool1ONLINE   0 0 0
  raidz2-0   ONLINE   0 0 0
c1t8d0   ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0
c1t12d0  ONLINE   0 0 0
c1t13d0  ONLINE   0 0 0
c1t14d0  ONLINE   0 0 0

errors: No known data errors

I try exporting and again get the metaslab_free_dva() warnings.  Imported 
again with no warnings, but same numbers as above.  If I try to remove 
files or truncate files I receive no free space errors.


I reverted back to b111 and here is what the pool really looks like.

# zfs list pool1
NAMEUSED  AVAIL  REFER  MOUNTPOINT
pool1   396G   970G  3.22M  /export/home
# zpool list pool1
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
pool1  1.91T   557G  1.36T28%  ONLINE  -
# zpool status pool1
  pool: pool1
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
pool1ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c1t8d0   ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0
c1t12d0  ONLINE   0 0 0
c1t13d0  ONLINE   0 0 0
c1t14d0  ONLINE   0 0 0

errors: No known data errors

Also, the disks were replaced one at a time last year from 73GB to 300GB to 
increase the size of the pool.  Any idea why the pool is showing up as the 
wrong size in b134 and have anything else to try?  I don't want to upgrade 
the pool version yet and then not be able to revert back...


thanks,
Ben

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool is wrong size in b134

2010-06-17 Thread Ben Miller


Cindy,
The other two pools are 2 disk mirrors (rpool and another).

Ben

Cindy Swearingen wrote:

Hi Ben,

Any other details about this pool, like how it might be different from 
the other two pools on this system, might be helpful...


I'm going to try to reproduce this problem.

We'll be in touch.

Thanks,

Cindy

On 06/17/10 07:02, Ben Miller wrote:
I upgraded a server today that has been running SXCE b111 to the 
OpenSolaris preview b134.  It has three pools and two are fine, but 
one comes up with no space available in the pool (SCSI jbod of 300GB 
disks). The zpool version is at 14.


I tried exporting the pool and re-importing and I get several errors 
like this both exporting and importing:


# zpool export pool1
WARNING: metaslab_free_dva(): bad DVA 0:645838978048
WARNING: metaslab_free_dva(): bad DVA 0:645843271168
...

I tried removing the zpool.cache file, rebooting, importing and 
receive no warnings, but still reporting the wrong avail and size.


# zfs list pool1
NAMEUSED  AVAIL  REFER  MOUNTPOINT
pool1   396G  0  3.22M  /export/home
# zpool list pool1
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
pool1   476G   341G   135G71%  1.00x  ONLINE  -
# zpool status pool1
  pool: pool1
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool 
can

still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
pool1ONLINE   0 0 0
  raidz2-0   ONLINE   0 0 0
c1t8d0   ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0
c1t12d0  ONLINE   0 0 0
c1t13d0  ONLINE   0 0 0
c1t14d0  ONLINE   0 0 0

errors: No known data errors

I try exporting and again get the metaslab_free_dva() warnings.  
Imported again with no warnings, but same numbers as above.  If I try 
to remove files or truncate files I receive no free space errors.


I reverted back to b111 and here is what the pool really looks like.

# zfs list pool1
NAMEUSED  AVAIL  REFER  MOUNTPOINT
pool1   396G   970G  3.22M  /export/home
# zpool list pool1
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
pool1  1.91T   557G  1.36T28%  ONLINE  -
# zpool status pool1
  pool: pool1
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
pool1ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c1t8d0   ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0
c1t12d0  ONLINE   0 0 0
c1t13d0  ONLINE   0 0 0
c1t14d0  ONLINE   0 0 0

errors: No known data errors

Also, the disks were replaced one at a time last year from 73GB to 
300GB to increase the size of the pool.  Any idea why the pool is 
showing up as the wrong size in b134 and have anything else to try?  I 
don't want to upgrade the pool version yet and then not be able to 
revert back...


thanks,
Ben

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Hard disk buffer at 100%

2010-05-09 Thread Ben Rockwood

The drive (c7t2d0)is bad and should be replaced.   The second drive
(c7t5d0) is either bad or going bad.  This is exactly the kind of
problem that can force a Thumper to it knees, ZFS performance is
horrific, and as soon as you drop the bad disks things magicly return to
normal.

My first recommendation is to pull the SMART data from the disks if you
can.  I wrote a blog entry about SMART to address exactly the behavior
your seeing back in 2008:
http://www.cuddletech.com/blog/pivot/entry.php?id=993

Yes, people will claim that SMART data is useless for predicting
failures, but in a case like yours you are just looking for data to
corroborate a hypothesis.

In order to test this condition, zpool offline... c7t2d0, which
emulated removal.  See if performance improves.  On Thumpers I'd build a
list of suspect disks based on 'iostat', like you show, and then
correlate the SMART data, and then systematically offline disks to see
if it really was the problem.

In my experience the only other reason you'll legitimately see really
wierd bottoming out of IO like this is if you hit the max conncurrent
IO limits in ZFS (untill recently that limit was 35), so you'd see
actv=35, and then when the device finally processed the IO's the thing
would snap back to life.  But even in those cases you shouldn't see
request times (asvc_t) rise above 200ms.

All that to say, replace those disks or at least test it.  SSD's won't
help, one or more drives are toast.

benr.



On 5/8/10 9:30 PM, Emily Grettel wrote:
 Hi Giovani,
  
 Thanks for the reply.
  
 Here's a bit of iostat after uncompressing a 2.4Gb RAR file that has 1
 DWF file that we use.

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1.0   13.0   26.0   18.0  0.0  0.00.00.8   0   1 c7t1d0
 2.05.0   77.0   12.0  2.4  1.0  343.8  142.8 100 100 c7t2d0
 1.0   16.0   25.5   15.5  0.0  0.00.00.3   0   0 c7t3d0
 0.0   10.00.0   17.0  0.0  0.03.21.2   1   1 c7t4d0
 1.0   12.0   25.5   15.5  0.4  0.1   32.4   10.9  14  14 c7t5d0
 1.0   15.0   25.5   18.0  0.0  0.00.10.1   0   0 c0t1d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.00.00.00.0  2.0  1.00.00.0 100 100 c7t2d0
 1.00.00.50.0  0.0  0.00.00.1   0   0 c7t0d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 5.0   15.0  128.0   18.0  0.0  0.00.01.8   0   3 c7t1d0
 1.09.0   25.5   18.0  2.0  1.8  199.7  179.4 100 100 c7t2d0
 3.0   13.0  102.5   14.5  0.0  0.10.05.2   0   5 c7t3d0
 3.0   11.0  102.0   16.5  0.0  0.12.34.2   1   6 c7t4d0
 1.04.0   25.52.0  0.4  0.8   71.3  158.9  12  79 c7t5d0
 5.0   16.0  128.5   19.0  0.0  0.10.12.6   0   5 c0t1d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.04.00.02.0  2.0  2.0  496.1  498.0  99 100 c7t2d0
 0.00.00.00.0  0.0  1.00.00.0   0 100 c7t5d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 7.00.0  204.50.0  0.0  0.00.00.2   0   0 c7t1d0
 1.00.0   25.50.0  3.0  1.0 2961.6 1000.0  99 100 c7t2d0
 8.00.0  282.00.0  0.0  0.00.00.3   0   0 c7t3d0
 6.00.0  282.50.0  0.0  0.06.12.3   1   1 c7t4d0
 0.03.00.05.0  0.5  1.0  165.4  333.3  18 100 c7t5d0
 7.00.0  204.50.0  0.0  0.00.01.6   0   1 c0t1d0
 2.02.0   89.0   12.0  0.0  0.03.16.1   1   2 c3t0d0
 0.02.00.0   12.0  0.0  0.00.00.2   0   0 c3t1d0

 Sometimes two or more disks are going at 100. How does one solve this
 issue if its a firmware bug? I tried looking around for Western
 Digital Firmware for WD10EADS but couldn't find any available.
  
 Would adding an SSD or two help here?
  
 Thanks,
 Em
  
 
 Date: Fri, 7 May 2010 14:38:25 -0300
 Subject: Re: [zfs-discuss] ZFS Hard disk buffer at 100%
 From: gtirl...@sysdroid.com
 To: emilygrettelis...@hotmail.com
 CC: zfs-discuss@opensolaris.org


 On Fri, May 7, 2010 at 8:07 AM, Emily Grettel
 emilygrettelis...@hotmail.com mailto:emilygrettelis...@hotmail.com
 wrote:

 Hi,
  
 I've had my RAIDz volume working well on SNV_131 but it has come
 to my attention that there has been some read issues with the
 drives. Previously I thought this was a CIFS problem but I'm
 noticing that when transfering files or uncompressing some fairly
 large 7z (1-2Gb) files (or even smaller rar - 200-300Mb) files
 occasionally running iostat will give the b% as 100 for a drive or
 two.



 That's

Re: [zfs-discuss] Mirrored Servers

2010-05-08 Thread Ben Rockwood

On 5/8/10 3:07 PM, Tony wrote:
 Lets say I have two servers, both running opensolaris with ZFS. I basically 
 want to be able to create a filesystem where the two servers have a common 
 volume, that is mirrored between the two. Meaning, each server keeps an 
 identical, real time backup of the other's data directory. Set them both up 
 as file servers, and load balance between the two for incoming requests.

 How would anyone suggest doing this?
   


I would carefully consider whether or not the _really_ need to be real
time.  Can you tolerate 5 minutes or even just 60 seconds of difference
between them? 

If you can, then things are much easier and less complex.  I'd
personally use ZFS Snapshots to keep the two servers in sync every 60
seconds.

As for load balancing, that depends on which protocal your using.  FTP
is easy.  NFS/CIFS is a little harder.  I'd simply use a load balancer
(Zeus, NetScaler, Balance, HA-Proxy, etc.), but that is a little scary
and bizarre in the case of NFS/CIFS, where you should instead use a
single-server failover solution, such as Sun Cluster.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Plugging in a hard drive after Solaris has booted up?

2010-05-07 Thread Ben Rockwood

On 5/7/10 9:38 PM, Giovanni wrote:
 Hi guys,

 I have a quick question, I am playing around with ZFS and here's what I did.

 I created a storage pool with several drives. I unplugged 3 out of 5 drives 
 from the array, currently:

   NAMESTATE READ WRITE CKSUM
   gpool   UNAVAIL  0 0 0  insufficient replicas
 raidz1UNAVAIL  0 0 0  insufficient replicas
   c8t2d0  UNAVAIL  0 0 0  cannot open
   c8t4d0  UNAVAIL  0 0 0  cannot open
   c8t0d0  UNAVAIL  0 0 0  cannot open

 These drives had power all the time, the SATA cable however was disconnected. 
 Now, after I logged into Solaris and opened firefox, I plugged them back in 
 to sit and watch if the storage pool suddenly becomes available

 This did not happen, so my question is, do I need to make Solaris re-detect 
 the hard drives and if so how? I tried format -e but it did not seem to 
 detect the 3 drives I just plugged back in. Is this a BIOS issue? 

 Does hot-swap hard drives only work when you replace current hard drives 
 (previously detected by BIOS) with others but not when you have ZFS/Solaris 
 running and want to add more storage without shutting down?

 It all boils down to, say the scenario is that I will need to purchase more 
 hard drives as my array grows, I would like to be able to (without shutting 
 down) add the drives to the storage pool (zpool)
   

There are lots of different things you can look at and do, but it comes
down to just one command:  devfsadm -vC.  This will cleanup (-C for
cleanup, -v for verbose) the device tree if it gets into a funky state.

Then run format or iostat -En to verify that the device(s) are
there.  Then re-import the zpool or add the device or whatever you wish
to do.  Even if device locations change, ZFS will do the right thing on
import.

If you wish to dig deeper... normally when you attach a new device
hot-plug will do the right thing and you'll see the connection messages
in dmesg.  If you want to explicitly check the state of dynamic
reconfiguration, checkout the cfgadm command.  Normally, however, on
modern version of Solaris there is no reason to resort to that, its just
something fun if you wish to dig.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Benchmarking Methodologies

2010-04-21 Thread Ben Rockwood

On 4/21/10 2:15 AM, Robert Milkowski wrote:
 I haven't heard from you in a while! Good to see you here again :)

 Sorry for stating obvious but at the end of a day it depends on what
 your goals are.
 Are you interested in micro-benchmarks and comparison to other file
 systems?

 I think the most relevant filesystem benchmarks for users is when you
 benchmark a specific application and present results from an
 application point of view. For example, given a workload for Oracle,
 MySQL, LDAP, ... how quickly it completes? How much benefit there is
 by using SSDs? What about other filesystems?

 Micro-benchmarks are fine but very hard to be properly interpreted by
 most users.

 Additionally most benchmarks are almost useless if they are not
 compared to some other configuration with only a benchmarked component
 changed. For example, knowing that some MySQL load completes in 1h on
 ZFS is basically useless. But knowing that on the same HW with
 Linux/ext3 and under the same load it completes in 2h would be
 interesting to users.

 Other interesting thing would be to see an impact of different ZFS
 setting on a benchmark results (aligned recordsize for database vs.
 default, atime off vs. on, lzjb, gzip, ssd). Also comparison of
 benchmark results with all default zfs setting compared to whatever
 setting you did which gave you the best result.

Hey Robert... I'm always around. :)

You've made an excellent case for benchmarking and where its useful
but what I'm asking for on this thread is for folks to share the
research they've done with as much specificity as possible for research
purposes. :)

Let me illustrate:

To Darren's point on FileBench and vdbench... to date I've found these
two to be the most useful.   IOzone, while very popular, has always
given me strange results which are inconsistent regardless of how large
the block and data is.  Given that the most important aspect of any
benchmark is repeatability and sanity in results, I've found no value in
IOzone any longer.

vdbench has become my friend particularly in the area of physical disk
profiling.  Before tuning ZFS (or any filesystem) its important to find
a solid baseline of performance on the underlying disk structure.  So
using a variety of vdbench profiles such as the following help you
pinpoint exactly the edges of the performance envelope:

sd=sd1,lun=/dev/rdsk/c0t1d0s0,threads=1
wd=wd1,sd=sd1,readpct=100,rhpct=0,seekpct=0
rd=run1,wd=wd1,iorate=max,elapsed=10,interval=1,forxfersize=(4k-4096k,d)

With vdbench and the workload above I can get consistent, reliable
results time after time and the results on other systems match.
This is particularly key if your running a hardware RAID controller
under ZFS.  There isn't anything dd can do that vdbench can't do
better.  Using a workload like above both at differing xfer sizes and
also at differing thread counts really helps give an accurate picture of
the disk capabilities.

Moving up into the filesystem.  I've been looking intently at improving
my FileBench profiles, based on the supplied ones with tweaking.  I'm
trying to get to a methodology that provides me with time-after-time
repeatable results for real comparison between systems. 

I'm looking hard at vdbench file workloads, but they aren't yet nearly
as sophisticated as FileBench.  I am also looking at FIO
(http://freshmeat.net/projects/fio/), which is FileBench-esce.


At the end of the day, I agree entirely that application benchmarks are
far more effective judges... but they are also more time consuming and
less flexible than dedicated tools.   The key is honing generic
benchmarks to provide useful data which can be relied upon for making
accurate estimates as regards to application performance.  When you
start judging filesystem performance based on something like MySQL there
are simply too many variables involved.


So, I appreciate the Benchmark 101, but I'm looking for anyone
interested in sharing meat.  Most of the existing ZFS benchmarks folks
published are several years old now, and most were using IOzone.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Benchmarking Methodologies

2010-04-20 Thread Ben Rockwood

I'm doing a little research study on ZFS benchmarking and performance
profiling.  Like most, I've had my favorite methods, but I'm
re-evaluating my choices and trying to be a bit more scientific than I
have in the past.


To that end, I'm curious if folks wouldn't mind sharing their work on
the subject?  What tool(s) to you prefer in what situations?  Do you
have a standard method of running them (tool args; block sizes, thread
counts, ...) or procedures between runs (zpool import/export, new
dataset creation,...)?  etc.


Any feedback is appreciated.  I want to get a good sampling of opinions.

Thanks!



benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] abusing zfs boot disk for fun and DR

2010-01-11 Thread Ben Taylor

 Ben,
 I have found that booting from cdrom and importing
 the pool on the new host, then boot the hard disk
 will prevent these issues.
 That will reconfigure the zfs to use the new disk
 device.
 When running, zpool detach the missing mirror device
 and attach a new one.

Thanks.  I'm well versed in dealing with zfs issues. The reason
I reported this boot/rpool issue, was that it was similar in
nature to issues that occured trying to remediate an
x4500 which had suffered may sata disks go offline (due to
the buggy Marvell driver) as well as corruption that occured
while trying to fix said issue.  Backline spent a fair amount 
of time just trying to remediate the issue with hot spares
that looked exactly like the faulted config in my rpool.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] abusing zfs boot disk for fun and DR

2010-01-07 Thread Ben Taylor

I'm in the process of standing up a couple of t5440's, of which the
config will eventually end up in another data center 6k miles from
the original config, and I'm supposed to send disks to the data
center and we'll start from there (yes, I know how to flar and jumpstart. 
When the boss says do something, sometimes you *just* have to do it)

As I've already run into the boot failsafe when moving a root disk from
one sparc host to another, I recently found out that a sys-unconfig'd disk 
does  not suffer from the same problem.

While I am probably going to be told, I shouldn't be doing this,
I ran into an interesting semantics issue that I think zfs should
at least be able to avoid (and which I have seen in other non-abusive
configurations. ;-)

2 zfs disk, root mirrored. c2t0 and c2t1.

hot unplug c2t0, (and I should probably have removed the
busted mirror from c2t1, but I didn't)

sys-unconfig disk in c2t1

move disk to new t5440

boot disk, and it enumerates everything correctly and then I notice
zpool thinks it's degraded.  I had added the mirror after I realized
I wanted to run this by the list

  pool: rpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid.  Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: resilver completed after 0h7m with 0 errors on Thu Jan  7 12:10:03 2010
config:

NAME  STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
  mirror  DEGRADED 0 0 0
c2t0d0s0  ONLINE   0 0 0
c2t0d0s0  FAULTED  0 0 0  corrupted data
c2t3d0s0  ONLINE   0 0 0  13.8G resilvered


Anyway, should zfs report a faulted drive of the same ctd# which is
already active?  I understand why this happened, but from a logistics
perspective, shouldn't zfs be smart enough to ignore a faulted disk
like this?  And this is not the first time I've had this scenario happen
(I had an x4500 that had suffered through months of marvell driver
bugs and corruption, and we probably had 2 or 3 of these types of
things happen while trying to soft fix the problems).  This also
happened with hot-spares, which caused support to spend some time
with back-line to figure out a procedure to clear those fauled disks
which had the same ctd# as a working hot-spare...

Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Liveupgrade'd to U8 and now can't boot previous U6 BE :(

2009-10-28 Thread Ben Middleton

Hi,

As a related issue to this (specifically CR 6884728) - any ideas how I should 
go about removing the old BE? When I attempt to run ludelete I get the 
following:

$ lustatus
Boot Environment   Is   Active ActiveCanCopy  
Name   Complete NowOn Reboot Delete Status
--  -- - -- --
10_05-09   yes  no noyes- 
10_10-09   yes  yesyes   no -  


$ ludelete 10_05-09

System has findroot enabled GRUB
Checking if last BE on any disk...
ERROR: cannot mount '/.alt.10_05-09/var': directory is not empty
ERROR: cannot mount mount point /.alt.10_05-09/var device 
rpool/ROOT/s10x_u7wos_08/var
ERROR: failed to mount file system rpool/ROOT/s10x_u7wos_08/var on 
/.alt.10_05-09/var
ERROR: unmounting partially mounted boot environment file systems
ERROR: No such file or directory: error unmounting rpool/ROOT/s10x_u7wos_08
ERROR: cannot mount boot environment by name 10_05-09
ERROR: Failed to mount BE 10_05-09.
ERROR: Failed to mount BE 10_05-09.
cat: cannot open /tmp/.lulib.luclb.dsk.2797.10_05-09
ERROR: This boot environment 10_05-09 is the last BE on the above disk.
ERROR: Deleting this BE may make it impossible to boot from this disk.
ERROR: However you may still boot solaris if you have BE(s) on other disks.
ERROR: You *may* have to change boot-device order in the BIOS to accomplish 
this.
ERROR: If you still want to delete this BE 10_05-09, please use the force 
option (-f).
Unable to delete boot environment.


My  zfs setup now shows this:

NAME   USED  AVAIL  REFER  MOUNTPOINT
rpool 11.4G  4.26G  39.5K  /rpool
rpool/ROOT9.15G  4.26G18K  legacy
rpool/ROOT/10_10-09   9.14G  4.26G  4.04G  /
rpool/ROOT/10_10...@10_10-09  2.39G  -  4.10G  -
rpool/ROOT/10_10-09/var   2.71G  4.26G  1.18G  /var
rpool/ROOT/10_10-09/v...@10_10-09  1.53G  -  2.11G  -
rpool/ROOT/s10x_u7wos_08  17.4M  4.26G  4.10G  /.alt.10_05-09
rpool/ROOT/s10x_u7wos_08/var  9.05M  4.26G  2.11G  /.alt.10_05-09/var
rpool/dump1.00G  4.26G  1.00G  -
rpool/export  74.6M  4.26G19K  /export
rpool/export/home 74.5M  4.26G21K  /export/home
rpool/export/home/admin   65.5K  4.26G  65.5K  /export/home/admin
rpool/swap   1G  4.71G   560M  -


It seems that the ludelete script reassigns the mountpoint for the BE to be 
deleted  - but falls foul of the /var mount underneath the old BE. I tried 
lumounting the old BE and checked the /etc/vfstab - but there are no extra zfs 
entries in there.

I'm just looking for a clean way to remove the old BE, and then remove the old 
snapshot without interfering with Live Upgrade from working in the future.

Many thanks,

Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Liveupgrade'd to U8 and now can't boot previous U6 BE :(

2009-10-28 Thread Ben Middleton

 + dev=`echo $dev | sed 's/mirror.*/mirror/'`

Thanks for the suggestion Kurt. However, I'm not running a mirror on that pool 
- so am guessing this won't help in my case.

I'll try and pick my way through the lulib script if I get any time.

Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-25 Thread Ben

Thanks very much everyone.

Victor, I did think about using VirtualBox, but I have a real machine and a 
supply of hard drives for a short time, for I'll test it out using that if I 
can.

Scott, of course, at work we use three mirrors and it works very well, has 
saved us on occasion where we have detached the third mirror, upgraded, found 
the upgrade failed and have been able to revert from the third mirror instead 
of having to go through backups.

George, it will be great to see the 'autoexpand' in the next release.  I'm 
keeping my home server on stable releases for the time being :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Ben

Hi all,

I have a ZFS mirror of two 500GB disks, I'd like to up these to 1TB disks, how 
can I do this?  I must break the mirror as I don't have enough controller on my 
system board.  My current mirror looks like this:

[b]r...@beleg-ia:/share/media# zpool status share
pool: share
state: ONLINE
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
share   ONLINE   0 0 0
mirrorONLINE   0 0 0
c5d0s0  ONLINE   0 0 0
c5d1s0  ONLINE   0 0 0

errors: No known data errors[/b]

If I detach c5d1s0, add a 1TB drive, attach that, wait for it to resilver, then 
detach c5d0s0 and add another 1TB drive and attach that to the zpool, will that 
up the storage of the pool?

Thanks very much,
Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Ben

Thomas, 

Could you post an example of what you mean (ie commands in the order to use 
them)?  I've not played with ZFS that much and I don't want to muck my system 
up (I have data backed up, but am more concerned about getting myself in a mess 
and having to reinstall, thus losing my configurations).

Many thanks for both of your replies,
Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Ben

Many thanks Thomas, 

I have a test machine so I shall try it on that before I try it on my main 
system.

Thanks very much once again,
Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [Fwd: Re: [perf-discuss] ZFS performance issue - READ is slow as hell...]

2009-03-31 Thread Ben Rockwood

Ya, I agree that we need some additional data and testing.  The iostat
data in itself doesn't suggest to me that the process (dd) is slow but
rather that most of the data is being retrieved elsewhere (ARC).   An
fsstat would be useful to correlate with the iostat data.

One thing that also comes to mind with streaming write performance is
the effects of the write throttle... curious if he'd have gotten more on
the write side with that disabled.

All these things don't strike me particularly as bugs (although there is
always improvement) but rather that ZFS is designed for real world
environments, not antiquated benchmarks.

benr.


Jim Mauro wrote:

 Posting this back to zfs-discuss.

 Roland's test case (below) is a single threaded sequential write
 followed by a single threaded sequential read. His bandwidth
 goes from horrible (~2MB/sec) to expected (~30MB/sec)
 when prefetch is disabled. This is with relatively recent nv bits
 (nv110).

 Roland - I'm wondering if you were tripping over
 CR6732803 ZFS prefetch creates performance issues for streaming
 workloads.
 It seems possible, but that CR is specific about multiple, concurrent
 IO streams,
 and your test case was only one.

 I think it's more likely you were tripping over
 CR6412053 zfetch needs a whole lotta love.

 For both CR's the workaround is disabling prefetch
 (echo zfs_prefetch_disable/W 1 | mdb -kw)

 Any other theories on this test case?

 Thanks,
 /jim


  Original Message 
 Subject: Re: [perf-discuss] ZFS performance issue - READ is slow
 as hell...
 Date: Tue, 31 Mar 2009 02:33:00 -0700 (PDT)
 From: roland devz...@web.de
 To: perf-disc...@opensolaris.org



 Hello Jim,
 i double checked again - but it`s like i told:

 echo zfs_prefetch_disable/W0t1 | mdb -kw 
 fixes my problem.

 i did a reboot and only set this single param - which immediately
 makes the read troughput go up from ~2 MB/s to ~30 MB/s

 I don't understand why disabling ZFS prefetch solved this
 problem. The test case was a single threaded sequential write, followed
 by a single threaded sequential read.

 i did not even do a single write - after reboot i just did
 dd if=/zfs/TESTFILE of=/dev/null

 Solaris Express Community Edition snv_110 X86
 FSC RX300 S2
 4GB RAM
 LSI Logic MegaRaid 320 Onboard SCSI Raid Controller
 1x Raid1 LUN
 1x Raid5 LUN (3 Disks)
 (both LUN`s show same behaviour)


 before:
 extended device statistics
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   21.30.1 2717.60.1  0.7  0.0   31.81.7   2   4 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
   16.00.0 2048.40.0 34.9  0.1 2181.84.8 100   3 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
   28.00.0 3579.20.0 34.8  0.1 1246.24.9 100   5 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
   45.00.0 5760.40.0 34.8  0.2  772.74.5 100   7 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
   19.00.0 2431.90.0 34.9  0.1 1837.34.4 100   3 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
   58.00.0 7421.10.0 34.6  0.3  597.45.8 100  12 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0
0.00.00.00.0 35.0  0.00.00.0 100   0 c0t1d0


 after:
 extended device statistics
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  218.00.0 27842.30.0  0.0  0.40.11.8   1  40 c0t1d0
  241.00.0 30848.00.0  0.0  0.40.01.6   0  38 c0t1d0
  237.00.0 30340.10.0  0.0  0.40.01.6   0  38 c0t1d0
  230.00.0 29434.70.0  0.0  0.40.01.8   0  40 c0t1d0
  238.10.0 30471.30.0  0.0  0.40.01.5   0  37 c0t1d0
  234.90.0 30001.90.0  0.0  0.40.01.6   1  37 c0t1d0
  220.10.0 28171.40.0  0.0  0.4

Re: [zfs-discuss] zpool status -x strangeness

2009-01-28 Thread Ben Miller

# zpool status -xv
all pools are healthy

Ben

 What does 'zpool status -xv' show?
 
 On Tue, Jan 27, 2009 at 8:01 AM, Ben Miller
 mil...@eecis.udel.edu wrote:
  I forgot the pool that's having problems was
 recreated recently so it's already at zfs version 3.
 I just did a 'zfs upgrade -a' for another pool, but
 some of those filesystems failed since they are busy
  and couldn't be unmounted.
 
  # zfs upgrade -a
  cannot unmount '/var/mysql': Device busy
  cannot unmount '/var/postfix': Device busy
  
  6 filesystems upgraded
  821 filesystems already at this version
 
  Ben
 
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool status -x strangeness

2009-01-27 Thread Ben Miller

I forgot the pool that's having problems was recreated recently so it's already 
at zfs version 3.  I just did a 'zfs upgrade -a' for another pool, but some of 
those filesystems failed since they are busy and couldn't be unmounted.

# zfs upgrade -a
cannot unmount '/var/mysql': Device busy
cannot unmount '/var/postfix': Device busy

6 filesystems upgraded
821 filesystems already at this version

Ben

 You can upgrade live.  'zfs upgrade' with no
 arguments shows you the  
 zfs version status of filesystems present without
 upgrading.
 
 
 
 On Jan 24, 2009, at 10:19 AM, Ben Miller
 mil...@eecis.udel.edu wrote:
 
  We haven't done 'zfs upgrade ...' any.  I'll give
 that a try the  
  next time the system can be taken down.
 
  Ben
 
  A little gotcha that I found in my 10u6 update
  process was that 'zpool
  upgrade [poolname]' is not the same as 'zfs
 upgrade
  [poolname]/[filesystem(s)]'
 
  What does 'zfs upgrade' say?  I'm not saying this
 is
  the source of
  your problem, but it's a detail that seemed to
 affect
  stability for
  me.
 
 
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool status -x strangeness

2009-01-24 Thread Ben Miller

We haven't done 'zfs upgrade ...' any.  I'll give that a try the next time the 
system can be taken down.

Ben

 A little gotcha that I found in my 10u6 update
 process was that 'zpool
 upgrade [poolname]' is not the same as 'zfs upgrade
 [poolname]/[filesystem(s)]'
 
 What does 'zfs upgrade' say?  I'm not saying this is
 the source of
 your problem, but it's a detail that seemed to affect
 stability for
 me.
 
 
 On Thu, Jan 22, 2009 at 7:25 AM, Ben Miller
  The pools are upgraded to version 10.  Also, this
 is on Solaris 10u6.
 
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool status -x strangeness

2009-01-22 Thread Ben Miller

The pools are upgraded to version 10.  Also, this is on Solaris 10u6.

# zpool upgrade
This system is currently running ZFS pool version 10.

All pools are formatted using this version.

Ben

 What's the output of 'zfs upgrade' and 'zpool
 upgrade'? (I'm just
 curious - I had a similar situation which seems to be
 resolved now
 that I've gone to Solaris 10u6 or OpenSolaris
 2008.11).
 
 
 
 On Wed, Jan 21, 2009 at 2:11 PM, Ben Miller
 mil...@eecis.udel.edu wrote:
  Bug ID is 6793967.
 
  This problem just happened again.
  % zpool status pool1
   pool: pool1
   state: DEGRADED
   scrub: resilver completed after 0h48m with 0
 errors on Mon Jan  5 12:30:52 2009
  config:
 
 NAME   STATE READ WRITE CKSUM
 pool1  DEGRADED 0 0 0
   raidz2   DEGRADED 0 0 0
 c4t8d0s0   ONLINE   0 0 0
 c4t9d0s0   ONLINE   0 0 0
 c4t10d0s0  ONLINE   0 0 0
 c4t11d0s0  ONLINE   0 0 0
 c4t12d0s0  REMOVED  0 0 0
 c4t13d0s0  ONLINE   0 0 0
 
  errors: No known data errors
 
  % zpool status -x
  all pools are healthy
  %
  # zpool online pool1 c4t12d0s0
  % zpool status -x
   pool: pool1
   state: ONLINE
  status: One or more devices is currently being
 resilvered.  The pool will
 continue to function, possibly in a degraded
 state.
  action: Wait for the resilver to complete.
   scrub: resilver in progress for 0h0m, 0.12% done,
 2h38m to go
  config:
 
 NAME   STATE READ WRITE CKSUM
 pool1  ONLINE   0 0 0
   raidz2   ONLINE   0 0 0
 c4t8d0s0   ONLINE   0 0 0
 c4t9d0s0   ONLINE   0 0 0
 c4t10d0s0  ONLINE   0 0 0
 c4t11d0s0  ONLINE   0 0 0
 c4t12d0s0  ONLINE   0 0 0
 c4t13d0s0  ONLINE   0 0 0
 
  errors: No known data errors
  %
 
  Ben
 
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool status -x strangeness

2009-01-21 Thread Ben Miller

Bug ID is 6793967.

This problem just happened again.
% zpool status pool1
  pool: pool1
 state: DEGRADED
 scrub: resilver completed after 0h48m with 0 errors on Mon Jan  5 12:30:52 2009
config:

NAME   STATE READ WRITE CKSUM
pool1  DEGRADED 0 0 0
  raidz2   DEGRADED 0 0 0
c4t8d0s0   ONLINE   0 0 0
c4t9d0s0   ONLINE   0 0 0
c4t10d0s0  ONLINE   0 0 0
c4t11d0s0  ONLINE   0 0 0
c4t12d0s0  REMOVED  0 0 0
c4t13d0s0  ONLINE   0 0 0

errors: No known data errors

% zpool status -x
all pools are healthy
%
# zpool online pool1 c4t12d0s0
% zpool status -x
  pool: pool1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.12% done, 2h38m to go
config:

NAME   STATE READ WRITE CKSUM
pool1  ONLINE   0 0 0
  raidz2   ONLINE   0 0 0
c4t8d0s0   ONLINE   0 0 0
c4t9d0s0   ONLINE   0 0 0
c4t10d0s0  ONLINE   0 0 0
c4t11d0s0  ONLINE   0 0 0
c4t12d0s0  ONLINE   0 0 0
c4t13d0s0  ONLINE   0 0 0

errors: No known data errors
%

Ben

 I just put in a (low priority) bug report on this.
 
 Ben
 
  This post from close to a year ago never received
 a
  response.  We just had this same thing happen to
  another server that is running Solaris 10 U6.  One
 of
  the disks was marked as removed and the pool
  degraded, but 'zpool status -x' says all pools are
  healthy.  After doing an 'zpool online' on the
 disk
  it resilvered in fine.  Any ideas why 'zpool
 status
  -x' reports all healthy while 'zpool status' shows
 a
  pool in degraded mode?
  
  thanks,
  Ben
  
   We run a cron job that does a 'zpool status -x'
 to
   check for any degraded pools.  We just happened
 to
   find a pool degraded this morning by running
  'zpool
   status' by hand and were surprised that it was
   degraded as we didn't get a notice from the cron
   job.
   
   # uname -srvp
   SunOS 5.11 snv_78 i386
   
   # zpool status -x
   all pools are healthy
   
   # zpool status pool1
 pool: pool1
   tate: DEGRADED
scrub: none requested
   onfig:
   
   NAME STATE READ WRITE CKSUM
   pool1DEGRADED 0 0 0
 raidz1 DEGRADED 0 0 0
 c1t8d0   REMOVED  0 0 0
 c1t9d0   ONLINE   0 0 0
 c1t10d0  ONLINE   0 0 0
 c1t11d0  ONLINE   0 0 0
   No known data errors
   
   I'm going to look into it now why the disk is
  listed
   as removed.
   
   Does this look like a bug with 'zpool status
 -x'?
   
   Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool status -x strangeness

2009-01-12 Thread Ben Miller

I just put in a (low priority) bug report on this.

Ben

 This post from close to a year ago never received a
 response.  We just had this same thing happen to
 another server that is running Solaris 10 U6.  One of
 the disks was marked as removed and the pool
 degraded, but 'zpool status -x' says all pools are
 healthy.  After doing an 'zpool online' on the disk
 it resilvered in fine.  Any ideas why 'zpool status
 -x' reports all healthy while 'zpool status' shows a
 pool in degraded mode?
 
 thanks,
 Ben
 
  We run a cron job that does a 'zpool status -x' to
  check for any degraded pools.  We just happened to
  find a pool degraded this morning by running
 'zpool
  status' by hand and were surprised that it was
  degraded as we didn't get a notice from the cron
  job.
  
  # uname -srvp
  SunOS 5.11 snv_78 i386
  
  # zpool status -x
  all pools are healthy
  
  # zpool status pool1
pool: pool1
  tate: DEGRADED
   scrub: none requested
  onfig:
  
  NAME STATE READ WRITE CKSUM
  pool1DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
c1t8d0   REMOVED  0 0 0
c1t9d0   ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0
  No known data errors
  
  I'm going to look into it now why the disk is
 listed
  as removed.
  
  Does this look like a bug with 'zpool status -x'?
  
  Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool status -x strangeness

2009-01-07 Thread Ben Miller

This post from close to a year ago never received a response.  We just had this 
same thing happen to another server that is running Solaris 10 U6.  One of the 
disks was marked as removed and the pool degraded, but 'zpool status -x' says 
all pools are healthy.  After doing an 'zpool online' on the disk it resilvered 
in fine.  Any ideas why 'zpool status -x' reports all healthy while 'zpool 
status' shows a pool in degraded mode?

thanks,
Ben

 We run a cron job that does a 'zpool status -x' to
 check for any degraded pools.  We just happened to
 find a pool degraded this morning by running 'zpool
 status' by hand and were surprised that it was
 degraded as we didn't get a notice from the cron
 job.
 
 # uname -srvp
 SunOS 5.11 snv_78 i386
 
 # zpool status -x
 all pools are healthy
 
 # zpool status pool1
   pool: pool1
 tate: DEGRADED
  scrub: none requested
 onfig:
 
 NAME STATE READ WRITE CKSUM
 pool1DEGRADED 0 0 0
   raidz1 DEGRADED 0 0 0
   c1t8d0   REMOVED  0 0 0
   c1t9d0   ONLINE   0 0 0
   c1t10d0  ONLINE   0 0 0
   c1t11d0  ONLINE   0 0 0
 No known data errors
 
 I'm going to look into it now why the disk is listed
 as removed.
 
 Does this look like a bug with 'zpool status -x'?
 
 Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zdb to dump data

2008-10-30 Thread Ben Rockwood

Is there some hidden way to coax zdb into not just displaying data based on a 
given DVA but rather to dump it in raw usable form?

I've got a pool with large amounts of corruption.  Several directories are 
toast and I get I/O Error when trying to enter or read the directory... 
however I can read the directory and files using ZDB, if I could just dump it 
in a raw format I could do recovery that way.

To be clear, I've already recovered from the situation, this is purely an 
academic can I do it exercise for the sake of learning.

If ZDB can't do it, I'd assume I'd have to write some code to read based on 
DVA.  Maybe I could write a little tool for it.

benr.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lost Disk Space

2008-10-20 Thread Ben Rockwood

No takers? :)

benr.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Lost Disk Space

2008-10-16 Thread Ben Rockwood

I've been struggling to fully understand why disk space seems to vanish.  I've 
dug through bits of code and reviewed all the mails on the subject that I can 
find, but I still don't have a proper understanding of whats going on.  

I did a test with a local zpool on snv_97... zfs list, zpool list, and zdb all 
seem to disagree on how much space is available.  In this case its only a 
discrepancy of about 20G or so, but I've got Thumpers that have a discrepancy 
of over 6TB!

Can someone give a really detailed explanation about whats going on?

block traversal size 670225837056 != alloc 720394438144 (leaked 50168601088)

bp count:15182232
bp logical:672332631040  avg:  44284
bp physical:   669020836352  avg:  44066compression:   1.00
bp allocated:  670225837056  avg:  44145compression:   1.00
SPA allocated: 720394438144 used: 96.40%

Blocks  LSIZE   PSIZE   ASIZE avgcomp   %Total  Type
12   120K   26.5K   79.5K   6.62K4.53 0.00  deferred free
 1512 512   1.50K   1.50K1.00 0.00  object directory
 3  1.50K   1.50K   4.50K   1.50K1.00 0.00  object array
 116K   1.50K   4.50K   4.50K   10.67 0.00  packed nvlist
 -  -   -   -   -   --  packed nvlist size
72  8.45M889K   2.60M   37.0K9.74 0.00  bplist
 -  -   -   -   -   --  bplist header
 -  -   -   -   -   --  SPA space map header
   974  4.48M   2.65M   7.94M   8.34K1.70 0.00  SPA space map
 -  -   -   -   -   --  ZIL intent log
 96.7K  1.51G389M777M   8.04K3.98 0.12  DMU dnode
17  17.0K   8.50K   17.5K   1.03K2.00 0.00  DMU objset
 -  -   -   -   -   --  DSL directory
13  6.50K   6.50K   19.5K   1.50K1.00 0.00  DSL directory child map
12  6.00K   6.00K   18.0K   1.50K1.00 0.00  DSL dataset snap map
14  38.0K   10.0K   30.0K   2.14K3.80 0.00  DSL props
 -  -   -   -   -   --  DSL dataset
 -  -   -   -   -   --  ZFS znode
 2 1K  1K  2K  1K1.00 0.00  ZFS V0 ACL
 5.81M   558G557G557G   95.8K1.0089.27  ZFS plain file
  382K   301M200M401M   1.05K1.50 0.06  ZFS directory
 9  4.50K   4.50K   9.00K  1K1.00 0.00  ZFS master node
12   482K   20.0K   40.0K   3.33K   24.10 0.00  ZFS delete queue
 8.20M  66.1G   65.4G   65.8G   8.03K1.0110.54  zvol object
 1512 512  1K  1K1.00 0.00  zvol prop
 -  -   -   -   -   --  other uint8[]
 -  -   -   -   -   --  other uint64[]
 -  -   -   -   -   --  other ZAP
 -  -   -   -   -   --  persistent error log
 1   128K   10.5K   31.5K   31.5K   12.19 0.00  SPA history
 -  -   -   -   -   --  SPA history offsets
 -  -   -   -   -   --  Pool properties
 -  -   -   -   -   --  DSL permissions
 -  -   -   -   -   --  ZFS ACL
 -  -   -   -   -   --  ZFS SYSACL
 -  -   -   -   -   --  FUID table
 -  -   -   -   -   --  FUID table size
 5  3.00K   2.50K   7.50K   1.50K1.20 0.00  DSL dataset next clones
 -  -   -   -   -   --  scrub work queue
 14.5M   626G623G624G   43.1K1.00   100.00  Total


real21m16.862s
user0m36.984s
sys 0m5.757s

===
Looking at the data:
[EMAIL PROTECTED] ~$ zfs list backup  zpool list backup
NAME USED  AVAIL  REFER  MOUNTPOINT
backup   685G   237K27K  /backup
NAME SIZE   USED  AVAILCAP  HEALTH  ALTROOT
backup   696G   671G  25.1G96%  ONLINE  -

So zdb says 626GB is used, zfs list says 685GB is used, and zpool list says 
671GB is used.  The pool was filled to 100% capacity via dd, this is confirmed, 
I can't write data, but yet zpool list says its only 96%. 

benr.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Couple of ZFS panics...

2008-08-22 Thread Ben Taylor

I've got a Intel DP35DP Motherboard, Q6600 proc (Intel 2.4G, 4 core), 4GB of 
ram and a
copule of Sata disks, running ICH9.  S10U5, patched about a week ago or so...

I have a zpool on a single slice (haven't added a mirror yet, was getting to 
that) and have
started to suffer regular hard resets and have gotten a few panics.  The system 
is an 
nfs server for a couple of systems (not much write) and one writer (I do my svn 
updates
over NFS cause my ath0 board refuses to work in 64-bit on S10U5)  I also do 
local
builds on the same server.

Ideas?  

The first looks like:

panic[cpu0]/thread=9bcf0460: 
BAD TRAP: type=e (#pf Page fault) rp=fe80001739a0 addr=c064dba0


cmake: 
#pf Page fault
Bad kernel fault at addr=0xc064dba0
pid=6797, pc=0xf0a6350a, sp=0xfe8000173a90, eflags=0x10207
cr0: 80050033pg,wp,ne,et,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
cr2: c064dba0 cr3: 12bf9b000 cr8: c
rdi: 6c60 rsi:0 rdx:0
rcx:0  r8: 8b21017f  r9: ae3a79c0
rax:0 rbx: c0611f40 rbp: fe8000173ac0
r10:0 r11:0 r12: ae4687d0
r13:   d8c200 r14:2 r15: 826c0480
fsb: 8000 gsb: fbc24ec0  ds:   43
 es:   43  fs:0  gs:  1c3
trp:e err:0 rip: f0a6350a
 cs:   28 rfl:10207 rsp: fe8000173a90
 ss:   30

fe80001738b0 unix:real_mode_end+71e1 ()
fe8000173990 unix:trap+5e6 ()
fe80001739a0 unix:_cmntrap+140 ()
fe8000173ac0 zfs:zio_buf_alloc+a ()
fe8000173af0 zfs:arc_buf_alloc+9f ()
fe8000173b70 zfs:arc_read+ee ()
fe8000173bf0 zfs:dbuf_read_impl+1a0 ()
fe8000173c30 zfs:zfsctl_ops_root+304172dd ()
fe8000173c60 zfs:dmu_tx_check_ioerr+6e ()
fe8000173cc0 zfs:dmu_tx_count_write+73 ()
fe8000173cf0 zfs:dmu_tx_hold_write+4a ()
fe8000173db0 zfs:zfs_write+1bb ()
fe8000173e00 genunix:fop_write+31 ()
fe8000173eb0 genunix:write+287 ()
fe8000173ec0 genunix:write32+e ()
fe8000173f10 unix:brand_sys_sysenter+1f2 ()

syncing file systems...
 3130
 15
 done
dumping to /dev/dsk/c0t0d0s1, offset 860356608, content: kernel
NOTICE: ahci_tran_reset_dport: port 0 reset port


The second liek this:

panic[cpu2]/thread=9b425f20: 
BAD TRAP: type=e (#pf Page fault) rp=fe80018cdf40 addr=c064dba0


nfsd: 
#pf Page fault
Bad kernel fault at addr=0xc064dba0
pid=665, pc=0xf0a6350a, sp=0xfe80018ce030, eflags=0x10207
cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
cr2: c064dba0 cr3: 12a9df000 cr8: c
rdi: 6c60 rsi:0 rdx:0
rcx:0  r8: 8b21017f  r9:f
rax:0 rbx: c0611f40 rbp: fe80018ce060
r10:0 r11:0 r12: fe81c20ecf00
r13:   d8c200 r14:2 r15: 826c2240
fsb: 8000 gsb: 81a6c800  ds:   43
 es:   43  fs:0  gs:  1c3
trp:e err:0 rip: f0a6350a
 cs:   28 rfl:10207 rsp: fe80018ce030
 ss:   30

fe80018cde50 unix:real_mode_end+71e1 ()
fe80018cdf30 unix:trap+5e6 ()
fe80018cdf40 unix:_cmntrap+140 ()
fe80018ce060 zfs:zio_buf_alloc+a ()
fe80018ce090 zfs:arc_buf_alloc+9f ()
fe80018ce110 zfs:arc_read+ee ()
fe80018ce190 zfs:dbuf_read_impl+1a0 ()
fe80018ce1d0 zfs:zfsctl_ops_root+304172dd ()
fe80018ce200 zfs:dmu_tx_check_ioerr+6e ()
fe80018ce260 zfs:dmu_tx_count_write+73 ()
fe80018ce290 zfs:dmu_tx_hold_write+4a ()
fe80018ce350 zfs:zfs_write+1bb ()
fe80018ce3a0 genunix:fop_write+31 ()
fe80018ce410 nfssrv:do_io+b5 ()
fe80018ce610 nfssrv:rfs4_op_write+40e ()
fe80018ce770 nfssrv:rfs4_compound+1b3 ()
fe80018ce800 nfssrv:rfs4_dispatch+234 ()
fe80018ceb10 nfssrv:common_dispatch+88a ()
fe80018ceb20 nfssrv:nfs4_drc+3051ccc1 ()
fe80018cebf0 rpcmod:svc_getreq+209 ()
fe80018cec40 rpcmod:svc_run+124 ()
fe80018cec70 rpcmod:svc_do_run+88 ()
fe80018ceec0 nfs:nfssys+208 ()
fe80018cef10 unix:brand_sys_sysenter+1f2 ()

syncing file systems...
 done
dumping to /dev/dsk/c0t0d0s1, offset 860356608, content: kernel
NOTICE: ahci_tran_reset_dport: port 0 reset port
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ARCSTAT Kstat Definitions

2008-08-21 Thread Ben Rockwood

Thanks, not as much as I was hoping for but still extremely helpful.


Can you, or others have a look at this: http://cuddletech.com/arc_summary.html

This is a PERL script that uses kstats to drum up a report such as the 
following:

System Memory:
 Physical RAM:  32759 MB
 Free Memory :  10230 MB
 LotsFree:  511 MB

ARC Size:
 Current Size: 7989 MB (arcsize)
 Target Size (Adaptive):   8192 MB (c)
 Min Size (Hard Limit):1024 MB (zfs_arc_min)
 Max Size (Hard Limit):8192 MB (zfs_arc_max)

ARC Size Breakdown:
 Most Recently Used Cache Size:  13%1087 MB (p)
 Most Frequently Used Cache Size:86%7104 MB (c-p)

ARC Efficency:
 Cache Access Total: 3947194710
 Cache Hit Ratio:  99%   3944674329
 Cache Miss Ratio:  0%   2520381

 Data Demand   Efficiency:99%
 Data Prefetch Efficiency:69%

CACHE HITS BY CACHE LIST:
  Anon:0%16730069 
  Most Frequently Used:   99%3915830091 (mfu)
  Most Recently Used:  0%10490502 (mru)
  Most Frequently Used Ghost:  0%439554 (mfu_ghost)
  Most Recently Used Ghost:0%1184113 (mru_ghost)
CACHE HITS BY DATA TYPE:
  Demand Data:99%3914527790 
  Prefetch Data:   0%2447831 
  Demand Metadata: 0%10709326 
  Prefetch Metadata:   0%16989382 
CACHE MISSES BY DATA TYPE:
  Demand Data:45%1144679 
  Prefetch Data:  42%1068975 
  Demand Metadata: 5%132649 
  Prefetch Metadata:   6%174078 
-


Feedback and input is welcome, in particular if I'm mischarrectorizing data.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ARCSTAT Kstat Definitions

2008-08-21 Thread Ben Rockwood

Its a starting point anyway.   The key is to try and draw useful conclusions 
from the info to answer the torrent of why is my ARC 30GB???

There are several things I'm unclear on whether or not I'm properly 
interpreting such as:

* As you state, the anon pages.  Even the comment in code is, to me anyway, a 
little vague.  I include them because otherwise you look at the hit counters 
and wonder where a large chunk of them went.

* Prefetch... I want to use the Prefetch Data hit ratio as a judgment call on 
the efficiency of prefetch.  If the value is very low it might be best to turn 
it off. but I'd like to hear that from someone else before I go saying that.

In high latency environments, such as ZFS on iSCSI, prefetch can either 
significantly help or hurt, determining which is difficult without some type of 
metric as as above.

* There are several instances (based on dtracing) in which the ARC is 
bypassed... for ZIL I understand, in some other cases I need to spend more time 
analyzing the DMU (dbuf_*) for why.

* In answering the Is having a 30GB ARC good? question, I want to say that if 
MFU is 60% of ARC, and if the hits are mostly MFU that you are deriving 
significant benefit from your large ARC but on a system with a 2GB ARC or a 
30GB ARC the overall hit ratio tends to be 99%.  Which is nuts, and tends to 
reinforce a misinterpretation of anon hits.

The only way I'm seeing to _really_ understand ARC's efficiency is to look at 
the overall number of reads and then how many are intercepted by ARC and how 
many actually made it to disk... and why (prefetch or demand).  This is tricky 
to implement via kstats because you have to pick out and monitor the zpool 
disks themselves.

I've spent a lot of time in this code (arc.c) and still have a lot of 
questions.  I really wish there was an Advanced ZFS Internals talk coming up; 
I simply can't keep spending so much time on this.

Feedback from PAE or other tuning experts is welcome and appreciated. :)

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ARCSTAT Kstat Definitions

2008-08-21 Thread Ben Rockwood

New version is available (v0.2) :

* Fixes divide by zero, 
* includes tuning from /etc/system in output
* if prefetch is disabled I explicitly say so.
* Accounts for jacked anon count.  Still need improvement here.
* Added friendly explanations for MRU/MFU  Ghost lists counts.

Page and examples are updated: cuddletech.com/arc_summary.pl

Still needs work, but hopefully interest in this will stimulate some improved 
understanding of ARC internals.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ARCSTAT Kstat Definitions

2008-08-20 Thread Ben Rockwood

Would someone in the know be willing to write up (preferably blog) definitive 
definitions/explanations of all the arcstats provided via kstat?  I'm 
struggling with proper interpretation of certain values, namely p, 
memory_throttle_count, and the mru/mfu+ghost hit vs demand/prefetch hit 
counters.  I think I've got it figured out, but I'd really like expert 
clarification before I start tweaking.

Thanks.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to delete hundreds of emtpy snapshots

2008-07-17 Thread Ben Rockwood

zfs list is mighty slow on systems with a large number of objects, but there is 
no foreseeable plan that I'm aware of to solve that problem.  

Never the less, you need to do a zfs list, therefore, do it once and work from 
that.

zfs list  /tmp/zfs.out
for i in `grep mydataset@ /tmp/zfs.out`; do zfs destroy $i; done


As for 5 minute snapshots this is NOT a bad idea.  It is, however, complex 
to manage.  Thus, you need to employ tactics to make it more digestible.  

You need to ask  yourself first why you want 5 min snaps. Is it replication?  
If so, create it, replicate it, destroy all but the last snapshot or even 
rotate them.  Or, is it fallback in case you make a mistake?  Then just keep 
around the last 6 snapshots or so.

zfs rename  zfs destroy are your friends use them wisely. :)

If you want to discuss exactly what your trying to facilitate I'm sure we can 
come up with some more concrete ideas to help you.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] 40min ls in empty directory

2008-07-16 Thread Ben Rockwood

I've run into an odd problem which I lovingly refer to as a black hole 
directory.  

On a Thumper used for mail stores we've found find's take an exceptionally long 
time to run.  There are directories that have as many as 400,000 files, which I 
immediately considered the culprit.  However, under investigation, they aren't 
the problem at all.  The problem is seen here in this truss output (first 
column is delta time):


 0.0001 lstat64(tmp, 0x08046A20)  = 0
 0. openat(AT_FDCWD, tmp, O_RDONLY|O_NDELAY|O_LARGEFILE) = 8
 0.0001 fcntl(8, F_SETFD, 0x0001)   = 0
 0. fstat64(8, 0x08046920)  = 0
 0. fstat64(8, 0x08046AB0)  = 0
 0. fchdir(8)   = 0
1321.3133   getdents64(8, 0xFEE48000, 8192) = 48
1255.8416   getdents64(8, 0xFEE48000, 8192) = 0
 0.0001 fchdir(7)   = 0
 0.0001 close(8)= 0

These two getdents64 syscalls take approx 20 mins each.  Notice that the 
directory structure is 48 bytes, the directory is empty:

drwx--   2 102  1022 Feb 21 02:24 tmp

My assumption is that the directory is corrupt, but I'd like to prove that.  I 
have a scrub running on the pool, but its got about 16 hours to go before it 
completes.  20% complete thus far and nothing is reported.

No errors are logged when I stimulate this problem.

Does anyone have suggestions on how to get additional data on this issue?  I've 
used dtrace flows to examine, however what I really want to see is the zio's as 
a result of the getdents, but can't see how to do so.  Ideally I'd quiet the 
system and watch all zio's occurring while I stimulate it, but this is 
production and not possible.   If anyone knows how to watch DMU/ZIO activity 
that _only_ pertains to a certain PID please let me know. ;)

Suggestions on how to pro-actively catch these sorts of instances are welcome, 
as are alternative explanations.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] J4200/J4400 Array

2008-07-02 Thread Ben B.

Hi,

According to the Sun Handbook, there is a new array :
SAS interface
12 disks SAS or SATA

ZFS could be used nicely with this box.

There is an another version called
J4400 with 24 disks.

Doc is here :
http://docs.sun.com/app/docs/coll/j4200

Does someone know price and availability for these products ?

Best Regards,
Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cannot delete errored file

2008-06-13 Thread Ben Middleton

Hi,

Quick update:

I left memtest running over night - 39 passes, no errors.

I also attempted to force the BIOS to run the memory at 800MHz  5-5-5-15 as 
suggested - but the machine became very unstable - long boot times; PCI-Express 
failure of Yukon network card on booting etc. I've switched it back to Auto 
speedtiming for now. I'll just hope that it was a one-off glitch that 
corrupted the pool.

I'm going to rebuild the pool this weekend.

Thanks for all the suggestions.

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cannot delete errored file

2008-06-10 Thread Ben Middleton

Hi Marc,

Thanks for all of your suggestions.

I'll restart memtest when I'm next in the office and leave it running overnight.

I can recreate the pool - but I guess the question is am I safe to do this on 
the existing setup, or am I going to hit the same issue again sometime? 
Assuming I don't find any obvious hardware issues - wouldn't this be a regarded 
as flaw in ZFS (i.e. no way of clearing such an error without a rebuild)?

Would I be safer rebuilding to a pair of mirrors rather than a 3 disk zraid + 
hotspare?

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cannot delete errored file

2008-06-10 Thread Ben Middleton

Sent response by private message.

Today's findings are that the cksum errors appear on the new disk on the other 
controller too - so I've ruled out controllers  cables. It's probably as Jeff 
says - just got to figure out now how to prove the memory is duff.

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cannot delete errored file

2008-06-09 Thread Ben Middleton

Hi,

Today's update:

- I ran a memtest a few times - no errors.
- I reseated, re-routed ad switched all connectors/cables
- I'm currently running a scrub, but it's showing vast numbers of cksum errors 
now across all devices:

$ zpool status -v
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h5m, 3.35% done, 2h26m to go
config:

NAMESTATE READ WRITE CKSUM
rpool   DEGRADED 0 0  211K
  raidz1DEGRADED 0 0  211K
c0t7d0  DEGRADED 0 0 0  too many errors
c0t1d0  DEGRADED 0 0 0  too many errors
c0t2d0  DEGRADED 0 0 0  too many errors

errors: Permanent errors have been detected in the following files:

/export/duke/test/Acoustic/3466/88832/09 - Check.mp3

I'll start moving each device over to a different controller to see if that 
helps once the scrub completes. Still getting I/O errors trying to delete that 
file.

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cannot delete errored file

2008-06-05 Thread Ben Middleton

Hello again,

I'm not making progress on this.

Every time I run a zpool scrub rpool I see:

$ zpool status -vx
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h0m, 0.01% done, 177h43m to go
config:

NAMESTATE READ WRITE CKSUM
rpool   DEGRADED 0 0 8
  raidz1DEGRADED 0 0 8
c0t0d0  DEGRADED 0 0 0  too many errors
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

/export/duke/test/Acoustic/3466/88832/09 - Check.mp3


I popped in a brand new disk of the same size, and did a zpool replace on the 
persistently degraded drive and the new drive. i.e.:

$ zpool replace rpool c0t0d0 c0t7d0

But that simply had the effect of transferring the issue to the new drive:

$ zpool status -xv rpool
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 2h41m with 1 errors on Wed Jun  4 20:22:27 2008
config:

NAME  STATE READ WRITE CKSUM
rpool DEGRADED 0 0 8
  raidz1  DEGRADED 0 0 8
spare DEGRADED 0 0 0
  c0t0d0  DEGRADED 0 0 0  too many errors
  c0t7d0  ONLINE   0 0 0
c0t1d0ONLINE   0 0 0
c0t2d0ONLINE   0 0 0
spares
  c0t7d0  INUSE currently in use


$ zpool detach rpool c0t0d0

$ zpool status -vx rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 2h41m with 1 errors on Wed Jun  4 20:22:27 2008
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 8
  raidz1ONLINE   0 0 8
c0t7d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

0xc3:0x1c0

$ zpool scrub rpool

...

$ zpool status -vx rpool

  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h0m, 0.00% done, 0h0m to go
config:

NAMESTATE READ WRITE CKSUM
rpool   DEGRADED 0 0 4
  raidz1DEGRADED 0 0 4
c0t7d0  DEGRADED 0 0 0  too many errors
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

/export/duke/test/Acoustic/3466/88832/09 - Check.mp3

$ rm -f /export/duke/test/Acoustic/3466/88832/09 - Check.mp3

rm: cannot remove `/export/duke/test/Acoustic/3466/88832/09 - Check.mp3': I/O 
error


I'm guessing this isn't a hardware fault, but a glitch in ZFS - but am hoping 
to be proved wrong.

Any ideas before I rebuild the pool from scratch? And if I do, is there 
anything I can do to prevent this problem in the future?

B
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cannot delete errored file

2008-06-05 Thread Ben Middleton

Hi Marc,

$ :  09 - Check.mp3
bash:  09 - Check.mp3: I/O error

$ cd ..
$ rm -rf BAD
$ rm: cannot remove `BAD/09 - Check.mp3': I/O error

I'll try shuffling the cables - but as you see above it occasionally reports on 
a different disk - so imagine the cables are OK. Also, the new disk I added has 
a new cable too, and on a different SATA port - which is also showing up as 
degraded.

Is there any lower level debugging that I can enable to try and work out what 
is going on.

This machine has been running fine since last August.

I couldn't see anything in builds later than snv 86 that might help - but I 
could try upgrading to the latest?

B
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS and ACL's over NFSv3

2008-06-05 Thread Ben Rockwood

Can someone please clarify the ability to utilize ACL's over NFSv3 from a ZFS 
share?  I can getfacl but I can't setfacl.  I can't find any documentation 
in this regard.  My suspicion is that that ZFS Shares must be NFSv4 in order to 
utilize ACLs but I'm hoping this isn't the case.

Can anyone definitively speak to this?  The closest related bug I can find is 
6340720 which simply says See comments.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Cannot delete errored file

2008-06-03 Thread Ben Middleton

Hi,

I can't seem to delete a file in my zpool that has permanent errors:

zpool status -vx
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 2h10m with 1 errors on Tue Jun  3 11:36:49 2008
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

/export/duke/test/Acoustic/3466/88832/09 - Check.mp3


rm /export/duke/test/Acoustic/3466/88832/09 - Check.mp3

rm: cannot remove `/export/duke/test/Acoustic/3466/88832/09 - Check.mp3': I/O 
error

Each time I try to do anything to the file, the checksum error count goes up on 
the pool.

I also tried a mv and a cp over the top - but same I/O error.

I performed a zpool scrub rpool followed by a zpool clear rpool - but still 
get the same error. Any ideas?

PS - I'm running snv_86, and use the sata driver on an intel x86 architecture.

B
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ClearCase support for ZFS?

2008-03-27 Thread Nissim Ben-Haim

Hi,

Does anybody know what is the latest status with ClearCase support for ZFS?
I noticed this from IBM:   
http://www-1.ibm.com/support/docview.wss?rs=0uid=swg21155708

I would like to make sure someone has installed and tested it before 
recommending to a customer.

Regards,
Nissim Ben-Haim
Solution Architect

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Clearing corrupted file errors

2008-03-12 Thread Ben Middleton

Hi,

Sorry if this is a RTM issue - but I wanted to be sure before continuing. I 
received a corrupted file error on one of my pools. I removed the file, and the 
status command now shows the following:

 zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

rpool/duke:0x8237


I tried running zpool clear rpool to clear the error, but it persists in the 
status output. Should a zpool scrub rpool get rid of this error?

Thanks,

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs 32bits

2008-03-05 Thread Ben

Hi,

I know that is not recommended by Sun
to use ZFS on 32 bits machines but,
what  are really the consequences of doing this ?

I have an old Bipro Xeon server (6 GB ram , 6 disks),
and I would like to do a raidz with 4 disks with Solaris 10 update 4.

Thanks,
Ben

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zpool status -x strangeness on b78

2008-02-06 Thread Ben Miller

We run a cron job that does a 'zpool status -x' to check for any degraded 
pools.  We just happened to find a pool degraded this morning by running 'zpool 
status' by hand and were surprised that it was degraded as we didn't get a 
notice from the cron job.

# uname -srvp
SunOS 5.11 snv_78 i386

# zpool status -x
all pools are healthy

# zpool status pool1
  pool: pool1
 state: DEGRADED
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
pool1DEGRADED 0 0 0
  raidz1 DEGRADED 0 0 0
c1t8d0   REMOVED  0 0 0
c1t9d0   ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0

errors: No known data errors

I'm going to look into it now why the disk is listed as removed.

Does this look like a bug with 'zpool status -x'?

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Panic on Zpool Import (Urgent)

2008-01-17 Thread Ben Rockwood

The solution here was to upgrade to snv_78.  By upgrade I mean 
re-jumpstart the system.

I tested snv_67 via net-boot but the pool paniced just as below.  I also 
attempted using zfs_recover without success.

I then tested snv_78 via net-boot, used both aok=1 and 
zfs:zfs_recover=1 and was able to (slowly) import the pool.  Following 
that test I exported and then did a full re-install of the box.

A very important note to anyone upgrading a Thumper!  Don't forget about 
the NCQ bug.  After upgrading to a release more recent than snv_60 add 
the following to /etc/system:

set sata:sata_max_queue_depth = 0x1

If you don't life will be highly unpleasant and you'll believe that disks are 
failing everywhere when in fact they are not.

benr.




Ben Rockwood wrote:
 Today, suddenly, without any apparent reason that I can find, I'm 
 getting panic's during zpool import.  The system paniced earlier today 
 and has been suffering since.  This is snv_43 on a thumper.  Here's the 
 stack:

 panic[cpu0]/thread=99adbac0: assertion failed: ss != NULL, file: 
 ../../common/fs/zfs/space_map.c, line: 145

 fe8000a240a0 genunix:assfail+83 ()
 fe8000a24130 zfs:space_map_remove+1d6 ()
 fe8000a24180 zfs:space_map_claim+49 ()
 fe8000a241e0 zfs:metaslab_claim_dva+130 ()
 fe8000a24240 zfs:metaslab_claim+94 ()
 fe8000a24270 zfs:zio_dva_claim+27 ()
 fe8000a24290 zfs:zio_next_stage+6b ()
 fe8000a242b0 zfs:zio_gang_pipeline+33 ()
 fe8000a242d0 zfs:zio_next_stage+6b ()
 fe8000a24320 zfs:zio_wait_for_children+67 ()
 fe8000a24340 zfs:zio_wait_children_ready+22 ()
 fe8000a24360 zfs:zio_next_stage_async+c9 ()
 fe8000a243a0 zfs:zio_wait+33 ()
 fe8000a243f0 zfs:zil_claim_log_block+69 ()
 fe8000a24520 zfs:zil_parse+ec ()
 fe8000a24570 zfs:zil_claim+9a ()
 fe8000a24750 zfs:dmu_objset_find+2cc ()
 fe8000a24930 zfs:dmu_objset_find+fc ()
 fe8000a24b10 zfs:dmu_objset_find+fc ()
 fe8000a24bb0 zfs:spa_load+67b ()
 fe8000a24c20 zfs:spa_import+a0 ()
 fe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
 fe8000a24ce0 zfs:zfsdev_ioctl+135 ()
 fe8000a24d20 genunix:cdev_ioctl+55 ()
 fe8000a24d60 specfs:spec_ioctl+99 ()
 fe8000a24dc0 genunix:fop_ioctl+3b ()
 fe8000a24ec0 genunix:ioctl+180 ()
 fe8000a24f10 unix:sys_syscall32+101 ()

 syncing file systems... done

 This is almost identical to a post to this list over a year ago titled 
 ZFS Panic.  There was follow up on it but the results didn't make it 
 back to the list.

 I spent time doing a full sweep for any hardware failures, pulled 2 
 drives that I suspected as problematic but weren't flagged as such, etc, 
 etc, etc.  Nothing helps.

 Bill suggested a 'zpool import -o ro' on the other post, but thats not 
 working either.

 I _can_ use 'zpool import' to see the pool, but I have to force the 
 import.  A simple 'zpool import' returns output in about a minute.  
 'zpool import -f poolname' takes almost exactly 10 minutes every single 
 time, like it hits some timeout and then panics.

 I did notice that while the 'zpool import' is running 'iostat' is 
 useless, just hangs.  I still want to believe this is some device 
 misbehaving but I have no evidence to support that theory.

 Any and all suggestions are greatly appreciated.  I've put around 8 
 hours into this so far and I'm getting absolutely nowhere.

 Thanks

 benr.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Removing An Errant Drive From Zpool

2008-01-16 Thread Ben Rockwood

Robert Milkowski wrote:
 If you can't re-create a pool (+backuprestore your data) I would
 recommend to wait for device removal in zfs and in a mean time I would
 attach another drive to it so you've got mirrored configuration and
 remove them once there's a device removal. Since you're already
 working on nevada you probably could adopt new bits quickly.

 The only question is - when device removal is going to be integrated -
 last time someone mentioned it here it was supposed to be by the end
 of last year...
   
Ya, I'm afraid your right.

benr.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Removing An Errant Drive From Zpool

2008-01-15 Thread Ben Rockwood

I made a really stupid mistake... having trouble removing a hot spare 
marked as failed I was trying several ways to put it back in a good 
state.  One means I tried was to 'zpool add pool c5t3d0'... but I forgot 
to use the proper syntax zpool add pool spare c5t3d0.

Now I'm in a bind.  I've got 4 large raidz2's and now this punty 500GB 
drive in the config:

...
  raidz2ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c7t7d0  ONLINE   0 0 0
c6t7d0  ONLINE   0 0 0
c1t7d0  ONLINE   0 0 0
c0t7d0  ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
  c5t3d0ONLINE   0 0 0
spares
  c5t3d0FAULTED   corrupted data
  c4t7d0AVAIL  
...



Detach and Remove won't work.  Does anyone know of a way to get that 
c5t3d0 out of the data configuration and back to hot-spare where it belongs?

However if I understand the layout properly, this should not have an 
adverse impact on my existing configuration I think.  If I can't 
dump it, what happens when that disk fills up?

I can't believe I made such a bone headed mistake.  This is one of those 
times when a Are you sure you...? would be helpful. :(

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Removing An Errant Drive From Zpool

2008-01-15 Thread Ben Rockwood


Eric Schrock wrote:
 There's really no way to recover from this, since we don't have device
 removal.  However, I'm suprised that no warning was given.  There are at
 least two things that should have happened:

 1. zpool(1M) should have warned you that the redundancy level you were
attempting did not match that of your existing pool.  This doesn't
apply if you already have a mixed level of redundancy.

 2. zpool(1M) should have warned you that the device was in use as an
active spare and not let you continue.

 What bits were you running?
   

snv_78, however the pool was created on snv_43 and hasn't yet been 
upgraded.  Though, programatically, I can't see why there would be a 
difference in the way 'zpool' would handle the check.

The big question is, if I'm stuck like the permanently, whats the 
potential risk?

Could I potentially just fail that drive and leave it in a failed state?

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Panic on Zpool Import (Urgent)

2008-01-12 Thread Ben Rockwood

Today, suddenly, without any apparent reason that I can find, I'm 
getting panic's during zpool import.  The system paniced earlier today 
and has been suffering since.  This is snv_43 on a thumper.  Here's the 
stack:

panic[cpu0]/thread=99adbac0: assertion failed: ss != NULL, file: 
../../common/fs/zfs/space_map.c, line: 145

fe8000a240a0 genunix:assfail+83 ()
fe8000a24130 zfs:space_map_remove+1d6 ()
fe8000a24180 zfs:space_map_claim+49 ()
fe8000a241e0 zfs:metaslab_claim_dva+130 ()
fe8000a24240 zfs:metaslab_claim+94 ()
fe8000a24270 zfs:zio_dva_claim+27 ()
fe8000a24290 zfs:zio_next_stage+6b ()
fe8000a242b0 zfs:zio_gang_pipeline+33 ()
fe8000a242d0 zfs:zio_next_stage+6b ()
fe8000a24320 zfs:zio_wait_for_children+67 ()
fe8000a24340 zfs:zio_wait_children_ready+22 ()
fe8000a24360 zfs:zio_next_stage_async+c9 ()
fe8000a243a0 zfs:zio_wait+33 ()
fe8000a243f0 zfs:zil_claim_log_block+69 ()
fe8000a24520 zfs:zil_parse+ec ()
fe8000a24570 zfs:zil_claim+9a ()
fe8000a24750 zfs:dmu_objset_find+2cc ()
fe8000a24930 zfs:dmu_objset_find+fc ()
fe8000a24b10 zfs:dmu_objset_find+fc ()
fe8000a24bb0 zfs:spa_load+67b ()
fe8000a24c20 zfs:spa_import+a0 ()
fe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
fe8000a24ce0 zfs:zfsdev_ioctl+135 ()
fe8000a24d20 genunix:cdev_ioctl+55 ()
fe8000a24d60 specfs:spec_ioctl+99 ()
fe8000a24dc0 genunix:fop_ioctl+3b ()
fe8000a24ec0 genunix:ioctl+180 ()
fe8000a24f10 unix:sys_syscall32+101 ()

syncing file systems... done

This is almost identical to a post to this list over a year ago titled 
ZFS Panic.  There was follow up on it but the results didn't make it 
back to the list.

I spent time doing a full sweep for any hardware failures, pulled 2 
drives that I suspected as problematic but weren't flagged as such, etc, 
etc, etc.  Nothing helps.

Bill suggested a 'zpool import -o ro' on the other post, but thats not 
working either.

I _can_ use 'zpool import' to see the pool, but I have to force the 
import.  A simple 'zpool import' returns output in about a minute.  
'zpool import -f poolname' takes almost exactly 10 minutes every single 
time, like it hits some timeout and then panics.

I did notice that while the 'zpool import' is running 'iostat' is 
useless, just hangs.  I still want to believe this is some device 
misbehaving but I have no evidence to support that theory.

Any and all suggestions are greatly appreciated.  I've put around 8 
hours into this so far and I'm getting absolutely nowhere.

Thanks

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] offlining a storage pool

2007-11-21 Thread Ben

Hi,

I would like to offline an entire storage pool (not some devices),
( I want to stop all io activity to the pool)

Maybe it could be implemented with a 
a command like :
zpool offline -f tank
which should  implicity do a zfs unmount tank

I use zfs with solaris 10 update 4.

Thanks,
Ben
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS mirror and sun STK 2540 FC array

2007-11-16 Thread Ben

Hi all,

we have just bought a sun X2200M2 (4GB / 2 opteron 2214 / 2 disks 250GB 
SATA2, solaris 10 update 4)
and a sun STK 2540 FC array (8 disks SAS 146 GB, 1 raid controller).
The server is attached to the array with a single 4 Gb Fibre Channel link.

I want to make a mirror using ZFS with this array. 

I have created  2 volumes on the array
in RAID0 (stripe of 128 KB) presented to the host with lun0 and lun1.

So, on the host  :
bash-3.00# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c1d0 DEFAULT cyl 30397 alt 2 hd 255 sec 63
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
   1. c2d0 DEFAULT cyl 30397 alt 2 hd 255 sec 63
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
   2. c6t600A0B800038AFBC02F7472155C0d0 DEFAULT cyl 35505 alt 2 
hd 255 sec 126
  /scsi_vhci/[EMAIL PROTECTED]
   3. c6t600A0B800038AFBC02F347215518d0 DEFAULT cyl 35505 alt 2 
hd 255 sec 126
  /scsi_vhci/[EMAIL PROTECTED]
Specify disk (enter its number):

bash-3.00# zpool create tank mirror 
c6t600A0B800038AFBC02F347215518d0 c6t600A0B800038AFBC02F7472155C0d0

bash-3.00# df -h /tank
Filesystem size   used  avail capacity  Mounted on
tank   532G24K   532G 1%/tank


I have tested the performance with a simple dd
[
time dd if=/dev/zero of=/tank/testfile bs=1024k count=1
time dd if=/tank/testfile of=/dev/null bs=1024k count=1
]
command and it gives :
# local throughput
stk2540
   mirror zfs /tank
read   232 MB/s
write  175 MB/s

# just to test the max perf I did:
zpool destroy -f tank
zpool create -f pool c6t600A0B800038AFBC02F347215518d0

And the same basic dd gives me :
  single zfs /pool
read   320 MB/s
write  263 MB/s

Just to give an idea the SVM mirror using the two local sata2 disks
gives :
read  58 MB/s
write 52 MB/s

So, in production the zfs /tank mirror will be used to hold
our home directories (10 users using 10GB each),
our projects files (200 GB mostly text files and cvs database),
and some vendors tools (100 GB).
People will access the data (/tank) using nfs4 with their
workstations (sun ultra 20M2 with centos 4update5).

On the ultra20 M2, the basic test via nfs4 gives :
read  104 MB/s
write  63 MB/s

A this point, I have the following questions :
-- Does someone has some similar figures about the STK 2540 using zfs  ?

-- Instead of doing only 2 volumes in the array,
   what do you think about doing 8 volumes (one for each disk)
   and doing a 4 two way mirror :
   zpool create tank mirror  c6t6001.. c6t6002.. mirror c6t6003.. 
c6t6004.. {...} mirror c6t6007.. c6t6008..

-- I will add 4 disks in the array next summer.
   Do you think  I should create 2 new luns in the array
   and doing a :
zpool add tank mirror c6t6001..(lun3) c6t6001..(lun4)
  
   or build from scratch the 2 luns (6 disks raid0) , and the pool tank
(ie : backup /tank - zpool destroy -- add disk - reconfigure array 
-- zpool create tank ... - restore backuped data)

-- I think about doing a disk scrubbing once a month.
   Is it sufficient ?

-- Have you got any comment on the performance from the nfs4 client ?

If you add any advices / suggestions, feel free to share.

Thanks,  
 
 Benjamin
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS Quota Oddness

2007-10-31 Thread Ben Rockwood

I've run across an odd issue with ZFS Quota's.  This is an snv_43 system with 
several zones/zfs datasets, but only one effected.  The dataset shows 10GB 
used, 12GB refered but when counting the files only has 6.7GB of data:

zones/ABC10.8G  26.2G  12.0G  /zones/ABC
zones/[EMAIL PROTECTED]14.7M  -  12.0G  -

[xxx:/zones/ABC/.zfs/snapshot/now] root# gdu --max-depth=1 -h .
43k ./dev
6.7G./root
1.5k./lu
6.7G.

I don't understand what might the cause this disparity.  This is an older box, 
snv_43.  Any bugs that might apply, fixed or in progress?

Thanks.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Ben Rockwood

Dick Davies wrote:
 On 04/10/2007, Nathan Kroenert [EMAIL PROTECTED] wrote:

   
 Client A
   - import pool make couple-o-changes

 Client B
   - import pool -f  (heh)
 

   
 Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ff0002b51c80:
 Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion
 failed: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0x5
 == 0x0)
 , file: ../../common/fs/zfs/space_map.c, line: 339
 Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51160
 genunix:assfail3+b9 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51200
 zfs:space_map_load+2ef ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51240
 zfs:metaslab_activate+66 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51300
 zfs:metaslab_group_alloc+24e ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b513d0
 zfs:metaslab_alloc_dva+192 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51470
 zfs:metaslab_alloc+82 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514c0
 zfs:zio_dva_allocate+68 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514e0
 zfs:zio_next_stage+b3 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51510
 zfs:zio_checksum_generate+6e ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51530
 zfs:zio_next_stage+b3 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515a0
 zfs:zio_write_compress+239 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515c0
 zfs:zio_next_stage+b3 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51610
 zfs:zio_wait_for_children+5d ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51630
 zfs:zio_wait_children_ready+20 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51650
 zfs:zio_next_stage_async+bb ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51670
 zfs:zio_nowait+11 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51960
 zfs:dbuf_sync_leaf+1ac ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b519a0
 zfs:dbuf_sync_list+51 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a10
 zfs:dnode_sync+23b ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a50
 zfs:dmu_objset_sync_dnodes+55 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51ad0
 zfs:dmu_objset_sync+13d ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51b40
 zfs:dsl_pool_sync+199 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51bd0
 zfs:spa_sync+1c5 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c60
 zfs:txg_sync_thread+19a ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c70
 unix:thread_start+8 ()
 Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]
 

   
 Is this a known issue, already fixed in a later build, or should I bug it?
 

 It shouldn't panic the machine, no. I'd raise a bug.

   
 After spending a little time playing with iscsi, I have to say it's
 almost inevitable that someone is going to do this by accident and panic
 a big box for what I see as no good reason. (though I'm happy to be
 educated... ;)
 

 You use ACLs and TPGT groups to ensure 2 hosts can't simultaneously
 access the same LUN by accident. You'd have the same problem with
 Fibre Channel SANs.
   
I ran into similar problems when replicating via AVS.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS for OSX - it'll be in there.

2007-10-04 Thread Ben Rockwood

Dale Ghent wrote:
 ...and eventually in a read-write capacity:

 http://www.macrumors.com/2007/10/04/apple-seeds-zfs-read-write- 
 developer-preview-1-1-for-leopard/

 Apple has seeded version 1.1 of ZFS (Zettabyte File System) for Mac  
 OS X to Developers this week. The preview updates a previous build  
 released on June 26, 2007.
   

Y!  Finally my USB Thumb Drives will work on my MacBook! :)

I wonder if it'll automatically mount the Zpool on my iPod when I sync it.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] System hang caused by a bad snapshot

2007-09-18 Thread Ben Miller

   Hello Matthew,
   Tuesday, September 12, 2006, 7:57:45 PM, you
  wrote:
   MA Ben Miller wrote:
I had a strange ZFS problem this morning.
  The
   entire system would
hang when mounting the ZFS filesystems.  After
   trial and error I
   determined that the problem was with one of
  the
   2500 ZFS filesystems.
   When mounting that users' home the system
  would
   hang and need to be
rebooted.  After I removed the snapshots (9 of
   them) for that
filesystem everything was fine.

I don't know how to reproduce this and didn't
  get
   a crash dump.  I
don't remember seeing anything about this
  before
   so I wanted to
report it and see if anyone has any ideas.
   
  MA Hmm, that sounds pretty bizarre, since I
  don't
   think that mounting a 
   MA filesystem doesn't really interact with
  snapshots
   at all. 
   MA Unfortunately, I don't think we'll be able to
   diagnose this without a 
   MA crash dump or reproducibility.  If it happens
   again, force a crash dump
   MA while the system is hung and we can take a
  look
   at it.
   
   Maybe it wasn't hung after all. I've seen similar
   behavior here
   sometimes. Did your disks used in a pool were
   actually working?
   
  
  There was lots of activity on the disks (iostat and
 status LEDs) until it got to this one filesystem
  and
  everything stopped.  'zpool iostat 5' stopped
  running, the shell wouldn't respond and activity on
  the disks stopped.  This fs is relatively small
(175M used of a 512M quota).
  Sometimes it takes a lot of time (30-50minutes) to
   mount a file system
   - it's rare, but it happens. And during this ZFS
   reads from those
   disks in a pool. I did report it here some time
  ago.
   
  In my case the system crashed during the evening
  and it was left hung up when I came in during the
   morning, so it was hung for a good 9-10 hours.
 
 The problem happened again last night, but for a
 different users' filesystem.  I took a crash dump
 with it hung and the back trace looks like this:
  ::status
 debugging crash dump vmcore.0 (64-bit) from hostname
 operating system: 5.11 snv_40 (sun4u)
 panic message: sync initiated
 dump content: kernel pages only
  ::stack
 0xf0046a3c(f005a4d8, 2a100047818, 181d010, 18378a8,
 1849000, f005a4d8)
 prom_enter_mon+0x24(2, 183c000, 18b7000, 2a100046c61,
 1812158, 181b4c8)
 debug_enter+0x110(0, a, a, 180fc00, 0, 183e000)
 abort_seq_softintr+0x8c(180fc00, 18abc00, 180c000,
 2a100047d98, 1, 1859800)
 intr_thread+0x170(600019de0e0, 0, 6000d7bfc98,
 600019de110, 600019de110, 
 600019de110)
 zfs_delete_thread_target+8(600019de080,
 , 0, 600019de080, 
 6000d791ae8, 60001aed428)
 zfs_delete_thread+0x164(600019de080, 6000d7bfc88, 1,
 2a100c4faca, 2a100c4fac8, 
 600019de0e0)
 thread_start+4(600019de080, 0, 0, 0, 0, 0)
 
 In single user I set the mountpoint for that user to
 be none and then brought the system up fine.  Then I
 destroyed the snapshots for that user and their
 filesystem mounted fine.  In this case the quota was
 reached with the snapshots and 52% used without.
 
 Ben

Hate to re-open something from a year ago, but we just had this problem happen 
again.  We have been running Solaris 10u3 on this system for awhile.  I 
searched the bug reports, but couldn't find anything on this.  I also think I 
understand what happened a little more.  We take snapshots at noon and the 
system hung up during that time.  When trying to reboot the system would hang 
on the ZFS mounts.  After I boot into single use and remove the snapshot from 
the filesystem causing the problem everything is fine.  The filesystem in 
question at 100% use with snapshots in use.

Here's the back trace for the system when it was hung:
 ::stack
0xf0046a3c(f005a4d8, 2a10004f828, 0, 181c850, 1848400, f005a4d8)
prom_enter_mon+0x24(0, 0, 183b400, 1, 1812140, 181ae60)
debug_enter+0x118(0, a, a, 180fc00, 0, 183d400)
abort_seq_softintr+0x94(180fc00, 18a9800, 180c000, 2a10004fd98, 1, 1857c00)
intr_thread+0x170(2, 30007b64bc0, 0, c001ed9, 110, 6000240)
0x985c8(300adca4c40, 0, 0, 0, 0, 30007b64bc0)
dbuf_hold_impl+0x28(60008cd02e8, 0, 0, 0, 7b648d73, 2a105bb57c8)
dbuf_hold_level+0x18(60008cd02e8, 0, 0, 7b648d73, 0, 0)
dmu_tx_check_ioerr+0x20(0, 60008cd02e8, 0, 0, 0, 7b648c00)
dmu_tx_hold_zap+0x84(60011fb2c40, 0, 0, 0, 30049b58008, 400)
zfs_rmnode+0xc8(3002410d210, 2a105bb5cc0, 0, 60011fb2c40, 30007b3ff58, 
30007b56ac0)
zfs_delete_thread+0x168(30007b56ac0, 3002410d210, 69a4778, 30007b56b28, 
2a105bb5aca, 2a105bb5ac8)
thread_start+4(30007b56ac0, 0, 0, 489a48, d83a10bf28, 50386)

Has this been fixed in more recent code?  I can make the crash dump available.

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is there _any_ suitable motherboard?

2007-08-26 Thread Ben Middleton

I've just purchased an Asus P5K WS, which seems to work OK. I had to download 
the Marvell Yukon ethernet driver - but it's all working fine. It's also got a 
PCI-X slot - so I have one of those Super Micro 8 port SATA cards - providing a 
total of 16 SATA ports across the system. Other specs are one of those Intel 
E6750 1333MHz FSB CPUs and 2Gb of matched memory.

Ben.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS, iSCSI + Mac OS X Tiger (globalSAN iSCSI)

2007-07-05 Thread Ben Rockwood

George wrote:
 I have set up an iSCSI ZFS target that seems to connect properly from 
 the Microsoft Windows initiator in that I can see the volume in MMC 
 Disk Management.

  
 When I shift over to Mac OS X Tiger with globalSAN iSCSI, I am able to 
 set up the Targets with the target name shown by `iscsitadm list 
 target` and when I actually connect or Log On I see that one 
 connection exists on the Solaris server.  I then go on to the Sessions 
 tab in globalSAN and I see the session details and it appears that 
 data is being transferred via the PDUs Sent, PDUs Received, Bytes, 
 etc.  HOWEVER the connection then appears to terminate on the Solaris 
 side if I check it a few minutes later it shows no connections, but 
 the Mac OS X initiator still shows connected although no more traffic 
 appears to be flowing in the Session Statistics dialog area.

  
 Additionally, when I then disconnect the Mac OS X initiator it seems 
 to drop fine on the Mac OS X side, even though the Solaris side has 
 shown it gone for a while, however when I reconnect or Log On again, 
 it seems to spin infinitely on the Target Connect... dialog. 
  Solaris is, interestingly, showing 1 connection while this apparent 
 issue (spinning beachball of death) is going on with globalSAN.  Even 
 killing the Mac OS X process doesn't seem to get me full control again 
 as I have to restart the system to kill all processes (unless I can 
 hunt them down and `kill -9` them which I've not successfully done 
 thus far).

 Has anyone dealt with this before and perhaps be able to assist or at 
 least throw some further information towards me to troubleshoot this?

When I learned of the globalSAN Initiator I was overcome with joy. 
after about 2 days of spending way too much time with it I gave up.  
Have a look at their forum 
(http://www.snsforums.com/index.php?s=b0c9031ebe1a89a40cfe4c417e3443f1showforum=14).
  
There are a wide range of problems.  In my case connections to the 
target (Solaris/ZFS/iscsitgt) look fine and dandy initially, but you can 
use the connection, on reboot globalSAN goes psycho, etc.

At this point I've given up on the product; at least for now.  If I 
could actually get an accessable disk at least part of the time I'd dig 
my fingers into it, but it doesn't offer a usable remote disk to begin 
with and in a variety of other environments it have identical problems.  
I consider debugging it to be purely academic at this point.  Its a 
great way to gain insight into the inner workings of iSCSI, but without 
source code or DTrace on the Mac its hard to expect any big gains.

Thats my personal take.  If you really wanna go hacking on it regardless 
bring it up on the Storage list and we can corporately enjoy the 
academic challenge of finding the problems, but there is nothing to 
suggest its an OpenSolaris issue.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS Send/RECV

2007-06-01 Thread Ben Bressler

I'm trying  to test an install of ZFS to see if I can backup data from one 
machine to another.  I'm using Solaris 5.10 on two VMware installs.

When I do the zfs send | ssh zfs recv part, the file system (folder) is getting 
created, but none of the data that I have in my snapshot is sent.  I can browse 
on the source machine to view the snapshot data pool/.zfs/snapshot/snap-name 
and I see the data.  

Am I missing something to make it copy all of the data?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZVol Panic on 62

2007-05-25 Thread Ben Rockwood

May 25 23:32:59 summer unix: [ID 836849 kern.notice]
May 25 23:32:59 summer ^Mpanic[cpu1]/thread=1bf2e740:
May 25 23:32:59 summer genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf 
Page fault) rp=ff00232c3a80 addr=490 occurred in module unix due to a 
NULL pointer dereference
May 25 23:32:59 summer unix: [ID 10 kern.notice]
May 25 23:32:59 summer unix: [ID 839527 kern.notice] grep:
May 25 23:32:59 summer unix: [ID 753105 kern.notice] #pf Page fault
May 25 23:32:59 summer unix: [ID 532287 kern.notice] Bad kernel fault at 
addr=0x490
May 25 23:32:59 summer unix: [ID 243837 kern.notice] pid=18425, 
pc=0xfb83b6bb, sp=0xff00232c3b78, eflags=0x10246
May 25 23:32:59 summer unix: [ID 211416 kern.notice] cr0: 
8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
May 25 23:32:59 summer unix: [ID 354241 kern.notice] cr2: 490 cr3: 1fce52000 
cr8: c
May 25 23:32:59 summer unix: [ID 592667 kern.notice]rdi:  490 
rsi:0 rdx: 1bf2e740
May 25 23:32:59 summer unix: [ID 592667 kern.notice]rcx:0  
r8:d  r9: 62ccc700
May 25 23:32:59 summer unix: [ID 592667 kern.notice]rax:0 
rbx:0 rbp: ff00232c3bd0
May 25 23:32:59 summer unix: [ID 592667 kern.notice]r10: fc18 
r11:0 r12:  490
May 25 23:32:59 summer unix: [ID 592667 kern.notice]r13:  450 
r14: 52e3aac0 r15:0
May 25 23:32:59 summer unix: [ID 592667 kern.notice]fsb:0 
gsb: fffec3731800  ds:   4b
May 25 23:32:59 summer unix: [ID 592667 kern.notice] es:   4b  
fs:0  gs:  1c3
May 25 23:33:00 summer unix: [ID 592667 kern.notice]trp:e 
err:2 rip: fb83b6bb
May 25 23:33:00 summer unix: [ID 592667 kern.notice] cs:   30 
rfl:10246 rsp: ff00232c3b78
May 25 23:33:00 summer unix: [ID 266532 kern.notice] ss:   38
May 25 23:33:00 summer unix: [ID 10 kern.notice]
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3960 
unix:die+c8 ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3a70 
unix:trap+135b ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3a80 
unix:cmntrap+e9 ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3bd0 
unix:mutex_enter+b ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3c20 
zfs:zvol_read+51 ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3c50 
genunix:cdev_read+3c ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3cd0 
specfs:spec_read+276 ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3d40 
genunix:fop_read+3f ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3e90 
genunix:read+288 ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3ec0 
genunix:read32+1e ()
May 25 23:33:00 summer genunix: [ID 655072 kern.notice] ff00232c3f10 
unix:brand_sys_syscall32+1a3 ()
May 25 23:33:00 summer unix: [ID 10 kern.notice]
May 25 23:33:00 summer genunix: [ID 672855 kern.notice] syncing file systems...


Does anyone have an idea of what bug this might be?  Occurred on X86 B62.  I'm 
not seeing any putbacks into 63 or bugs that seem to match.

Any insight is appreciated.  Core's are available.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New zfs pr0n server :)))

2007-05-19 Thread Ben Rockwood


Diego Righi wrote:

Hi all, I just built a new zfs server for home and, being a long time and avid 
reader of this forum, I'm going to post my config specs and my benchmarks 
hoping this could be of some help for others :)

http://www.sickness.it/zfspr0nserver.jpg
http://www.sickness.it/zfspr0nserver.txt
http://www.sickness.it/zfspr0nserver.png
http://www.sickness.it/zfspr0nserver.pdf

Correct me if I'm wrong: from the benchmark results, I understand that this 
setup is slow at writing, but fast at reading (and this is perfect for my 
usage, copying large files once and then accessing only to read them). It also 
seems that at 128kb it gives the best performances, iirc due to the zfs stripe 
size (again, correct me if I'm wrong :).

I'd happily try any other test, but if you suggest bonnie++ please tell me 
what's the right version to use, too much of them I really can't understand 
which to try!

tnx :)
 


Classy.  +1 for style. ;)

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Remove files when at quota limit

2007-05-15 Thread Ben Miller

Has anyone else run into this situation?  Does anyone have any solutions other 
than removing snapshots or increasing the quota?  I'd like to put in an RFE to 
reserve some space so files can be removed when users are at their quota.  Any 
thoughts from the ZFS team?

Ben

 We have around 1000 users all with quotas set on
 their ZFS filesystems on Solaris 10 U3.  We take
 snapshots daily and rotate out the week old ones.
 The situation is that some users ignore the advice
 of keeping space used below 80% and keep creating
 large temporary files.  They then try to remove
 files when the space used is 100% and get over quota
 messages.  We then need to remove some or all of
 their snapshots to free space.  Is there anything
 being worked on to keep some space reserved so files
 can be removed when at the quota limit or some other
 solution?  What are other people doing is this
 situation?  We have also set up alternate
 filesystems for users with transient data that we do
 not take snapshots on, but we still have this
  problem on home directories.
 
 thanks,
 Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Remove files when at quota limit

2007-05-10 Thread Ben Miller

We have around 1000 users all with quotas set on their ZFS filesystems on 
Solaris 10 U3.  We take snapshots daily and rotate out the week old ones.  The 
situation is that some users ignore the advice of keeping space used below 80% 
and keep creating large temporary files.  They then try to remove files when 
the space used is 100% and get over quota messages.  We then need to remove 
some or all of their snapshots to free space.  Is there anything being worked 
on to keep some space reserved so files can be removed when at the quota limit 
or some other solution?  What are other people doing is this situation?  We 
have also set up alternate filesystems for users with transient data that we do 
not take snapshots on, but we still have this problem on home directories.

thanks,
Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: ZFS disables nfs/server on a host

2007-04-27 Thread Ben Miller

I just threw in a truss in the SMF script and rebooted the test system and it 
failed again.
The truss output is at http://www.eecis.udel.edu/~bmiller/zfs.truss-Apr27-2007

thanks,
Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS disables nfs/server on a host

2007-04-26 Thread Ben Miller

I was able to duplicate this problem on a test Ultra 10.  I put in a workaround 
by adding a service that depends on /milestone/multi-user-server which does a 
'zfs share -a'.  It's strange this hasn't happened on other systems, but maybe 
it's related to slower systems...

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS disables nfs/server on a host

2007-04-19 Thread Ben Miller

It does seem like an ordering problem, but nfs/server should be starting up 
late enough with SMF dependencies.  I need to see if I can duplicate the 
problem on a test system...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] [OT] Multipathing on Mac OS X

2007-03-01 Thread Ben Gollmer

This is pretty OT, but a bit ago there was some discussion of Mac OS  
X's multipathing support (or its lack thereof). According to this  
technote, multipathing support has been included Mac OS X since  
10.3.5, but there are some particular requirements on the target  
devices  HBAs.


http://developer.apple.com/technotes/tn2007/tn2173.html

--
Ben




PGP.sig
Description: This is a digitally signed message part
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] snapdir visable recursively throughout a dataset

2007-02-06 Thread Ben Rockwood


Darren J Moffat wrote:

Ben Rockwood wrote:

Robert Milkowski wrote:

I haven't tried it but what if you mounted ro via loopback into a zone


/zones/myzone01/root/.zfs is loop mounted in RO to /zones/myzone01/.zfs
  


That is so wrong. ;)

Besides just being evil, I doubt it'd work.  And if it does, it 
probly shouldn't.   I think I'm the only one that gets a rash when 
using LOFI.


lofi or lofs ?

lofi - Loopback file driver
Makes a block device from a file
lofs - loopback virtual file system
Makes a file system from a file system


Yes, I know.  I was referring more so to loopback happy people in 
general. :)


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Read Only Zpool: ZFS and Replication

2007-02-05 Thread Ben Rockwood

I've been playing with replication of a ZFS Zpool using the recently released 
AVS.  I'm pleased with things, but just replicating the data is only part of 
the problem.  The big question is: can I have a zpool open in 2 places?  

What I really want is a Zpool on node1 open and writable (production storage) 
and a replicated to node2 where its open for read-only access (standby storage).

This is an old problem.  I'm not sure its remotely possible.  Its bad enough 
with UFS, but ZFS maintains a hell of a lot more meta-data.  How is node2 
supposed to know that a snapshot has been created for instance.  With UFS you 
can at least get by some of these problems using directio, but thats not an 
option with a zpool.

I know this is a fairly remedial issue to bring up... but if I think about what 
I want Thumper-to-Thumper replication to look like, I want 2 usable storage 
systems.  As I see it now the secondary storage (node2) is useless untill you 
break replication and import the pool, do your thing, and then re-sync storage 
to re-enable replication.  

Am I missing something?  I'm hoping there is an option I'm not aware of.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] snapdir visable recursively throughout a dataset

2007-02-05 Thread Ben Rockwood

Is there an existing RFE for, what I'll wrongly call, recursively visable 
snapshots?  That is, .zfs in directories other than the dataset root.

Frankly, I don't need it available in all directories, although it'd be nice, 
but I do have a need for making it visiable 1 dir down from the dataset root.  
The problem is that while ZFS and Zones work smoothly together for moving, 
cloning, sizing, etc, you can't view .zfs/ from within the zone because  the 
zone root is one dir down:

/zones   -- Dataset
/zones/myzone01  -- Dataset, .zfs is located here.
/zones/myzone01/root -- Directory, want .zfs Here!

The ultimate idea is to make ZFS snapdirs accessable from within the zone.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs / nfs issue (not performance :-) with courier-imap

2007-01-25 Thread Ben Rockwood


Robert Milkowski wrote:

CLSNL but if I click, say E, it has F's contents, F has Gs contents, and no
CLSNL mail has D's contents that I can see.  But the list in the mail
CLSNL client list view is correct.

I don't belive it's a problem with nfs/zfs server.

Please try with simple dtrace script to see (or even truss) what files
your imapd actually opens when you click E - I don't belive it opens E
and you get F contents, I would bet it opens F.
  


I completely agree with Robert.  I'd personally suggest 'truss' to start 
because its trivial to use, then start using DTrace to further hone down 
the problem.


In the case of Courier-IMAP the best way to go about it would be to 
truss the parent (courierlogger, which calls courierlogin and ultimately 
imapd) using 'truss -f -p PID'.   Then open the mailbox and watch 
those stat's and open's closely.


I'll be very interested in your findings.  We use Courier on NFS/ZFS 
heavily and I'm thankful to report having no such problems.


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over NFS extra slow?

2007-01-02 Thread Ben Rockwood


Brad Plecs wrote:
I had a user report extreme slowness on a ZFS filesystem mounted over NFS over the weekend. 
After some extensive testing, the extreme slowness appears to only occur when a ZFS filesystem is mounted over NFS.  

One example is doing a 'gtar xzvf php-5.2.0.tar.gz'... over NFS onto a ZFS filesystem.  this takes: 


real5m12.423s
user0m0.936s
sys 0m4.760s

Locally on the server (to the same ZFS filesystem) takes: 


real0m4.415s
user0m1.884s
sys 0m3.395s

The same job over NFS to a UFS filesystem takes

real1m22.725s
user0m0.901s
sys 0m4.479s

Same job locally on server to same UFS filesystem: 


real0m10.150s
user0m2.121s
sys 0m4.953s


This is easily reproducible even with single large files, but the multiple small files 
seems to illustrate some awful sync latency between each file.  

Any idea why ZFS over NFS is so bad?  I saw the threads that talk about an fsync penalty, 
but they don't seem relevant since the local ZFS performance is quite good.
  


Known issue, discussed here: 
http://www.opensolaris.org/jive/thread.jspa?threadID=14696tstart=15


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS

2006-12-20 Thread Ben Rockwood

Andrew Summers wrote:
 So, I've read the wikipedia, and have done a lot of research on google about 
 it, but it just doesn't make sense to me.  Correct me if I'm wrong, but you 
 can take a simple 5/10/20 GB drive or whatever size, and turn it into 
 exabytes of storage space?

 If that is not true, please explain the importance of this other than the 
 self heal and those other features.
   

I'm probably to blame for the image of endless storage.  With ZFS Sparse
Volumes (aka: Thin Provisioning) you can make a 1G drive _look_ like a
500TB drive, but of course it isn't.  See my entry on the topic here: 
http://www.cuddletech.com/blog/pivot/entry.php?id=729

With ZFS Compression you can, however, potentially store 10GB of data on
a 5GB drive.  It really depends on what type of data your storing and
how compressible it is, but I've seen almost 2:1 compression in some
cases by simply turning compression on.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS works in waves

2006-12-15 Thread Ben Rockwood


Stuart Glenn wrote:
A little back story: I have a Norco DS-1220, a 12 bay SATA box, it is 
connected to eSATA (SiI3124) via PCI-X two drives are straight 
connections, then the other two ports go to 5x multipliers within the 
box. My needs/hopes for this was using 12 500GB drives and ZFS make a 
very large  simple data dump spot on my network for other servers to 
rsync to daily  use zfs snapshots for some quick backup  if it 
things worked out start trying to save up towards getting a thumper 
someday


The trouble is it is too slow to really useable. At times it is fast 
enough to be useable, ~ 13MB/s write. However, this last for only a 
few minutes. It then just stalls doing nothing. iostat shows 100% 
blocking for one of the drives in the pool


I can however use dd to read or write directly to/from the disks all 
at the same time with good speed (~30MB/s according to dd)


The test pools I have had are either 2 raidz of 6 drives or 3 raidz of 
4 drives. The system is using an Athlon 64 3500+  1GB of RAM.


Any suggestions on what I could do to make this useable? More RAM? Too 
many drives for ZFS? Any tests to find the real slow down?


I would really like to use ZFS  solaris for this. Linux was able to 
use the same hardware using some beta kernel modules for the sata 
multipliers  its software raid at an acceptable speed, but I would 
like to finally rid my network of linux boxen.


I have similar issues on my home workstation.  They started happening 
when I put Seagate SATA-II drives with NCQ on a SI3124.  I do not 
believe this to be an issue with ZFS.  I've largely dismissed the issue 
as hardware caused, although I may be wrong.   This system has had 
several problems with SATA-II drives which hardware forums suggest are 
issues with the nForce4 chipset and SATA-II.


Anyway, your not alone, but its not a ZFS issue.  Its possible a tunable 
parameter in the SATA drivers would help.  If I find an answer I'll let 
you know.


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [nfs-discuss] Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-11 Thread Ben Rockwood


Robert Milkowski wrote:

Hello eric,

Saturday, December 9, 2006, 7:07:49 PM, you wrote:

ek Jim Mauro wrote:
  
Could be NFS synchronous semantics on file create (followed by 
repeated flushing of the write cache).  What kind of storage are you 
using (feel free to send privately if you need to) - is it a thumper? 

It's not clear why NFS-enforced synchronous semantics would induce 
different behavior than the same

load to a local ZFS.
  


ek Actually i forgot he had 'zil_disable' turned on, so it won't matter in
ek this case.


Ben, are you sure zil_disable was set to 1 BEFORE pool was imported?
  


Yes, absolutely.  Set var in /etc/system, reboot, system come up.  That 
happened almost 2 months ago, long before this lock insanity problem 
popped up.


To be clear, the ZIL issue was a problem for creation of a handful of 
files of any size.  Untar'ing a file was a massive performance drain.  
This issue, other the other hand, deals with thousands of little files 
being created all the time (IMAP Locks).  These are separate issues from 
my point of view.  With ZIL slowness NFS performance was just slow but 
we didn't see massive CPU usage, with this issue on the other hand we 
were seeing waves in 10 second-ish cycles where the run queue would go 
sky high with 0 idle.  Please see the earlier mails for examples of the 
symptoms.


benr.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Ben Rockwood


Spencer Shepler wrote:

Good to hear that you have figured out what is happening, Ben.

For future reference, there are two commands that you may want to
make use of in observing the behavior of the NFS server and individual
filesystems.

There is the trusty, nfsstat command.  In this case, you would have been
able to do something like:
nfsstat -s -v3 60

This will provide all of the server side NFSv3 statistics on 60 second
intervals.  


Then there is a new command fsstat that will provide vnode level
activity on a per filesystem basis.  Therefore, if the NFS server
has multiple filesystems active and you want ot look at just one
something like this can be helpful:

fsstat /export/foo 60

Fsstat has a 'full' option that will list all of the vnode operations
or just certain types.  It also will watch a filesystem type (e.g. zfs, nfs).
Very useful.
  



NFSstat I've been using, but fsstat I was unaware of.  Which I'd used it 
rather than duplicated most of its functionality with D script. :)


Thanks for the tip.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Ben Rockwood


Bill Moore wrote:

On Fri, Dec 08, 2006 at 12:15:27AM -0800, Ben Rockwood wrote:
  
Clearly ZFS file creation is just amazingly heavy even with ZIL 
disabled.  If creating 4,000 files in a minute squashes 4 2.6Ghz Opteron 
cores we're in big trouble in the longer term.  In the meantime I'm 
going to find a new home for our IMAP Mail so that the other things 
served from that NFS server at least aren't effected.



For local tests, this is not true of ZFS.  It seems that file creation
only swamps us when coming over NFS.  We can do thousands of files a
second on a Thumper with room to spare if NFS isn't involved.

Next step is to figure out why NFS kills us.
  


Agreed.  If mass file creation was a problem locally I'd think that we'd 
have people beating down the doors with complaints.


One thought I had as a work around was to move all my mail on NFS to an 
iSCSI LUN and then put a Zpool on that.  I'm willing to bet that'd work 
fine.  Hopefully I can try it.



To round out the discussion, the root cause of this whole mess was 
Courier IMAP Locking.   After isolating the problem last night and 
writing a little d script to find out what files were being create it 
was obviously lock files, turn off locking and file creations dropped to 
a reasonable level and our problem vanished.


If I can help at all with testing or analysis please let me know.


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-08 Thread Ben Rockwood


eric kustarz wrote:
So i'm guessing there's lots of files being created over NFS in one 
particular dataset?


We should figure out how many creates/second you are doing over NFS (i 
should have put a timeout on the script).  Here's a real simple one 
(from your snoop it looked like you're only doing NFSv3, so i'm not 
tracking NFSv4):


#!/usr/sbin/dtrace -s

rfs3_create:entry,
zfs_create:entry
{
@creates[probefunc] = count();
}

tick-60s
{
exit(0);
}




Eric, I love you. 

Running this bit of DTrace reveled more than 4,000 files being created 
in almost any given 60 second window.  And I've only got one system that 
would fit that sort of mass file creation: our Joyent Connector products 
Courier IMAP server which uses Maildir.  As a test I simply shutdown 
Courier and unmounted the mail NFS share for good measure and sure 
enough the problem vanished and could not be reproduced.  10 minutes 
later I re-enabled Courier and our problem came back. 

Clearly ZFS file creation is just amazingly heavy even with ZIL 
disabled.  If creating 4,000 files in a minute squashes 4 2.6Ghz Opteron 
cores we're in big trouble in the longer term.  In the meantime I'm 
going to find a new home for our IMAP Mail so that the other things 
served from that NFS server at least aren't effected.


You asked for the zpool and zfs info, which I don't want to share 
because its confidential (if you want it privately I'll do so, but not 
on a public list), but I will say that its a single massive Zpool in 
which we're using less than 2% of the capacity.   But in thinking about 
this problem, even if we used 2 or more pools, the CPU consumption still 
would have choked the system, right?  This leaves me really nervous 
about what we'll do when its not an internal mail server thats creating 
all those files but a customer. 

Oddly enough, this might be a very good reason to use iSCSI instead of 
NFS on the Thumper.


Eric, I owe you a couple cases of beer for sure.  I can't tell you how 
much I appreciate your help.  Thanks to everyone else who chimed in with 
ideas and suggestions, all of you guys are the best!


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Ben Rockwood

I've got a Thumper doing nothing but serving NFS.  Its using B43 with 
zil_disabled.  The system is being consumed in waves, but by what I don't know. 
 Notice vmstat:

 3 0 0 25693580 2586268 0 0  0  0  0  0  0  0  0  0  0  926   91  703  0 25 75
 21 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 13 14 1720   21 1105  0 92  8
 20 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 17 18 2538   70  834  0 100 0
 25 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0  0  0  745   18  179  0 100 0
 37 0 0 25693552 2586240 0 0 0  0  0  0  0  0  0  7  7 1152   52  313  0 100 0
 16 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0 15 13 1543   52  767  0 100 0
 17 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0  2  2  890   72  192  0 100 0
 27 0 0 25693572 2586260 0 0 0  0  0  0  0  0  0 15 15 3271   19 3103  0 98  2
 0 0 0 25693456 2586144 0 11 0  0  0  0  0  0  0 281 249 34335 242 37289 0 46 54
 0 0 0 25693448 2586136 0 2  0  0  0  0  0  0  0  0  0 2470  103 2900  0 27 73
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1062  105  822  0 26 74
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1076   91  857  0 25 75
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0  917  126  674  0 25 75

These spikes of sys load come in waves like this.  While there are close to a 
hundred systems mounting NFS shares on the Thumper, the amount of traffic is 
really low.  Nothing to justify this.  We're talking less than 10MB/s.

NFS is pathetically slow.  We're using NFSv3 TCP shared via ZFS sharenfs on a 
3Gbps aggregation (3*1Gbps).

I've been slamming my head against this problem for days and can't make 
headway.  I'll post some of my notes below.  Any thoughts or ideas are welcome!

benr.

===

Step 1 was to disable any ZFS features that might consume large amounts of CPU:

# zfs set compression=off joyous
# zfs set atime=off joyous
# zfs set checksum=off joyous

These changes had no effect.

Next was to consider that perhaps NFS was doing name lookups when it shouldn't. 
Indeed dns was specified in /etc/nsswitch.conf which won't work given that no 
DNS servers are accessable from the storage or private networks, but again, no 
improvement. In this process I removed dns from nsswitch.conf, deleted 
/etc/resolv.conf, and disabled the dns/client service in SMF.

Turning back to CPU usage, we can see the activity is all SYStem time and comes 
in waves:

[private:/tmp] root# sar 1 100

SunOS private.thumper1 5.11 snv_43 i86pc12/07/2006

10:38:05%usr%sys%wio   %idle
10:38:06   0  27   0  73
10:38:07   0  27   0  73
10:38:09   0  27   0  73
10:38:10   1  26   0  73
10:38:11   0  26   0  74
10:38:12   0  26   0  74
10:38:13   0  24   0  76
10:38:14   0   6   0  94
10:38:15   0   7   0  93
10:38:22   0  99   0   1  --
10:38:23   0  94   0   6  --
10:38:24   0  28   0  72
10:38:25   0  27   0  73
10:38:26   0  27   0  73
10:38:27   0  27   0  73
10:38:28   0  27   0  73
10:38:29   1  30   0  69
10:38:30   0  27   0  73

And so we consider whether or not there is a pattern to the frequency. The 
following is sar output from any lines in which sys is above 90%:

10:40:04%usr%sys%wio   %idleDelta
10:40:11   0  97   0   3
10:40:45   0  98   0   2   34 seconds
10:41:02   0  94   0   6   17 seconds
10:41:26   0 100   0   0   24 seconds
10:42:00   0 100   0   0   34 seconds
10:42:25   (end of sample) 25 seconds

Looking at the congestion in the run queue:

[private:/tmp] root# sar -q 5 100

10:45:43 runq-sz %runocc swpq-sz %swpocc
10:45:5127.0  85 0.0   0
10:45:57 1.0  20 0.0   0
10:46:02 2.0  60 0.0   0
10:46:1319.8  99 0.0   0
10:46:2317.7  99 0.0   0
10:46:3424.4  99 0.0   0
10:46:4122.1  97 0.0   0
10:46:4813.0  96 0.0   0
10:46:5525.3 102 0.0   0

Looking at the per-CPU breakdown:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   324  224000  1540 00 100   0   0
  10   00   1140  2260   10   130860   1   0  99
  20   00   162  138  1490540 00   1   0  99
  30   00556   460430 00   1   0  99
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   310  210   340   17  1717 50 100   0   0
  10   00   1521  2000   17   265591  65   0  34
  20   00   271  197  1751   13   202 00  66   0  34
  30   00   120

Re: [zfs-discuss] (OT: SVN branches) A versioning FS

2006-10-07 Thread Ben Gollmer


On Oct 6, 2006, at 12:18 PM, David Dyer-Bennet wrote:

On 10/5/06, Wee Yeh Tan [EMAIL PROTECTED] wrote:

On 10/6/06, David Dyer-Bennet [EMAIL PROTECTED] wrote:
 One of the big problems with CVS and SVN and Microsoft  
SourceSafe is
 that you don't have the benefits of version control most of the  
time,

 because all commits are *public*.

David,

That is exactly what branch is for in CVS and SVN.  Dunno much  
about

M$ SourceSafe.


I've never encountered branch being used that way, anywhere.  It's
used for things like developing release 2.0 while still supporting 1.5
and 1.6.

However, especially with merge in svn it might be feasible to use a
branch that way.  What's the operation to update the branch from the
trunk in that scenario?


We use personal branches all the time; in fact each developer has at  
least one, sometimes several if they are working on orthogonal issues  
or experimenting with a couple of different approaches to the same  
problem. Personal branches are for messy code, unfinished patches -  
basically anything that took longer than 15 minutes to write. Keeping  
that stuff on just one machine is unworkable as I code from many  
locations, not to mention the server is backed up more often.


Note that when I say 'personal', I mean intended for the use of one  
particular person. Some people refer to these as 'private' branches,  
but we don't do access control in svn other than on a per-project  
level, so other users can take a look at what I'm up to. This allows  
me to ask for suggestions or advice without having to email diffs  
around.


Updating from trunk is slightly irritating as svn doesn't do merge  
tracking ATM (it's in the works, though). Currently I just grep the  
commit log for the last merge from trunk (I use a consistent log  
message so this is easy).


svn log https://svn.example.com/project/branches/ben | grep 'Merged  
from trunk'

(note last merged revision)
svn merge -r$LAST_MERGED_REV:HEAD https://svn.example.com/project/ 
trunk /path/to/wc

(fix any conflicts)
svn ci /path/to/wc -m Merged from trunk r$LASTMERGEDREV

Of course, you can also cherry-pick changes from other branches or  
tags if you know the revision number(s).


From what I've seen on the svn mailing lists, this is a pretty  
common pattern to use. I don't think it's very common in CVS though,  
simply because branching and merging are more difficult.


--
Ben




PGP.sig
Description: This is a digitally signed message part
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A versioning FS

2006-10-07 Thread Ben Gollmer


On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote:
What I'm saying is that I'd like to be able to keep multiple  
versions of

my files without echo * or ls showing them to me by default.


Hmm, what about file.txt - ._file.txt.1, ._file.txt.2, etc? If you  
don't like the _ you could use @ or some other character.


I'd like an option for ls(1), find(1) and friends to show file  
versions,
and a way to copy (or, rather, un-hide) selected versions files so  
that
I could now refer to them as usual -- when I do this I don't care  
to see

version numbers in the file name, I just want to give them names.


ln -s ._file.txt.1 first_published_draft.txt
ln -s ._file.txt.5 second_published_draft.txt


And, maybe, I'd like a way to write globs that match file versions
(think of extended globboing, as in KSH).


Hmm, I'm not exactly sure what you mean by this, but using a dotfile  
scheme would allow you to easily glob for the file names.



Similarly with applications that keep files open but keep writing
transactions in ways that the OS can't isolate without input from the
app.  E.g., databases.  fsync(2) helps here, but lots and lots of
fsync(2)s would result in no useful versioning.


Presumably you'd create a different fs for your database, turning the  
versioning property off. You'd be likely to want to adjust other fs  
parameters anyway, judging from some recent posts discussing how to  
get the best database performance.


--
Ben




PGP.sig
Description: This is a digitally signed message part
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: NFS Performance and Tar

2006-10-03 Thread Ben Rockwood

I was really hoping for some option other than ZIL_DISABLE, but finally gave up 
the fight.  Some people suggested NFSv4 helping over NFSv3 but it didn't... at 
least not enough to matter.

ZIL_DISABLE was the solution, sadly.  I'm running B43/X86 and hoping to get up 
to 48 or so soonish (I BFU'd it straight to B48 last night and brick'ed it).

Here are the times.  This is an untar (gtar xfj) of SIDEkick 
(http://www.cuddletech.com/blog/pivot/entry.php?id=491) on NFSv4 on a 20TB 
RAIDZ2 ZFS Pool:

ZIL Enabled:
real1m26.941s

ZIL Disabled:
real0m5.789s


I'll update this post again when I finally get B48 or newer on the system and 
try it.  Thanks to everyone for their suggestions.

benr.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re[2]: System hang caused by a bad snapshot

2006-09-13 Thread Ben Miller

 Hello Matthew,
 Tuesday, September 12, 2006, 7:57:45 PM, you wrote:
 MA Ben Miller wrote:
  I had a strange ZFS problem this morning.  The
 entire system would
  hang when mounting the ZFS filesystems.  After
 trial and error I
  determined that the problem was with one of the
 2500 ZFS filesystems.
  When mounting that users' home the system would
 hang and need to be
  rebooted.  After I removed the snapshots (9 of
 them) for that
  filesystem everything was fine.
  
  I don't know how to reproduce this and didn't get
 a crash dump.  I
  don't remember seeing anything about this before
 so I wanted to
  report it and see if anyone has any ideas.
 
 MA Hmm, that sounds pretty bizarre, since I don't
 think that mounting a 
 MA filesystem doesn't really interact with snapshots
 at all. 
 MA Unfortunately, I don't think we'll be able to
 diagnose this without a 
 MA crash dump or reproducibility.  If it happens
 again, force a crash dump
 MA while the system is hung and we can take a look
 at it.
 
 Maybe it wasn't hung after all. I've seen similar
 behavior here
 sometimes. Did your disks used in a pool were
 actually working?
 

  There was lots of activity on the disks (iostat and status LEDs) until it got 
to this one filesystem and everything stopped.  'zpool iostat 5' stopped 
running, the shell wouldn't respond and activity on the disks stopped.  This fs 
is relatively small  (175M used of a 512M quota).

 Sometimes it takes a lot of time (30-50minutes) to
 mount a file system
 - it's rare, but it happens. And during this ZFS
 reads from those
 disks in a pool. I did report it here some time ago.
 
  In my case the system crashed during the evening and it was left hung up when 
I came in during the morning, so it was hung for a good 9-10 hours.

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] System hang caused by a bad snapshot

2006-09-12 Thread Ben Miller

I had a strange ZFS problem this morning.  The entire system would hang when 
mounting the ZFS filesystems.  After trial and error I determined that the 
problem was with one of the 2500 ZFS filesystems.  When mounting that users' 
home the system would hang and need to be rebooted.  After I removed the 
snapshots (9 of them) for that filesystem everything was fine.

I don't know how to reproduce this and didn't get a crash dump.  I don't 
remember seeing anything about this before so I wanted to report it and see if 
anyone has any ideas.

The system is a Sun Fire 280R with 3GB of RAM running SXCR b40.
The pool looks like this (I'm running a scrub currently):
# zpool status pool1
  pool: pool1
 state: ONLINE
 scrub: scrub in progress, 78.61% done, 0h18m to go
config:

NAME STATE READ WRITE CKSUM
pool1ONLINE   0 0 0
  raidz  ONLINE   0 0 0
c1t8d0   ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0

errors: No known data errors

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Home Server with ZFS

2006-08-18 Thread Ben Short

Hi, 

I'm plan to build home server that will host my svn repository, fileserver, 
mailserver and webserver. 
This is my plan..

I have an old dell precision 420 dual 933Mhz pIII cpus. Inside this i have one 
scsi 9.1G hdd and 2 80G ide hdds. I am going to install solaris 10 on the scsi 
drive and have it as the boot disk. I will then create a zfs mirror on the two 
ide drives. Since i dont want to mix internet facing services (mailserver, 
webservers) with my internal services (svn server, fileserver) i am going to 
use zones to isolate them. Not sure how many zones just yet.

In this configureation i hope too of gained the protection of having the 
serives mirrors ( will perform backups also ).

What i dont know is what happens if the boot disk dies? can i replace is, 
install solaris again and get it to see the zfs mirror?
Also what happens if one of the ide drives fails? can i plug another one in and 
run some zfs commands to make it part of the mirror?

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Removing a device from a zfs pool

2006-07-13 Thread Yacov Ben-Moshe

How can I remove a device or a partition from a pool.
NOTE: The devices are not mirrored or raidz

Thanks
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

1 2 >

100 matches

Mail list logo