[zfs-discuss] no valid replicas

2012-04-04 Thread Jan-Aage Frydenbø-Bruvoll
Dear List,

I am struggling with a storage pool on a server, where I would like to offline 
a device for replacement. The pool consists of two-disk stripes set up in 
mirrors (yep, stupid, but we were running out of VDs on the controller at the 
time, and that's where we are now...).

Here's the pool config:

root@storage:~# zpool status -v tank
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scan: resilvered 321G in 36h58m with 1 errors on Wed Apr  4 06:46:10 2012
config:

NAME  STATE READ WRITE CKSUM
tank  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
c1t14d0   ONLINE   0 0 0
c1t15d0   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
c1t19d0   ONLINE   0 0 0
c1t18d0   ONLINE   0 0 0
  mirror-2ONLINE   0 0 0
c1t20d0   ONLINE   0 0 0
c1t21d0   ONLINE   0 0 0
  mirror-3ONLINE   0 0 0
c1t22d0   ONLINE   0 0 0
c1t23d0   ONLINE   0 0 0
logs
  mirror-4ONLINE   0 0 0
c2t2d0p7  ONLINE   0 0 0
c2t3d0p7  ONLINE   0 0 0
cache
  c2t2d0p11   ONLINE   0 0 0
  c2t3d0p11   ONLINE   0 0 0 

errors: Permanent errors have been detected in the following files:

0xeb78a:0xa8be6b

What I would like to do is offline or detach c1t19d0, which the server won't 
let me do:

root@storage:~# zpool offline tank c1t19d0
cannot offline c1t19d0: no valid replicas

The errored file above is not important to me; it was part of a snapshot since 
deleted. Could that be related to this?

How can I find more information about why it simultaneously seems to think 
mirror-1 above is ok and broken?

Any ideas would be greatly appreciated. Thanks in advance for your kind 
assistance.

Best regards
Jan


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] no valid replicas

2012-04-04 Thread Jan-Aage Frydenbø-Bruvoll
Hi Richard,

Thanks for your reply.

 I am struggling with a storage pool on a server, where I would like to 
 offline a device for replacement.
 The pool consists of two-disk stripes set up in mirrors (yep, stupid, but we 
 were
 running out of VDs on the controller at the time, and that's where we are 
 now...).

 Which OS and release?

This is OpenIndiana oi_148, ZFS pool version 28.

 There was a bug in some releases circa 2010 that you might be hitting. It is 
 harmless, but annoying.

Ok - what bug is this, how do I verify whether I am facing it here and what 
remedies are there?

Best regards
Jan


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-19 Thread Jan-Aage Frydenbø-Bruvoll
Hi,

2011/12/19 Hung-Sheng Tsao (laoTsao) laot...@gmail.com:
 what is the ram size?

32 GB

 are there many snap? create then delete?

Currently, there are 36 snapshots on the pool - it is part of a fairly
normal backup regime of snapshots every 5 min, hour, day, week and
month.

 did you run a scrub?

Yes, as part of the previous drive failure. Nothing reported there.

Now, interestingly - I deleted two of the oldest snapshots yesterday,
and guess what - the performance went back to normal for a while. Now
it is severely dropping again - after a good while on 1.5-2GB/s I am
again seeing write performance in the 1-10MB/s range.

Is there an upper limit on the number of snapshots on a ZFS pool?

Best regards
Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-18 Thread Jan-Aage Frydenbø-Bruvoll
Dear List,

I have a storage server running OpenIndiana with a number of storage
pools on it. All the pools' disks come off the same controller, and
all pools are backed by SSD-based l2arc and ZIL. Performance is
excellent on all pools but one, and I am struggling greatly to figure
out what is wrong.

A very basic test shows the following - pretty much typical
performance at the moment:

root@stor:/# for a in pool1 pool2 pool3; do dd if=/dev/zero of=$a/file
bs=1M count=10; done
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.00772965 s, 1.4 GB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.00996472 s, 1.1 GB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 71.8995 s, 146 kB/s

The zpool status of the affected pool is:

root@stor:/# zpool status pool3
  pool: pool3
 state: ONLINE
 scan: resilvered 222G in 24h2m with 0 errors on Wed Dec 14 15:20:11 2011
config:

NAME  STATE READ WRITE CKSUM
pool3 ONLINE   0 0 0
  c1t0d0  ONLINE   0 0 0
  c1t1d0  ONLINE   0 0 0
  c1t2d0  ONLINE   0 0 0
  c1t3d0  ONLINE   0 0 0
  c1t4d0  ONLINE   0 0 0
  c1t5d0  ONLINE   0 0 0
  c1t6d0  ONLINE   0 0 0
  c1t7d0  ONLINE   0 0 0
  c1t8d0  ONLINE   0 0 0
  c1t9d0  ONLINE   0 0 0
  c1t10d0 ONLINE   0 0 0
  mirror-12   ONLINE   0 0 0
c1t26d0   ONLINE   0 0 0
c1t27d0   ONLINE   0 0 0
  mirror-13   ONLINE   0 0 0
c1t28d0   ONLINE   0 0 0
c1t29d0   ONLINE   0 0 0
  mirror-14   ONLINE   0 0 0
c1t34d0   ONLINE   0 0 0
c1t35d0   ONLINE   0 0 0
logs
  mirror-11   ONLINE   0 0 0
c2t2d0p8  ONLINE   0 0 0
c2t3d0p8  ONLINE   0 0 0
cache
  c2t2d0p12   ONLINE   0 0 0
  c2t3d0p12   ONLINE   0 0 0

errors: No known data errors

Ditto for the disk controller - MegaCli reports zero errors, be that
on the controller itself, on this pool's disks or on any of the other
attached disks.

I am pretty sure I am dealing with a disk-based problem here, i.e. a
flaky disk that is just slow without exhibiting any actual data
errors, holding the rest of the pool back, but I am at a miss as how
to pinpoint what is going on.

Would anybody on the list be able to give me any pointers as how to
dig up more detailed information about the pool's/hardware's
performance?

Thank you in advance for your kind assistance.

Best regards
Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-18 Thread Jan-Aage Frydenbø-Bruvoll
Hi,

On Sun, Dec 18, 2011 at 15:13, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.
laot...@gmail.com wrote:
 what are the output of zpool status pool1 and pool2
 it seems that you have mix configuration of pool3 with disk and mirror

The other two pools show very similar outputs:

root@stor:~# zpool status pool1
  pool: pool1
 state: ONLINE
 scan: resilvered 1.41M in 0h0m with 0 errors on Sun Dec  4 17:42:35 2011
config:

NAME  STATE READ WRITE CKSUM
pool1  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
c1t12d0   ONLINE   0 0 0
c1t13d0   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
c1t24d0   ONLINE   0 0 0
c1t25d0   ONLINE   0 0 0
  mirror-2ONLINE   0 0 0
c1t30d0   ONLINE   0 0 0
c1t31d0   ONLINE   0 0 0
  mirror-3ONLINE   0 0 0
c1t32d0   ONLINE   0 0 0
c1t33d0   ONLINE   0 0 0
logs
  mirror-4ONLINE   0 0 0
c2t2d0p6  ONLINE   0 0 0
c2t3d0p6  ONLINE   0 0 0
cache
  c2t2d0p10   ONLINE   0 0 0
  c2t3d0p10   ONLINE   0 0 0

errors: No known data errors
root@stor:~# zpool status pool2
  pool: pool2
 state: ONLINE
 scan: scrub canceled on Wed Dec 14 07:51:50 2011
config:

NAME  STATE READ WRITE CKSUM
pool2 ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
c1t14d0   ONLINE   0 0 0
c1t15d0   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
c1t18d0   ONLINE   0 0 0
c1t19d0   ONLINE   0 0 0
  mirror-2ONLINE   0 0 0
c1t20d0   ONLINE   0 0 0
c1t21d0   ONLINE   0 0 0
  mirror-3ONLINE   0 0 0
c1t22d0   ONLINE   0 0 0
c1t23d0   ONLINE   0 0 0
logs
  mirror-4ONLINE   0 0 0
c2t2d0p7  ONLINE   0 0 0
c2t3d0p7  ONLINE   0 0 0
cache
  c2t2d0p11   ONLINE   0 0 0
  c2t3d0p11   ONLINE   0 0 0

The affected pool does indeed have a mix of straight disks and
mirrored disks (due to running out of vdevs on the controller),
however it has to be added that the performance of the affected pool
was excellent until around 3 weeks ago, and there have been no
structural changes nor to the pools neither to anything else on this
server in the last half year or so.

-jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-18 Thread Jan-Aage Frydenbø-Bruvoll
Hi,

On Sun, Dec 18, 2011 at 16:41, Fajar A. Nugraha w...@fajar.net wrote:
 Is the pool over 80% full? Do you have dedup enabled (even if it was
 turned off later, see zpool history)?

The pool stands at 86%, but that has not changed in any way that
corresponds chronologically with the sudden drop in performance on the
pool.

With regards to dedup - I have already got my fingers burned
thoroughly on dedup, so that setting is, has been and will remain
firmly off.

-jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-18 Thread Jan-Aage Frydenbø-Bruvoll
Hi,

On Sun, Dec 18, 2011 at 22:00, Fajar A. Nugraha w...@fajar.net wrote:
 From http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
 (or at least Google's cache of it, since it seems to be inaccessible
 now:

 
 Keep pool space under 80% utilization to maintain pool performance.
 Currently, pool performance can degrade when a pool is very full and
 file systems are updated frequently, such as on a busy mail server.
 Full pools might cause a performance penalty, but no other issues. If
 the primary workload is immutable files (write once, never remove),
 then you can keep a pool in the 95-96% utilization range. Keep in mind
 that even with mostly static content in the 95-96% range, write, read,
 and resilvering performance might suffer.
 

 I'm guessing that your nearly-full disk, combined with your usage
 performance, is the cause of slow down. Try freeing up some space
 (e.g. make it about 75% full), just tot be sure.

I'm aware of the guidelines you refer to, and I have had slowdowns
before due to the pool being too full, but that was in the 9x% range
and the slowdown was in the order of a few percent.

At the moment I am slightly above the recommended limit, and the
performance is currently between 1/1 and 1/2000 of what the other
pools achieve - i.e. a few hundred kB/s versus 2GB/s on the other
pools - surely allocation above 80% cannot carry such extreme
penalties?!

For the record - the read/write load on the pool is almost exclusively WORM.

Best regards
Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-18 Thread Jan-Aage Frydenbø-Bruvoll
Hi,

On Sun, Dec 18, 2011 at 22:14, Nathan Kroenert nat...@tuneunix.com wrote:
  I know some others may already have pointed this out - but I can't see it
 and not say something...

 Do you realise that losing a single disk in that pool could pretty much
 render the whole thing busted?

 At least for me - the rate at which _I_ seem to lose disks, it would be
 worth considering something different ;)

Yeah, I have thought that thought myself. I am pretty sure I have a
broken disk, however I cannot for the life of me find out which one.
zpool status gives me nothing to work on, MegaCli reports that all
virtual and physical drives are fine, and iostat gives me nothing
either.

What other tools are there out there that could help me pinpoint
what's going on?

Best regards
Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-18 Thread Jan-Aage Frydenbø-Bruvoll
Hi Craig,

On Sun, Dec 18, 2011 at 22:33, Craig Morgan crgm...@gmail.com wrote:
 Try fmdump -e and then fmdump -eV, it could be a pathological disk just this 
 side of failure doing heavy retries that id dragging the pool down.

Thanks for the hint - didn't know about fmdump. Nothing in the log
since 13 Dec, though - when the previous failed disk was replaced.

Best regards
Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controllererrors?!

2011-12-18 Thread Jan-Aage Frydenbø-Bruvoll
Hi,

On Sun, Dec 18, 2011 at 22:38, Matt Breitbach matth...@flash.shanje.com wrote:
 I'd look at iostat -En.  It will give you a good breakdown of disks that
 have seen errors.  I've also spotted failing disks just by watching an
 iostat -nxz and looking for the one who's spending more %busy than the rest
 of them, or exhibiting longer than normal service times.

Thanks for that - I've been looking at iostat output for a while
without being able to make proper sense of it, i.e. it doesn't really
look that weird. However, on a side note - would you happen to know
whether it is possible to reset the error counter for a particular
device?

In the output I get this:

c1t29d0  Soft Errors: 0 Hard Errors: 5395 Transport Errors: 5394

And c1t29d0 is actually a striped pair of disks where one disk failed
recently. (c1t29d0 has a mirror on the zpool level - the reason for
this weird config was running out of vdevs on the controller). The
device should be just fine now - the counter has stopped incrementing
- however having that number there is confusing for debugging.

Best regards
Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-18 Thread Jan-Aage Frydenbø-Bruvoll
On Sun, Dec 18, 2011 at 22:14, Nathan Kroenert nat...@tuneunix.com wrote:
 Do you realise that losing a single disk in that pool could pretty much
 render the whole thing busted?

Ah - didn't pick up on that one until someone here pointed it out -
all my disks are mirrored, however some of them are mirrored on the
controller level.

Best regards
Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss