Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-30 Thread jason matthews
Replace it. Reslivering should not as painful if all your disks are functioning 
normally.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-30 Thread LIC mesh
If we've found one bad disk, what are our options?





On Thu, Sep 30, 2010 at 10:12 AM, Richard Elling
wrote:

> On Sep 30, 2010, at 2:32 AM, Tuomas Leikola wrote:
>
> > On Thu, Sep 30, 2010 at 1:16 AM, Scott Meilicke <
> scott.meili...@craneaerospace.com> wrote:
> > Resliver speed has been beaten to death I know, but is there a way to
> avoid this? For example, is more enterprisy hardware less susceptible to
> reslivers? This box is used for development VMs, but there is no way I would
> consider this for production with this kind of performance hit during a
> resliver.
> >
> >
> > According to
> >
> > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
> >
> > resilver should in later builds have some option to limit rebuild speed
> in order to allow for more IO during reconstruction, but I havent't found
> any guides on how to actually make use of this feature. Maybe someone can
> shed some light on this?
>
> Simple.  Resilver activity is throttled using a delay method.  Nothing to
> tune here.
>
> In general, if resilver or scrub make a system seem unresponsive, there is
> a
> root cause that is related to the I/O activity. To diagnose, I usually use
> "iostat -zxCn 10"
> (or similar) and look for unusual asvc_t from a busy disk.  One bad disk
> can ruin
> performance for the whole pool.
>  -- richard
>
> --
> OpenStorage Summit, October 25-27, Palo Alto, CA
> http://nexenta-summit2010.eventbrite.com
> ZFS and performance consulting
> http://www.RichardElling.com
>
>
>
>
>
>
>
>
>
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-30 Thread Richard Elling
On Sep 30, 2010, at 2:32 AM, Tuomas Leikola wrote:

> On Thu, Sep 30, 2010 at 1:16 AM, Scott Meilicke 
>  wrote:
> Resliver speed has been beaten to death I know, but is there a way to avoid 
> this? For example, is more enterprisy hardware less susceptible to reslivers? 
> This box is used for development VMs, but there is no way I would consider 
> this for production with this kind of performance hit during a resliver.
> 
> 
> According to
> 
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
> 
> resilver should in later builds have some option to limit rebuild speed in 
> order to allow for more IO during reconstruction, but I havent't found any 
> guides on how to actually make use of this feature. Maybe someone can shed 
> some light on this? 

Simple.  Resilver activity is throttled using a delay method.  Nothing to tune 
here.

In general, if resilver or scrub make a system seem unresponsive, there is a 
root cause that is related to the I/O activity. To diagnose, I usually use 
"iostat -zxCn 10"
(or similar) and look for unusual asvc_t from a busy disk.  One bad disk can 
ruin
performance for the whole pool.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-30 Thread Tuomas Leikola
On Thu, Sep 30, 2010 at 1:16 AM, Scott Meilicke <
scott.meili...@craneaerospace.com> wrote:

> Resliver speed has been beaten to death I know, but is there a way to avoid
> this? For example, is more enterprisy hardware less susceptible to
> reslivers? This box is used for development VMs, but there is no way I would
> consider this for production with this kind of performance hit during a
> resliver.
>
>
According to

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473

resilver
should in later builds have some option to limit rebuild speed in order to
allow for more IO during reconstruction, but I havent't found any guides on
how to actually make use of this feature. Maybe someone can shed some light
on this?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-29 Thread LIC mesh
Yeah, I'm having a combination of this and the "resilver constantly
restarting" issue.

And nothing to free up space.

It was recommended to me to replace any expanders I had between the HBA and
the drives with extra HBAs, but my array doesn't have expanders.

If your's does, you may want to try that.

Otherwise, wait it out :(




On Wed, Sep 29, 2010 at 6:37 PM, Scott Meilicke  wrote:

> I should add I have 477 snapshots across all files systems. Most of them
> are hourly snaps (225 of them anyway).
>
> On Sep 29, 2010, at 3:16 PM, Scott Meilicke wrote:
>
> > This must be resliver day :)
> >
> > I just had a drive failure. The hot spare kicked in, and access to the
> pool over NFS was effectively zero for about 45 minutes. Currently the pool
> is still reslivering, but for some reason I can access the file system now.
> >
> > Resliver speed has been beaten to death I know, but is there a way to
> avoid this? For example, is more enterprisy hardware less susceptible to
> reslivers? This box is used for development VMs, but there is no way I would
> consider this for production with this kind of performance hit during a
> resliver.
> >
> > My hardware:
> > Dell 2950
> > 16G ram
> > 16 disk SAS chassis
> > LSI 3801 (I think) SAS card (1068e chip)
> > Intel x25-e SLOG off of the internal PERC 5/i RAID controller
> > Seagate 750G disks (7200.11)
> >
> > I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc
> i386 i86pc Solaris)
> >
> >  pool: data01
> > state: DEGRADED
> > status: One or more devices is currently being resilvered.  The pool will
> >   continue to function, possibly in a degraded state.
> > action: Wait for the resilver to complete.
> > scan: resilver in progress since Wed Sep 29 14:03:52 2010
> >1.12T scanned out of 5.00T at 311M/s, 3h37m to go
> >82.0G resilvered, 22.42% done
> > config:
> >
> >   NAME   STATE READ WRITE CKSUM
> >   data01 DEGRADED 0 0 0
> > raidz2-0 ONLINE   0 0 0
> >   c1t8d0 ONLINE   0 0 0
> >   c1t9d0 ONLINE   0 0 0
> >   c1t10d0ONLINE   0 0 0
> >   c1t11d0ONLINE   0 0 0
> >   c1t12d0ONLINE   0 0 0
> >   c1t13d0ONLINE   0 0 0
> >   c1t14d0ONLINE   0 0 0
> > raidz2-1 DEGRADED 0 0 0
> >   c1t22d0ONLINE   0 0 0
> >   c1t15d0ONLINE   0 0 0
> >   c1t16d0ONLINE   0 0 0
> >   c1t17d0ONLINE   0 0 0
> >   c1t23d0ONLINE   0 0 0
> >   spare-5REMOVED  0 0 0
> > c1t20d0  REMOVED  0 0 0
> > c8t18d0  ONLINE   0 0 0  (resilvering)
> >   c1t21d0ONLINE   0 0 0
> >   logs
> > c0t1d0   ONLINE   0 0 0
> >   spares
> > c8t18d0  INUSE currently in use
> >
> > errors: No known data errors
> >
> > Thanks for any insights.
> >
> > -Scott
> > --
> > This message posted from opensolaris.org
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> Scott Meilicke
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-29 Thread Scott Meilicke
I should add I have 477 snapshots across all files systems. Most of them are 
hourly snaps (225 of them anyway).

On Sep 29, 2010, at 3:16 PM, Scott Meilicke wrote:

> This must be resliver day :)
> 
> I just had a drive failure. The hot spare kicked in, and access to the pool 
> over NFS was effectively zero for about 45 minutes. Currently the pool is 
> still reslivering, but for some reason I can access the file system now. 
> 
> Resliver speed has been beaten to death I know, but is there a way to avoid 
> this? For example, is more enterprisy hardware less susceptible to reslivers? 
> This box is used for development VMs, but there is no way I would consider 
> this for production with this kind of performance hit during a resliver.
> 
> My hardware:
> Dell 2950
> 16G ram
> 16 disk SAS chassis
> LSI 3801 (I think) SAS card (1068e chip)
> Intel x25-e SLOG off of the internal PERC 5/i RAID controller
> Seagate 750G disks (7200.11)
> 
> I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 
> i86pc Solaris)
> 
>  pool: data01
> state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>   continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scan: resilver in progress since Wed Sep 29 14:03:52 2010
>1.12T scanned out of 5.00T at 311M/s, 3h37m to go
>82.0G resilvered, 22.42% done
> config:
> 
>   NAME   STATE READ WRITE CKSUM
>   data01 DEGRADED 0 0 0
> raidz2-0 ONLINE   0 0 0
>   c1t8d0 ONLINE   0 0 0
>   c1t9d0 ONLINE   0 0 0
>   c1t10d0ONLINE   0 0 0
>   c1t11d0ONLINE   0 0 0
>   c1t12d0ONLINE   0 0 0
>   c1t13d0ONLINE   0 0 0
>   c1t14d0ONLINE   0 0 0
> raidz2-1 DEGRADED 0 0 0
>   c1t22d0ONLINE   0 0 0
>   c1t15d0ONLINE   0 0 0
>   c1t16d0ONLINE   0 0 0
>   c1t17d0ONLINE   0 0 0
>   c1t23d0ONLINE   0 0 0
>   spare-5REMOVED  0 0 0
> c1t20d0  REMOVED  0 0 0
> c8t18d0  ONLINE   0 0 0  (resilvering)
>   c1t21d0ONLINE   0 0 0
>   logs
> c0t1d0   ONLINE   0 0 0
>   spares
> c8t18d0  INUSE currently in use
> 
> errors: No known data errors
> 
> Thanks for any insights.
> 
> -Scott
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Resliver making the system unresponsive

2010-09-29 Thread Scott Meilicke
This must be resliver day :)

I just had a drive failure. The hot spare kicked in, and access to the pool 
over NFS was effectively zero for about 45 minutes. Currently the pool is still 
reslivering, but for some reason I can access the file system now. 

Resliver speed has been beaten to death I know, but is there a way to avoid 
this? For example, is more enterprisy hardware less susceptible to reslivers? 
This box is used for development VMs, but there is no way I would consider this 
for production with this kind of performance hit during a resliver.

My hardware:
Dell 2950
16G ram
16 disk SAS chassis
LSI 3801 (I think) SAS card (1068e chip)
Intel x25-e SLOG off of the internal PERC 5/i RAID controller
Seagate 750G disks (7200.11)

I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 
i86pc Solaris)

  pool: data01
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Wed Sep 29 14:03:52 2010
1.12T scanned out of 5.00T at 311M/s, 3h37m to go
82.0G resilvered, 22.42% done
config:

NAME   STATE READ WRITE CKSUM
data01 DEGRADED 0 0 0
  raidz2-0 ONLINE   0 0 0
c1t8d0 ONLINE   0 0 0
c1t9d0 ONLINE   0 0 0
c1t10d0ONLINE   0 0 0
c1t11d0ONLINE   0 0 0
c1t12d0ONLINE   0 0 0
c1t13d0ONLINE   0 0 0
c1t14d0ONLINE   0 0 0
  raidz2-1 DEGRADED 0 0 0
c1t22d0ONLINE   0 0 0
c1t15d0ONLINE   0 0 0
c1t16d0ONLINE   0 0 0
c1t17d0ONLINE   0 0 0
c1t23d0ONLINE   0 0 0
spare-5REMOVED  0 0 0
  c1t20d0  REMOVED  0 0 0
  c8t18d0  ONLINE   0 0 0  (resilvering)
c1t21d0ONLINE   0 0 0
logs
  c0t1d0   ONLINE   0 0 0
spares
  c8t18d0  INUSE currently in use

errors: No known data errors

Thanks for any insights.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss