subject:"Re\: \[zfs\-discuss\] Resliver making the system unresponsive"

Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-30 Thread jason matthews

Replace it. Reslivering should not as painful if all your disks are functioning 
normally.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-30 Thread LIC mesh

If we've found one bad disk, what are our options?





On Thu, Sep 30, 2010 at 10:12 AM, Richard Elling
wrote:

> On Sep 30, 2010, at 2:32 AM, Tuomas Leikola wrote:
>
> > On Thu, Sep 30, 2010 at 1:16 AM, Scott Meilicke <
> scott.meili...@craneaerospace.com> wrote:
> > Resliver speed has been beaten to death I know, but is there a way to
> avoid this? For example, is more enterprisy hardware less susceptible to
> reslivers? This box is used for development VMs, but there is no way I would
> consider this for production with this kind of performance hit during a
> resliver.
> >
> >
> > According to
> >
> > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
> >
> > resilver should in later builds have some option to limit rebuild speed
> in order to allow for more IO during reconstruction, but I havent't found
> any guides on how to actually make use of this feature. Maybe someone can
> shed some light on this?
>
> Simple.  Resilver activity is throttled using a delay method.  Nothing to
> tune here.
>
> In general, if resilver or scrub make a system seem unresponsive, there is
> a
> root cause that is related to the I/O activity. To diagnose, I usually use
> "iostat -zxCn 10"
> (or similar) and look for unusual asvc_t from a busy disk.  One bad disk
> can ruin
> performance for the whole pool.
>  -- richard
>
> --
> OpenStorage Summit, October 25-27, Palo Alto, CA
> http://nexenta-summit2010.eventbrite.com
> ZFS and performance consulting
> http://www.RichardElling.com
>
>
>
>
>
>
>
>
>
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-30 Thread Richard Elling

On Sep 30, 2010, at 2:32 AM, Tuomas Leikola wrote:

> On Thu, Sep 30, 2010 at 1:16 AM, Scott Meilicke 
>  wrote:
> Resliver speed has been beaten to death I know, but is there a way to avoid 
> this? For example, is more enterprisy hardware less susceptible to reslivers? 
> This box is used for development VMs, but there is no way I would consider 
> this for production with this kind of performance hit during a resliver.
> 
> 
> According to
> 
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
> 
> resilver should in later builds have some option to limit rebuild speed in 
> order to allow for more IO during reconstruction, but I havent't found any 
> guides on how to actually make use of this feature. Maybe someone can shed 
> some light on this? 

Simple.  Resilver activity is throttled using a delay method.  Nothing to tune 
here.

In general, if resilver or scrub make a system seem unresponsive, there is a 
root cause that is related to the I/O activity. To diagnose, I usually use 
"iostat -zxCn 10"
(or similar) and look for unusual asvc_t from a busy disk.  One bad disk can 
ruin
performance for the whole pool.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-30 Thread Tuomas Leikola

On Thu, Sep 30, 2010 at 1:16 AM, Scott Meilicke <
scott.meili...@craneaerospace.com> wrote:

> Resliver speed has been beaten to death I know, but is there a way to avoid
> this? For example, is more enterprisy hardware less susceptible to
> reslivers? This box is used for development VMs, but there is no way I would
> consider this for production with this kind of performance hit during a
> resliver.
>
>
According to

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473

resilver
should in later builds have some option to limit rebuild speed in order to
allow for more IO during reconstruction, but I havent't found any guides on
how to actually make use of this feature. Maybe someone can shed some light
on this?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-29 Thread LIC mesh

Yeah, I'm having a combination of this and the "resilver constantly
restarting" issue.

And nothing to free up space.

It was recommended to me to replace any expanders I had between the HBA and
the drives with extra HBAs, but my array doesn't have expanders.

If your's does, you may want to try that.

Otherwise, wait it out :(




On Wed, Sep 29, 2010 at 6:37 PM, Scott Meilicke  wrote:

> I should add I have 477 snapshots across all files systems. Most of them
> are hourly snaps (225 of them anyway).
>
> On Sep 29, 2010, at 3:16 PM, Scott Meilicke wrote:
>
> > This must be resliver day :)
> >
> > I just had a drive failure. The hot spare kicked in, and access to the
> pool over NFS was effectively zero for about 45 minutes. Currently the pool
> is still reslivering, but for some reason I can access the file system now.
> >
> > Resliver speed has been beaten to death I know, but is there a way to
> avoid this? For example, is more enterprisy hardware less susceptible to
> reslivers? This box is used for development VMs, but there is no way I would
> consider this for production with this kind of performance hit during a
> resliver.
> >
> > My hardware:
> > Dell 2950
> > 16G ram
> > 16 disk SAS chassis
> > LSI 3801 (I think) SAS card (1068e chip)
> > Intel x25-e SLOG off of the internal PERC 5/i RAID controller
> > Seagate 750G disks (7200.11)
> >
> > I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc
> i386 i86pc Solaris)
> >
> >  pool: data01
> > state: DEGRADED
> > status: One or more devices is currently being resilvered.  The pool will
> >   continue to function, possibly in a degraded state.
> > action: Wait for the resilver to complete.
> > scan: resilver in progress since Wed Sep 29 14:03:52 2010
> >1.12T scanned out of 5.00T at 311M/s, 3h37m to go
> >82.0G resilvered, 22.42% done
> > config:
> >
> >   NAME   STATE READ WRITE CKSUM
> >   data01 DEGRADED 0 0 0
> > raidz2-0 ONLINE   0 0 0
> >   c1t8d0 ONLINE   0 0 0
> >   c1t9d0 ONLINE   0 0 0
> >   c1t10d0ONLINE   0 0 0
> >   c1t11d0ONLINE   0 0 0
> >   c1t12d0ONLINE   0 0 0
> >   c1t13d0ONLINE   0 0 0
> >   c1t14d0ONLINE   0 0 0
> > raidz2-1 DEGRADED 0 0 0
> >   c1t22d0ONLINE   0 0 0
> >   c1t15d0ONLINE   0 0 0
> >   c1t16d0ONLINE   0 0 0
> >   c1t17d0ONLINE   0 0 0
> >   c1t23d0ONLINE   0 0 0
> >   spare-5REMOVED  0 0 0
> > c1t20d0  REMOVED  0 0 0
> > c8t18d0  ONLINE   0 0 0  (resilvering)
> >   c1t21d0ONLINE   0 0 0
> >   logs
> > c0t1d0   ONLINE   0 0 0
> >   spares
> > c8t18d0  INUSE currently in use
> >
> > errors: No known data errors
> >
> > Thanks for any insights.
> >
> > -Scott
> > --
> > This message posted from opensolaris.org
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> Scott Meilicke
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-29 Thread Scott Meilicke

I should add I have 477 snapshots across all files systems. Most of them are 
hourly snaps (225 of them anyway).

On Sep 29, 2010, at 3:16 PM, Scott Meilicke wrote:

> This must be resliver day :)
> 
> I just had a drive failure. The hot spare kicked in, and access to the pool 
> over NFS was effectively zero for about 45 minutes. Currently the pool is 
> still reslivering, but for some reason I can access the file system now. 
> 
> Resliver speed has been beaten to death I know, but is there a way to avoid 
> this? For example, is more enterprisy hardware less susceptible to reslivers? 
> This box is used for development VMs, but there is no way I would consider 
> this for production with this kind of performance hit during a resliver.
> 
> My hardware:
> Dell 2950
> 16G ram
> 16 disk SAS chassis
> LSI 3801 (I think) SAS card (1068e chip)
> Intel x25-e SLOG off of the internal PERC 5/i RAID controller
> Seagate 750G disks (7200.11)
> 
> I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 
> i86pc Solaris)
> 
>  pool: data01
> state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>   continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scan: resilver in progress since Wed Sep 29 14:03:52 2010
>1.12T scanned out of 5.00T at 311M/s, 3h37m to go
>82.0G resilvered, 22.42% done
> config:
> 
>   NAME   STATE READ WRITE CKSUM
>   data01 DEGRADED 0 0 0
> raidz2-0 ONLINE   0 0 0
>   c1t8d0 ONLINE   0 0 0
>   c1t9d0 ONLINE   0 0 0
>   c1t10d0ONLINE   0 0 0
>   c1t11d0ONLINE   0 0 0
>   c1t12d0ONLINE   0 0 0
>   c1t13d0ONLINE   0 0 0
>   c1t14d0ONLINE   0 0 0
> raidz2-1 DEGRADED 0 0 0
>   c1t22d0ONLINE   0 0 0
>   c1t15d0ONLINE   0 0 0
>   c1t16d0ONLINE   0 0 0
>   c1t17d0ONLINE   0 0 0
>   c1t23d0ONLINE   0 0 0
>   spare-5REMOVED  0 0 0
> c1t20d0  REMOVED  0 0 0
> c8t18d0  ONLINE   0 0 0  (resilvering)
>   c1t21d0ONLINE   0 0 0
>   logs
> c0t1d0   ONLINE   0 0 0
>   spares
> c8t18d0  INUSE currently in use
> 
> errors: No known data errors
> 
> Thanks for any insights.
> 
> -Scott
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resliver making the system unresponsive

Re: [zfs-discuss] Resliver making the system unresponsive

Re: [zfs-discuss] Resliver making the system unresponsive

Re: [zfs-discuss] Resliver making the system unresponsive

Re: [zfs-discuss] Resliver making the system unresponsive

Re: [zfs-discuss] Resliver making the system unresponsive

6 matches

Site Navigation

Mail list logo

Footer information