Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Richard Elling
[richard tries pushing the rope one more time]

On Mar 21, 2011, at 8:40 PM, Edward Ned Harvey wrote:

>> From: Richard Elling [mailto:richard.ell...@gmail.com]
>> 
>> There is no direct correlation between the number of blocks and resilver
>> time.
> 
> Incorrect.
> 
> Although there are possibly some cases where you could be bandwidth limited,
> it's certainly not true in general.
> 
> If Richard were correct, then a resilver would never take longer than
> resilvering an entire disk (including unused space) sequentially.

I can prove this to be true for a device that does not suffer from a seek 
penalty.

>  The time
> to resilver an entire disk sequentially is easily calculated, if you know
> the sustained sequential speed of the disk and size of the disk.  In my
> case, I have a 1TB mirror, where each disk can sustain 1Gbit/sec. Which
> means according to Richard, my max resilver time would be 133min.  In
> reality, my system resilvered in 12 hours while otherwise idle.  

Bummer, your disk must have some sort of seek penalty... perhaps 8.2 ms?

> This can
> only be explained one way:  As Erik says, the order in which my disks
> resilvered is not disk ordered.  My disks resilver time was random access
> time limited.  Not bandwidth limited.

I have data that proves the resilver time depends on the data layout and that
layout changes as your usage of the pool changes. Like most things in ZFS, it 
is dynamic. The data proves the resilver time is not correlated to the number of
disks in a vdev. The data shows that the resilver time is dependent on the speed
of the resilvering disk. I am glad that your experience confirms this. But why 
does
it need to be rehashed every few months on the alias?
 -- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Edward Ned Harvey
> From: Richard Elling [mailto:richard.ell...@gmail.com]
> 
> There is no direct correlation between the number of blocks and resilver
> time.

Incorrect.

Although there are possibly some cases where you could be bandwidth limited,
it's certainly not true in general.

If Richard were correct, then a resilver would never take longer than
resilvering an entire disk (including unused space) sequentially.  The time
to resilver an entire disk sequentially is easily calculated, if you know
the sustained sequential speed of the disk and size of the disk.  In my
case, I have a 1TB mirror, where each disk can sustain 1Gbit/sec.  Which
means according to Richard, my max resilver time would be 133min.  In
reality, my system resilvered in 12 hours while otherwise idle.  This can
only be explained one way:  As Erik says, the order in which my disks
resilvered is not disk ordered.  My disks resilver time was random access
time limited.  Not bandwidth limited.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Erik Trimble

On 3/21/2011 3:25 PM, Richard Elling wrote:

On Mar 21, 2011, at 5:09 AM, Edward Ned Harvey wrote:


From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Richard Elling

How many times do we have to rehash this? The speed of resilver is
dependent on the amount of data, the distribution of data on the

resilvering

device, speed of the resilvering device, and the throttle. It is NOT

dependent

on the number of drives in the vdev.

What the heck?  Yes it is.  Indirectly.  When you say it depends on the
amount of data, speed of resilvering device, etc, what you really mean
(correctly) is that it depends on the total number of used blocks that must
be resilvered on the resilvering device, multiplied by the access time for
the resilvering device.  And of course, throttling and usage during resilver
can have a big impact.  And various other factors.  But the controllable big
factor is the number of blocks used in the degraded vdev.

There is no direct correlation between the number of blocks and resilver time.



Just to be clear here, remember block != slab.  Slab is the allocation 
unit often seen through the "recordsize" attribute.


The number of data *slabs* directly correlates to resilver time.


So here is how the number of devices in the vdev matter:

If you have your whole pool made of one vdev, then every block in the pool
will be on the resilvering device.  You must spend time resilvering every
single block in the whole pool.

If you have the same amount of data, on a pool broken into N smaller vdev's,
then approximately speaking, 1/N of the blocks in the pool must be
resilvered on the resilvering vdev.  And therefore the resilver goes
approximately N times faster.

Nope. The resilver time is dependent on the speed of the resilvering disk.
Well, unless my previous posts are completely wrong, I can't see how 
resilver time is primarily bounded by speed (i.e bandwidth/throughput) 
of the HD for the vast majority of use cases.   The IOPS and raw speed 
of the underlying backing store help define how fast the workload (i.e. 
total used slabs) gets processed.  The layout of the vdev, and the 
on-disk data distribution, will define the total IOPS required to 
resilver the slab workload.  Most data distribution/vdev layout 
combinations will result in an IOPS-bound resilver disk, not a 
bandwidth-saturated resilver disk.




So if you assume the size of the pool or the number of total disks is a
given, determined by outside constraints and design requirements, and then
you faced the decision of how to architect the vdev's in your pool, then
Yes.  The number of devices in a vdev do dramatically impact the resilver
time.  Only because the number of blocks written in each vdev depend on
these decisions you made earlier.

I do not think it is wise to set the vdev configuration based on a model for
resilver time. Choose the configuration to get the best data protection.
  -- richard
Depends on the needs of the end-user.  I can certainly see places where 
it would be better to build a pool out of RAIDZ2 devices rather than 
RAIDZ3 devices.  And, of course, the converse.  Resilver times should be 
a consideration in building your pool, just like performance and disk 
costs are.  How much you value it, of course, it up to the end-user.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-21 Thread Paul B. Henson

On 3/21/2011 2:59 PM, Garrett D'Amore wrote:


I *hate* talking about unreleased product schedules


:).


but I think you can expect a beta with a month or two, perhaps less.
We've already got an alpha that we've handed out in limited
quantities.


Actually, I read about that alpha; one of my coworkers was at SCALE 9x, 
if I'd known at the time I would have had him pick up a CD ;).



Once you dive under the controlled UI (which you can do), you
basically are breaking your support contract.


Meh :(, that rules it out for me; I need to run our own custom stuff to 
integrate it into our identity management platform.



add-on features like HA clustering, the management UI,
auto-tiering/auto-sync, etc.


HA clustering I would actually be interested in, depending on pricing; 
but unfortunately not in an appliance-only availability.



There have been some discussions, but figuring out how to make that
commercially worthwhile is challenging


Agreed. If not support contracts, what about engineering services 
available on a time/materials basis? That would cover my main concern of 
having expertise available in case of a critical failure. There might 
also be occasions where a specific bug has already been identified, but 
local resources lack sufficient time or knowledge to efficiently fix it. 
One of the people I've spoken to off-line mentioned a handful of known 
opensolaris bugs he'd really like to see resolved in NCP and would be 
willing to pay somebody to make it happen.


Thanks for the info...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Richard Elling
On Mar 21, 2011, at 5:32 AM, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
>> 
>> it depends on the total number of used blocks that must
>> be resilvered on the resilvering device, multiplied by the access time for
>> the resilvering device.  
> 
> It is a safe assumption, if you've got a lot of devices in a vdev, that
> you've probably got a lot of data in the vdev.  And therefore the resilver
> time for that vdev will be large.

Several studies have shown no correlation between the size of disks and
the amount of data used. Or, to look at it another way, boot disks grow faster
than OSes.

> If you break your pool up into a bunch of mirrors, then the most data you'll
> have in any one vdev is 1-disk worth of data.

Fancy that, if you use raidz, the most data you will have to resilver is 1-disk 
worth of data. In the raidz case, the utilization of the resilvering disk is 
100% 
and the utilization of the other disks is approximately (100% / N)

> If you have a vdev whose usable capacity is M times a single disk, chances
> are, the amount of data you have in the vdev is L times larger than the
> amount of data you would have had in each vdev if you were using mirrors.
> (I'm intentionally leaving the relationship between M and L vague, but both
> are assumed to be > 1 and approaching the number of devices in the vdev
> minus parity drives).  Therefore the resilver time for that vdev will be
> roughly L times the resilver time of a mirror.
> 


For ZFS, usable capacity has no correlation to resilver time.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Richard Elling
On Mar 21, 2011, at 5:09 AM, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Richard Elling
>> 
>> How many times do we have to rehash this? The speed of resilver is
>> dependent on the amount of data, the distribution of data on the
> resilvering
>> device, speed of the resilvering device, and the throttle. It is NOT
> dependent
>> on the number of drives in the vdev.
> 
> What the heck?  Yes it is.  Indirectly.  When you say it depends on the
> amount of data, speed of resilvering device, etc, what you really mean
> (correctly) is that it depends on the total number of used blocks that must
> be resilvered on the resilvering device, multiplied by the access time for
> the resilvering device.  And of course, throttling and usage during resilver
> can have a big impact.  And various other factors.  But the controllable big
> factor is the number of blocks used in the degraded vdev.

There is no direct correlation between the number of blocks and resilver time.

> So here is how the number of devices in the vdev matter:
> 
> If you have your whole pool made of one vdev, then every block in the pool
> will be on the resilvering device.  You must spend time resilvering every
> single block in the whole pool.
> 
> If you have the same amount of data, on a pool broken into N smaller vdev's,
> then approximately speaking, 1/N of the blocks in the pool must be
> resilvered on the resilvering vdev.  And therefore the resilver goes
> approximately N times faster.

Nope. The resilver time is dependent on the speed of the resilvering disk.

> So if you assume the size of the pool or the number of total disks is a
> given, determined by outside constraints and design requirements, and then
> you faced the decision of how to architect the vdev's in your pool, then
> Yes.  The number of devices in a vdev do dramatically impact the resilver
> time.  Only because the number of blocks written in each vdev depend on
> these decisions you made earlier.

I do not think it is wise to set the vdev configuration based on a model for
resilver time. Choose the configuration to get the best data protection.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-21 Thread Paul B. Henson

On 3/18/2011 6:32 PM, David Magda wrote:


Oracle has said that they "will distribute updates to approved CDDL
or other open source- licensed code following full releases of our
enterprise Solaris operating system."

http://unixconsole.blogspot.com/2010/08/internal-oracle-memo-leaked-on-solaris.html


Hmm, I dunno that I'd take a quote from a leaked internal memo as gospel 
;). For that matter, even if they flat out publicly announced it I can't 
say I'd trust them to actually follow through...



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-21 Thread Garrett D'Amore
On Mon, 2011-03-21 at 14:56 -0700, Paul B. Henson wrote:
> On 3/18/2011 3:15 PM, Garrett D'Amore wrote:

> 
> > c) NCP 4 is still 5-6 months away.  We're still developing it.
> 
> By the time I do some initial evaluation, then some prototyping, I don't
> anticipate migrating anything production wise until at the earliest
> Christmas break, so that timing shouldn't be a problem. Any thoughts on
> how soon a beta might be available? As it sounds like there will be
> significant changes, it might be better to evaluate with a beta of the
> new stuff rather than the production version of the older stuff. Plus I
> generally tend to break things in unexpected ways ;), so doing that in
> the beta cycle might be beneficial.

I *hate* talking about unreleased product schedules, but I think you can
expect a beta with a month or two, perhaps less.  We've already got an
alpha that we've handed out in limited quantities.

> 
> > d) NCP 4 will make much more use of the illumos userland, and only
> > use Debian when illumos doesn't have an equivalent.
> 
> Given both NCP and OpenIndiana will be based off of illumos, and as of
> version 4 NCP will be migrating as much as possible of the userland to
> solaris as opposed to gnu, other than the differing packaging formats
> what do you feel will distinguish NCP from openindiana? NCP is positioned as
> a bare-bones server, whereas openindiana is trying to be more general
> purpose including desktop use?

NCP is a core-technology thing.  Definitely not a general purpose OS at
all, and will be missing all the desktop stuff.

The idea behind NCP is that other distros build on top of, or people who
just want that bare bones OS use it.  It comes with debian packaging,
and we do have a bunch of the common server packages (Apache, etc.) set
up, but not everything that you might want.

> 
> > e) NCP comes entirely unsupported.  NexentaStor is a commercial
> > product with real support behind it, though.
> 
> Can you treat NexentaStor like a general purpose operating system, not
> use the management gui, and configure everything from a shell prompt, or
> is it more appliance like and you're locked out from the OS? In other
> words, would it be possible (although not necessarily cost-effective) to
> pay for NexentaStor for the support but treat it like NCP?

Once you dive under the controlled UI (which you can do), you basically
are breaking your support contract.

Going forward, NCP and NS will be more closely synchronized, so you'll
be able to get the same OS, and probably receive patches to it, that you
get with NS, albeit without official support and without the proprietary
add-on features like HA clustering, the management UI,
auto-tiering/auto-sync, etc.

> 
> Has your company considered basic support contracts for NCP? I've heard
> from at least one other site that might be interested in something like
> that. We don't need much in the way of handholding, the majority of our
> support calls end up being actual bugs or limitations in solaris. But if
> one of our file servers panics, doesn't import a pool when it boots, and
> crashes every time you try to import it by hand, it would be nice to
> have an engineer available :).

There have been some discussions, but figuring out how to make that
commercially worthwhile is challenging.  At some level, our engineers
are busy enough that we'd have to see enough commercial demand here to
justify adding engineers, because the number of calls we would take
would probably go significantly with such a change.

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Ian Collins

 On 03/22/11 10:39 AM, Edward Ned Harvey wrote:

So the conclusion to draw is:
Yes, there are situations where ZFS resilver is a strength, and limited by
serial throughput.  But for what I call "typical" usage patterns, it's a
weakness, and it's dramatically much worse than resilvering the whole disk
sequentially.


That probably correct.  It certainly helps explain my recent experience.

The total data in the pool has remained fairly constant over the past 6 
months, but as the pool is on a staging server, it aggregates all of the 
churn form the servers that send data to it.


So given the hardware, use and the total data hasn't changed since the 
last resilver, the significant increase in resilver time must be down 
the increased data fragmentation.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-21 Thread Paul B. Henson

On 3/18/2011 3:15 PM, Garrett D'Amore wrote:


a) Nexenta Core Platform is a bare-bones OS.  No GUI, in other words
(no X11.)  It might well suit you.


Indeed :), my servers are headless (well, as headless as you can get on
x86 hardware 8-/, they do have an ipmi remote console that still needs
to be used occasionally ) and I generally install a minimal set of
packages. We have the X client libraries installed on some of our linux
servers, as our DBA's like to run the gui oracle installer, but I don't
recall ever needing to run X software on our storage servers. One of my
many spats with Oracle technical support (the database side, not the
operating system side) was trying to get them to justify why the
"xscreensaver" package was listed as a core dependency of running 10g
under RHEL 5 :(. Never did get an answer to that, they just closed the
ticket out from under me...


c) NCP 4 is still 5-6 months away.  We're still developing it.


By the time I do some initial evaluation, then some prototyping, I don't
anticipate migrating anything production wise until at the earliest
Christmas break, so that timing shouldn't be a problem. Any thoughts on
how soon a beta might be available? As it sounds like there will be
significant changes, it might be better to evaluate with a beta of the
new stuff rather than the production version of the older stuff. Plus I
generally tend to break things in unexpected ways ;), so doing that in
the beta cycle might be beneficial.


d) NCP 4 will make much more use of the illumos userland, and only
use Debian when illumos doesn't have an equivalent.


Given both NCP and OpenIndiana will be based off of illumos, and as of
version 4 NCP will be migrating as much as possible of the userland to
solaris as opposed to gnu, other than the differing packaging formats
what do you feel will distinguish NCP from openindiana? NCP is positioned as
a bare-bones server, whereas openindiana is trying to be more general
purpose including desktop use?


e) NCP comes entirely unsupported.  NexentaStor is a commercial
product with real support behind it, though.


Can you treat NexentaStor like a general purpose operating system, not
use the management gui, and configure everything from a shell prompt, or
is it more appliance like and you're locked out from the OS? In other
words, would it be possible (although not necessarily cost-effective) to
pay for NexentaStor for the support but treat it like NCP?

Has your company considered basic support contracts for NCP? I've heard
from at least one other site that might be interested in something like
that. We don't need much in the way of handholding, the majority of our
support calls end up being actual bugs or limitations in solaris. But if
one of our file servers panics, doesn't import a pool when it boots, and
crashes every time you try to import it by hand, it would be nice to
have an engineer available :).

Thanks...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Paul Kraus
> 
> Is resilver time related to the amount of data (TBs) or the number
> of objects (file + directory counts) ? I have seen zpools with lots of
> data in very few files resilver quickly while smaller pools with lots
> of tiny files take much longer (no hard data here, just recollection
> of how long things took).

In some cases, it could be dependent on the total amount of data (TB) and be
limited by sequential drive throughput.  In that case, it will always be
fast.
In other cases, it could be dependent on a lot of small blocks scattered
randomly about.  In that case, it will be limited by random access time of
the devices, and it's certain to be painfully slow.

But in this conversation, we're trying to make a generalization.  So let's
define "typical," and discuss how each of the above cases is possible, and
reach a generalization:

Note:  There is another common usage scenario.  The home video server, or
large static sequential file store.  Which would have precisely the opposite
usage characteristics.  But for me, that's not typical, so when I'm the
person writing, here is what I'm defining as "typical..."

Typical:  You have a nontrivial pool, with volatile data.  Autosnapshots are
on, which means snapshots are frequently created & destroyed.  Some files &
directories are deleted, created, and/or modified or appended to, in
essentially random order.  It is in the nature of COW (and therefore ZFS) to
only write new copies of the changed blocks, while leaving old blocks in
place, hence files become progressively more fragmented, as long as they are
modified in the middles and ends (rather than deleted & recreated entirely).
It is in the nature of ZFS small write aggregation into larger sequential
blocks ... A bunch of small random writes are aggregated into a single
larger sequential write ...  And eventually those changes are changed or
deleted, and snapshots destroyed, leaving a "hole" in the middle of what was
formerly an aggregated sequential write...  It's in the nature of ZFS to
become progressively more fragmented in these too.

All of the above is normal for any snapshot-capable filesystem.  (Different
implementations reach the same result.)

Here is the part which is both a ZFS strength and weakness:  Upon scrub or
resilver, ZFS will only scrub or resilver the used blocks.  It will not do
the unused space.  If you have a really small percentage of pool
utilization, or highly sequential data, this is a strength.  Because you get
to skip over all the unused portions of disk, it will complete faster than
resilvering or scrubbing the whole disk sequentially.

Unfortunately, in my "typical" usage scenario, a system has been in volatile
production for an extended time, so there is significant usage in the pool,
which is highly fragmented.

Unfortunately, in ZFS resilver (and I think scrub too) the order of
resilvering blocks is NOT based on disk order, which means you don't get to
simply perform a bunch of sequential disk reads and skip over all the unused
sectors.  Instead, your heads need to thrash around, randomly seeking small
blocks all over the place, in essentially random order.

So the answer to your question, assuming my "typical" usage and assuming
hard drives (not SSD's etc) is:

Resilver is dependent on neither the total quantity of data, nor the total
number of files/directories.  It is dependent on the number of used blocks
in the vdev, and dependent on precisely how fragmented and how randomly
those blocks are scattered throughout the vdev, and limited by the random
access time of the vdev.  

YMMV, but here is one of my experiences:  In a given pool that I admin, if I
needed to resilver a whole disk including unused space, the sequential IO of
the disk would be the limiting factor, and the time would be approx 2 hours.
Instead, I am using ZFS, and this sytem is in "typical" production usage,
and I am using mirrors.  Hence, this is the best case scenario for a
"typical" ZFS server with volatile data.  My resilver took 12 hours.  If I
had used raidz2 with 8-2=6 disks, then it would have taken 3 days.

So the conclusion to draw is:
Yes, there are situations where ZFS resilver is a strength, and limited by
serial throughput.  But for what I call "typical" usage patterns, it's a
weakness, and it's dramatically much worse than resilvering the whole disk
sequentially.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Roy Sigurd Karlsbakk
> Our main backups storage server has 3x 8-drive raidz2 vdevs. Was
> replacing the 500 GB drives in one vdev with 1 TB drives. The last 2
> drives took just under 300 hours each. :( The first couple drives
> took approx 150 hours each, and then it just started taking longer and
> longer for each drive.

That's strange indeed. I just replaced 21 drives (seven 2TB drives in three 
raidz2 VDEVs) drives with 3TB ones, and resilver times were quite stable, until 
the last replace, which was a bit faster. Have you checked 'iostat -en'? If one 
(or more) of the drives are having i/o errors, that may slow down the whole 
pool.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Roy Sigurd Karlsbakk
> The 30+ second latency I see on this system during a resilver renders
> it pretty useless as a staging server (lots of small snapshots).

I've seen similar numbers on a system during resilver, without L2ARC/SLOG. 
Adding L2ARC/SLOG made the system work quite well during resilver/scrub, but 
without them, it wasn't very useful.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Freddie Cash
On Sun, Mar 20, 2011 at 12:57 AM, Ian Collins  wrote:
>  Has anyone seen a resilver longer than this for a 500G drive in a riadz2
> vdev?
>
> scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37
> 2011
>              c0t0d0  ONLINE       0     0     0  769G resilvered
>
> and I told the client it would take 3 to 4 days!

Our main backups storage server has 3x 8-drive raidz2 vdevs.  Was
replacing the 500 GB drives in one vdev with 1 TB drives.  The last 2
drives took just under 300 hours each.  :(  The first couple drives
took approx 150 hours each, and then it just started taking longer and
longer for each drive.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Paul Kraus
On Sun, Mar 20, 2011 at 7:20 PM, Richard Elling
 wrote:
> On Mar 20, 2011, at 3:02 PM, Ian Collins wrote:
>
>> On 03/20/11 08:57 PM, Ian Collins wrote:
>>> Has anyone seen a resilver longer than this for a 500G drive in a riadz2 
>>> vdev?
>>>
>>> scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 
>>> 19:57:37 2011
>>>              c0t0d0  ONLINE       0     0     0  769G resilvered
>>>
>> I didn't intend to start an argument, I was just very surprised the resilver 
>> took so long.
>
> I'd describe the thread as critical analysis, not argument. There are many 
> facets of ZFS
> resilver and scrub that many people have never experienced, so it makes sense 
> to
> explore the issue.
>
> Expect ZFS resilvers to take longer in the future for HDDs.
> Expect ZFS resilvers to remain quite fast for SSDs.
> Why? Because HDDs are getting bigger, but not faster, while SSDs are getting 
> bigger and faster.
>

Is resilver time related to the amount of data (TBs) or the number
of objects (file + directory counts) ? I have seen zpools with lots of
data in very few files resilver quickly while smaller pools with lots
of tiny files take much longer (no hard data here, just recollection
of how long things took).

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> it depends on the total number of used blocks that must
> be resilvered on the resilvering device, multiplied by the access time for
> the resilvering device.  

It is a safe assumption, if you've got a lot of devices in a vdev, that
you've probably got a lot of data in the vdev.  And therefore the resilver
time for that vdev will be large.

If you break your pool up into a bunch of mirrors, then the most data you'll
have in any one vdev is 1-disk worth of data.

If you have a vdev whose usable capacity is M times a single disk, chances
are, the amount of data you have in the vdev is L times larger than the
amount of data you would have had in each vdev if you were using mirrors.
(I'm intentionally leaving the relationship between M and L vague, but both
are assumed to be > 1 and approaching the number of devices in the vdev
minus parity drives).  Therefore the resilver time for that vdev will be
roughly L times the resilver time of a mirror.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Richard Elling
> 
> How many times do we have to rehash this? The speed of resilver is
> dependent on the amount of data, the distribution of data on the
resilvering
> device, speed of the resilvering device, and the throttle. It is NOT
dependent
> on the number of drives in the vdev.

What the heck?  Yes it is.  Indirectly.  When you say it depends on the
amount of data, speed of resilvering device, etc, what you really mean
(correctly) is that it depends on the total number of used blocks that must
be resilvered on the resilvering device, multiplied by the access time for
the resilvering device.  And of course, throttling and usage during resilver
can have a big impact.  And various other factors.  But the controllable big
factor is the number of blocks used in the degraded vdev.

So here is how the number of devices in the vdev matter:

If you have your whole pool made of one vdev, then every block in the pool
will be on the resilvering device.  You must spend time resilvering every
single block in the whole pool.

If you have the same amount of data, on a pool broken into N smaller vdev's,
then approximately speaking, 1/N of the blocks in the pool must be
resilvered on the resilvering vdev.  And therefore the resilver goes
approximately N times faster.

So if you assume the size of the pool or the number of total disks is a
given, determined by outside constraints and design requirements, and then
you faced the decision of how to architect the vdev's in your pool, then
Yes.  The number of devices in a vdev do dramatically impact the resilver
time.  Only because the number of blocks written in each vdev depend on
these decisions you made earlier.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] GNU 'cp -p' can't work well with ZFS-based-NFS

2011-03-21 Thread Fred Liu
Thanks.
But does noacal work with nfs v3?
Thanks.

Fred

> -Original Message-
> From: Cameron Hanover [mailto:chano...@umich.edu]
> Sent: 星期四, 三月 17, 2011 1:34
> To: Fred Liu
> Cc: ZFS Discussions
> Subject: Re: [zfs-discuss] GNU 'cp -p' can't work well with ZFS-based-
> NFS
> 
> I thought this explained it well.
> http://www.cuddletech.com/blog/pivot/entry.php?id=939
> 'NFSv3, ACL's and ZFS' is the relevant part.
> 
> I've told my customers that run into this to use the noacl mount option.
> 
> -
> Cameron Hanover
> chano...@umich.edu
> 
> Fill with mingled cream and amber,
> I will drain that glass again.
> Such hilarious visions clamber
> Through the chamber of my brain ―
> Quaintest thoughts ― queerest fancies
> Come to life and fade away;
> What care I how time advances?
> I am drinking ale today.
> ―-Edgar Allan Poe
> 
> On Mar 16, 2011, at 9:56 AM, Fred Liu wrote:
> 
> > Always show info like ‘operation not supported’.
> >
> > Any workaround?
> >
> >
> >
> > Thanks.
> >
> >
> >
> > Fred
> >
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss