Re: [zfs-discuss] A resilver record?
[richard tries pushing the rope one more time] On Mar 21, 2011, at 8:40 PM, Edward Ned Harvey wrote: >> From: Richard Elling [mailto:richard.ell...@gmail.com] >> >> There is no direct correlation between the number of blocks and resilver >> time. > > Incorrect. > > Although there are possibly some cases where you could be bandwidth limited, > it's certainly not true in general. > > If Richard were correct, then a resilver would never take longer than > resilvering an entire disk (including unused space) sequentially. I can prove this to be true for a device that does not suffer from a seek penalty. > The time > to resilver an entire disk sequentially is easily calculated, if you know > the sustained sequential speed of the disk and size of the disk. In my > case, I have a 1TB mirror, where each disk can sustain 1Gbit/sec. Which > means according to Richard, my max resilver time would be 133min. In > reality, my system resilvered in 12 hours while otherwise idle. Bummer, your disk must have some sort of seek penalty... perhaps 8.2 ms? > This can > only be explained one way: As Erik says, the order in which my disks > resilvered is not disk ordered. My disks resilver time was random access > time limited. Not bandwidth limited. I have data that proves the resilver time depends on the data layout and that layout changes as your usage of the pool changes. Like most things in ZFS, it is dynamic. The data proves the resilver time is not correlated to the number of disks in a vdev. The data shows that the resilver time is dependent on the speed of the resilvering disk. I am glad that your experience confirms this. But why does it need to be rehashed every few months on the alias? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
> From: Richard Elling [mailto:richard.ell...@gmail.com] > > There is no direct correlation between the number of blocks and resilver > time. Incorrect. Although there are possibly some cases where you could be bandwidth limited, it's certainly not true in general. If Richard were correct, then a resilver would never take longer than resilvering an entire disk (including unused space) sequentially. The time to resilver an entire disk sequentially is easily calculated, if you know the sustained sequential speed of the disk and size of the disk. In my case, I have a 1TB mirror, where each disk can sustain 1Gbit/sec. Which means according to Richard, my max resilver time would be 133min. In reality, my system resilvered in 12 hours while otherwise idle. This can only be explained one way: As Erik says, the order in which my disks resilvered is not disk ordered. My disks resilver time was random access time limited. Not bandwidth limited. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
On 3/21/2011 3:25 PM, Richard Elling wrote: On Mar 21, 2011, at 5:09 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Richard Elling How many times do we have to rehash this? The speed of resilver is dependent on the amount of data, the distribution of data on the resilvering device, speed of the resilvering device, and the throttle. It is NOT dependent on the number of drives in the vdev. What the heck? Yes it is. Indirectly. When you say it depends on the amount of data, speed of resilvering device, etc, what you really mean (correctly) is that it depends on the total number of used blocks that must be resilvered on the resilvering device, multiplied by the access time for the resilvering device. And of course, throttling and usage during resilver can have a big impact. And various other factors. But the controllable big factor is the number of blocks used in the degraded vdev. There is no direct correlation between the number of blocks and resilver time. Just to be clear here, remember block != slab. Slab is the allocation unit often seen through the "recordsize" attribute. The number of data *slabs* directly correlates to resilver time. So here is how the number of devices in the vdev matter: If you have your whole pool made of one vdev, then every block in the pool will be on the resilvering device. You must spend time resilvering every single block in the whole pool. If you have the same amount of data, on a pool broken into N smaller vdev's, then approximately speaking, 1/N of the blocks in the pool must be resilvered on the resilvering vdev. And therefore the resilver goes approximately N times faster. Nope. The resilver time is dependent on the speed of the resilvering disk. Well, unless my previous posts are completely wrong, I can't see how resilver time is primarily bounded by speed (i.e bandwidth/throughput) of the HD for the vast majority of use cases. The IOPS and raw speed of the underlying backing store help define how fast the workload (i.e. total used slabs) gets processed. The layout of the vdev, and the on-disk data distribution, will define the total IOPS required to resilver the slab workload. Most data distribution/vdev layout combinations will result in an IOPS-bound resilver disk, not a bandwidth-saturated resilver disk. So if you assume the size of the pool or the number of total disks is a given, determined by outside constraints and design requirements, and then you faced the decision of how to architect the vdev's in your pool, then Yes. The number of devices in a vdev do dramatically impact the resilver time. Only because the number of blocks written in each vdev depend on these decisions you made earlier. I do not think it is wise to set the vdev configuration based on a model for resilver time. Choose the configuration to get the best data protection. -- richard Depends on the needs of the end-user. I can certainly see places where it would be better to build a pool out of RAIDZ2 devices rather than RAIDZ3 devices. And, of course, the converse. Resilver times should be a consideration in building your pool, just like performance and disk costs are. How much you value it, of course, it up to the end-user. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] best migration path from Solaris 10
On 3/21/2011 2:59 PM, Garrett D'Amore wrote: I *hate* talking about unreleased product schedules :). but I think you can expect a beta with a month or two, perhaps less. We've already got an alpha that we've handed out in limited quantities. Actually, I read about that alpha; one of my coworkers was at SCALE 9x, if I'd known at the time I would have had him pick up a CD ;). Once you dive under the controlled UI (which you can do), you basically are breaking your support contract. Meh :(, that rules it out for me; I need to run our own custom stuff to integrate it into our identity management platform. add-on features like HA clustering, the management UI, auto-tiering/auto-sync, etc. HA clustering I would actually be interested in, depending on pricing; but unfortunately not in an appliance-only availability. There have been some discussions, but figuring out how to make that commercially worthwhile is challenging Agreed. If not support contracts, what about engineering services available on a time/materials basis? That would cover my main concern of having expertise available in case of a critical failure. There might also be occasions where a specific bug has already been identified, but local resources lack sufficient time or knowledge to efficiently fix it. One of the people I've spoken to off-line mentioned a handful of known opensolaris bugs he'd really like to see resolved in NCP and would be willing to pay somebody to make it happen. Thanks for the info... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
On Mar 21, 2011, at 5:32 AM, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey >> >> it depends on the total number of used blocks that must >> be resilvered on the resilvering device, multiplied by the access time for >> the resilvering device. > > It is a safe assumption, if you've got a lot of devices in a vdev, that > you've probably got a lot of data in the vdev. And therefore the resilver > time for that vdev will be large. Several studies have shown no correlation between the size of disks and the amount of data used. Or, to look at it another way, boot disks grow faster than OSes. > If you break your pool up into a bunch of mirrors, then the most data you'll > have in any one vdev is 1-disk worth of data. Fancy that, if you use raidz, the most data you will have to resilver is 1-disk worth of data. In the raidz case, the utilization of the resilvering disk is 100% and the utilization of the other disks is approximately (100% / N) > If you have a vdev whose usable capacity is M times a single disk, chances > are, the amount of data you have in the vdev is L times larger than the > amount of data you would have had in each vdev if you were using mirrors. > (I'm intentionally leaving the relationship between M and L vague, but both > are assumed to be > 1 and approaching the number of devices in the vdev > minus parity drives). Therefore the resilver time for that vdev will be > roughly L times the resilver time of a mirror. > For ZFS, usable capacity has no correlation to resilver time. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
On Mar 21, 2011, at 5:09 AM, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Richard Elling >> >> How many times do we have to rehash this? The speed of resilver is >> dependent on the amount of data, the distribution of data on the > resilvering >> device, speed of the resilvering device, and the throttle. It is NOT > dependent >> on the number of drives in the vdev. > > What the heck? Yes it is. Indirectly. When you say it depends on the > amount of data, speed of resilvering device, etc, what you really mean > (correctly) is that it depends on the total number of used blocks that must > be resilvered on the resilvering device, multiplied by the access time for > the resilvering device. And of course, throttling and usage during resilver > can have a big impact. And various other factors. But the controllable big > factor is the number of blocks used in the degraded vdev. There is no direct correlation between the number of blocks and resilver time. > So here is how the number of devices in the vdev matter: > > If you have your whole pool made of one vdev, then every block in the pool > will be on the resilvering device. You must spend time resilvering every > single block in the whole pool. > > If you have the same amount of data, on a pool broken into N smaller vdev's, > then approximately speaking, 1/N of the blocks in the pool must be > resilvered on the resilvering vdev. And therefore the resilver goes > approximately N times faster. Nope. The resilver time is dependent on the speed of the resilvering disk. > So if you assume the size of the pool or the number of total disks is a > given, determined by outside constraints and design requirements, and then > you faced the decision of how to architect the vdev's in your pool, then > Yes. The number of devices in a vdev do dramatically impact the resilver > time. Only because the number of blocks written in each vdev depend on > these decisions you made earlier. I do not think it is wise to set the vdev configuration based on a model for resilver time. Choose the configuration to get the best data protection. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] best migration path from Solaris 10
On 3/18/2011 6:32 PM, David Magda wrote: Oracle has said that they "will distribute updates to approved CDDL or other open source- licensed code following full releases of our enterprise Solaris operating system." http://unixconsole.blogspot.com/2010/08/internal-oracle-memo-leaked-on-solaris.html Hmm, I dunno that I'd take a quote from a leaked internal memo as gospel ;). For that matter, even if they flat out publicly announced it I can't say I'd trust them to actually follow through... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] best migration path from Solaris 10
On Mon, 2011-03-21 at 14:56 -0700, Paul B. Henson wrote: > On 3/18/2011 3:15 PM, Garrett D'Amore wrote: > > > c) NCP 4 is still 5-6 months away. We're still developing it. > > By the time I do some initial evaluation, then some prototyping, I don't > anticipate migrating anything production wise until at the earliest > Christmas break, so that timing shouldn't be a problem. Any thoughts on > how soon a beta might be available? As it sounds like there will be > significant changes, it might be better to evaluate with a beta of the > new stuff rather than the production version of the older stuff. Plus I > generally tend to break things in unexpected ways ;), so doing that in > the beta cycle might be beneficial. I *hate* talking about unreleased product schedules, but I think you can expect a beta with a month or two, perhaps less. We've already got an alpha that we've handed out in limited quantities. > > > d) NCP 4 will make much more use of the illumos userland, and only > > use Debian when illumos doesn't have an equivalent. > > Given both NCP and OpenIndiana will be based off of illumos, and as of > version 4 NCP will be migrating as much as possible of the userland to > solaris as opposed to gnu, other than the differing packaging formats > what do you feel will distinguish NCP from openindiana? NCP is positioned as > a bare-bones server, whereas openindiana is trying to be more general > purpose including desktop use? NCP is a core-technology thing. Definitely not a general purpose OS at all, and will be missing all the desktop stuff. The idea behind NCP is that other distros build on top of, or people who just want that bare bones OS use it. It comes with debian packaging, and we do have a bunch of the common server packages (Apache, etc.) set up, but not everything that you might want. > > > e) NCP comes entirely unsupported. NexentaStor is a commercial > > product with real support behind it, though. > > Can you treat NexentaStor like a general purpose operating system, not > use the management gui, and configure everything from a shell prompt, or > is it more appliance like and you're locked out from the OS? In other > words, would it be possible (although not necessarily cost-effective) to > pay for NexentaStor for the support but treat it like NCP? Once you dive under the controlled UI (which you can do), you basically are breaking your support contract. Going forward, NCP and NS will be more closely synchronized, so you'll be able to get the same OS, and probably receive patches to it, that you get with NS, albeit without official support and without the proprietary add-on features like HA clustering, the management UI, auto-tiering/auto-sync, etc. > > Has your company considered basic support contracts for NCP? I've heard > from at least one other site that might be interested in something like > that. We don't need much in the way of handholding, the majority of our > support calls end up being actual bugs or limitations in solaris. But if > one of our file servers panics, doesn't import a pool when it boots, and > crashes every time you try to import it by hand, it would be nice to > have an engineer available :). There have been some discussions, but figuring out how to make that commercially worthwhile is challenging. At some level, our engineers are busy enough that we'd have to see enough commercial demand here to justify adding engineers, because the number of calls we would take would probably go significantly with such a change. - Garrett ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
On 03/22/11 10:39 AM, Edward Ned Harvey wrote: So the conclusion to draw is: Yes, there are situations where ZFS resilver is a strength, and limited by serial throughput. But for what I call "typical" usage patterns, it's a weakness, and it's dramatically much worse than resilvering the whole disk sequentially. That probably correct. It certainly helps explain my recent experience. The total data in the pool has remained fairly constant over the past 6 months, but as the pool is on a staging server, it aggregates all of the churn form the servers that send data to it. So given the hardware, use and the total data hasn't changed since the last resilver, the significant increase in resilver time must be down the increased data fragmentation. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] best migration path from Solaris 10
On 3/18/2011 3:15 PM, Garrett D'Amore wrote: a) Nexenta Core Platform is a bare-bones OS. No GUI, in other words (no X11.) It might well suit you. Indeed :), my servers are headless (well, as headless as you can get on x86 hardware 8-/, they do have an ipmi remote console that still needs to be used occasionally ) and I generally install a minimal set of packages. We have the X client libraries installed on some of our linux servers, as our DBA's like to run the gui oracle installer, but I don't recall ever needing to run X software on our storage servers. One of my many spats with Oracle technical support (the database side, not the operating system side) was trying to get them to justify why the "xscreensaver" package was listed as a core dependency of running 10g under RHEL 5 :(. Never did get an answer to that, they just closed the ticket out from under me... c) NCP 4 is still 5-6 months away. We're still developing it. By the time I do some initial evaluation, then some prototyping, I don't anticipate migrating anything production wise until at the earliest Christmas break, so that timing shouldn't be a problem. Any thoughts on how soon a beta might be available? As it sounds like there will be significant changes, it might be better to evaluate with a beta of the new stuff rather than the production version of the older stuff. Plus I generally tend to break things in unexpected ways ;), so doing that in the beta cycle might be beneficial. d) NCP 4 will make much more use of the illumos userland, and only use Debian when illumos doesn't have an equivalent. Given both NCP and OpenIndiana will be based off of illumos, and as of version 4 NCP will be migrating as much as possible of the userland to solaris as opposed to gnu, other than the differing packaging formats what do you feel will distinguish NCP from openindiana? NCP is positioned as a bare-bones server, whereas openindiana is trying to be more general purpose including desktop use? e) NCP comes entirely unsupported. NexentaStor is a commercial product with real support behind it, though. Can you treat NexentaStor like a general purpose operating system, not use the management gui, and configure everything from a shell prompt, or is it more appliance like and you're locked out from the OS? In other words, would it be possible (although not necessarily cost-effective) to pay for NexentaStor for the support but treat it like NCP? Has your company considered basic support contracts for NCP? I've heard from at least one other site that might be interested in something like that. We don't need much in the way of handholding, the majority of our support calls end up being actual bugs or limitations in solaris. But if one of our file servers panics, doesn't import a pool when it boots, and crashes every time you try to import it by hand, it would be nice to have an engineer available :). Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Paul Kraus > > Is resilver time related to the amount of data (TBs) or the number > of objects (file + directory counts) ? I have seen zpools with lots of > data in very few files resilver quickly while smaller pools with lots > of tiny files take much longer (no hard data here, just recollection > of how long things took). In some cases, it could be dependent on the total amount of data (TB) and be limited by sequential drive throughput. In that case, it will always be fast. In other cases, it could be dependent on a lot of small blocks scattered randomly about. In that case, it will be limited by random access time of the devices, and it's certain to be painfully slow. But in this conversation, we're trying to make a generalization. So let's define "typical," and discuss how each of the above cases is possible, and reach a generalization: Note: There is another common usage scenario. The home video server, or large static sequential file store. Which would have precisely the opposite usage characteristics. But for me, that's not typical, so when I'm the person writing, here is what I'm defining as "typical..." Typical: You have a nontrivial pool, with volatile data. Autosnapshots are on, which means snapshots are frequently created & destroyed. Some files & directories are deleted, created, and/or modified or appended to, in essentially random order. It is in the nature of COW (and therefore ZFS) to only write new copies of the changed blocks, while leaving old blocks in place, hence files become progressively more fragmented, as long as they are modified in the middles and ends (rather than deleted & recreated entirely). It is in the nature of ZFS small write aggregation into larger sequential blocks ... A bunch of small random writes are aggregated into a single larger sequential write ... And eventually those changes are changed or deleted, and snapshots destroyed, leaving a "hole" in the middle of what was formerly an aggregated sequential write... It's in the nature of ZFS to become progressively more fragmented in these too. All of the above is normal for any snapshot-capable filesystem. (Different implementations reach the same result.) Here is the part which is both a ZFS strength and weakness: Upon scrub or resilver, ZFS will only scrub or resilver the used blocks. It will not do the unused space. If you have a really small percentage of pool utilization, or highly sequential data, this is a strength. Because you get to skip over all the unused portions of disk, it will complete faster than resilvering or scrubbing the whole disk sequentially. Unfortunately, in my "typical" usage scenario, a system has been in volatile production for an extended time, so there is significant usage in the pool, which is highly fragmented. Unfortunately, in ZFS resilver (and I think scrub too) the order of resilvering blocks is NOT based on disk order, which means you don't get to simply perform a bunch of sequential disk reads and skip over all the unused sectors. Instead, your heads need to thrash around, randomly seeking small blocks all over the place, in essentially random order. So the answer to your question, assuming my "typical" usage and assuming hard drives (not SSD's etc) is: Resilver is dependent on neither the total quantity of data, nor the total number of files/directories. It is dependent on the number of used blocks in the vdev, and dependent on precisely how fragmented and how randomly those blocks are scattered throughout the vdev, and limited by the random access time of the vdev. YMMV, but here is one of my experiences: In a given pool that I admin, if I needed to resilver a whole disk including unused space, the sequential IO of the disk would be the limiting factor, and the time would be approx 2 hours. Instead, I am using ZFS, and this sytem is in "typical" production usage, and I am using mirrors. Hence, this is the best case scenario for a "typical" ZFS server with volatile data. My resilver took 12 hours. If I had used raidz2 with 8-2=6 disks, then it would have taken 3 days. So the conclusion to draw is: Yes, there are situations where ZFS resilver is a strength, and limited by serial throughput. But for what I call "typical" usage patterns, it's a weakness, and it's dramatically much worse than resilvering the whole disk sequentially. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
> Our main backups storage server has 3x 8-drive raidz2 vdevs. Was > replacing the 500 GB drives in one vdev with 1 TB drives. The last 2 > drives took just under 300 hours each. :( The first couple drives > took approx 150 hours each, and then it just started taking longer and > longer for each drive. That's strange indeed. I just replaced 21 drives (seven 2TB drives in three raidz2 VDEVs) drives with 3TB ones, and resilver times were quite stable, until the last replace, which was a bit faster. Have you checked 'iostat -en'? If one (or more) of the drives are having i/o errors, that may slow down the whole pool. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
> The 30+ second latency I see on this system during a resilver renders > it pretty useless as a staging server (lots of small snapshots). I've seen similar numbers on a system during resilver, without L2ARC/SLOG. Adding L2ARC/SLOG made the system work quite well during resilver/scrub, but without them, it wasn't very useful. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
On Sun, Mar 20, 2011 at 12:57 AM, Ian Collins wrote: > Has anyone seen a resilver longer than this for a 500G drive in a riadz2 > vdev? > > scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37 > 2011 > c0t0d0 ONLINE 0 0 0 769G resilvered > > and I told the client it would take 3 to 4 days! Our main backups storage server has 3x 8-drive raidz2 vdevs. Was replacing the 500 GB drives in one vdev with 1 TB drives. The last 2 drives took just under 300 hours each. :( The first couple drives took approx 150 hours each, and then it just started taking longer and longer for each drive. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
On Sun, Mar 20, 2011 at 7:20 PM, Richard Elling wrote: > On Mar 20, 2011, at 3:02 PM, Ian Collins wrote: > >> On 03/20/11 08:57 PM, Ian Collins wrote: >>> Has anyone seen a resilver longer than this for a 500G drive in a riadz2 >>> vdev? >>> >>> scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 >>> 19:57:37 2011 >>> c0t0d0 ONLINE 0 0 0 769G resilvered >>> >> I didn't intend to start an argument, I was just very surprised the resilver >> took so long. > > I'd describe the thread as critical analysis, not argument. There are many > facets of ZFS > resilver and scrub that many people have never experienced, so it makes sense > to > explore the issue. > > Expect ZFS resilvers to take longer in the future for HDDs. > Expect ZFS resilvers to remain quite fast for SSDs. > Why? Because HDDs are getting bigger, but not faster, while SSDs are getting > bigger and faster. > Is resilver time related to the amount of data (TBs) or the number of objects (file + directory counts) ? I have seen zpools with lots of data in very few files resilver quickly while smaller pools with lots of tiny files take much longer (no hard data here, just recollection of how long things took). -- {1-2-3-4-5-6-7-} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > it depends on the total number of used blocks that must > be resilvered on the resilvering device, multiplied by the access time for > the resilvering device. It is a safe assumption, if you've got a lot of devices in a vdev, that you've probably got a lot of data in the vdev. And therefore the resilver time for that vdev will be large. If you break your pool up into a bunch of mirrors, then the most data you'll have in any one vdev is 1-disk worth of data. If you have a vdev whose usable capacity is M times a single disk, chances are, the amount of data you have in the vdev is L times larger than the amount of data you would have had in each vdev if you were using mirrors. (I'm intentionally leaving the relationship between M and L vague, but both are assumed to be > 1 and approaching the number of devices in the vdev minus parity drives). Therefore the resilver time for that vdev will be roughly L times the resilver time of a mirror. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A resilver record?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Richard Elling > > How many times do we have to rehash this? The speed of resilver is > dependent on the amount of data, the distribution of data on the resilvering > device, speed of the resilvering device, and the throttle. It is NOT dependent > on the number of drives in the vdev. What the heck? Yes it is. Indirectly. When you say it depends on the amount of data, speed of resilvering device, etc, what you really mean (correctly) is that it depends on the total number of used blocks that must be resilvered on the resilvering device, multiplied by the access time for the resilvering device. And of course, throttling and usage during resilver can have a big impact. And various other factors. But the controllable big factor is the number of blocks used in the degraded vdev. So here is how the number of devices in the vdev matter: If you have your whole pool made of one vdev, then every block in the pool will be on the resilvering device. You must spend time resilvering every single block in the whole pool. If you have the same amount of data, on a pool broken into N smaller vdev's, then approximately speaking, 1/N of the blocks in the pool must be resilvered on the resilvering vdev. And therefore the resilver goes approximately N times faster. So if you assume the size of the pool or the number of total disks is a given, determined by outside constraints and design requirements, and then you faced the decision of how to architect the vdev's in your pool, then Yes. The number of devices in a vdev do dramatically impact the resilver time. Only because the number of blocks written in each vdev depend on these decisions you made earlier. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] GNU 'cp -p' can't work well with ZFS-based-NFS
Thanks. But does noacal work with nfs v3? Thanks. Fred > -Original Message- > From: Cameron Hanover [mailto:chano...@umich.edu] > Sent: 星期四, 三月 17, 2011 1:34 > To: Fred Liu > Cc: ZFS Discussions > Subject: Re: [zfs-discuss] GNU 'cp -p' can't work well with ZFS-based- > NFS > > I thought this explained it well. > http://www.cuddletech.com/blog/pivot/entry.php?id=939 > 'NFSv3, ACL's and ZFS' is the relevant part. > > I've told my customers that run into this to use the noacl mount option. > > - > Cameron Hanover > chano...@umich.edu > > Fill with mingled cream and amber, > I will drain that glass again. > Such hilarious visions clamber > Through the chamber of my brain ― > Quaintest thoughts ― queerest fancies > Come to life and fade away; > What care I how time advances? > I am drinking ale today. > ―-Edgar Allan Poe > > On Mar 16, 2011, at 9:56 AM, Fred Liu wrote: > > > Always show info like ‘operation not supported’. > > > > Any workaround? > > > > > > > > Thanks. > > > > > > > > Fred > > > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss