[zfs-discuss] ZFS, power failures, and UPSes
Hello, I've looked around Google and the zfs-discuss archives but have not been able to find a good answer to this question (and the related questions that follow it): How well does ZFS handle unexpected power failures? (e.g. environmental power failures, power supply dying, etc.) Does it consistently gracefully recover? Should having a UPS be considered a (strong) recommendation or a don't even think about running without it item? Are there any communications/interfacing caveats to be aware of when choosing the UPS? In this particular case, we're talking about a home file server running OpenSolaris 2009.06. Actual environment power failures are generally 1 per year. I know there are a few blog articles about this type of application, but I don't recall seeing any (or any detailed) discussion about power failures and UPSes as they relate to ZFS. I did see that the ZFS Evil Tuning Guide says cache flushes are done every 5 seconds. Here is one post that didn't get any replies about a year ago after someone had a power failure, then UPS battery failure while copying data to a ZFS pool: http://lists.macosforge.org/pipermail/zfs-discuss/2008-July/000670.html Both theoretical answers and real life experiences would be appreciated as the former tells me where ZFS is needed while the later tells me where it has been or is now. Thanks, -hk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
backup windows using primarily iSCSI. When those writes occur to my RaidZ volume, all activity pauses until the writes are fully flushed. The more I read about this, the worse it sounds. The thing is, I can see where the ZFS developers are coming from - in theory this is a more efficient use of the disk, and with that being the slowest part of the system, there probably is a slight benefit in computational time. However, it completely breaks any process like this that can't afford 3-5s delays in processing, it makes ZFS a nightmare for things like audio or video editing (where it would otherwise be a perfect fit), and it's also horrible from the perspective of the end user. Does anybody know if a L2ARC would help this? Does that work off a different queue, or would reads still be blocked? I still think a simple solution to this could be to split the ZFS writes into smaller chunks. That creates room for reads to be squeezed in (with the ratio of reads to writes something that should be automatically balanced by the software), but you still get the benefit of ZFS write ordering with all the work that's gone into perfecting that. Regardless of whether there are reads or not, your data is always going to be written to disk in an optimized fashion, and you could have a property on the pool that specifies how finely chopped up writes should be, allowing this to be easily tuned. We're considering ZFS as storage for our virtualization solution, and this could be a big concern. We really don't want the entire network pausing for 3-5 seconds any time there is a burst of write activity. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Scrub restarting on Solaris 10 Update 7.
I'm trying to scrub a pool on a backup server running Solaris 10 Update 7 and the scrub restarts each time a snap is received. I thought this was fixed in update 6? The machine was recently upgraded from update5, which did have the issue. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
I've seen enough people suffer from corrupted pools that a UPS is definitely good advice. However, I'm running a (very low usage) ZFS server at home and it's suffered through at least half a dozen power outages without any problems at all. I do plan to buy a UPS as soon as I can, but it seems to be surviving very well so far. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
A related question: If you are on a UPS, is it OK to disable ZIL? The evil tuning guide says The ZIL is an essential part of ZFS and should never be disabled. However, if you have a UPS, what can go wrong that really requires ZIL? Opinions? Monish - Original Message - From: Ross no-re...@opensolaris.org To: zfs-discuss@opensolaris.org Sent: Tuesday, June 30, 2009 3:04 PM Subject: Re: [zfs-discuss] ZFS, power failures, and UPSes I've seen enough people suffer from corrupted pools that a UPS is definitely good advice. However, I'm running a (very low usage) ZFS server at home and it's suffered through at least half a dozen power outages without any problems at all. I do plan to buy a UPS as soon as I can, but it seems to be surviving very well so far. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
Haudy Kazemi wrote: Hello, I've looked around Google and the zfs-discuss archives but have not been able to find a good answer to this question (and the related questions that follow it): How well does ZFS handle unexpected power failures? (e.g. environmental power failures, power supply dying, etc.) Does it consistently gracefully recover? Mostly. Unless you are unlucky. Backups are your friend in *any* environment though. Should having a UPS be considered a (strong) recommendation or a don't even think about running without it item? There has been quite any interesting thread on this over the last few months. I won't repeat my comments, but it is there in digital posterity on the zfs-discuss archives. Certainly in a large environment with a lot of data being written, then one should consider this a mandatory requirement if you care about your data. Particularly if there are many links in your storage chain that cause data corruption due to power failure. Are there any communications/interfacing caveats to be aware of when choosing the UPS? In this particular case, we're talking about a home file server running OpenSolaris 2009.06. As far as a home server goes, particularly if it is not write intensive then you will 'most likely' be fine. I have a home one with a v120 running S10 u6 with a D1000 and 7 x 300 GB SCSI disk in a RAIDZ2 that has seen numerous power interruptions with no faults. This machine is a Samba server for my Macs and printing business. I also have another mail / web server also on another v120 which experiences the same power faults and regularly bounces back without issues. But your mileage may vary. It all really depends on how much you care about the data really. I haven't used OpenSolaris specifically however as I prefer the generally more well supported S10 releases. (yes I know you can get support for OS, but I tend to be conservative and standardize as much as possible. I do have millions of files stored on ZFS volumes for our Uni and I sleep well ;)) Actual environment power failures are generally 1 per year. I know there are a few blog articles about this type of application, but I don't recall seeing any (or any detailed) discussion about power failures and UPSes as they relate to ZFS. I did see that the ZFS Evil Tuning Guide says cache flushes are done every 5 seconds. The flush time you mention is based on older versions of ZFS, newer ones can have a flush time as long as 30 seconds I believe now. Here is one post that didn't get any replies about a year ago after someone had a power failure, then UPS battery failure while copying data to a ZFS pool: http://lists.macosforge.org/pipermail/zfs-discuss/2008-July/000670.html Both theoretical answers and real life experiences would be appreciated as the former tells me where ZFS is needed while the later tells me where it has been or is now. Thanks, -hk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
On Tue, 30 Jun 2009, Monish Shah wrote: The evil tuning guide says The ZIL is an essential part of ZFS and should never be disabled. However, if you have a UPS, what can go wrong that really requires ZIL? Without addressing a single ZFS-specific issue: * panics * crashes * hardware failures - dead RAM - dead CPU - dead systemboard - dead something else * natural disasters * UPS failure * UPS failure (must be said twice) * Human error (what does this button do?) * Cabling problems (say, where did my disks go?) * Malicious actions (Fired? Let me turn their power off!) That's just a warm-up; I'm sure people can add both the ZFS-specific reasons and also the fallacy that a UPS does anything more than mitigate one particular single point of failure. Don't forget to buy two UPSes and split your machine across both. And don't forget to actually maintain the UPS. And check the batteries. And schedule a load test. The single best way to learn about the joys of UPS behaviour is to sit down and have a drink with a facilities manager who has been doing the job for at least ten years. At least you'll hear some funny stories about the day a loose screw on one floor took out a house UPS and 100+ hosts and NEs with it. Andre. -- Andre van Eyssen. mail: an...@purplecow.org jabber: an...@interact.purplecow.org purplecow.org: UNIX for the masses http://www2.purplecow.org purplecow.org: PCOWpix http://pix.purplecow.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
Monish Shah wrote: A related question: If you are on a UPS, is it OK to disable ZIL? The evil tuning guide says The ZIL is an essential part of ZFS and should never be disabled. However, if you have a UPS, what can go wrong that really requires ZIL? The UPS. Opinions? Monish - Original Message - From: Ross no-re...@opensolaris.org To: zfs-discuss@opensolaris.org Sent: Tuesday, June 30, 2009 3:04 PM Subject: Re: [zfs-discuss] ZFS, power failures, and UPSes I've seen enough people suffer from corrupted pools that a UPS is definitely good advice. However, I'm running a (very low usage) ZFS server at home and it's suffered through at least half a dozen power outages without any problems at all. I do plan to buy a UPS as soon as I can, but it seems to be surviving very well so far. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Dr Doug Baker Sun Microsystems Systems Support Engineer. UK Mission Critical Solution Centre. Tel : 0870 600 3222 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Useful Emulex tunable for i386
On Sun, 28 Jun 2009, Bob Friesenhahn wrote: On Sun, 28 Jun 2009, Bob Friesenhahn wrote: Today I experimented with doubling this value to 688128 and was happy to see a large increase in sequential read performance from my ZFS pool which is based on six mirrors vdevs. Sequential read performance jumped from 552787 MB/s to 799626 MB/s. It seems that the default driver buffer size interfers with zfs's ability to double the read performance by balancing the reads from the mirror devices. Now the read performance is almost 2X the write performance. Grumble. This may be a bit of a red herring. Perhaps this Emulex tunable was not entirely a red herring. Doubling the default for this tunable made a difference to my application. It dropped total real execution time from 2:45:03.152 to 2:24:25.675. That is a pretty large improvement. If I run two copies of my application at once and divide up the work, the execution time is 1:42:32.42. Even with two (or three) copies of the application running, it seems that zfs is still the bottleneck since the square-wave of system CPU utilization becomes even more prominent, indicating that all readers are blocked during the TXG sync. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
On Tue, 30 Jun 2009, Ross wrote: However, it completely breaks any process like this that can't afford 3-5s delays in processing, it makes ZFS a nightmare for things like audio or video editing (where it would otherwise be a perfect fit), and it's also horrible from the perspective of the end user. Yes. I updated the image at http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-stalls.png so that it shows the execution impact with more processes running. This is taken with three processes running in parallel so that there can be no doubt that I/O is being globally blocked and it is not just misbehavior of a single process. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
On 06/30/09 03:00 AM, Andre van Eyssen wrote: On Tue, 30 Jun 2009, Monish Shah wrote: The evil tuning guide says The ZIL is an essential part of ZFS and should never be disabled. However, if you have a UPS, what can go wrong that really requires ZIL? Without addressing a single ZFS-specific issue: * panics * crashes * hardware failures - dead RAM - dead CPU - dead systemboard - dead something else * natural disasters * UPS failure * UPS failure (must be said twice) * Human error (what does this button do?) * Cabling problems (say, where did my disks go?) * Malicious actions (Fired? Let me turn their power off!) That's just a warm-up; I'm sure people can add both the ZFS-specific reasons and also the fallacy that a UPS does anything more than mitigate one particular single point of failure. Actually, they do quite a bit more than that. They create jobs, generate revenue for battery manufacturers, and tech's that change batteries and do PM maintenance on the large units. Let's not forget that they add significant revenue to the transportation industry, given their weight for shipping. In the last 28 years of doing this stuff, I've found a few times that the UPS has actually worked and lasted as long as the outage. Many other times, the unit is failed (circuits), or the batteries are beyond the service life. But really, something approaching 40% of the time they actually work out OK. So they also create repair and recycling jobs. :-) Don't forget to buy two UPSes and split your machine across both. And don't forget to actually maintain the UPS. And check the batteries. And schedule a load test. The single best way to learn about the joys of UPS behaviour is to sit down and have a drink with a facilities manager who has been doing the job for at least ten years. At least you'll hear some funny stories about the day a loose screw on one floor took out a house UPS and 100+ hosts and NEs with it. Andre. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
On Tue, 30 Jun 2009, Neal Pollack wrote: Actually, they do quite a bit more than that. They create jobs, generate revenue for battery manufacturers, and tech's that change batteries and do PM maintenance on the large units. Let's not It sounds like this is a responsibility which should be moved to the US federal goverment since UPSs create jobs. In the last 28 years of doing this stuff, I've found a few times that the UPS has actually worked and lasted as long as the outage. I have seen UPSs help quite a lot for short glitches lasting seconds, or a minute. Otherwise the outage is usually longer than the UPSs can stay up since the problem required human attention. A standby generator is needed for any long outages. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
Bob Friesenhahn wrote: On Tue, 30 Jun 2009, Neal Pollack wrote: Actually, they do quite a bit more than that. They create jobs, generate revenue for battery manufacturers, and tech's that change batteries and do PM maintenance on the large units. Let's not It sounds like this is a responsibility which should be moved to the US federal goverment since UPSs create jobs. Actually, I think UPS already employs some 410,000+ people, making it the 3rd largest private employer in the USA. (5th overall, if you include the Federal Gov't and the US Postal Service). wink In the last 28 years of doing this stuff, I've found a few times that the UPS has actually worked and lasted as long as the outage. I have seen UPSs help quite a lot for short glitches lasting seconds, or a minute. Otherwise the outage is usually longer than the UPSs can stay up since the problem required human attention. A standby generator is needed for any long outages. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss As someone who has spend enough time doing data center work, I can attest to the fact that UPSes are really useful only as extremely-short-interval solutions. A dozen or so minutes, at best. The best design I've see was for an old BBN (hey, remember them!) site just outside of Cambridge, MA. It took in utility power, ran it through a conditioner setup, and then through this nice switch thing. The switch took three inputs: Utility, a local diesel generator, and a line of marine batteries. The switch itself was internally redundant (which isn't hard to do, it's 50's tech), so you could draw power from any (or even all 3 at once). Nothing really fancy; it was simple, with no semiconductor stuff to fail - just all 50-ish hardwired circuitry. I don't even think there was a transistor in the whole shebang. Lots of capacitors, though. :-) The jist of the whole thing was, that if utility power was out more than 5 minutes, there was not good predictor of how long it would remain out - I saw a nice little graph that showed no real good prediction of outage time based on existing outage length (i.e. if the power has been out X minutes, you can expect it to be restored in Y minutes...). I suspect it was something like 20 years of accumulated data or so... The end of this is simple: UPSes should give you enough time to start the gen-pack. If you are having problems with your gen-pack, you'll never have enough UPS time to fix it (and, it's not cost-effective to try to make it so), so FIX YOUR GEN PACK BEFORE the outage. Which means - TEST it, and TEST it, and TEST it again! For home use, I set my UPS to immediately shut down anything attached to it for /any/ service outage. Large enough batteries to handle anything more than a couple of minutes are frankly a fire-hazard for the home, not to mention a maintenance PITA. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
On Tue, Jun 30, 2009 at 1:36 PM, Erik Trimbleerik.trim...@sun.com wrote: Bob Friesenhahn wrote: On Tue, 30 Jun 2009, Neal Pollack wrote: Actually, they do quite a bit more than that. They create jobs, generate revenue for battery manufacturers, and tech's that change batteries and do PM maintenance on the large units. Let's not It sounds like this is a responsibility which should be moved to the US federal goverment since UPSs create jobs. Actually, I think UPS already employs some 410,000+ people, making it the 3rd largest private employer in the USA. (5th overall, if you include the Federal Gov't and the US Postal Service). wink In the last 28 years of doing this stuff, I've found a few times that the UPS has actually worked and lasted as long as the outage. I have seen UPSs help quite a lot for short glitches lasting seconds, or a minute. Otherwise the outage is usually longer than the UPSs can stay up since the problem required human attention. A standby generator is needed for any long outages. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss As someone who has spend enough time doing data center work, I can attest to the fact that UPSes are really useful only as extremely-short-interval solutions. A dozen or so minutes, at best. The best design I've see was for an old BBN (hey, remember them!) site just outside of Cambridge, MA. It took in utility power, ran it through a conditioner setup, and then through this nice switch thing. The switch took three inputs: Utility, a local diesel generator, and a line of marine batteries. The switch itself was internally redundant (which isn't hard to do, it's 50's tech), so you could draw power from any (or even all 3 at once). Nothing really fancy; it was simple, with no semiconductor stuff to fail - just all 50-ish hardwired circuitry. I don't even think there was a transistor in the whole shebang. Lots of capacitors, though. :-) The jist of the whole thing was, that if utility power was out more than 5 minutes, there was not good predictor of how long it would remain out - I saw a nice little graph that showed no real good prediction of outage time based on existing outage length (i.e. if the power has been out X minutes, you can expect it to be restored in Y minutes...). I suspect it was something like 20 years of accumulated data or so... The end of this is simple: UPSes should give you enough time to start the gen-pack. If you are having problems with your gen-pack, you'll never have enough UPS time to fix it (and, it's not cost-effective to try to make it so), so FIX YOUR GEN PACK BEFORE the outage. Which means - TEST it, and TEST it, and TEST it again! Slight corollary -- just because you have a generator and test it doesn't mean you can assume you can get fuel in a timely manner (so still be prepared to shutdown if needed). I have seen places whose DR plans completely rely on the assumption there will never be any problems refueling their generators. However, last year after Ike hit, one of ATT's central offices lost power because it ran out of fuel (and couldn't get refilled in time). For home use, I set my UPS to immediately shut down anything attached to it for /any/ service outage. Large enough batteries to handle anything more than a couple of minutes are frankly a fire-hazard for the home, not to mention a maintenance PITA. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
For what it is worth, I too have seen this behavior when load testing our zfs box. I used iometer and the RealLife profile (1 worker, 1 target, 65% reads, 60% random, 8k, 32 IOs in the queue). When writes are being dumped, reads drop close to zero, from 600-700 read IOPS to 15-30 read IOPS. zpool iostat data01 1 Where data01 is my pool name pool used avail read write read write -- - - - - - - data01 55.5G 20.4T691 0 4.21M 0 data01 55.5G 20.4T632 0 3.80M 0 data01 55.5G 20.4T657 0 3.93M 0 data01 55.5G 20.4T669 0 4.12M 0 data01 55.5G 20.4T689 0 4.09M 0 data01 55.5G 20.4T488 1.77K 2.94M 9.56M data01 55.5G 20.4T 29 4.28K 176K 23.5M data01 55.5G 20.4T 25 4.26K 165K 23.7M data01 55.5G 20.4T 20 3.97K 133K 22.0M data01 55.6G 20.4T170 2.26K 1.01M 11.8M data01 55.6G 20.4T678 0 4.05M 0 data01 55.6G 20.4T625 0 3.74M 0 data01 55.6G 20.4T685 0 4.17M 0 data01 55.6G 20.4T690 0 4.04M 0 data01 55.6G 20.4T679 0 4.02M 0 data01 55.6G 20.4T664 0 4.03M 0 data01 55.6G 20.4T699 0 4.27M 0 data01 55.6G 20.4T423 1.73K 2.66M 9.32M data01 55.6G 20.4T 26 3.97K 151K 21.8M data01 55.6G 20.4T 34 4.23K 223K 23.2M data01 55.6G 20.4T 13 4.37K 87.1K 23.9M data01 55.6G 20.4T 21 3.33K 136K 18.6M data01 55.6G 20.4T468496 2.89M 1.82M data01 55.6G 20.4T687 0 4.13M 0 -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
On Mon, 29 Jun 2009, Lejun Zhu wrote: With ZFS write throttle, the number 2.5GB is tunable. From what I've read in the code, it is possible to e.g. set zfs:zfs_write_limit_override = 0x800 (bytes) to make it write 128M instead. This works, and the difference in behavior is profound. Now it is a matter of finding the best value which optimizes both usability and performance. A tuning for 384 MB: # echo zfs_write_limit_override/W0t402653184 | mdb -kw zfs_write_limit_override: 0x3000 = 0x1800 CPU is smoothed out quite a lot and write latencies (as reported by a zio_rw.d dtrace script) are radically different than before. Perfmeter display for 256 MB: http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-256mb.png Perfmeter display for 384 MB: http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-384mb.png Perfmeter display for 768 MB: http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-768mb.png Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
ms == Monish Shah mon...@indranetworks.com writes: sl == Scott Lawson scott.law...@manukau.ac.nz writes: np == Neal Pollack neal.poll...@sun.com writes: ms If you are on a UPS, is it OK to disable ZIL? sl I have seen numerous UPS' failures over the years, yeah at my place in NYC we've had more problems with the UPS than with the service. At the very least a UPS needs to switch off for new batteries every two years, and the raw service does not go out that often for me. It starts to make more sense to use a UPS if you have dual power supplies, dual UPS's, bypass switches. Or crappy aboveground power. anyway, typical machines panic because of bugs a lot more often than either UPS or line problems. **BUT THIS IS ALL BESIDE THE POINT**! The ZIL is for implementing fsync() for databases and also the part of NFS that allows servers to reboot without client data loss. It has *NOTHING TO DO* with losing your entire pool. Disabling the ZIL does not make catastrophic pool loss more likely, not even a little bit! Unfortunately some software developer decided to write a bunch of DIRE WARNINGS to SCARE PEOPLE INTO ASSUMPTIONS leading them to use the maximum amount of code of which said developer is justly proud, regardless of whether they're using it for the right reason or not. oddly, I don't think disabling ZIL will make catastrophic loss more likely for databases running above the ZFS, either, because unlike non-COW filesystems ZFS never recovers to a state where writes appear to have happened out-of-order prior to the crash. Yes, disabling the ZIL could break the 'D' in ACID for databases running above that ZFS, but in a way that rolls them back in time, not makes them become corrupt. Running without ZIL is as if a snapshot were taken at each TXG commit time, and on reboot after a crash you recover to the most recent TXG-snapshot that fully committed, thus databases will be ``crash-consistent'' even without the ZIL, unless I'm mistaken. Adding an SSD *does* make catastrophic pool loss more likely, because if you break the SSD and then export the pool, you can never import it again. so, adding an SSD for the ZIL as a suggestive good-little-boy alternative to disabling the ZIL makes catastrophic loss of the entire pool more likely, not less. The advantage of rolling with ZIL is, if you're using NFS you should be able to crash and reboot the server without the clients noticing. Also MTA's that accept messages, databases that confirm orders and bookings, won't lose anything they've accepted or confirmed in the crash (if everything else works). I wish ZIL could be enabled and disabled per filesystem instead of per kernel. pgpxF80aXBJS7.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
On Tue, Jun 30, 2009 at 12:25 PM, Bob Friesenhahnbfrie...@simple.dallas.tx.us wrote: On Mon, 29 Jun 2009, Lejun Zhu wrote: With ZFS write throttle, the number 2.5GB is tunable. From what I've read in the code, it is possible to e.g. set zfs:zfs_write_limit_override = 0x800 (bytes) to make it write 128M instead. This works, and the difference in behavior is profound. Now it is a matter of finding the best value which optimizes both usability and performance. A tuning for 384 MB: # echo zfs_write_limit_override/W0t402653184 | mdb -kw zfs_write_limit_override: 0x3000 = 0x1800 CPU is smoothed out quite a lot and write latencies (as reported by a zio_rw.d dtrace script) are radically different than before. Perfmeter display for 256 MB: http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-256mb.png Perfmeter display for 384 MB: http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-384mb.png Perfmeter display for 768 MB: http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-768mb.png Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Maybe there could be a supported ZFS tuneable (per file system even?) that is optimized for 'background' tasks, or 'foreground'. Beyond that, I will give this tuneable a shot and see how it impacts my own workload. Thanks! -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
On Tue, 30 Jun 2009, Brent Jones wrote: Maybe there could be a supported ZFS tuneable (per file system even?) that is optimized for 'background' tasks, or 'foreground'. Beyond that, I will give this tuneable a shot and see how it impacts my own workload. Note that this issue does not apply at all to NFS service, database service, or any other usage which does synchronous writes. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
On Jun 30, 2009, at 14:08, Bob Friesenhahn wrote: I have seen UPSs help quite a lot for short glitches lasting seconds, or a minute. Otherwise the outage is usually longer than the UPSs can stay up since the problem required human attention. A standby generator is needed for any long outages. Can't remember where I read the claim, but supposedly if power isn't restored within about ten minutes, then it will probably be out for a few hours. If this 'statistic' is true, it would mean that your UPS should last (say) fifteen minutes, and after that you really need a generator. At $WORK we currently have about thirty minutes worth of juice at full load, but as time drags on and we start shutting down less essential stuff we can increase that. The PBX and security system have their own UPSes in their own racks, so there are two layers of battery there. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
CPU is smoothed out quite a lot yes, but the area under the CPU graph is less, so the rate of real work performed is less, so the entire job took longer. (allbeit smoother) Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
Interesting to see that it makes such a difference, but I wonder what effect it has on ZFS's write ordering, and it's attempts to prevent fragmentation? By reducing the write buffer, are you loosing those benefits? Although on the flip side, I guess this is no worse off than any other filesystem, and as SSD drives take off, fragmentation is going to be less and less of an issue. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
David Magda wrote: On Jun 30, 2009, at 14:08, Bob Friesenhahn wrote: I have seen UPSs help quite a lot for short glitches lasting seconds, or a minute. Otherwise the outage is usually longer than the UPSs can stay up since the problem required human attention. A standby generator is needed for any long outages. Can't remember where I read the claim, but supposedly if power isn't restored within about ten minutes, then it will probably be out for a few hours. If this 'statistic' is true, it would mean that your UPS should last (say) fifteen minutes, and after that you really need a generator. Most UPS's from any vendor are designed to run for around ~12 minutes at full load. So that would appear to back that claim up and from my experience that is pretty much on the money... At $WORK we currently have about thirty minutes worth of juice at full load, but as time drags on and we start shutting down less essential stuff we can increase that. The PBX and security system have their own UPSes in their own racks, so there are two layers of battery there. The problem comes when the power cut comes and you aren't there in the middle of the night. Then you either need an automated shutdown system instigated by traps from the UPS (shutting things down in the correct order) or a generator. About here the generator becomes a very good option. The above no generator scenario needs to be consistently tested to maintain it's validity, which is a royal pain in the neck. Gen sets are worth their weight in gold. I can't even think how many times in the last few years they have saved our bacon. (through both planned and unplanned outages) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
On Tue, 30 Jun 2009, Rob Logan wrote: CPU is smoothed out quite a lot yes, but the area under the CPU graph is less, so the rate of real work performed is less, so the entire job took longer. (allbeit smoother) For the purpose of illustration, the case showing the huge sawtooth was when running three processes at once. The period/duration of the sawtooth was pretty similar, but the magnitude changes. I agree that there is a size which provides the best balance of smoothness and application performance. Probably the value should be dialed down to just below the point where the sawtooth occurs. More at 11. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
On Tue, 30 Jun 2009, Bob Friesenhahn wrote: Note that this issue does not apply at all to NFS service, database service, or any other usage which does synchronous writes. I see read starvation with NFS. I was using iometer on a Windows VM, connecting to an NFS mount on a 2008.11 physical box. iometer params: 65% read, 60% random, 8k blocks, 32 outstanding IO requests, 1 worker, 1 target. NFS Testing capacity operationsbandwidth pool used avail read write read write -- - - - - - - data01 59.6G 20.4T 46 24 757K 3.09M data01 59.6G 20.4T 39 24 593K 3.09M data01 59.6G 20.4T 45 25 687K 3.22M data01 59.6G 20.4T 45 23 683K 2.97M data01 59.6G 20.4T 33 23 492K 2.97M data01 59.6G 20.4T 16 41 214K 1.71M data01 59.6G 20.4T 3 2.36K 53.4K 30.4M data01 59.6G 20.4T 1 2.23K 20.3K 29.2M data01 59.6G 20.4T 0 2.24K 30.2K 28.9M data01 59.6G 20.4T 0 1.93K 30.2K 25.1M data01 59.6G 20.4T 0 2.22K 0 28.4M data01 59.7G 20.4T 21295 317K 4.48M data01 59.7G 20.4T 32 12 495K 1.61M data01 59.7G 20.4T 35 25 515K 3.22M data01 59.7G 20.4T 36 11 522K 1.49M data01 59.7G 20.4T 33 24 508K 3.09M data01 59.7G 20.4T 35 23 536K 2.97M data01 59.7G 20.4T 32 23 483K 2.97M data01 59.7G 20.4T 37 37 538K 4.70M While writes are being committed to the ZIL all the time, periodic dumping to the pool still occurs, and during those times reads are starved. Maybe this doesn't happen in the 'real world' ? -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any news on deduplication?
On Tue, 30 Jun 2009, MC wrote: Any news on the ZFS deduplication work being done? I hear Jeff Bonwick might speak about it this month. Yes, it is definately on the agenda for Kernel Conference Australia (http://www.kernelconference.net) - you should come along! -- Andre van Eyssen. mail: an...@purplecow.org jabber: an...@interact.purplecow.org purplecow.org: UNIX for the masses http://www2.purplecow.org purplecow.org: PCOWpix http://pix.purplecow.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, power failures, and UPSes
David Magda wrote: On Jun 30, 2009, at 14:08, Bob Friesenhahn wrote: I have seen UPSs help quite a lot for short glitches lasting seconds, or a minute. Otherwise the outage is usually longer than the UPSs can stay up since the problem required human attention. A standby generator is needed for any long outages. Can't remember where I read the claim, but supposedly if power isn't restored within about ten minutes, then it will probably be out for a few hours. If this 'statistic' is true, it would mean that your UPS should last (say) fifteen minutes, and after that you really need a generator. Or run your systems of DC and get as much backup as you have room (and budget!) for batteries. I once visited a central exchange with 48 hours of battery capacity... -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss