Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 4 apr 2010, at 06.01, Richard Elling wrote: Thank you for your reply! Just wanted to make sure. Do not assume that power outages are the only cause of unclean shutdowns. -- richard Thanks, I have seen that mistake several times with other (file)systems, and hope I'll never ever make it myself! :-) /ragge s ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Diagnosing Permanent Errors
I would like to get some help diagnosing permanent errors on my files. The machine in question has 12 1TB disks connected to an Areca raid card. I installed OpenSolaris build 134 and according to zpool history, created a pool with zpool create bigraid raidz2 c4t0d0 c4t0d1 c4t0d2 c4t0d3 c4t0d4 c4t0d5 c4t0d6 c4t0d7 c4t1d0 c4t1d1 c4t1d2 c4t1d3 I then backed up 806G of files to the machine, and had the backup program verify the files. It failed. The check is continuing to run, but so far it found 4 files where the checksums of the backup files don't match the checksum of the original file. Zpool status shows problems: $ sudo zpool status -v pool: bigraid state: DEGRADED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAMESTATE READ WRITE CKSUM bigraid DEGRADED 0 0 536 raidz2-0 DEGRADED 0 0 3.14K c4t0d0 ONLINE 0 0 0 c4t0d1 ONLINE 0 0 0 c4t0d2 ONLINE 0 0 0 c4t0d3 ONLINE 0 0 0 c4t0d4 ONLINE 0 0 0 c4t0d5 ONLINE 0 0 0 c4t0d6 ONLINE 0 0 0 c4t0d7 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t1d1 ONLINE 0 0 0 c4t1d2 ONLINE 0 0 0 c4t1d3 DEGRADED 0 0 0 too many errors errors: Permanent errors have been detected in the following files: metadata:0x18 metadata:0x3a So, it appears that one of the disks is bad, but if one disk failed, how would a raidz2 pool develop permanent errors? The numbers in the CKSUM column are continuing to grow, but is that because the backup verification is tickling the errors as it runs? Previous postings on permanent errors said to look at fmdump -eV, but that has 437543 lines, and I don't really know how to interpret what I see. I did check the vdev_path with fmdump -eV | grep vdev_path | sort | uniq -c to see if it was only certain disks, but every disk in the array is listed in the file, albeit with different frequencies: 2189vdev_path = /dev/dsk/c4t0d0s0 1077vdev_path = /dev/dsk/c4t0d1s0 1077vdev_path = /dev/dsk/c4t0d2s0 1097vdev_path = /dev/dsk/c4t0d3s0 25vdev_path = /dev/dsk/c4t0d4s0 25vdev_path = /dev/dsk/c4t0d5s0 20vdev_path = /dev/dsk/c4t0d6s0 1072vdev_path = /dev/dsk/c4t0d7s0 1092vdev_path = /dev/dsk/c4t1d0s0 vdev_path = /dev/dsk/c4t1d1s0 2221vdev_path = /dev/dsk/c4t1d2s0 1149vdev_path = /dev/dsk/c4t1d3s0 What should I make of this? All the disks are bad? That seems unlikely. I found another thread http://opensolaris.org/jive/thread.jspa?messageID=399988 where it finally came down to bad memory, so I'll test that. Any other suggestions? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] vPool unavailable but RaidZ1 is online
I am trying to recover a raid set, there are only three drives that are part of the set. I attached a disk and discovered it was bad. It was never part of the raid set. The disk is now gone and when I try to import the pool I get the error listed below. Is there a chance to recover? TIA! Sun Microsystems Inc. SunOS 5.11 snv_112 November 2008 # zpool import pool: vpool id: 14231674658037629037 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-6X config: vpool UNAVAIL missing device raidz1ONLINE c0t0d0 ONLINE c0t1d0 ONLINE c0t2d0 ONLINE Additional devices are known to be part of this pool, though their exact configuration cannot be determined. # bash bash-3.2# zpool import -fF pool: vpool id: 14231674658037629037 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-6X config: vpool UNAVAIL missing device raidz1ONLINE c0t0d0 ONLINE c0t1d0 ONLINE c0t2d0 ONLINE Additional devices are known to be part of this pool, though their exact configuration cannot be determined. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Diagnosing Permanent Errors
On 04/ 4/10 10:00 AM, Willard Korfhage wrote: What should I make of this? All the disks are bad? That seems unlikely. I found another thread http://opensolaris.org/jive/thread.jspa?messageID=399988 where it finally came down to bad memory, so I'll test that. Any other suggestions? It could be the cpu. I had a very bizarre case where the cpu would sometimes miscalculate the checksums of certain files and mostly when the cpu was also busy doing other things. Probably the cache. Days of running memtest and SUNWvts didn't result in any errors because this was a weirdly pattern sensitive problem. However, I too am of the opinion that you shouldn't even think of running zfs without ECC memory (lots of threads about that!) and that this is far, far more likely to be your problem, but I wouldn't count on diagnostics finding it, either. Of course it could be the controller too. For laughs, the cpu calculating bad checksums was discussed in http://opensolaris.org/jive/message.jspa?messageID=469108 (see last message in the thread). If you are seriously contemplating using a system with non-ECC RAM, check out the Google research mentioned in http://opensolaris.org/jive/thread.jspa?messageID=423770 http://www.cs.toronto.edu/%7Ebianca/papers/sigmetrics09.pdf Cheers -- Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Diagnosing Permanent Errors
Yeah, this morning I concluded I really should be running ECC ram. I sometimes wonder why people people don't run ECC ram more frequently. I remember a decade ago, when ram was much, much less dense, people fretted about alpha particles randomly flipping bits, but that seems to have died down. I know, of course, there is some added expense, but browsing on Newegg, the additional RAM cost is pretty minimal. I see 2GB ECC sticks going for about $12 more than similar non-ECC sticks. It's the motherboards that can handle ECC which are the expensive part. Now I've got to see what is a good motherboard for a file server. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Which zfs options are replicated
Hello list, I started playing aroud with Comstar in snv_134. In snv_116 version of ZFS, a new hidden property for the Comstar MetaData has been intoduced (stmf_sbd_lu). This makes it possible to migrate from legacy (iscsi target daemon) to Comstar without data loss, which is great. Before this property you always lost the first 64k of your zvol data where comstar wrote it's metadata - which is bad. When testing send/receive with the latest opensolaris I found that the property is not replicated. Without the send/receive support, it makes it very difficult to use send/receive to perform disaster recovery - why ? Because disk ID's of the devices change on the target side, so clients must be reconfigured, which is difficult with many clients. After investigating this, I tried iscsioptions - the old style property of the legacy target. This is also not replicated. So it seems to be designed in. So I wonder - where can I found information which properties are replicated and which are not ? Can someone help ? Regards, Robert -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] It's alive, and thank you all for the help.
I finally achieved critical mass on enough parts to put my zfs server together. It basically ran the first time, any non-function being my own misunderstandings. I wanted to issue a thank you to those of you who suffered through my questions and pointed me in the right direction. Many pieces of my learning were done right here. I moaned about the difficulty of figuring out what would run opensolaris and zfs before buying hardware. In the end, I used a recipe largely copied from someone who had already built a home server. I'd like to return the favor. This combination of hardware runs with no problems with the Opensolris live CD install: - ASUS M3A78-CM which implies AMD 780V and SB700 ...the onboard ethernet on 100Mb wiring with the rge driver ... the onboard video runs 1024x768 ... I did not try onboard sound, DVI, etc.; don't care, it's a server. - AMD Athlon II 240e - Kingston 800MHz DDR2 unbuffered ECC ram, 2x 2GB - Syba SD-SA2PEX-2IR PCIe x1 dual port SATA card with 3124 driver ... could not get a disk attached to this to boot the system yet - 2x 40GB 2.5 SATA drives, mirrored as rpool for boot - 6x Seagate 750GB raid-rated SATA for main storage - Corsair 400W 80+ rated PS with single 30A +12V rail for spin-up surge - Norco RC470 enclosure, 4U rackmount with between 11 and 15 spaces for 3.5 disks, internal fans, air filter, etc.; it's big and ungainly, but modestly priced and not difficult to do the internal wiring as a result of the size. - the usual clot of cables and adapters No messing about with finding new drivers or things not working (other than the Syba card not booting the system) were found. As measured by a Kill-a-watt, the thing peaks at 200W from the wall at spinup, but settles to 105W at idle, 10W of which is the five bulkhead fans in the case. I suspect that I could pull the plug on maybe 2-3 of them and still not have overheating because of the low idle power. I get a reported 4.06TB of available storage from the six 750GB drives in Raidz2, and another 30GB left over unused in the boot pool. As yet, I have no performance numbers. I suspect that it will be entirely sufficient for my needs, as I don't intend to serve anything with real time requirements. It's intended as a simple, large bit-bucket. Again, thank you to those of you who helped me. R.G. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] mpxio load-balancing...it doesn't work??
I had always thought that with mpxio, it load-balances IO request across your storage ports but this article http://christianbilien.wordpress.com/2007/03/23/storage-array-bottlenecks/ has got me thinking its not true. The available bandwidth is 2 or 4Gb/s (200 or 400MB/s – FC frames are 10 bytes long -) per port. As load balancing software (Powerpath, MPXIO, DMP, etc.) are most of the times used both for redundancy and load balancing, I/Os coming from a host can take advantage of an aggregated bandwidth of two ports. However, reads can use only one path, but writes are duplicated, i.e. a host write ends up as one write on each host port. Is this true? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpxio load-balancing...it doesn't work??
On Sun, Apr 4, 2010 at 8:55 PM, Brad bene...@yahoo.com wrote: I had always thought that with mpxio, it load-balances IO request across your storage ports but this article http://christianbilien.wordpress.com/2007/03/23/storage-array-bottlenecks/has got me thinking its not true. The available bandwidth is 2 or 4Gb/s (200 or 400MB/s – FC frames are 10 bytes long -) per port. As load balancing software (Powerpath, MPXIO, DMP, etc.) are most of the times used both for redundancy and load balancing, I/Os coming from a host can take advantage of an aggregated bandwidth of two ports. However, reads can use only one path, but writes are duplicated, i.e. a host write ends up as one write on each host port. Is this true? -- I have no idea what MPIO stack he's talking about, but I've never heard anything operating like he's talking about. Writes aren't duplicated on each port. The path a read OR write goes down depends on the host-side mpio stack, and how you have it configured to load-balance. It could be simple round-robin, it could be based on queue depth, it could be most recently used, etc. etc. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems with zfs and a STK RAID INT SAS HBA
When running the card in copyback write cache mode, I got horrible performance (with zfs), much worse than with copyback disabled (which I believe should mean it does write-through), when tested with filebench. When I benchmark my disks, I also find that the system is slower with WriteBack enabled. I would not call it much worse, I'd estimate about 10% worse. This, naturally, is counterintuitive. I do have an explanation, however, which is partly conjecture: With the WriteBack enabled, when the OS tells the HBA to write something, it seems to complete instantly. So the OS will issue another, and another, and another. The HBA has no knowledge of the underlying pool data structure, so it cannot consolidate the smaller writes into larger sequential ones. It will brainlessly (or less-brainfully) do as it was told, and write the blocks to precisely the addresses that it was instructed to write. Even if those are many small writes, scattered throughout the platters. ZFS is smarter than that. It's able to consolidate a zillion tiny writes, as well as some larger writes, all into a larger sequential transaction. ZFS has flexibility, in choosing precisely how large a transaction it will create, before sending it to disk. One of the variables used to decide how large the transaction should be is ... Is the disk busy writing, right now? If the disks are still busy, I might as well wait a little longer and continue building up my next sequential block of data to write. If it appears to have completed the previous transaction already, no need to wait any longer. Don't let the disks sit idle. Just send another small write to the disk. Long story short, I think, ZFS simply does a better job of write buffering than the HBA could possibly do. So you benefit by disabling the WriteBack, in order to allow ZFS handle that instead. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] To slice, or not to slice
Your experience is exactly why I suggested ZFS start doing some right sizing if you will. Chop off a bit from the end of any disk so that we're guaranteed to be able to replace drives from different manufacturers. The excuse being no reason to, Sun drives are always of identical size. If your drives did indeed come from Sun, their response is clearly not true. Regardless, I guess I still think it should be done. Figure out what the greatest variation we've seen from drives that are supposedly of the exact same size, and chop it off the end of every disk. I'm betting it's no more than 1GB, and probably less than that. When we're talking about a 2TB drive, I'm willing to give up a gig to be guaranteed I won't have any issues when it comes time to swap it out. My disks are sun branded intel disks. Same model number. The first replacement disk had a newer firmware, so we jumped to conclusion that was the cause of the problem, and caused oracle plenty of trouble in locating an older firmware drive in some warehouse somewhere. But the second replacement disk is truly identical to the original. Same firmware and everything. Only the serial number is different. Still the same problem behavior. I have reason to believe that both the drive, and the OS are correct. I have suspicion that the HBA simply handled the creation of this volume somehow differently than how it handled the original. Don't know the answer for sure yet. Either way, yes, I would love zpool to automatically waste a little space at the end of the drive, to avoid this sort of situation, whether it's caused by drive manufacturers, or HBA, or any other factor. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] To slice, or not to slice
CR 6844090, zfs should be able to mirror to a smaller disk http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 b117, June 2009 Awesome. Now if someone would only port that to solaris, I'd be a happy man. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] To slice, or not to slice
On Sun, Apr 4, 2010 at 9:46 PM, Edward Ned Harvey solar...@nedharvey.comwrote: CR 6844090, zfs should be able to mirror to a smaller disk http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 b117, June 2009 Awesome. Now if someone would only port that to solaris, I'd be a happy man. ;-) Have you tried pointing that bug out to the support engineers who have your case at Oracle? If the fixed code is already out there, it's just a matter of porting the code, right? :) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Hmm, when you did the write-back test was the ZIL SSD included in the write-back? What I was proposing was write-back only on the disks, and ZIL SSD with no write-back. The tests I did were: All disks write-through All disks write-back With/without SSD for ZIL All the permutations of the above. So, unfortunately, no, I didn't test with WriteBack enabled only for spindles, and WriteThrough on SSD. It has been suggested, and this is actually what I now believe based on my experience, that precisely the opposite would be the better configuration. If the spindles are configured WriteThrough, while the SSD is configured WriteBack. I believe would be optimal. If I get the opportunity to test further, I'm interested and I will. But who knows when/if that will happen. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Actually, It's my experience that Sun (and other vendors) do exactly that for you when you buy their parts - at least for rotating drives, I have no experience with SSD's. The Sun disk label shipped on all the drives is setup to make the drive the standard size for that sun part number. They have to do this since they (for many reasons) have many sources (diff. vendors, even diff. parts from the same vendor) for the actual disks they use for a particular Sun part number. Actually, if there is a fdisk partition and/or disklabel on a drive when it arrives, I'm pretty sure that's irrelevant. Because when I first connect a new drive to the HBA, of course the HBA has to sign and initialize the drive at a lower level than what the OS normally sees. So unless I do some sort of special operation to tell the HBA to preserve/import a foreign disk, the HBA will make the disk blank before the OS sees it anyway. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] To slice, or not to slice
There is some question about performance. Is there any additional overhead caused by using a slice instead of the whole physical device? No. If the disk is only used for ZFS, then it is ok to enable volatile disk write caching if the disk also supports write cache flush requests. If the disk is shared with UFS, then it is not ok to enable volatile disk write caching. Thank you. If you don't know the answer to this off the top of your head, I'll go attempt the internet, but thought you might just know the answer in 2 seconds ... Assuming the disk's write cache is disabled because of the slice (as documented in the Best Practices Guide) how do you enable it? I would only be using ZFS on the drive. The existence of a slice is purely to avoid future mirror problems and the like. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] To slice, or not to slice
I haven't taken that approach, but I guess I'll give it a try. From: Tim Cook [mailto:t...@cook.ms] Sent: Sunday, April 04, 2010 11:00 PM To: Edward Ned Harvey Cc: Richard Elling; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] To slice, or not to slice On Sun, Apr 4, 2010 at 9:46 PM, Edward Ned Harvey solar...@nedharvey.com wrote: CR 6844090, zfs should be able to mirror to a smaller disk http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 b117, June 2009 Awesome. Now if someone would only port that to solaris, I'd be a happy man. ;-) Have you tried pointing that bug out to the support engineers who have your case at Oracle? If the fixed code is already out there, it's just a matter of porting the code, right? :) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] writeback vs writethrough [was: Sun Flash Accelerator F20 numbers]
On Apr 2, 2010, at 5:03 AM, Edward Ned Harvey wrote: Seriously, all disks configured WriteThrough (spindle and SSD disks alike) using the dedicated ZIL SSD device, very noticeably faster than enabling the WriteBack. What do you get with both SSD ZIL and WriteBack disks enabled? I mean if you have both why not use both? Then both async and sync IO benefits. Interesting, but unfortunately false. Soon I'll post the results here. I just need to package them in a way suitable to give the public, and stick it on a website. But I'm fighting IT fires for now and haven't had the time yet. Roughly speaking, the following are approximately representative. Of course it varies based on tweaks of the benchmark and stuff like that. Stripe 3 mirrors write through: 450-780 IOPS Stripe 3 mirrors write back: 1030-2130 IOPS Stripe 3 mirrors write back + SSD ZIL: 1220-2480 IOPS Stripe 3 mirrors write through + SSD ZIL: 1840-2490 IOPS Thanks for sharing these interesting numbers. Overall, I would say WriteBack is 2-3 times faster than naked disks. SSD ZIL is 3-4 times faster than naked disk. And for some reason, having the WriteBack enabled while you have SSD ZIL actually hurts performance by approx 10%. You're better off to use the SSD ZIL with disks in Write Through mode. YMMV. The write workload for ZFS is best characterized by looking at the txg commit. In a very short period of time ZFS sends a lot[1] of write I/O to the vdevs. It is not surprising that this can blow through the relatively small caches on controllers. Once you blow through the cache, then the [in]efficiency of the disks behind the cache is experienced as well as the [in]efficiency of the cache controller. Alas, little public information seems to be published regarding how those caches work. Changing to write-through effectively changes the G/M/1 queue [2] at the controller to a G/M/n queue at the disks. Sorta like: 1. write-back controller (ZFS) N*#vdev I/Os -- controller -- disks (ZFS) M/M/n -- G/M/1 -- M/M/n 2. write-through controller (ZFS) N*#vdev I/Os -- disks (ZFS) M/M/n -- G/M/n This can simply be a case of the middleman becoming the bottleneck. [1] a lot means up to 35 I/Os per vdev for older releases, 4-10 I/Os per vdev for more recent releases [2] queuing theory enthusiasts will note that ZFS writes do not exhibit an exponential arrival rate at the controller or disks except for sync writes. That result is surprising to me. But I have a theory to explain it. When you have WriteBack enabled, the OS issues a small write, and the HBA immediately returns to the OS: Yes, it's on nonvolatile storage. So the OS quickly gives it another, and another, until the HBA write cache is full. Now the HBA faces the task of writing all those tiny writes to disk, and the HBA must simply follow orders, writing a tiny chunk to the sector it said it would write, and so on. The HBA cannot effectively consolidate the small writes into a larger sequential block write. But if you have the WriteBack disabled, and you have a SSD for ZIL, then ZFS can log the tiny operation on SSD, and immediately return to the process: Yes, it's on nonvolatile storage. So the application can issue another, and another, and another. ZFS is smart enough to aggregate all these tiny write operations into a single larger sequential write before sending it to the spindle disks. I agree, though this paragraph has 3 different thoughts embedded. Taken separately: 1. queuing surprises people :-) 2. writeback inserts a middleman with its own queue 3. separate logs radically change the write workload seen by the controller and disks Long story short, the evidence suggests if you have SSD ZIL, you're better off without WriteBack on the HBA. And I conjecture the reasoning behind it is because ZFS can write buffer better than the HBA can. I think the way the separate log works is orthogonal. However, not having a separate log can influence the ability of the controller and disks to respond to read requests during this workload. Perhaps this is a long way around to saying that a well tuned system will have harmony among its parts. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] To slice, or not to slice
On Apr 4, 2010, at 8:11 PM, Edward Ned Harvey wrote: There is some question about performance. Is there any additional overhead caused by using a slice instead of the whole physical device? No. If the disk is only used for ZFS, then it is ok to enable volatile disk write caching if the disk also supports write cache flush requests. If the disk is shared with UFS, then it is not ok to enable volatile disk write caching. Thank you. If you don't know the answer to this off the top of your head, I'll go attempt the internet, but thought you might just know the answer in 2 seconds ... Assuming the disk's write cache is disabled because of the slice (as documented in the Best Practices Guide) how do you enable it? I would only be using ZFS on the drive. The existence of a slice is purely to avoid future mirror problems and the like. This is a trick question -- some drives ignore efforts to disable the write cache :-P Use format -e for access to the expert mode where you can enable the write cache. As for performance benefits, YMMV. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS getting slower over time
I have a problem with my zfs system, it's getting slower and slower over time. When the OpenSolaris machine is rebooted and just started I get about 30-35MB/s in read and write but after 4-8 hours I'm down to maybe 10MB/s and it varies between 4-18MB/s. Now, if i reboot the machine it's all gone and I have perfect speed again. Does it have something to do with the cache? I use a separate SSD as a cache disk. Anyways, here's my setup: OpenSolaris 1.34 dev C2D with 4GB ram 4x 1,5TB WD SATA drives and 1x Corsair 32GB SSD as cache Doesn't seem to matter if I copy files locally on the computer or if I use CIFS, still getting the same degredation in speed. Last night I left my workstation copying files to/from the server for about 8 hours and you could see the performance dropping from about 28MB/s down to under 10MB/s after a couple of hours. Any suggestion on what to do? I've tried some tuning by setting the following variables in /etc/system: set zfs:zfs_txg_timeout = 1 set zfs:zfs_vdev_max_pending = 1 But it doesn't seem to make any difference. Regards /Marcus Wilhelmsson, Kalmar, Sweden Message was edited by: tanngens -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss