Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, 21 May 2010, David Dyer-Bennet wrote: To be comfortable (I don't ask for know for a certainty; I'm not sure that exists outside of faith), I want a claim by the manufacturer and multiple outside tests in significant journals -- which could be the blog of somebody I trusted, as well as actual magazines and such. Ideally, certainly if it's important, I'd then verify the tests myself. For me, know for a certainty means that the feature is clearly specified in the formal specification sheet for the product, and the vendor has historically published reliable specification sheets. This may not be the same as money in the bank, but it is better than relying on thoughts from some blog posting. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, 21 May 2010, Brandon High wrote: My understanding is that the controller contains enough cache to buffer enough data to write a complete erase block size, eliminating the need to read / erase / write that a partial block write entails. It's reported to do a copy-on-write, so it doesn't need to do a read of existing blocks when making changes, which gives it such high iops - Even random writes are turned into sequential writes (much like how ZFS works) of entire erase blocks. The excessive spare area is used to ensure that there are always full pages free to write to. (Some vendors are releasing consumer drives with 60/120/240 GB, using 7% reserved space rather than the 27% that the original drives ship with.) FLASH is useless as working space since it does not behave like RAM so every SSD needs to have some RAM for temporary storage of data. This COW approach seems nice except that it would appear to inflate performance by only considering a specific magic block size and alignment. Other block sizes and alignments would require that existing data be read so that the new block content can be constructed. Also, the blazing fast write speed (which depends on plenty of already erased blocks) would stop once the spare space in the SSD has been consumed. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, 21 May 2010, Don wrote: You know- it would probably be sufficient to provide the SSD with _just_ a big capacitor bank. If the host lost power it would stop writing and if the SSD still had power it would probably use the idle time to flush it's buffers. Then there would be world peace! This makes the assumption that an SSD will want to flush its write cache as soon as possible rather than just letting it sit there waiting for more data. This is probably not a good assumption. If the OS sends 512 bytes of data but the SSD block size is 4K, it is reasonable for the SSD to wait for 3584 more contiguous bytes of data before it bothers to write anything. Writes increase the wear on the flash and writes require a slow erase cycle so it is reasonable for SSDs to buffer as much data in their write cache as possible before writing anything. An advanced SSD could write non-contiguous sectors in a SSD page and then use a sort of lookup table to know where the sectors actually are. Regardless, under slow write conditions, it is is definitely valuable to buffer the data for a while in the hope that more related data will appear, or the data might even be overwritten. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
If you do not care about this NFS problem (or the others) then maybe you can just disable the ZIL. It is a matter of working through step 1. Working through STEP 1 might be ``doesn't affect us. Disable ZIL.'' Or it might be ``get slog with supercap''. STEP 1 will never be ``plug in OCZ Vertex cheaposlog that ignores cacheflush'' if you are doing it right. And Step 2 has nothing to do with anything yet until we finish STEP 1 and the insane failure cases. AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. Also: http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html Another benefit of SandForce's architecture is that the SSD keeps information on the NAND grid and removes the need for a separate cache buffer DRAM module. The result is a faster transaction, albeit at the expense of total storage capacity. So if I interpret them correctly, what they chose to do with the current incarnation of the architecture is actually reserve some of the primary memory capacity for I/O transaction management. In plain English, if the system gets interrupted either by power or by a crash, when it initializes the next time, it can read from its transaction space and resume where it left off. This makes it durable. So, OCZ Vertex 2 seems to be a good choice for ZIL. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. Also: http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html Another benefit of SandForce's architecture is that the SSD keeps information on the NAND grid and removes the need for a separate cache buffer DRAM module. The result is a faster transaction, albeit at the expense of total storage capacity. So if I interpret them correctly, what they chose to do with the current incarnation of the architecture is actually reserve some of the primary memory capacity for I/O transaction management. In plain English, if the system gets interrupted either by power or by a crash, when it initializes the next time, it can read from its transaction space and resume where it left off. This makes it durable. Here is a detailed explanation of the SandForce controllers: http://www.anandtech.com/show/3661/understanding-sandforces-sf1200-sf1500-not-all-drives-are-equal So the SF-1500 is enterprise class and relies on a supercap, the SF-1200 is consumer class and does not rely on a supercap. The SF-1200 firmware on the other hand doesn’t assume the presence of a large capacitor to keep the controller/NAND powered long enough to complete all writes in the event of a power failure. As such it does more frequent check pointing and doesn’t guarantee the write in progress will complete before it’s acknowledged. As I understand it, the SF-1200 will ack the sync write only after it is written to flash thus reducing write performance. There is an interesting part about firmwares and OCZ having an exclusive firmware in the Vertex 2 series which based on the SF-1200 but its random write IOPS is not capped at 10K (while other vendors and other SSDs from OCZ using the SF-1200 are capped, unless they sell the drive with the RC firmware which is for OEM evaluation and not production ready but does not contain the IOPS cap). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
SNIP a whole lot of ZIL/SLOG discussion Hi guys. yep I know about the ZIL, and SSD Slogs. While setting Nextenta up it offered to disable the ZIL entirely. For now I left it on. In the end (hopefully for only specifc filesystems - once that feature is released.) I'll end up disabling the ZIL for our software builds since: 1) The builds are disposable - We only need to save them if they finish, and we can restart them if needed. 2) The build servers are not on UPS so a power failure is likely to make the clients lose all state and need to restart anyway. But, This issue I've seen with Nexenta, is not due to the ZIL. It runs until it literally crashes the machine. It's not just slow, It brings the machine to it's knees. I beleive it does have something to do with exhausting memory though. As Erast says it maybe the IPS driver (though I've used that on b130 of SXCE without issues,) or who knows what else. I did download some updates from Nexenta yesterday. I'm going to try to retest today or tomorrow. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, 21 May 2010, Miika Vesti wrote: AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. So, OCZ Vertex 2 seems to be a good choice for ZIL. There seem to be quite a lot of blind assumptions in the above. The only good choice for ZIL is when you know for a certainty and not assumptions based on 3rd party articles and blog postings. Otherwise it is like assuming that if you jump through an open window that there will be firemen down below to catch you. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, May 21, 2010 10:19, Bob Friesenhahn wrote: On Fri, 21 May 2010, Miika Vesti wrote: AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. So, OCZ Vertex 2 seems to be a good choice for ZIL. There seem to be quite a lot of blind assumptions in the above. The only good choice for ZIL is when you know for a certainty and not assumptions based on 3rd party articles and blog postings. Otherwise it is like assuming that if you jump through an open window that there will be firemen down below to catch you. Just how DOES one know something for a certainty, anyway? I've seen LOTS of people mess up performance testing in ways that gave them very wrong answers; relying solely on your own testing is as foolish as relying on a couple of random blog posts. To be comfortable (I don't ask for know for a certainty; I'm not sure that exists outside of faith), I want a claim by the manufacturer and multiple outside tests in significant journals -- which could be the blog of somebody I trusted, as well as actual magazines and such. Ideally, certainly if it's important, I'd then verify the tests myself. There aren't enough hours in the day, so I often get by with less. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
This is intresting. I thought all Vertex 2 SSDs are good choices for ZIL but this does not seem to be the case. According to http://www.legitreviews.com/article/1208/1/ Vertex 2 LE, Vertex 2 Pro and Vertex 2 EX are SF-1500 based but Vertex 2 (without any suffix) is SF-1200 based. Here is the table: ModelController Max Read Max Write IOPS Vertex 2 SF-1200270MB/s 260MB/s 9500 Vertex 2 LE SF-1500270MB/s 250MB/s ? Vertex 2 Pro SF-1500280MB/s 270MB/s 19000 Vertex 2 EX SF-1500280MB/s 270MB/s 25000 21.05.2010 17:09, Attila Mravik kirjoitti: AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. Also: http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html Another benefit of SandForce's architecture is that the SSD keeps information on the NAND grid and removes the need for a separate cache buffer DRAM module. The result is a faster transaction, albeit at the expense of total storage capacity. So if I interpret them correctly, what they chose to do with the current incarnation of the architecture is actually reserve some of the primary memory capacity for I/O transaction management. In plain English, if the system gets interrupted either by power or by a crash, when it initializes the next time, it can read from its transaction space and resume where it left off. This makes it durable. Here is a detailed explanation of the SandForce controllers: http://www.anandtech.com/show/3661/understanding-sandforces-sf1200-sf1500-not-all-drives-are-equal So the SF-1500 is enterprise class and relies on a supercap, the SF-1200 is consumer class and does not rely on a supercap. The SF-1200 firmware on the other hand doesn’t assume the presence of a large capacitor to keep the controller/NAND powered long enough to complete all writes in the event of a power failure. As such it does more frequent check pointing and doesn’t guarantee the write in progress will complete before it’s acknowledged. As I understand it, the SF-1200 will ack the sync write only after it is written to flash thus reducing write performance. There is an interesting part about firmwares and OCZ having an exclusive firmware in the Vertex 2 series which based on the SF-1200 but its random write IOPS is not capped at 10K (while other vendors and other SSDs from OCZ using the SF-1200 are capped, unless they sell the drive with the RC firmware which is for OEM evaluation and not production ready but does not contain the IOPS cap). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Thu, May 20, 2010 at 2:23 PM, Miika Vesti miika.ve...@trivore.com wrote: I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. I've read conflicting reports that the controller contains a small DRAM cache. So while it doesn't rely on an external DRAM cache, it does have one: http://www.legitreviews.com/article/1299/2/ As we noted, the Vertex 2 doesn't have any cache chips on it as that is because the SandForce controller itself is said to carry a small cache inside that is a number of megabytes in size. Another benefit of SandForce's architecture is that the SSD keeps information on the NAND grid and removes the need for a separate cache buffer DRAM module. The result is a faster transaction, albeit at the expense of total storage capacity. Again, conflicting reports indicate otherwise. http://www.legitreviews.com/article/1299/2/ That adds up to 128GB of storage space, but only 93.1GB of it will be usable space! The 'hidden' capacity is used for wear leveling, which is crucial to keeping SSDs running as long as possible. My understanding is that the controller contains enough cache to buffer enough data to write a complete erase block size, eliminating the need to read / erase / write that a partial block write entails. It's reported to do a copy-on-write, so it doesn't need to do a read of existing blocks when making changes, which gives it such high iops - Even random writes are turned into sequential writes (much like how ZFS works) of entire erase blocks. The excessive spare area is used to ensure that there are always full pages free to write to. (Some vendors are releasing consumer drives with 60/120/240 GB, using 7% reserved space rather than the 27% that the original drives ship with.) With an unexpected power loss, you could still lose any data that's cached in the controller, or any uncommitted changes that have been partially written to the NAND I hate having to rely on sites like Legit Reviews and Anandtech for technical data, but there don't seem to be non-fanboy sites doing comprehensive reviews of the drives ... -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
dd == David Dyer-Bennet d...@dd-b.net writes: dd Just how DOES one know something for a certainty, anyway? science. Do a test like Lutz did on X25M G2. see list archives 2010-01-10. pgpeiR4DYODbj.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Now, if someone would make a Battery FOB, that gives broken SSD 60 seconds of power, then we could use the consumer SSD's in servers again with real value instead of CYA value. You know- it would probably be sufficient to provide the SSD with _just_ a big capacitor bank. If the host lost power it would stop writing and if the SSD still had power it would probably use the idle time to flush it's buffers. Then there would be world peace! Yeah- got a little carried away there. Still this seems like an experiment I'm going to have to try on my home server out of curiosity more than anything else :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Hi all, I recently installed Nexenta Community 3.0.2 on one of my servers: IBM eSeries X346 2.8Ghz Xeon 12GB DDR2 RAM 1 builtin BGE interface for management 4 port Intel GigE card aggregated for Data IBM ServRAID 7k with 256MB BB Cache with (isp driver) 6 RAID0 single drive LUNS (so I can use the Cache) 1 18GB LUN for the rpool 5 300GB LUN for the data pool 1 RAIDZ1 pool from the 5 300GB drives. 4 test filesystems 1 No Dedup, No Compression 1 DeDup, No Compression 1 No DeDup, Compression 1 DeDup, Compression This is pretty old hardware, so I wasn't expecting miracles, but I thought I'd give it a shot. My work load is NFS service to software build servers (cvs checkouts, un tarring files, compiling, etc.) I'm hoping the many CVS checkout trees will lend themselves to DeDup well, and I know source code should compress easily. I setup one client with a single GigE connection, mounted the four file systems (plus one from the netapp we have here) and proceeded to write a loop to time both un-tarring the gcc-4.3.3 sources to those 5 filesystems, and to 1 local directory, and to rm -rf the sources too. The tar took 28 seconds and 10 seconds to remove in the local dir, then on the first ZFS/NFS filesystem mount, it took basically forever and hung the Nexenta server. I was watching it go on the web admin page and it all looked fine for a while, then the client started reporting 'NFS Server not responding, still trying...' For a while, there were Also 'NFS Server OK' messages too, and the Web GUI remained responsive. Eventually The OK messages stopped, and the Web GUI froze. I went an rebooted the NFS client thinking that id the requests stopped the Server might catch up, but it never started responding again. I was only untarring a file.. How did this bring the machine down? I hadn't even gotten to the FS's that had SeSup or Compression turned on, so those shouldn't have affected things - yet. Any ideas? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Hi Kyle, very likely that you hit driver bug in isp. After the reboot, take a look on /var/adm/messages file - anything related might shed some light. I wouldn't suspect Intel GigE card - fairly good one and driver is very stable. Also, some upgrades posted, make sure the kernel displays 134e after the reboot into the new upgrade checkpoint. The upgrade command: nmc$ setup appliance upgrade On 05/20/2010 08:05 AM, Kyle McDonald wrote: Hi all, I recently installed Nexenta Community 3.0.2 on one of my servers: IBM eSeries X346 2.8Ghz Xeon 12GB DDR2 RAM 1 builtin BGE interface for management 4 port Intel GigE card aggregated for Data IBM ServRAID 7k with 256MB BB Cache with (isp driver) 6 RAID0 single drive LUNS (so I can use the Cache) 1 18GB LUN for the rpool 5 300GB LUN for the data pool 1 RAIDZ1 pool from the 5 300GB drives. 4 test filesystems 1 No Dedup, No Compression 1 DeDup, No Compression 1 No DeDup, Compression 1 DeDup, Compression This is pretty old hardware, so I wasn't expecting miracles, but I thought I'd give it a shot. My work load is NFS service to software build servers (cvs checkouts, un tarring files, compiling, etc.) I'm hoping the many CVS checkout trees will lend themselves to DeDup well, and I know source code should compress easily. I setup one client with a single GigE connection, mounted the four file systems (plus one from the netapp we have here) and proceeded to write a loop to time both un-tarring the gcc-4.3.3 sources to those 5 filesystems, and to 1 local directory, and to rm -rf the sources too. The tar took 28 seconds and 10 seconds to remove in the local dir, then on the first ZFS/NFS filesystem mount, it took basically forever and hung the Nexenta server. I was watching it go on the web admin page and it all looked fine for a while, then the client started reporting 'NFS Server not responding, still trying...' For a while, there were Also 'NFS Server OK' messages too, and the Web GUI remained responsive. Eventually The OK messages stopped, and the Web GUI froze. I went an rebooted the NFS client thinking that id the requests stopped the Server might catch up, but it never started responding again. I was only untarring a file.. How did this bring the machine down? I hadn't even gotten to the FS's that had SeSup or Compression turned on, so those shouldn't have affected things - yet. Any ideas? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Disable ZIL and test again. NFS does a lot of sync writes and kills performance. Disabling ZIL (or using the synchronicity option if a build with that ever comes out) will prevent that behavior, and should get your NFS performance close to local. It's up to you if you want to leave it that way. There are reasons not to as well. NFS clients can get corrupted views of the filesystem should the server go down before a write flush is completed. ZIL prevents that problem. In my case, the clients aren't on a UPS while the server is, so it's not an issue. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
- Travis Tabbal tra...@tabbal.net skrev: Disable ZIL and test again. NFS does a lot of sync writes and kills performance. Disabling ZIL (or using the synchronicity option if a build with that ever comes out) will prevent that behavior, and should get your NFS performance close to local. It's up to you if you want to leave it that way. There are reasons not to as well. NFS clients can get corrupted views of the filesystem should the server go down before a write flush is completed. ZIL prevents that problem. In my case, the clients aren't on a UPS while the server is, so it's not an issue. :) Disabling ZIL is, according to ZFS best practice, NOT recommended. Get some SSD for the Zil instead, preferably mirrored. You won't need a lot, ZIL never uses more than half the RAM size Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Thu, May 20, 2010 13:58, Roy Sigurd Karlsbakk wrote: - Travis Tabbal tra...@tabbal.net skrev: Disable ZIL and test again. NFS does a lot of sync writes and kills performance. Disabling ZIL (or using the synchronicity option if a build with that ever comes out) will prevent that behavior, and should get your NFS performance close to local. It's up to you if you want to leave it that way. There are reasons not to as well. NFS clients can get corrupted views of the filesystem should the server go down before a write flush is completed. ZIL prevents that problem. In my case, the clients aren't on a UPS while the server is, so it's not an issue. :) Disabling ZIL is, according to ZFS best practice, NOT recommended. Get some SSD for the Zil instead, preferably mirrored. You won't need a lot, ZIL never uses more than half the RAM size Disabling the ZIL is an easy way to TEST whether a ZIL would be helpful. If things speed up after turning it off, then you'd turn it back on, and go and purchase an SSD. There's no sense spending money if it won't fix the problem. To the OP, see Section 2.7 (Disabling the ZIL (Don't)) of: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide As mentioned, you do NOT want to run with this in production, but it is a quick way to check. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
rsk == Roy Sigurd Karlsbakk r...@karlsbakk.net writes: dm == David Magda dma...@ee.ryerson.ca writes: tt == Travis Tabbal tra...@tabbal.net writes: rsk Disabling ZIL is, according to ZFS best practice, NOT rsk recommended. dm As mentioned, you do NOT want to run with this in production, dm but it is a quick way to check. REPEAT: I disagree. Once you associate the disasterizing and dire warnings from the developer's advice-wiki with the specific problems that ZIL-disabling causes for real sysadmins rather than abstract notions of ``POSIX'' or ``the application'', a lot more people end up wanting to disable their ZIL's. In fact, most of the SSD's sold seem to be relying on exactly the trick disabled-ZIL ZFS does for much of their high performance, if not their feasibility within their price bracket period: provide a guarantee of write ordering without durability, and many applications are just, poof, happy. If the SSD's arrange that no writes are reordered across a SYNC CACHE, but don't bother actually providing durability, end uzarZ will ``OMG windows fast and no corruption.'' -- ssd sales. The ``do-not-disable-buy-SSD!!!1!'' advice thus translates to ``buy one of these broken SSD's, and you will be basically happy. Almost everyone is. When you aren't, we can blame the SSD instead of ZFS.'' all that bottlenecked SATA traffic host-SSD is just CYA and of no real value (except for kernel panics). Now, if someone would make a Battery FOB, that gives broken SSD 60 seconds of power, then we could use the consumer crap SSD's in servers again with real value instead of CYA value. FOB should work like this: == RUNNING == battery ,--- SATA port: pass -. recharged? / power to SSD: on\ input /\ power ( . lost | | . input ,---\ v power / v restored / =power lost= =power restored= . =hold-down = =hold down =-- SATA port: block power to SSD: off power to SSD: on ^ | | | . . 60 seconds input\/ elapsed power . =power off= , restored power to SSD: off - The device must know when its battery has gone bad and stick itself in ``power restored hold down'' state. Knowing when the battery is bad may require more states to test the battery, but this is the general idea. I think it would be much cheaper to build an SSD with supercap, and simpler because you can assume the supercap is good forever instead of testing it. However because of ``market forces'' the FOB approach might sell for cheaper because the FOB cannot be tied to the SSD and used as a way to segment the market. If there are 2 companies making only FOB's and not making SSD's, only then competition will work like people want it to. Otherwise FOBs will be $1000 or something because only ``enterprise'' users are smart/dumb enough to demand them. Normally I would have a problem that the FOB and SSD are separable, but see, the FOB and SSD can be put together with double-sided tape: the tape only has to hold for 60 seconds after $event, and there's no way to separate the two by tripping over a cord. You can safely move SSD+FOB from one chassis to another without fearing all is lost if you jiggle the connection. I think it's okay overall. tt This risk is mostly mitigated by UPS backup and auto-shutdown tt when the UPS detects power loss, correct? no no it's about cutting off a class of failure cases and constraining ourselves to relatively sane forms of failure. We are not haggling about NO FAILURES EVAR yet. First, for STEP 1 we isolate the insane kinds of failure that cost us days or months of data rather than just a few seconds, the kinds that call for crazy unplannable ad-hoc recovery methods like `Viktor plz help me' and ``is anyone here a Postgres data recovery expert?'' and ``is there a way I can invalidate the batch of billing auth requests I uploaded yesterday so I can rerun it without double-billing anyone?'' For STEP 1 we make the insane fail almost impossible through clever software and planning. A UPS never never ever qualifies as ``almost impossible''. Then, once that's done, we come back for STEP 2 where we try to minimize the sane failures also, and for step 2 things like UPS might be useful. For STEP 2 it makes sense to talk about percent availability, probability of failure, length of time to recover from Scenario X. but in STEP 1 all the failures are insane