Re: Re[2]: [zfs-discuss] HowTo: UPS + ZFS NFS + no fsync
Robert Milkowski writes: Hello Wee, Thursday, April 26, 2007, 4:21:00 PM, you wrote: WYT On 4/26/07, cedric briner [EMAIL PROTECTED] wrote: okay let'say that it is not. :) Imagine that I setup a box: - with Solaris - with many HDs (directly attached). - use ZFS as the FS - export the Data with NFS - on an UPS. Then after reading the : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Complex_Storage_Considerations I wonder if there is a way to tell the OS to ignore the fsync flush commands since they are likely to survive a power outage. WYT Cedric, WYT You do not want to ignore syncs from ZFS if your harddisk is directly WYT attached to the server. As the document mentioned, that is really for WYT Complex Storage with NVRAM where flush is not necessary. What?? Setting zil_disable=1 has nothing to do with NVRAM in storage arrays. It disables ZIL in ZFS wich means that if application calls fsync() or opens a file with O_DSYNC, etc. then ZFS won't honor it (return immediatelly without commiting to stable storage). Once txg group closes data will be written to disks and SCSI write cache flush commands will be send. Setting zil_disable to 1 is not that bad actually, and if someone doesn't care to lose some last N seconds of data in case of server crash (however zfs itself will be consistent) it can actually speed up nfs operations a lot. ...set zil_disable...speed up nfs...at the expense of a risk of corruption of the NFS client's view. We must never forget this. zil_disable is really not an option IMO. -r -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] HowTo: UPS + ZFS NFS + no fsync
Wee Yeh Tan writes: Robert, On 4/27/07, Robert Milkowski [EMAIL PROTECTED] wrote: Hello Wee, Thursday, April 26, 2007, 4:21:00 PM, you wrote: WYT On 4/26/07, cedric briner [EMAIL PROTECTED] wrote: okay let'say that it is not. :) Imagine that I setup a box: - with Solaris - with many HDs (directly attached). - use ZFS as the FS - export the Data with NFS - on an UPS. Then after reading the : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Complex_Storage_Considerations I wonder if there is a way to tell the OS to ignore the fsync flush commands since they are likely to survive a power outage. WYT Cedric, WYT You do not want to ignore syncs from ZFS if your harddisk is directly WYT attached to the server. As the document mentioned, that is really for WYT Complex Storage with NVRAM where flush is not necessary. What?? Setting zil_disable=1 has nothing to do with NVRAM in storage arrays. It disables ZIL in ZFS wich means that if application calls fsync() or opens a file with O_DSYNC, etc. then ZFS won't honor it (return immediatelly without commiting to stable storage). Wait a minute. Are we talking about zil_disable or zfs_noflush (or zfs_nocacheflush)? The article quoted was about configuring the array to ignore flush commands or device specific zfs_noflush, not zil_disable. I agree that zil_disable is okay from FS view (correctness still depends on the application), but zfs_noflush is dangerous. For me, both are dangerous. zil_disable can cause immense pain to applications and NFS clients. I don't see how anyone can recommend itwithout mentioning the risk of application/NFS corruption. zfs_nocacheflush is also unsafe. It opens a risk of pool corruption ! But, if you have *all* of your pooled data on safe NVRAM protected storage, and that you don't find a way to tell the storage to ignore cache flush requests, you might want to set the variable temporarily until the SYNC_NV thing is sorted out. Then make sure, nobody imports the tunable elsewhere without full understanding and make sure noone creates a new pool with non-NVRAM storage. Since those things are not under anyones control, it's not a good idea to spread these kind of recommendations. -- Just me, Wire ... Blog: prstat.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HowTo: UPS + ZFS NFS + no fsync
cedric briner writes: You might set zil_disable to 1 (_then_ mount the fs to be shared). But you're still exposed to OS crashes; those would still corrupt your nfs clients. Just to better understand ? (I know that I'm quite slow :( ) when you say _nfs clients_ are you specifically talking of: - the nfs client program itself : (lockd, statd) meaning that you can have a stale nfs handle or other things ? - the host acting as an nfs client meaning that the nfs client service works, but you would have corrupt the data that the software use with nfs's mounted disk. It's rather applications running on the client. Basically, we would have data loss from application's perspective running on client without any sign of errors. It's a bit like having a disk that would drop a write request and not signal an error. If I'm digging and digging against this ZIL and NFS UFS with write cache, that's because I do not understand which kind of problems that can occurs. What I read in general is statement like _corruption_ of the client's point of view.. but what does that means ? is the shema of what can happen is : - the application on the nfs client side write data on the nfs server - meanwhile the nfs server crashes so: - the data are not stored - the application on the nfs client think that the data are stored ! :( - when the server is up again - the nfs client re-see the data - the application on the nfs client side find itself with data in the previous state of its lasts writes. Am I right ? The scenario I see would be on the client, download some software (a tar file). tar x make The tar succeeded with no errors at all. Behind our back during the tar x, the server rebooted. No big deal normally. But with zil_disable on the server, the make fails, either because some files from the original tar are missing or parts of files. So with ZIL: - The application has the ability to do things in the right way. So even of a nfs-server crash, the application on the nfs-client side can rely on is own data. So without ZIL: - The application has not the ability to do things in the right way. And we can have a corruption of data. But that doesn't mean corruption of the FS. It means that the data were partially written and some are missing. Sounds right. For the love of God do NOT do stuff like that. Just create ZFS on a pile of disks the way that we should, with the write cache disabled on all the disks and with redundancy in the ZPool config .. nothing special : Wh !!noo.. this is really special to me !! I've read and re-read many times the: - NFS and ZFS, a fine combination - ZFS Best Practices Guide and other blog without remarking such idea ! I even notice the opposite recommendation from: -ZFS Best Practices Guide ZFS Storage Pools Recommendations -http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_Storage_Pools_Recommendations where I read : - For production systems, consider using whole disks for storage pools rather than slices for the following reasons: + Allow ZFS to enable the disk's write cache, for those disks that have write caches and from: -NFS and ZFS, a fine combination Comparison with UFS -http://blogs.sun.com/roch/#zfs_to_ufs_performance_comparison where I read : Semantically correct NFS service : nfs/ufs : 17 sec (write cache disable) nfs/zfs : 12 sec (write cache disable,zil_disable=0) nfs/zfs : 7 sec (write cache enable,zil_disable=0) then I can say: that nfs/zfs with write cache enable end zil_enable is --in that case-- faster So why are you recommending me to disable the write cache ? For ZFS, it can work either way. Maybe the above was a typo. -- Cedric BRINER Geneva - Switzerland ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] HowTo: UPS + ZFS NFS + no fsync
Hello, I wonder if the subject of this email is not self-explanetory ? okay let'say that it is not. :) Imagine that I setup a box: - with Solaris - with many HDs (directly attached). - use ZFS as the FS - export the Data with NFS - on an UPS. Then after reading the : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Complex_Storage_Considerations I wonder if there is a way to tell the OS to ignore the fsync flush commands since they are likely to survive a power outage. Ced. -- Cedric BRINER Geneva - Switzerland ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HowTo: UPS + ZFS NFS + no fsync
On 4/26/07, cedric briner [EMAIL PROTECTED] wrote: okay let'say that it is not. :) Imagine that I setup a box: - with Solaris - with many HDs (directly attached). - use ZFS as the FS - export the Data with NFS - on an UPS. Then after reading the : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Complex_Storage_Considerations I wonder if there is a way to tell the OS to ignore the fsync flush commands since they are likely to survive a power outage. Cedric, You do not want to ignore syncs from ZFS if your harddisk is directly attached to the server. As the document mentioned, that is really for Complex Storage with NVRAM where flush is not necessary. -- Just me, Wire ... Blog: prstat.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HowTo: UPS + ZFS NFS + no fsync
You might set zil_disable to 1 (_then_ mount the fs to be shared). But you're still exposed to OS crashes; those would still corrupt your nfs clients. -r cedric briner writes: Hello, I wonder if the subject of this email is not self-explanetory ? okay let'say that it is not. :) Imagine that I setup a box: - with Solaris - with many HDs (directly attached). - use ZFS as the FS - export the Data with NFS - on an UPS. Then after reading the : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Complex_Storage_Considerations I wonder if there is a way to tell the OS to ignore the fsync flush commands since they are likely to survive a power outage. Ced. -- Cedric BRINER Geneva - Switzerland ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HowTo: UPS + ZFS NFS + no fsync
okay let'say that it is not. :) Imagine that I setup a box: - with Solaris - with many HDs (directly attached). - use ZFS as the FS - export the Data with NFS - on an UPS. Then after reading the : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Complex_Storage_Considerations I wonder if there is a way to tell the OS to ignore the fsync flush commands since they are likely to survive a power outage. Cedric, You do not want to ignore syncs from ZFS if your harddisk is directly attached to the server. As the document mentioned, that is really for Complex Storage with NVRAM where flush is not necessary. This post follows : `XServe Raid Complex Storage Considerations' http://www.opensolaris.org/jive/thread.jspa?threadID=29276tstart=0 Where we have made the assumption (*1) if the XServe Raid is connected to an UPS that we can consider the RAM in the XServe Raid as it was NVRAM. (*1) This assumption is even pointed by Roch : http://blogs.sun.com/roch/#zfs_to_ufs_performance_comparison Intelligent Storage through: `the Shenanigans with ZFS flushing and intelligent arrays...' http://blogs.digitar.com/jjww/?itemid=44 Tell your array to ignore ZFS' flush commands So in this way, when we export it with NFS we get a boost in the BW. Okay, then is there any difference that I do not catch between : - the Shenanigans with ZFS flushing and intelligent arrays... - and my situation I mean, I want to have a cheap and reliable nfs service. Why should I buy expensive `Complex Storage with NVRAM' and not just buying a machine with 8 IDE HD's ? Ced. -- Cedric BRINER Geneva - Switzerland ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HowTo: UPS + ZFS NFS + no fsync
cedric briner wrote: You might set zil_disable to 1 (_then_ mount the fs to be shared). But you're still exposed to OS crashes; those would still corrupt your nfs clients. -r hello Roch, I've few questions 1) from: Shenanigans with ZFS flushing and intelligent arrays... http://blogs.digitar.com/jjww/?itemid=44 I read : Disable the ZIL. The ZIL is the way ZFS maintains _consistency_ until it can get the blocks written to their final place on the disk. This is wrong. The on-disk format is always consistent. The author of this blog is misinformed and is probably getting confused with traditional journalling. That's why the ZIL flushes the cache. The ZIL flushes it's blocks to ensure that if a power failure/panic occurs then the data the system guarantees to be on stable storage (due say to a fsync or O_DSYNC) is actually on stable storage. If you don't have the ZIL and a power outage occurs, your blocks may go poof in your server's RAM...'cause they never made it to the disk Kemosabe. True, but not blocks, rather system call transactions - as this is what the ZIL handles. from : Eric Kustarz's Weblog http://blogs.sun.com/erickustarz/entry/zil_disable I read : Note: disabling the ZIL does _NOT_ compromise filesystem integrity. Disabling the ZIL does NOT cause corruption in ZFS. then : I don't understand: In one they tell that: - we can lose _consistency_ and in the other one they say that : - does not compromise filesystem integrity so .. which one is right ? Eric's, who works on ZFS! 2) from : Eric Kustarz's Weblog http://blogs.sun.com/erickustarz/entry/zil_disable I read: Disabling the ZIL is definitely frowned upon and can cause your applications much confusion. Disabling the ZIL can cause corruption for NFS clients in the case where a reply to the client is done before the server crashes, and the server crashes before the data is commited to stable storage. If you can't live with this, then don't turn off the ZIL. then: The service that we export with zfs NFS is not such things as databases or some really stress full system, but just exporting home. So it feels to me that we can juste disable this ZIL. 3) from: NFS and ZFS, a fine combination http://blogs.sun.com/roch/#zfs_to_ufs_performance_comparison I read: NFS service with risk of corruption of client's side view : nfs/ufs : 7 sec (write cache enable) nfs/zfs : 4.2 sec (write cache enable,zil_disable=1) nfs/zfs : 4.7 sec (write cache disable,zil_disable=1) Semantically correct NFS service : nfs/ufs : 17 sec (write cache disable) nfs/zfs : 12 sec (write cache disable,zil_disable=0) nfs/zfs : 7 sec (write cache enable,zil_disable=0) then : Does this mean that when you just create an UFS FS, and that you just export it with NFS, you are doing an not semantically correct NFS service. And that you have to disable the write cache to have an correct NFS server ??? Yes. UFS requires the write cache to be disabled to maintain consistency. 4) so can we say that people used to have an NFS with risk of corruption of client's side view can just take ZFS and disable the ZIL ? I suppose but we aim to strive for better than expected corruption. We (ZFS) recommend not disabling the ZIL. We also recommend not disabling the disk write cache flushing unless they are backed by nvram or UPS. thanks in advance for your clarifications Ced. P.-S. Does some of you know the best way to send an email containing many questions inside it ? Should I create a thread for each of them, the next time This works. - Good questions. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] HowTo: UPS + ZFS NFS + no fsync
Hello Wee, Thursday, April 26, 2007, 4:21:00 PM, you wrote: WYT On 4/26/07, cedric briner [EMAIL PROTECTED] wrote: okay let'say that it is not. :) Imagine that I setup a box: - with Solaris - with many HDs (directly attached). - use ZFS as the FS - export the Data with NFS - on an UPS. Then after reading the : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Complex_Storage_Considerations I wonder if there is a way to tell the OS to ignore the fsync flush commands since they are likely to survive a power outage. WYT Cedric, WYT You do not want to ignore syncs from ZFS if your harddisk is directly WYT attached to the server. As the document mentioned, that is really for WYT Complex Storage with NVRAM where flush is not necessary. What?? Setting zil_disable=1 has nothing to do with NVRAM in storage arrays. It disables ZIL in ZFS wich means that if application calls fsync() or opens a file with O_DSYNC, etc. then ZFS won't honor it (return immediatelly without commiting to stable storage). Once txg group closes data will be written to disks and SCSI write cache flush commands will be send. Setting zil_disable to 1 is not that bad actually, and if someone doesn't care to lose some last N seconds of data in case of server crash (however zfs itself will be consistent) it can actually speed up nfs operations a lot. btw: people accustomed to Linux in a way have always zil_disable set to 1... :) -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss