Re: [zfs-discuss] memory hog
So does that mean ZFS is not for consumer computer? If ZFS require 4GB of Ram for operation, that means i will need 8GB+ Ram if i were to use Photoshop or any other memory intensive application? And it seems ZFS memory usage scales with the amount of HDD space? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
Edward wrote: So does that mean ZFS is not for consumer computer? Not at all. Consumer computers are plenty powerful enough to use ZFS with. If ZFS require 4GB of Ram for operation, that means i will need 8GB+ Ram if i were to use Photoshop or any other memory intensive application? ZFS doesn't require 4Gb of ram. That's merely a recommendation of the amount you might want installed in your system - a subtle difference :-) And it seems ZFS memory usage scales with the amount of HDD space? I'm not quite sure how to address this, could you re-phrase your question please? You might find this wiki page useful http://www.solarisinternals.com/wiki/index.php/ZFS_Configuration_Guide, along with the others that it points to. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
Edward wrote: So does that mean ZFS is not for consumer computer? If ZFS require 4GB of Ram for operation, that means i will need 8GB+ Ram if i were to use Photoshop or any other memory intensive application? No. It works fine on desktops - I'm writing this on an older Athlon64 with 1GB. Memory pressure does seem to become a bit more of an issue when I'm doing more I/O on the box (which, I'm assuming, is due to the various caches), so for things like compiling, I feel a little cramped. Personally, (in my experience only), I'd say that ZFS works well for use on the desktop, ASSUMING you dedicate 1GB of RAM to solely the OS (and ZFS). For very heavy I/O work, I think at least 2GB is a better idea. So, size your total memory accordingly. And it seems ZFS memory usage scales with the amount of HDD space? I think the more proper thing to say is that ZFS memory usage is relative to the amount of I/O you are doing. Very heavy I/O uses much more RAM. It is not per se connected to total size of the pool. That is, if I've got several TB of disk in a zpool, but I'm doing only 10 op/sec, it will consume much less RAM than if I have a 100GB zpool, but I'm trying to do 1000 ops/sec. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
Erik Trimble wrote: Edward wrote: So does that mean ZFS is not for consumer computer? If ZFS require 4GB of Ram for operation, that means i will need 8GB+ Ram if i were to use Photoshop or any other memory intensive application? No. It works fine on desktops - I'm writing this on an older Athlon64 with 1GB. Memory pressure does seem to become a bit more of an issue when I'm doing more I/O on the box (which, I'm assuming, is due to the various caches), so for things like compiling, I feel a little cramped. Personally, (in my experience only), I'd say that ZFS works well for use on the desktop, ASSUMING you dedicate 1GB of RAM to solely the OS (and ZFS). For very heavy I/O work, I think at least 2GB is a better idea. So, size your total memory accordingly. I've got a Dell Dimension 8400 w/ 2.5gb ram and p4 3.2Ghz processor; I haven't noticed any slow downs either. Memory is so cheap, adding an extra 2gb is only around NZ$100 these days anyway. Matthew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
On Monday 23 June 2008 09:39:13 Kaiwai Gardiner wrote: Erik Trimble wrote: Edward wrote: So does that mean ZFS is not for consumer computer? If ZFS require 4GB of Ram for operation, that means i will need 8GB+ Ram if i were to use Photoshop or any other memory intensive application? No. It works fine on desktops - I'm writing this on an older Athlon64 with 1GB. Memory pressure does seem to become a bit more of an issue when I'm doing more I/O on the box (which, I'm assuming, is due to the various caches), so for things like compiling, I feel a little cramped. Personally, (in my experience only), I'd say that ZFS works well for use on the desktop, ASSUMING you dedicate 1GB of RAM to solely the OS (and ZFS). For very heavy I/O work, I think at least 2GB is a better idea. So, size your total memory accordingly. I've got a Dell Dimension 8400 w/ 2.5gb ram and p4 3.2Ghz processor; I haven't noticed any slow downs either. Memory is so cheap, adding an extra 2gb is only around NZ$100 these days anyway. Matthew this is the kind of reasoning that hides problems rather than correcting them. Sooner or later problems will show up in other - maybe worse - forms ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] swap dump on ZFS volume
Hi folks, I am member of Solaris Install team and I am currently working on making Slim installer compliant with ZFS boot design specification: http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/ After ZFS boot project was integrated into Nevada and support for installation on ZFS root delivered into legacy installer, some differences occurred between how Slim installer implements ZFS root and how it is done in legacy installer. One part is that we need to change in Slim installer is to create swap dump on ZFS volume instead of utilizing UFS slice for this as defined in design spec and implemented in SXCE installer. When reading through the specification and looking at SXCE installer source code, I have realized some points are not quite clear to me. Could I please ask you to help me clarify them in order to follow the right way as far as implementation of that features is concerned ? Thank you very much, Jan [i] Formula for calculating dump swap size I have gone through the specification and found that following formula should be used for calculating default size of swap dump during installation: o size of dump: 1/4 of physical memory o size of swap: max of (512MiB, 1% of rpool size) However, looking at the source code, SXCE installer calculates default sizes using slightly different algorithm: size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) Are there any preferences which one should be used or is there any other possibility we might take into account ? [ii] Procedure of creating dump swap -- Looking at the SXCE source code, I have discovered that following commands should be used for creating swap dump: o swap # /usr/sbin/zfs create -b PAGESIZE -V size_in_mbm rpool/swap # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap o dump # /usr/sbin/zfs create -b 128*1024 -V size_in_mbm rpool/dump # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump Could you please let me know, if my observations are correct or if I should use different approach ? As far as setting of volume block size is concerned (-b option), how that numbers are to be determined ? Will they be the same in different scenarios or are there plans to tune them in some way in future ? [iii] Is there anything else I should be aware of ? --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid card vs zfs
I agree to other comments. From the Day 1 ZFS is fine tuned for JBOD's. While Raid cards are welcome ZFS will perform better with JBOD's. Most of the Raid cards do have limited power and bandwith to support platter speeds of the newer drives. And ZFS code seems to be more intelligent for caching. A few days a ago a customer tested a Sunfire X4500 connected to a network with 4 x 1 Gbit ethernets. X4500 have modest CPU power and do not use any Raid card. The unit easly performaed 400 MB/sec on write from LAN tests which clearly limited by the ethernet ports. Mertol Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bob Friesenhahn Sent: Monday, June 23, 2008 5:33 AM To: kevin williams Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] raid card vs zfs On Sun, 22 Jun 2008, kevin williams wrote: The article says that ZFS eliminates the need for a RAID card and is faster because the striping is running on the main cpu rather than an old chipset on a card. My question is, is this true? Can I Ditto what the other guys said. Since ZFS may generate more I/O traffic from the CPU, you will want an adaptor with lots of I/O ports. SATA/SAS with a port per drive is ideal. It is useful to have a NVRAM cache on the card if you will be serving NFS or running a database, although some vendors sell this NVRAM cache as a card which plugs into the backplane and uses a special driver. ZFS is memory-hungry so 4GB of RAM is a good starting point for a server. Make sure that your CPU and OS are able to run a 64-bit kernel. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
No, ZFS loves memory and unlike most other FS's around it can make good use of memory. But ZFS will free memory if it recognizes that other apps require memory or you can limit the cache ARC will be using. To my experiance ZFS still performs nicely on 1 GB boxes. PS: How much 4 GB Ram costs for a desktop ? Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward Sent: Monday, June 23, 2008 9:32 AM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] memory hog So does that mean ZFS is not for consumer computer? If ZFS require 4GB of Ram for operation, that means i will need 8GB+ Ram if i were to use Photoshop or any other memory intensive application? And it seems ZFS memory usage scales with the amount of HDD space? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [SOLVED] Confusion with snapshot send-receive
James C. McPherson wrote: Andrius wrote: Boyd Adamson wrote: Andrius [EMAIL PROTECTED] writes: Hi, there is a small confusion with send receive. zfs andrius/sounds was snapshoted @421 and should be copied to new zpool beta that on external USB disk. After /usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs recv beta or usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs recv beta/sounds answer come ssh: host1: node name or service name not known What has been done bad? Your machine cannot resolve the name host1 into an IP address. This is a network configuration problem, not a zfs problem. You should find that ssh host1 fails too. Second pool is in the same machine. What to write instead of host2? try /usr/sbin/zfs send andrius/[EMAIL PROTECTED] | /usr/sbin/zfs recv beta/sounds You only need to pipe the zfs send output through ssh if you're actually sending it to a different system. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcphttp://www.jmcp.homeunix.com/blog Thanks, it works. Just strange, why s your simplified sample are hot in ZFS Administration Guide. Somebody wanted to complicate things. -- Regards, Andrius Burlega begin:vcard fn:Andrius Burlega n:Burlega;Andrius email;internet:[EMAIL PROTECTED] tel;cell:+353876301575 x-mozilla-html:FALSE version:2.1 end:vcard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs mirror broken?
I am running zfs 3 on SunOS zen 5.10 Generic_118855-33 i86pc i386 i86pc What is baffling is that the disk did come online and appear as healthy, but zpool showed the fs inconsistency. As Miles said, after the disk came back the resilver did not resume. The only additions i have to the sequence shown are: 1) i am absolutely sure there were no disk writes in the interim since the non-global zones which use these fses were halted during the operation 2) The first time i unplugged the disk, i upgraded to a larger disk so i still have that original disk intact 3) i was afraid that zfs might resilver backwards, ie from the 22% image back to the original copy. I therefore pulled the new disk out again. Current status: # zpool status pool: external state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed with 0 errors on Sat Jun 21 07:42:03 2008 config: NAME STATE READ WRITE CKSUM external ONLINE 26.57 114 0 c12t0d0p0ONLINE 4 114 0 mirror ONLINE 26.57 0 0 c13t0d0p0 ONLINE 55.25 4.48K 0 c16t0d0p0 ONLINE 0 0 53.14 Can i be sure that the unrecoverable error found is on the failed mirror? I was thinking of the following ways forward. Any comments most welcome: 1) run a scrub. I am thinking that kicking this off might actually corrupt data in the second vdev, so maybe starting off with 2 might be better idea... 2) physically replace disk1 with ORIGINAL disk2 and attempt a scrub justin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Miles Nordin Sent: 21 June 2008 02:46 To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] zfs mirror broken? jb == Jeff Bonwick [EMAIL PROTECTED] writes: jb If you say 'zpool online pool disk' that should tell ZFS jb that the disk is healthy again and automatically kick off a jb resilver. jb Of course, that should have happened automatically. with b71 I find that it does sometimes happen automatically, but the resilver isn't enough to avoid checksum errors later. Only a manually-requested scrub will stop any more checksum errors from accumulating. Also, if I reboot before one of these auto-resilvers finishes, or plug in the component that flapped while powered down, the auto-resilver never resumes. While one vdev was resilvering at 22% (HD replacement), the original disk went away so if I understand you, it happened like this: #1#2 online online t online UNPLUG i online UNPLUG-- filesystem writes m online UNPLUG-- filesystem writes e online online | online resilver - online v UNPLUGxxx online-- fs reads allowed? how? online onlinewhy no resilvering? It seems to me like DTRT after #1 is unplugged is to take the whole pool UNAVAIL until the original disk #1 comes back. When the original disk #1 drops off, the only available component left is the #2 component that flapped earlier and is being resilvered, so #2 is out-of-date and should be ignored. but I'm pretty sure ZFS doesn't work that way, right? What does it do? Will it serve incorrect, old data? Will it somehow return I/O errors for data that has changed on #1 and not been resilvered onto #2 yet? smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid card vs zfs
On 6/23/08 6:22 AM, Mertol Ozyoney [EMAIL PROTECTED] wrote: A few days a ago a customer tested a Sunfire X4500 connected to a network with 4 x 1 Gbit ethernets. X4500 have modest CPU power and do not use any Raid card. The unit easly performaed 400 MB/sec on write from LAN tests which clearly limited by the ethernet ports. Mertol This is what we are seeing with our X4500. Clearly, the four Ethernet channels are our limiting factor. We put 10Gbps Ethernet on the unit, but as this is currently the only 10-gig host on our network (waiting for Vmware drivers to support the X6250 cards we bought), I can't really test that fully. We're using this as a NFS/Samba server, so JBOD with ZFS is fast enough. I'm waiting for COMSTAR and ADM to really take advantage of the Thumper platform. The complete storage stack that Sun and the OpenSolaris project have envisioned will make such commodity hardware useful pieces of our solution. I love our EMC/Brocade/HP SAN gear, but it's just too expensive to scale (particularly when it comes to total data management). Charles ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
On 6/23/08 6:24 AM, Mertol Ozyoney [EMAIL PROTECTED] wrote: No, ZFS loves memory and unlike most other FS's around it can make good use of memory. But ZFS will free memory if it recognizes that other apps require memory or you can limit the cache ARC will be using. This is an important distinction. There are many examples of software which does not utilize the resources we make available. I'm happy with code that takes advantage of these additional resources to improve performance. Otherwise, it becomes difficult to make cost/benefit decisions. I need more performance. It's worth $x to get that. To my experiance ZFS still performs nicely on 1 GB boxes. This is probably fine for the typical consumer usage pattern. PS: How much 4 GB Ram costs for a desktop ? I just bought 2GB DIMMs for $40. IIRC, they were Kingston, so not a no-name brand. Charles ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] swap dump on ZFS volume
Hi Jan, comments below... jan damborsky wrote: Hi folks, I am member of Solaris Install team and I am currently working on making Slim installer compliant with ZFS boot design specification: http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/ After ZFS boot project was integrated into Nevada and support for installation on ZFS root delivered into legacy installer, some differences occurred between how Slim installer implements ZFS root and how it is done in legacy installer. One part is that we need to change in Slim installer is to create swap dump on ZFS volume instead of utilizing UFS slice for this as defined in design spec and implemented in SXCE installer. When reading through the specification and looking at SXCE installer source code, I have realized some points are not quite clear to me. Could I please ask you to help me clarify them in order to follow the right way as far as implementation of that features is concerned ? Thank you very much, Jan [i] Formula for calculating dump swap size I have gone through the specification and found that following formula should be used for calculating default size of swap dump during installation: o size of dump: 1/4 of physical memory This is a non-starter for systems with 1-4 TBytes of physical memory. There must be a reasonable maximum cap, most likely based on the size of the pool, given that we regularly boot large systems from modest-sized disks. o size of swap: max of (512MiB, 1% of rpool size) However, looking at the source code, SXCE installer calculates default sizes using slightly different algorithm: size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) Are there any preferences which one should be used or is there any other possibility we might take into account ? zero would make me happy :-) But there are some cases where swap space is preferred. Again, there needs to be a reasonable cap. In general, the larger the system, the less use for swap during normal operations, so for most cases there is no need for really large swap volumes. These can also be adjusted later, so the default can be modest. One day perhaps it will be fully self-adjusting like it is with other UNIX[-like] implementations. [ii] Procedure of creating dump swap -- Looking at the SXCE source code, I have discovered that following commands should be used for creating swap dump: o swap # /usr/sbin/zfs create -b PAGESIZE -V size_in_mbm rpool/swap # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap o dump # /usr/sbin/zfs create -b 128*1024 -V size_in_mbm rpool/dump # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump Could you please let me know, if my observations are correct or if I should use different approach ? As far as setting of volume block size is concerned (-b option), how that numbers are to be determined ? Will they be the same in different scenarios or are there plans to tune them in some way in future ? Setting the swap blocksize to pagesize is interesting, but should be ok for most cases. The reason I say it is interesting is because it is optimized for small systems, but not for larger systems which typically see more use of large page sizes. OTOH larger systems should not swap, so it is probably a non-issue for them. Small systems should see this as the best solution. Dump just sets the blocksize to the default, so it is a no-op. -- richard [iii] Is there anything else I should be aware of ? --- Installation should *not* fail due to running out of space because of large dump or swap allocations. I think the algorithm should first take into account the space available in the pool after accounting for the OS. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Oracle and ZFS
Hi All ; One of our customer is suffered from FS being corrupted after an unattanded shutdonw due to power problem. They want to switch to ZFS. From what I read on, ZFS will most probably not be corrupted from the same event. But I am not sure how will Oracle be affected from a sudden power outage when placed over ZFS ? Any comments ? PS: I am aware of UPS's and smilar technologies but customer is still asking those if ... questions ... Mertol http://www.sun.com/ http://www.sun.com/emrkt/sigs/6g_top.gif Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] attachment: image001.gif___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle and ZFS
From my usage, the first question you should ask your customer is how much of a performance hit they can spare when switching to ZFS for Oracle. I've done lots of tweaking (following threads I've read on the mailing list), but I still can't seem to get enough performance out of any databases on ZFS. I've tried using zvols, cooked files on top of ZFS filesystems, everything, but either raw disk devices via the old style DiskSuite tools or cooked files on top of the same are far more performant than anything on ZFS. Your mileage may vary, but so far, that's where I stand. As for the corrupted filesystem, ZFS is much better, but there are still no guarantees that your filesystem won't be corrupted during a hard shutdown. The CoW and checksumming gives you a much lower incidence of corruption, but the customer still needs to be made aware that things like battery backed controllers, managed UPS, redundant power supplies, and the like are the first thing they need to put into place - not the last. On Mon, Jun 23, 2008 at 11:56 AM, Mertol Ozyoney [EMAIL PROTECTED] wrote: Hi All ; One of our customer is suffered from FS being corrupted after an unattanded shutdonw due to power problem. They want to switch to ZFS. From what I read on, ZFS will most probably not be corrupted from the same event. But I am not sure how will Oracle be affected from a sudden power outage when placed over ZFS ? Any comments ? PS: I am aware of UPS's and smilar technologies but customer is still asking those if ... questions ... Mertol [image: http://www.sun.com/emrkt/sigs/6g_top.gif] http://www.sun.com/ *Mertol Ozyoney * Storage Practice - Sales Manager *Sun Microsystems, TR* Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email [EMAIL PROTECTED] [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- chris -at- microcozm -dot- net === Si Hoc Legere Scis Nimium Eruditionis Habes image001.gif___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
Yes you are all correct. Ram cost nothing today, even though it might be bouncing back to their normal margin. DDR2 Ram are relatively cheap. Not to mention DDR3 will bring us double or more memory capacity. Most people could afford 4GB Ram on their Desktop today. With 8GB Ram for Prosumers. At todays price i reckon ALL systems, even entry level should have 2GB Ram Standard. But the sad thing is Windows XP / Vista is still 32Bit. It doesn't recognize more then 3.x GB of Ram. 64Bit version is still premature and hardly OEM are adopting it. Hardware makers have yet to full jump on broad for 64 bit drivers. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume
Richard Elling wrote: Hi Jan, comments below... jan damborsky wrote: Hi folks, I am member of Solaris Install team and I am currently working on making Slim installer compliant with ZFS boot design specification: http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/ After ZFS boot project was integrated into Nevada and support for installation on ZFS root delivered into legacy installer, some differences occurred between how Slim installer implements ZFS root and how it is done in legacy installer. One part is that we need to change in Slim installer is to create swap dump on ZFS volume instead of utilizing UFS slice for this as defined in design spec and implemented in SXCE installer. When reading through the specification and looking at SXCE installer source code, I have realized some points are not quite clear to me. Could I please ask you to help me clarify them in order to follow the right way as far as implementation of that features is concerned ? Thank you very much, Jan [i] Formula for calculating dump swap size I have gone through the specification and found that following formula should be used for calculating default size of swap dump during installation: o size of dump: 1/4 of physical memory This is a non-starter for systems with 1-4 TBytes of physical memory. There must be a reasonable maximum cap, most likely based on the size of the pool, given that we regularly boot large systems from modest-sized disks. Actually, starting with build 90, the legacy installer sets the default size of the swap and dump zvols to half the size of physical memory, but no more then 32 GB and no less than 512 MB. Those are just the defaults. Administrators can use the zfs command to modify the volsize property of both the swap and dump zvols (to any value, including values larger than 32 GB). o size of swap: max of (512MiB, 1% of rpool size) However, looking at the source code, SXCE installer calculates default sizes using slightly different algorithm: size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) Are there any preferences which one should be used or is there any other possibility we might take into account ? zero would make me happy :-) But there are some cases where swap space is preferred. Again, there needs to be a reasonable cap. In general, the larger the system, the less use for swap during normal operations, so for most cases there is no need for really large swap volumes. These can also be adjusted later, so the default can be modest. One day perhaps it will be fully self-adjusting like it is with other UNIX[-like] implementations. [ii] Procedure of creating dump swap -- Looking at the SXCE source code, I have discovered that following commands should be used for creating swap dump: o swap # /usr/sbin/zfs create -b PAGESIZE -V size_in_mbm rpool/swap # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap o dump # /usr/sbin/zfs create -b 128*1024 -V size_in_mbm rpool/dump # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump The above commands for creating the swap and dump zvols match what the legacy installer does, as of build 90. Could you please let me know, if my observations are correct or if I should use different approach ? As far as setting of volume block size is concerned (-b option), how that numbers are to be determined ? Will they be the same in different scenarios or are there plans to tune them in some way in future ? There are no plans to tune this. The block sizes are appropriate for the way the zvols are to be used. Setting the swap blocksize to pagesize is interesting, but should be ok for most cases. The reason I say it is interesting is because it is optimized for small systems, but not for larger systems which typically see more use of large page sizes. OTOH larger systems should not swap, so it is probably a non-issue for them. Small systems should see this as the best solution. Dump just sets the blocksize to the default, so it is a no-op. -- richard [iii] Is there anything else I should be aware of ? --- Installation should *not* fail due to running out of space because of large dump or swap allocations. I think the algorithm should first take into account the space available in the pool after accounting for the OS. The Caiman team can make their own decision here, but we decided to be more hard-nosed about disk space requirements in the legacy install. If the pool is too small to accommodate the recommended swap and dump zvols, then maybe this system isn't a good candidate for a zfs root pool. Basically, we decided that since you almost can't buy disks smaller than 60 GB these days, it's not worth much effort to facilitate the setup of zfs root pools on disks that are smaller than
Re: [zfs-discuss] memory hog
On Mon, Jun 23, 2008 at 11:18 AM, Edward [EMAIL PROTECTED] wrote: Yes you are all correct. Ram cost nothing today, even though it might be bouncing back to their normal margin. DDR2 Ram are relatively cheap. Not to mention DDR3 will bring us double or more memory capacity. Not likely. Their *normal margins* were because of their collusion. The anti-trust lawsuit, and subsequent multi-billion dollar settlement assured we won't be seeing that again anytime soon. Most people could afford 4GB Ram on their Desktop today. With 8GB Ram for Prosumers. At todays price i reckon ALL systems, even entry level should have 2GB Ram Standard. And most vista systems do. OEM's slowly learned their lesson. But the sad thing is Windows XP / Vista is still 32Bit. It doesn't recognize more then 3.x GB of Ram. 64Bit version is still premature and hardly OEM are adopting it. Hardware makers have yet to full jump on broad for 64 bit drivers. false, both of them recognize well in excess of 4GB of ram. What they CAN'T do is address it for *ONE* process. That's why applications like oracle were quick to hop on the 64bit bandwagon, they actually need it. I don't know of too many consumer level apps besides maybe photoshop (and firefox ;) ) that come anywhere near 4GB ram usage. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle and ZFS
Mertol Ozyoney wrote: Hi All ; One of our customer is suffered from FS being corrupted after an unattanded shutdonw due to power problem. They want to switch to ZFS. From what I read on, ZFS will most probably not be corrupted from the same event. But I am not sure how will Oracle be affected from a sudden power outage when placed over ZFS ? Any comments ? Most databases have the ability to recover from unscheduled interruptions without causing corruption. ZFS works in the same way -- you will recover to a stable point in time. In-flight transactions will not be completed, as expected. Upon restart, ZFS recovery will happen first, followed by the database recovery. PS: I am aware of UPS’s and smilar technologies but customer is still asking those if ... questions ... UPS's fail, too. When we design highly available services, we will expect that unscheduled interruptions will occur -- that is the only way to design effective solutions. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
Mike Gerdts wrote: On Wed, Jun 4, 2008 at 11:18 PM, Rich Teer [EMAIL PROTECTED] wrote: Why would one do that? Just keep an eye on the root pool and all is good. The only good argument I have for separating out some of /var is for boot environment management. I grew tired of repeating my arguments and suggestions and wrote a blog entry. http://mgerdts.blogspot.com/2008/03/future-of-opensolaris-boot-environment.html Sorry it's taken me so long to weigh in on this. The reason that the install program permits the user to set up a separate /var dataset is because some production environments require it. More exactly, some production environments require that /var have its own slice so that unrestrained growth in /var can't fill up the root file system. (I have no idea whether this is actually a good or sufficient policy. I just know that some customer environments have such a policy.) With zfs, we don't actually have to put /var in its own slice. We can achieve the same goal by putting it in its own dataset and assigning a quota to that dataset. That's really the only reason we offered this option. Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle and ZFS
mo == Mertol Ozyoney [EMAIL PROTECTED] writes: mo One of our customer is suffered from FS being corrupted after mo an unattanded shutdonw due to power problem. mo They want to switch to ZFS. mo From what I read on, ZFS will most probably not be corrupted mo from the same event. It's not supposed to happen with UFS, either. nor XFS, JFS, ext3, reiserfs, FFS+softdep, plain FFS, mac-HFS+journal. All filesystems in popular use for many years except maybe NTFS are supposed to obey fsync and survive kernel crashes and unplanned power outage that happens after fsync returns, without losing any data written before fsync was called. The fact that they don't in practice is a warning that ZFS might not, either, no matter what it promises in theory. I think many cheap PeeCee RAID setups without batteries suffer from ``the RAID5 write hole'' which takes away all the guarantees of no-power-fail-corruption that the filesystems made, and these broken no-battery setups seem to be really popular. If one used ZFS on top of such a no-battery RAID instead of switching it to JBOD mode, ZFS would be vulnerable, too. One interesting part of ZFS's ``in theory'' pitch is that, if you use redundancy with ZFS, the checksums may somewhat address this problem described below: http://linuxmafia.com/faq/Filesystems/reiserfs.html -8- You see, when you yank the power cord out of the wall, not all parts of the computer stop functioning at the same time. As the voltage starts dropping on the +5 and +12 volt rails, certain parts of the system may last longer than other parts. For example, the DMA controller, hard drive controller, and hard drive unit may continue functioning for several hundred of milliseconds, long after the DIMMs, which are very voltage sensitive, have gone crazy, and are returning total random garbage. If this happens while the filesystem is writing critical sections of the filesystem metadata, well, you get to visit the fun Web pages at http://You.Lose.Hard/ . I was actually told about this by an XFS engineer, who discovered this about the hardware. Their solution was to add a power-fail interrupt and bigger capacitors in the power supplies in SGI hardware; and, in Irix, when the power-fail interrupt triggers, the first thing the OS does is to run around frantically aborting I/O transfers to the disk. Unfortunately, PC-class hardware doesn't have power-fail interrupts. Remember, PC-class hardware is cr*p. -8- I would suspect a ZFS mirror might have a better shot of coming through that type of crazy power failure, but I don't know how anything can be robust to a mysterious force that scribbles randomly all over the disk. On the downside there are some things I thought I understood about SVM's ideas of quorum that I do not yet understand in the ZFS world. also...FTR I use his ext3 rather than XFS myself, but I'm a little skeptical of Ted Ts'o ranting above because he is defending a shortcut he took writing his own filesystem. And I'm not sure the cord-pulling problem he describes is really universal, and is really a reason for XFS-users losing data that ext3-users don't---it sounds like it could be a specific-quirk type problem, a blip in history just like ``the 5-volt rail'' he talks about (+5V? what did they used to run on 5 volts, a disk motor or a battery charger or something?). The SGI engineers had the problem on their specific hardware, and solved it, but it may or may not exist on present machines. Maybe current hardware has other equally weird problems when one pulls the power cord. -- READ CAREFULLY. By reading this fortune, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (BOGUS AGREEMENTS) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. pgpQRPlS0ZITA.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle and ZFS
On Jun 23, 2008, at 11:36 AM, Miles Nordin wrote: unplanned power outage that happens after fsync returns Aye, but isn't that the real rub ... when the power fails after the write but *before* the fsync has occurred... -- Keith H. Bierman [EMAIL PROTECTED] | AIM kbiermank 5430 Nassau Circle East | Cherry Hills Village, CO 80113 | 303-997-2749 speaking for myself* Copyright 2008 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle and ZFS
Miles Nordin wrote: mo == Mertol Ozyoney [EMAIL PROTECTED] writes: mo One of our customer is suffered from FS being corrupted after mo an unattanded shutdonw due to power problem. mo They want to switch to ZFS. mo From what I read on, ZFS will most probably not be corrupted mo from the same event. It's not supposed to happen with UFS, either. nor XFS, JFS, ext3, reiserfs, FFS+softdep, plain FFS, mac-HFS+journal. All filesystems in popular use for many years except maybe NTFS are supposed to obey fsync and survive kernel crashes and unplanned power outage that happens after fsync returns, without losing any data written before fsync was called. The fact that they don't in practice is a warning that ZFS might not, either, no matter what it promises in theory. There is a more common failure mode at work here. Most low-cost disks have their volatile write cache enabled. UFS knows nothing of such caches and believes the disk has committed data when it acks. In other words, even with O_DSYNC and friends doing the right thing in the OS, the disk lies about the persistence of the data. ZFS knows disks lie, so it sends sync commands when necessary to help ensure that the data is flushed to persistent storage. But even if it is not flushed, the ZFS on-disk format is such that you can recover to a point in time where the file system is consistent. This is not the case for UFS which was designed to trust the hardware. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
On 6/23/08 11:59 AM, Tim [EMAIL PROTECTED] wrote: On Mon, Jun 23, 2008 at 11:18 AM, Edward [EMAIL PROTECTED] wrote: But the sad thing is Windows XP / Vista is still 32Bit. It doesn't recognize more then 3.x GB of Ram. 64Bit version is still premature and hardly OEM are adopting it. Hardware makers have yet to full jump on broad for 64 bit drivers. false, both of them recognize well in excess of 4GB of ram. What they CAN'T do is address it for *ONE* process. That's why applications like oracle were quick to hop on the 64bit bandwagon, they actually need it. I don't know of too many consumer level apps besides maybe photoshop (and firefox ;) ) that come anywhere near 4GB ram usage. While Edward is technically incorrect, the ceiling is still 4GB total physical memory: http://msdn.microsoft.com/en-us/library/aa366778.aspx Note that even though A 25% higher RAM ceiling is one thing, but it's a far cry from the 64-128GB the enterprise target Windows versions can use (yes, some of them are 32-bit but if you pay the extra $, you are allowed to use more RAM). The 3GB per-process limit is the real factor. But then again, who runs Oracle on Windows? :) Charles (ok, I have, but only for testing) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
On Mon, Jun 23, 2008 at 1:26 PM, Charles Soto [EMAIL PROTECTED] wrote: On 6/23/08 11:59 AM, Tim [EMAIL PROTECTED] wrote: On Mon, Jun 23, 2008 at 11:18 AM, Edward [EMAIL PROTECTED] wrote: But the sad thing is Windows XP / Vista is still 32Bit. It doesn't recognize more then 3.x GB of Ram. 64Bit version is still premature and hardly OEM are adopting it. Hardware makers have yet to full jump on broad for 64 bit drivers. false, both of them recognize well in excess of 4GB of ram. What they CAN'T do is address it for *ONE* process. That's why applications like oracle were quick to hop on the 64bit bandwagon, they actually need it. I don't know of too many consumer level apps besides maybe photoshop (and firefox ;) ) that come anywhere near 4GB ram usage. While Edward is technically incorrect, the ceiling is still 4GB total physical memory: http://msdn.microsoft.com/en-us/library/aa366778.aspx Note that even though A 25% higher RAM ceiling is one thing, but it's a far cry from the 64-128GB the enterprise target Windows versions can use (yes, some of them are 32-bit but if you pay the extra $, you are allowed to use more RAM). The 3GB per-process limit is the real factor. But then again, who runs Oracle on Windows? :) Charles (ok, I have, but only for testing) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Read the fine print: Limits on physical memory for 32-bit platforms also depend on the Physical Address Extensionhttp://msdn.microsoft.com/en-us/library/aa366796%28VS.85%29.aspx(PAE), which allows 32-bit Windows systems to use more than 4 GB of physical memory. PAE is enabled by default on XP after SP1, and all builds of vista. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
On Mon, Jun 23, 2008 at 03:16:45PM -0400, Brian H. Nelson wrote: Limits on physical memory for 32-bit platforms also depend on the Physical Address Extension http://msdn.microsoft.com/en-us/library/aa366796%28VS.85%29.aspx (PAE), which allows 32-bit Windows systems to use more than 4 GB of physical memory. PAE is enabled by default on XP after SP1, and all builds of vista. Read the regular-sized print in the XP and Vista tables: Under Windows, the 4GB limit is a LICENSING limit, not a problem of addressability, PAE or otherwise. The 4GB limit is also in place for 32-bit Windows Server Standard editions. If you want to be able to use more memory, you need to pay more money (as Charles already stated). Regardless of licensing issues, PAE is an ugly hack and shouldn't be used it at all possible. ;) -brian -- Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you'll end up with a cupboard full of pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
Brian Hechinger wrote: On Mon, Jun 23, 2008 at 03:16:45PM -0400, Brian H. Nelson wrote: Limits on physical memory for 32-bit platforms also depend on the Physical Address Extension http://msdn.microsoft.com/en-us/library/aa366796%28VS.85%29.aspx (PAE), which allows 32-bit Windows systems to use more than 4 GB of physical memory. PAE is enabled by default on XP after SP1, and all builds of vista. Read the regular-sized print in the XP and Vista tables: Under Windows, the 4GB limit is a LICENSING limit, not a problem of addressability, PAE or otherwise. The 4GB limit is also in place for 32-bit Windows Server Standard editions. If you want to be able to use more memory, you need to pay more money (as Charles already stated). Regardless of licensing issues, PAE is an ugly hack and shouldn't be used it at all possible. ;) -brian But, but, but, PAE works so nice on my Solaris 8 x86 boxes for massive /tmp. :-) To be even more pedantic about XP, here's the FINAL word from microsoft about the PAE and 2+ GB RAM support: http://msdn.microsoft.com/en-us/library/ms791485.aspx http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx Bottom line: Windows XP (any SP) supports a MAXIMUM of 4GB of ram, regardless of the various switches. This is a CODE limit, not a license limit. While there are a bunch of APIs which are nominally available under XP for use of 4+GB address spaces, the OS kernel itself it limited to 4GB of physical RAM. Back on topic: the one thing I haven't tried out is ZFS on a 32-bit-only system with PAE, and more than 4GB of RAM. Anyone? -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [SOLVED] Confusion with snapshot send-receive
I modified the ZFS Admin Guide to show a simple zfs send | zfs recv example, then a more complex example using ssh to another system. Thanks for the feedback... Cindy Andrius wrote: James C. McPherson wrote: Andrius wrote: Boyd Adamson wrote: Andrius [EMAIL PROTECTED] writes: Hi, there is a small confusion with send receive. zfs andrius/sounds was snapshoted @421 and should be copied to new zpool beta that on external USB disk. After /usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs recv beta or usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs recv beta/sounds answer come ssh: host1: node name or service name not known What has been done bad? Your machine cannot resolve the name host1 into an IP address. This is a network configuration problem, not a zfs problem. You should find that ssh host1 fails too. Second pool is in the same machine. What to write instead of host2? try /usr/sbin/zfs send andrius/[EMAIL PROTECTED] | /usr/sbin/zfs recv beta/sounds You only need to pipe the zfs send output through ssh if you're actually sending it to a different system. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcphttp://www.jmcp.homeunix.com/blog Thanks, it works. Just strange, why s your simplified sample are hot in ZFS Administration Guide. Somebody wanted to complicate things. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored
Ralf Bertling wrote: Hi list, as this matter pops up every now and then in posts on this list I just want to clarify that the real performance of RaidZ (in its current implementation) is NOT anything that follows from raidz-style data efficient redundancy or the copy-on-write design used in ZFS. In a M-Way mirrored setup of N disks you get the write performance of the worst disk and a read performance that is the sum of all disks (for streaming and random workloads, while latency is not improved) Apart from the write performance you get very bad disk utilization from that scenario. I beg to differ with very bad disk utilization. IMHO you get perfect disk utilization for M-way redundancy :-) In Raid-Z currently we have to distinguish random reads from streaming reads: - Write performance (with COW) is (N-M)*worst single disk write performance since all writes are streaming writes by design of ZFS (which is N-M-1 times faste than mirrored) - Streaming read performance is N*worst read performance of a single disk (which is identical to mirrored if all disks have the same speed) - The problem with the current implementation is that N-M disks in a vdev are currently taking part in reading a single byte from a it, which i turn results in the slowest performance of N-M disks in question. You will not be able to predict real-world write or sequential read performance with this simple analysis because there are many caches involved. The caching effects will dominate for many cases. ZFS actually works well with write caches, so it will be doubly difficult to predict write performance. You can predict small, random read workload performance, though, because you can largely discount the caching effects for most scenarios, especially JBODs. Now lets see if this really has to be this way (this implies no, doesn't it ;-) When reading small blocks of data (as opposed to streams discussed earlier) the requested data resides on a single disk and thus reading it does not require to send read commands to all disks in the vdev. Without detailed knowledge of the ZFS code, I suspect the problem is the logical block size of any ZFS operation always uses the full stripe. If true, I think this is a design error. No, the reason is that the block is checksummed and we check for errors upon read by verifying the checksum. If you search the zfs-discuss archives you will find this topic arises every 6 months or so. Here is a more interesting thread on the subject, dated November 2006: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-November/035711.html You will also note that for fixed record length workloads, we tend to recommend the blocksize be matched with the ZFS recordsize. This will improve efficiency for reads, in general. Without that, random reads to a raid-z are almost as fast as mirrored data. The theoretical disadvantages come from disks that have different speed (probably insignificant in any real-life scenario) and the statistical probability that by chance a few particular random reads do in fact have to access the same disk drive to be fulfilled. (In a mirrored setup, ZFS can choose from all idle devices, whereas in RAID-Z it has to wait for the disk that holds the data to be ready processing its current requests). Looking more closely, this effect mostly affects latency (not performance) as random read-requests coming in should be distributed equally across all devices even bette if the queue of requests gets longer (this would however require ZFS to reorder requests for maximum performance. ZFS does re-order I/O. Array controllers re-order the re-ordered I/O. Disks then re-order I/O, just to make sure it was re-ordered again. So it is also difficult to develop meaningful models of disk performance in these complex systems. Since this seems to be a real issue for many ZFS users, it would be nice if someone who has more time than me to look into the code, can comment on the amount of work required to boost RaidZ read performance. Periodically, someone offers to do this... but I haven't seen an implementation. Doing so would level the tradeoff between read- write- performance and disk utilization significantly. Obviously if disk space (and resulting electricity costs) do not matter compared to getting maximum read performance, you will always be best of with 3 or even more way mirrors and a very large number of vdevs in your pool. Space, performance, reliability: pick two. sidebar The ZFS checksum has proven to be very effective at identifying data corruption in systems. In a traditional RAID-5 implementation, like SVM, the data is assumed to be correct if the read operation returned without an error. If you try to make SVM more reliable by adding a checksum, then you will end up at approximately the same place ZFS is: by distrusting the hardware you take a performance penalty, but improve your data
Re: [zfs-discuss] ZFS root finally here in SNV90
On Mon, Jun 23, 2008 at 4:04 PM, Orvar Korvar [EMAIL PROTECTED] wrote: Wouldnt it be nice to break out all file systems in separate zfs file systems? Then you could snapshot each file system individually. Just like each user has his own filesystem, and I can snapshot that filesystem independently from other users. Because of now, if I do a snapshot of /, then everything gets snapshotted, even /var which changes a lot. I dont want to snapshot /var. I only want to snapshot /usr. Some things in /var are likely appropriate to snapshot with /. For example, /var/sadm has lots of information about which packages and patches are installed. There is a lot of other stuff that shouldn't be snapshotted with it. I have proposed /var/share to cope with this. http://mgerdts.blogspot.com/2008/03/future-of-opensolaris-boot-environment.html -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2
Yeah, that's something I'd love to see. CIFS isn't quite there yet, but it's miles ahead of Samba, and as soon as it is ready we'll want to be rolling it out under Sun Cluster. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2
Marcelo Leal wrote: Thanks all for the answers! Seems like the solution to have a opensolaris storage solution is the CIFS project. And there is no agent to provide HA, so seems like a good project too. Currently, the HA-NFS service requires that you disable the sharenfs property. http://docs.sun.com/app/docs/doc/820-2565/geaov?a=view I'm not sure of the reasoning here, except that the NFS agent monitor currently reads dfstab for configuration information. ZFS offers a different approach. For the alias, Solaris Cluster will know that for a HA-NFS implementation, there will be some devices containing a file system which must be mounted prior to starting the NFS service. This ballet is scheduled based on the configuration of the cluster and its services including IP addresses, storage affinity, etc. With ZFS sharenfs, shareiscsi, and scharesmb, some of the ballet steps are combined with zpool import. IMHO, it would be worthwhile to investigate how to leverage and adjust the NFS and Samba agents to also understand how ZFS works and do the right thing. Adding iSCSI should be a trivial addition. Might be a good project... -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle and ZFS
re == Richard Elling [EMAIL PROTECTED] writes: kb == Keith Bierman [EMAIL PROTECTED] writes: re the disk lies about the persistence of the data. ZFS knows re disks lie, so it sends sync commands when necessary (1) i don't think ``lie'' is a correct characerization given that the sync commands exist, but point taken about the other area of risk. I suspect there may be similar problems in ZFS's write path when one is using iSCSI targets. Or it's just common for iSCSI target implementations to suck (lie). or maybe it's something else I'm seeing. (2) i thought the recommendation that one give ZFS whole disks and let it put EFI labels on them came from the Solaris behavior that, only in a whole-disk-for-zfs configuration, will the Solaris drivers refrain from explicitly disabling the write cache in these inexpensive disks. so the cache shouldn't be a problem for UFS, but it might be for non-Solaris operating systems (even for ZFS on platforms where ZFS is ported but the SYNCHRONIZE CACHE commands don't make it through some mid-layer or CAM or driver). kb Aye, but isn't that the real rub ... when the power fails kb after the write but *before* the fsync has occurred... no, there is no rub here, I was only speaking precisely. A proper DBMS (anything except MySQL) is also designed to understand that power failures happen. It does its writes in a deliberate order such that it won't return success to the application calling it until it gets the return from fsync(), and also so that the system will never recover such that a transaction is half-completed. re the ZFS on-disk format is such that you can recover to a point re in time where the file system is consistent. do you mean taht, ``after a power outage ZFS will always recover the filesystem to a state that it passed through in the moments leading up to the outage,'' while UFS, which logs only metadata, typically recovers to some state the filesystem never passed through---but it never loses fsync()ed data nor data that wasn't written ``recently'' before the crash? For casual filesystem use, or for applications that weren't designed with cord-pulling in mind, ZFS's guarantee is larger and more comforting. But for databases, I don't think the distinction matters because they call fsync() at deliberate moments and do their own copy-on-write logging above the filesystem, so they provide the same consistency guarantees whether operating on UFS or ZFS. It would be fine to feed a database the type of hacked non-CoW zvol that's used for swap, if fsync could be made to work there, which maybe it can't. pgpUl3DbdgW5f.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2
On Mon, 23 Jun 2008, Ross wrote: Yeah, that's something I'd love to see. CIFS isn't quite there yet, but it's miles ahead of Samba, and as soon as it is ready we'll want to be rolling it out under Sun Cluster. If Samba is already there for many people for many years, in what way is native CIFS miles ahead of Samba? Does this apply to CIFS in general or just for HA? This is not meant as a silly question since I would like to understand the benefits (beyond the native ACLs). Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
On Mon, Jun 23, 2008 at 01:36:53PM -0700, Erik Trimble wrote: But, but, but, PAE works so nice on my Solaris 8 x86 boxes for massive /tmp. :-) What CPU? If it's a 64-bit CPU, you don't need PAE. ;) Back on topic: the one thing I haven't tried out is ZFS on a 32-bit-only system with PAE, and more than 4GB of RAM. Anyone? Probably poorly. ZFS needs address space, which is lacking in a 32-bit kernel. -brian -- Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you'll end up with a cupboard full of pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool iostat
On Thu, Jun 19, 2008 at 10:06:19AM +0100, Robert Milkowski wrote: Hello Brian, BH A three-way mirror and three disks in a double parity array are going to get you BH the same usable space. They are going to get you the same level of redundancy. BH The only difference is that the RAIDZ2 is going to consume a lot more CPU cycles BH calculating parity for no good cause. And you will also get higher IOPS with 3-way mirror. That's a good point that I completely forgot to make, thanks! -brian -- Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you'll end up with a cupboard full of pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
Brian Hechinger wrote: On Mon, Jun 23, 2008 at 11:18:21AM -0600, Lori Alt wrote: Sorry it's taken me so long to weigh in on this. You're busy with important things, we'll forgive you. ;) With zfs, we don't actually have to put /var in its own slice. We can achieve the same goal by putting it in its own dataset and assigning a quota to that dataset. That's really the only reason we offered this option. And thank you for doing so. I will always put /var in it's own area even if the definition of that area has changed with the use of ZFS. Rampant writes to /var can *still* run / out of space even on ZFS, being able to keep that from happening is never a bad idea as far as I'm concerned. :) I think the ability to have different policies for file systems is pure goodness -- though you pay for it on the backup/ restore side. A side question though, my friends who run Windows, Linux, or OSX don't seem to have this bias towards isolating /var. Is this a purely Solaris phenomenon? If so, how do we fix it? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
On Mon, Jun 23, 2008 at 8:45 PM, Richard Elling [EMAIL PROTECTED] wrote: Brian Hechinger wrote: On Mon, Jun 23, 2008 at 11:18:21AM -0600, Lori Alt wrote: Sorry it's taken me so long to weigh in on this. You're busy with important things, we'll forgive you. ;) With zfs, we don't actually have to put /var in its own slice. We can achieve the same goal by putting it in its own dataset and assigning a quota to that dataset. That's really the only reason we offered this option. And thank you for doing so. I will always put /var in it's own area even if the definition of that area has changed with the use of ZFS. Rampant writes to /var can *still* run / out of space even on ZFS, being able to keep that from happening is never a bad idea as far as I'm concerned. :) I think the ability to have different policies for file systems is pure goodness -- though you pay for it on the backup/ restore side. A side question though, my friends who run Windows, Linux, or OSX don't seem to have this bias towards isolating /var. Is this a purely Solaris phenomenon? If so, how do we fix it? I don't think it's a Solaris phenomenon, and it's not really a /var thing. UNIX heads have always had to contend with the disaster that is a full / filesystem. /var was always the most common culprit for causing it to run out of space. If you talk to the really paranoid among us, we run a read-only root filesystem. The real way to fix it, in zfs terms, is to reserve a minimum amount of space in / - thereby guaranteeing that you don't fill up your root filesystem. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- chris -at- microcozm -dot- net === Si Hoc Legere Scis Nimium Eruditionis Habes ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
On Mon, Jun 23, 2008 at 05:45:45PM -0700, Richard Elling wrote: I think the ability to have different policies for file systems is pure goodness -- though you pay for it on the backup/ restore side. That's a price I for one am willing to pay. ;) A side question though, my friends who run Windows, Linux, or OSX don't seem to have this bias towards isolating /var. Is this a purely Solaris phenomenon? If so, how do we fix it? This is not a purely Solaris phenomenon, this is a UNIX phenomenon. People who run Linux or OSX (I can't speak for Windows users) tend to be new to the game and feel that This 40/80/500GB disk will never fill up and so don't believe that seperating /var is needed. It doesn't matter how big your disk is, a rampant process can fill up any disk of any size, it's just a matter of how long it takes. It isn't just /var that can cause trouble either, it's just that /var is the usual suspect since it's the filesystem that tends to get written to by the largest number of different processes. /tmp on a system that doesn't do tmpfs (BSD for example) is another likely candidate. Keeping / as far away from everything else as possible is never a bad idea. ZFS only makes this task easier (IMHO) since you can set quotas and reserves on different filesystems, thus protecting yourself from damage, and also at the same time not wasting disk space that could be better used elsewhere. Opinionated? Me? Yes. ;) -brian -- Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you'll end up with a cupboard full of pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
Hi All, the separating of /var is something that comes from the Unix tradition. Much of the Unix tradition of systems administration is based on making sure systems with many users remain stable and so administrators are prepared to work to make the system more reliable. Common Windows, Linux and OS X practices are dominated by the concept of a personal computer ie you only hurt yourself so ease is a priority to them. The original filesystem layout separated / /var /tmp /usr onto separate filessytems. In the bad old days every time there is a write there is risk that the filesystem may be made unstable so the aim was to minimise writes to / as without / booting to a minimal environment is a serious trial. /tmp was used for data that is not required to persist over reboots. /var was used for data that should persist over reboots The other filesystems were used to store user files / non-minimal boot programs etc By separating the filesystems it is possible to make a far more recoverable system in the event of: - a user deciding to fill up all of one piece of temporary storage (ramdisk /tmp was one of those optimisations that sun made that had some serious negative consequences; many admins on large shared systems make it back into a disk backed filesystem) - high write rate to other filesystems reduces risk of boot affecting writes from being made So keeping /var and /tmp separate make life much easier. Some of us have even been known to run with a read-only root filesystem. Linux and windows users appear to value the flexibility of not having to make system use decisions ie how big /var and /tmp should be at installation and being able to use the disk as they see fit; however, they are typically not managing systems for others and so they have made a choice of convenience which can be seriously inconvenient in a shared environment. Maurice Castro On 24/06/2008, at 10:45 AM, Richard Elling wrote: I think the ability to have different policies for file systems is pure goodness -- though you pay for it on the backup/ restore side. A side question though, my friends who run Windows, Linux, or OSX don't seem to have this bias towards isolating /var. Is this a purely Solaris phenomenon? If so, how do we fix it? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle and ZFS
Miles Nordin wrote: re == Richard Elling [EMAIL PROTECTED] writes: kb == Keith Bierman [EMAIL PROTECTED] writes: re the disk lies about the persistence of the data. ZFS knows re disks lie, so it sends sync commands when necessary (1) i don't think ``lie'' is a correct characerization given that the sync commands exist, but point taken about the other area of risk. IMNSHO, they lie. Some disks do not disable volatile write caches, even when you ask them. I've got a scar... right there below the ORA-27062 and next to the FC-disk firmware scars... I think Torrey's is on his backside... :-) I suspect there may be similar problems in ZFS's write path when one is using iSCSI targets. Or it's just common for iSCSI target implementations to suck (lie). or maybe it's something else I'm seeing. I hope they aren't making assumptions about volatility... (2) i thought the recommendation that one give ZFS whole disks and let it put EFI labels on them came from the Solaris behavior that, only in a whole-disk-for-zfs configuration, will the Solaris drivers refrain from explicitly disabling the write cache in these inexpensive disks. so the cache shouldn't be a problem for UFS, but it might be for non-Solaris operating systems (even for ZFS on platforms where ZFS is ported but the SYNCHRONIZE CACHE commands don't make it through some mid-layer or CAM or driver). Close. By default, Solaris will try to disable the write cache, ostensibly to protect UFS. But if the whole disk is in use by ZFS, then it will enable the write cache and ZFS uses the synchronize cache commands, as appropriate. Solaris is a bit conservative here, maybe too conservative. In some cases you can override it with format -e. kb Aye, but isn't that the real rub ... when the power fails kb after the write but *before* the fsync has occurred... no, there is no rub here, I was only speaking precisely. A proper DBMS (anything except MySQL) is also designed to understand that power failures happen. It does its writes in a deliberate order such that it won't return success to the application calling it until it gets the return from fsync(), and also so that the system will never recover such that a transaction is half-completed. ZFS has similar protections. The most interesting is that since it is COW, the metadata is (almost) never overwritten. The almost applies to the uberblocks which use a circular queue. re the ZFS on-disk format is such that you can recover to a point re in time where the file system is consistent. do you mean taht, ``after a power outage ZFS will always recover the filesystem to a state that it passed through in the moments leading up to the outage,'' while UFS, which logs only metadata, typically recovers to some state the filesystem never passed through---but it never loses fsync()ed data nor data that wasn't written ``recently'' before the crash? The system can lose fsync()ed data if UFS thinks it wrote to persistent storage, but was actually writing to volatile storage. This may be less common, though. I think the more common symptom is a need to fsck to rebuild the metadata. For casual filesystem use, or for applications that weren't designed with cord-pulling in mind, ZFS's guarantee is larger and more comforting. But for databases, I don't think the distinction matters because they call fsync() at deliberate moments and do their own copy-on-write logging above the filesystem, so they provide the same consistency guarantees whether operating on UFS or ZFS. It would be fine to feed a database the type of hacked non-CoW zvol that's used for swap, if fsync could be made to work there, which maybe it can't. Hacked non-COW zvol? Since COW occurs at the DMU layer, below ZPL or ZVol, I don't see how to bypass it. AFAIK, the trick to using ZVols for swap was to just fix some bugs in ZFS and rewrite the pertinent parts of the installer(s). The subject of a non-COW volume does come up periodically. I refer to these as raw devices :-) Since many of the features of ZFS depend on COW, if you get rid of COW then you get rid of the features, and you might as well use raw devices, no? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
Maurice Castro wrote: Hi All, the separating of /var is something that comes from the Unix tradition. Much of the Unix tradition of systems administration is based on making sure systems with many users remain stable and so administrators are prepared to work to make the system more reliable. Common Windows, Linux and OS X practices are dominated by the concept of a personal computer ie you only hurt yourself so ease is a priority to them. So the consensus is that if we are to compete with them at the desktop, then a simpler, easier to maintain file system structure is cool. The original filesystem layout separated / /var /tmp /usr onto separate filessytems. In the bad old days every time there is a write there is risk that the filesystem may be made unstable so the aim was to minimise writes to / as without / booting to a minimal environment is a serious trial. Actually, no. There was /. /var didn't show up until SunOS 4 circa 1988. /usr was made separable when it began to grow bigger than the disks available at the time, circa 1986 or so. In any case, well after UNIX was established. /tmp was used for data that is not required to persist over reboots. /var was used for data that should persist over reboots The other filesystems were used to store user files / non-minimal boot programs etc By separating the filesystems it is possible to make a far more recoverable system in the event of: - a user deciding to fill up all of one piece of temporary storage (ramdisk /tmp was one of those optimisations that sun made that had some serious negative consequences; many admins on large shared systems make it back into a disk backed filesystem) - high write rate to other filesystems reduces risk of boot affecting writes from being made The reason for separating was very different, though this was also a side-product. In the days of diskless systems, you could share parts of the OS as read-only. /usr originally contained /usr/tmp and /usr/Richard (or whatever). To make /usr be read-only, user home directories and tmp had to be moved out to /var. / also was unique to each diskless client, so while they could share much of the stuff in /usr, each had to have its own /. Some people also took advantage of the fact that /usr/spool is now in /var/spool, where the printer files and mail collected and separated these out so that you wouldn't have to back them up (tapes were 60MBytes or so at the time). So, there you have it: /, /var, /usr, and /export/home. Each has a different policy, which is the key here. NB. UFS, by default, reserves 10% which only root can write. So a regular user could not directly impact a running UNIX system using UFS. ZFS does not have such reserve, so if you want to implement it, you will end up with a separate file system somewhere. -- richard So keeping /var and /tmp separate make life much easier. Some of us have even been known to run with a read-only root filesystem. Linux and windows users appear to value the flexibility of not having to make system use decisions ie how big /var and /tmp should be at installation and being able to use the disk as they see fit; however, they are typically not managing systems for others and so they have made a choice of convenience which can be seriously inconvenient in a shared environment. Maurice Castro On 24/06/2008, at 10:45 AM, Richard Elling wrote: I think the ability to have different policies for file systems is pure goodness -- though you pay for it on the backup/ restore side. A side question though, my friends who run Windows, Linux, or OSX don't seem to have this bias towards isolating /var. Is this a purely Solaris phenomenon? If so, how do we fix it? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs primarycache and secondarycache properties
Moved from PSARC to zfs-code...this discussion is seperate from the case. Eric kustarz wrote: On Jun 23, 2008, at 1:20 PM, Darren Reed wrote: eric kustarz wrote: On Jun 23, 2008, at 1:07 PM, Darren Reed wrote: Tim Haley wrote: primarycache=all | none | metadata Controls what is cached in the primary cache (ARC). If set to all, then both user data and metadata is cached. If set to none, then neither user data nor metadata is cached. If set to metadata, then only metadata is cached. The default behavior is all. The description above kind of implies that user data is somehow separate to metadata but it isn't possible to say cache only user data (with the text given.) Is this just an oversight or is this really saying you cannot cache only the user data? We couldn't come up with any realistic workload that would want to cache user data but not metadata, so we're not allowing it. We can always add the option later, but if someone has a realistic use case for it, i'd be happy to add it now. It's not so much the why, but maybe I'd like to say the primarycache gets metadata and the secondary cache gets user data (or vice versa.) If that make sense? Or would that require linkage between metadata and user data (across cache boundaries) in order to maintain sanity? It is the why. If there's no reason to do it, then we shouldn't allow it (adds more complexity, more confusion, more ways for a customer to shoot themselves in the foot). However, if there is a legitimate use case, let's discuss that. In considering the why, being aware of some implementation details seems necessary, such as: - is there a difference in size between the primary and secondary cache - how big is the meta data relative to user data - how many meta data items are there relative to user data So if I'm constantly accessing just a few files, I may prefer to have all of the metadata cache'd by a smaller (primary?) cache and for user data to not be able to cause that any problems (that capability is there now, ok.) So the question becomes why does one not want the metadata in the same cache as the user data? So I spent some time thinking about different directions you could build on this in the future, for example: 1) controlling the size of the ARC/L2ARC by controlling the cache size 2) specifying different backing storage for primary/secondary cache 3) having more than two levels of cache ...none of which is precluded by current efforts. With (2), if the backing storage for each cache is different and it is slower to access the secondary cache than the primary, then you may not want metadata to be stored in the secondary cache for performance reasons. As an example, you might be using NVRAM (be it flash or otherwise) for the primary cache and ordinary RAM for the secondary. In this case you probably don't want any metadata to be stored in the secondary cache (power failure issues) but the same may not hold for user data. But I'm probably wrong about that. Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
On Mon, Jun 23, 2008 at 8:06 PM, Brian Hechinger [EMAIL PROTECTED] wrote: This is not a purely Solaris phenomenon, this is a UNIX phenomenon. People who run Linux or OSX (I can't speak for Windows users) tend to be new to the game and feel that This 40/80/500GB disk will never fill up and so don't believe that seperating /var is needed. Why is having a full /var so much better than having a full /? I've had a number of Solaris systems fail to boot because it can't update /var/adm/utmpx, but I've never had one fail to boot because / was full. As best as I can deduce, the root file system corruption when it gets full is a combination of ancient history and urban legend. I've brought this up on a lengthy thread over at sysadmin-discuss a while back and have had no one refute my assertion with credible data. http://mail.opensolaris.org/pipermail/sysadmin-discuss/2007-September/001668.html I've also shared more detailed thoughts on file system sprawl at... http://mail.opensolaris.org/pipermail/sysadmin-discuss/2007-September/001641.html Really it boils down to lots of file systems to hold the OS adds administrative complexity and rarely saves more work than it creates. I believe this especially holds true for enterprise server environments where downtime is really expensive. I much prefer to ask for a 3 hour outage to patch than a 5 hour outage to relayout file systems then patch. Of course today's development work will make the 3 hour outage for patching a thing of ancient history as well. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
Mike Gerdts writes: On Mon, Jun 23, 2008 at 8:06 PM, Brian Hechinger [EMAIL PROTECTED] wrote: This is not a purely Solaris phenomenon, this is a UNIX phenomenon. People who run Linux or OSX (I can't speak for Windows users) tend to be new to the game and feel that This 40/80/500GB disk will never fill up and so don't believe that seperating /var is needed. WIth ZFS boot, the point is moot. Why is having a full /var so much better than having a full /? I've had a number of Solaris systems fail to boot because it can't update /var/adm/utmpx, but I've never had one fail to boot because / was full. As best as I can deduce, the root file system corruption when it gets full is a combination of ancient history and urban legend. I've brought this up on a lengthy thread over at sysadmin-discuss a while back and have had no one refute my assertion with credible data. We can only hope that ZFS boot will consign this never ending layout argument to the dust of history. Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mv between ZFSs on same zpool
Yaniv Aknin wrote: Thanks for the reference. I read that thread to the end, and saw there are some complex considerations regarding changing st_dev on an open file, but no decision. Despite this complexity, I think the situation is quite brain damanged - I'm moving large files between ZFSs all the time, otherwise I can't separate the tree as I'd like to, and it's fairly annoying to think these blocks are basically not doing anything at 50mb/s. I think even a hack will do for a start (do I hear 'zmv'). Thoughts? Objections? It is filed as an RFE - 6650426 and 6483179 are related. Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss