Re: [zfs-discuss] Deduplication Memory Requirements
Hi, On 05/ 5/11 03:02 PM, Edward Ned Harvey wrote: From: Garrett D'Amore [mailto:garr...@nexenta.com] We have customers using dedup with lots of vm images... in one extreme case they are getting dedup ratios of over 200:1! I assume you're talking about a situation where there is an initial VM image, and then to clone the machine, the customers copy the VM, correct? If that is correct, have you considered ZFS cloning instead? When I said dedup wasn't good for VM's, what I'm talking about is: If there is data inside the VM which is cloned... For example if somebody logs into the guest OS and then does a "cp" operation... Then dedup of the host is unlikely to be able to recognize that data as cloned data inside the virtual disk. ZFS cloning and ZFS dedup are solving two problems that are related, but different: - Through Cloning, a lot of space can be saved in situations where it is known beforehand that data is going to be used multiple times from multiple different "views". Virtualization is a perfect example of this. - Through Dedup, space can be saved in situations where the duplicate nature of data is not known, or not known beforehand. Again, in virtualization scenarios, this could be common modifications to VM images that are performed multiple times, but not anticipated, such as extra software, OS patches, or simply man users saving the same files to their local desktops. To go back to the "cp" example: If someone logs into a VM that is backed by ZFS with dedup enabled, then copies a file, the extra space that the file will take will be minimal. The act of copying the file will break down into a series of blocks that will be recognized as duplicate blocks. This is completely independent of the clone nature of the underlying VM's backing store. But I agree that the biggest savings are to be expected from cloning first, as they typically translate into n GB (for the base image) x # of users, which is a _lot_. Dedup is still the icing on the cake for all those data blocks that were unforeseen. And that can be a lot, too, as everone who has seen cluttered desktops full of downloaded files can probably confirm. Cheers, Constantin -- Constantin Gonzalez Schmitz, Sales Consultant, Oracle Hardware Presales Germany Phone: +49 89 460 08 25 91 | Mobile: +49 172 834 90 30 Blog: http://constantin.glez.de/| Twitter: zalez ORACLE Deutschland B.V. & Co. KG, Sonnenallee 1, 85551 Kirchheim-Heimstetten ORACLE Deutschland B.V. & Co. KG Hauptverwaltung: Riesstraße 25, D-80992 München Registergericht: Amtsgericht München, HRA 95603 Komplementärin: ORACLE Deutschland Verwaltung B.V. Hertogswetering 163/167, 3543 AS Utrecht Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deleting large amounts of files
Hi, Is there a way to see which files have been deduped, so I can copy them again an un-dedupe them? unfortunately, that's not easy (I've tried it :) ). The issue is that the dedup table (which knows which blocks have been deduped) doesn't know about files. And if you pull block pointers for deduped blocks from the dedup table, you'll need to backtrack from there through the filesystem structure to figure out what files are associated with those blocks. (remember: Deduplication happens at the block level, not the file level.) So, in order to compile a list of deduped _files_, one would need to extract the list of dedupes _blocks_ from the dedup table, then chase the pointers from the root of the zpool to the blocks in order to figure out what files they're associated with. Unless there's a different way that I'm not aware of (and I hope someone can correct me here), the only way to do that is run a scrub-like process and build up a table of files and their blocks. Cheers, Constantin -- Constantin Gonzalez Schmitz | Principal Field Technologist Phone: +49 89 460 08 25 91 || Mobile: +49 172 834 90 30 Oracle Hardware Presales Germany ORACLE Deutschland B.V. & Co. KG | Sonnenallee 1 | 85551 Kirchheim-Heimstetten ORACLE Deutschland B.V. & Co. KG Hauptverwaltung: Riesstraße 25, D-80992 München Registergericht: Amtsgericht München, HRA 95603 Komplementärin: ORACLE Deutschland Verwaltung B.V. Rijnzathe 6, 3454PV De Meern, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
Hi Tim, thanks for sharing your dedup experience. Especially for Virtualization, having a good pool of experience will help a lot of people. So you see a dedup ratio of 1.29 for two installations of Windows Server 2008 on the same ZFS backing store, if I understand you correctly. What dedup ratios do you see for the third, fourth and fifth server installation? Also, maybe dedup is not the only way to save space. What compression rate do you get? And: Have you tried setting up a Windows System, then setting up the next one based on a ZFS clone of the first one? Hope this helps, Constantin On 04/23/10 08:13 PM, tim Kries wrote: Dedup is a key element for my purpose, because i am planning a central repository for like 150 Windows Server 2008 (R2) servers which would take a lot less storage if they dedup right. -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist Blog: constantin.glez.de Tel.: +49 89/4 60 08-25 91 Twitter: @zalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Jürgen Kunz ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Hi, I agree 100% with Chris. Notice the "on their own" part of the original post. Yes, nobody wants to run zfs send or (s)tar by hand. That's why Chris's script is so useful: You set it up and forget and get the job done for 80% of home users. On another note, I was positively surprised by the availability of Crash Plan for OpenSolaris: http://crashplan.com/ Their free service allows to back up your stuff to a friend's system over the net in an encrypted way, the paid-for servide uses Crashplan's data centers at a less than Amazon-S3 pricing. While this may not be everyone's solution, I find it significant that they explicitly support OpenSolaris. This either means they're OpenSolaris fans or that they see potential in OpenSolaris home server users. Cheers, Constantin On 03/20/10 01:31 PM, Chris Gerhard wrote: I'll say it again: neither 'zfs send' or (s)tar is an enterprise (or even home) backup system on their own one or both can be components of the full solution. Up to a point. zfs send | zfs receive does make a very good back up scheme for the home user with a moderate amount of storage. Especially when the entire back up will fit on a single drive which I think would cover the majority of home users. Using external drives and incremental zfs streams allows for extremely quick back ups of large amounts of data. It certainly does for me. http://chrisgerhard.wordpress.com/2007/06/01/rolling-incremental-backups/ -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist Blog: constantin.glez.de Tel.: +49 89/4 60 08-25 91 Twitter: @zalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Hi, I'm using 2 x 1.5 TB drives from Samsung (EcoGreen, I believe) in my current home server. One reported 14 Read errors a few weeks ago, roughly 6 months after install, which went away during the next scrub/resilver. This remembered me to order a 3rd drive, a 2.0 TB WD20EADS from Western Digital and I now have a 3-way mirror, which is effectively a 2-way mirror with its hot-spare already synced in. The idea behind notching up the capacity is threefold: - No "sorry, this disk happens to have 1 block too few" problems on attach. - When the 1.5 TB disks _really_ break, I'll just order another 2 TB one and use the opportunity to upgrade pool capacity. Since at least one of the 1.5TB drives will still be attached, there won't be any "slightly smaller drive" problems either when attaching the second 2TB drive. - After building in 2 bigger drives, it becomes easy to figure out which of the drives to phase out. Just go for the smaller drives. This solves the headache of trying to figure out the right drive to build out when you replace drives that aren't hot spares and don't have blinking lights. Frankly, I don't care whether the Samsung or the WD drives are better or worse, they're both consumer drives and they're both dirt cheap. Just assume that they'll break soon (since you're probably using them more intensely than their designed purpose) and make sure their replacements are already there. It also helps mixing vendors, so one glitch that affect multiple disks in the same batch won't affect your setup too much. (And yes, I broke that rule with my initial 2 Samsung drives but I'm now glad I have both vendors :)). Hope this helps, Constantin Simon Breden wrote: I see also that Samsung have very recently released the HD203WI 2TB 4-platter model. It seems to have good customer ratings so far at newegg.com, but currently there are only 13 reviews so it's a bit early to tell if it's reliable. Has anyone tried this model with ZFS? Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting default user/group quotas?
Hi, IMHO, it would be useful to have something like: zfs set userquota=5G tank/home ... I think that would be great feature. thanks. I just created CR 6902902 to track this. I hope it becomes viewable soon here: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6902902 Cheers, Constantin -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Setting default user/group quotas?
Hi, first of all, many thanks to those who made user/group quotas possible. This is a huge improvement for many users of ZFS! While presenting on this new future at the Munich OpenSolaris User Group meeting yesterday, a question came up that I couldn't find an answer for: Can you set a default user/group quota? Apparently, zfs set userqu...@user1=5g tank/home/user1 is the only way to set user quotas and the "@user1" part seems to be mandatory, at least according to the snv_126 version of the ZFS manpage. According to my attempts with ZFS: The {user|group}{used|quota}@ properties must be appended with a user or group specifier of one of these forms: POSIX name (eg: "matt") POSIX id(eg: "126829") SMB n...@domain (eg: "m...@sun") SMB SID (eg: "S-1-234-567-89") Imagine a system that needs to handle thousands of users. Setting quota individually for all of these users would quickly become unwieldly, in a similar manner to the unwieldliness that having a filesystem for each user presented. Which was the reason to introduce user/group quotas in the first place. IMHO, it would be useful to have something like: zfs set userquota=5G tank/home and that would mean that all users who don't have an individual user quota assigned to them would see a default 5G quota. I haven't found an RFE for this yet. Is this planned? Should I file an RFE? Or did I overlook something? Thanks, Constantin -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS commands hang after several zfs receives
Hi, I think I've run into the same issue on OpenSolaris 2009.06. Does anybody know when this issue will be solved in OpenSolaris? What's the BugID? Thanks, Constantin Gary Mills wrote: On Tue, Sep 15, 2009 at 08:48:20PM +1200, Ian Collins wrote: Ian Collins wrote: I have a case open for this problem on Solaris 10u7. The case has been identified and I've just received an IDR,which I will test next week. I've been told the issue is fixed in update 8, but I'm not sure if there is an nv fix target. I'll post back once I've abused a test system for a while. The IDR I was sent appears to have fixed the problem. I have been abusing the box for a couple of weeks without any lockups. Roll on update 8! Was that IDR140221-17? That one fixed a deadlock bug for us back in May. -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto Updates [PSARC/2009/443 FastTrack timeout 08/24/2009]
Hi, Brian Hechinger wrote: On Tue, Aug 18, 2009 at 12:37:23AM +0100, Robert Milkowski wrote: Hi Darren, Thank you for the update. Have you got any ETA (build number) for the crypto project? Also, is there any word on if this will support the hardware crypto stuff in the VIA CPUs natively? That would be nice. :) ZFS Crypto uses the Solaris Cryptographics Framework to do the actual encryption work, so ZFS is agnostic to any hardware crypto acceleration. The Cryptographic Framework project on OpenSolaris.org is looking for help in implementing VIA Padlock support for the Solaris Cryptographic Framework: http://www.opensolaris.org/os/project/crypto/inprogress/padlock/ Cheers, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motherboard for home zfs/solaris file server
Hi, thank you so much for this post. This is exactly what I was looking for. I've been eyeing the M3A76-CM board, but will now look at 78 and M4A as well. Actually, not that many Asus M3A, let alone M4A boards show up yet on the OpenSolaris HCL, so I'd like to encourage everyone to share their hardware experience by clicking on the "submit hardware" link on: http://www.sun.com/bigadmin/hcl/data/os/ I've done it a couple of times and it's really just a matter of 5-10 minutes where you can help others know if a certain component works or not or if a special driver or /etc/driver_aliases setting is required. I'm also interested in getting the power down. Right now, I have the Athlon X2 5050e (45W TDP) on my list, but I'd also like to know more about the possibilities of the Athlon II X2 250 and whether it has better potential for power savings. Neal, the M3A78 seems to have a RealTek RTL8111/8168B NIC chip. I pulled this off a Gentoo Wiki, because strangely this information doesn't show up on the Asus website. Also, thanks for the CF to pata hint for the root pool mirror. Will try to find fast CFs to boot from. The performance problems you see when writing may be related to master/slave issues, but I'm not a good PC tweaker to back that up. Cheers, Constantin F. Wessels wrote: Hi, I'm using asus m3a78 boards (with the sb700) for opensolaris and m2a* boards (with the sb600) for linux some of them with 4*1GB and others with 4*2Gb ECC memory. Ecc faults will be detected and reported. I tested it with a small tungsten light. By moving the light source slowly towards the memory banks you'll heat them up in a controlled way and at a certain point bit flips will occur. I recommend you to go for a m4a board since they support up to 16 GB. I don't know if you can run opensolaris without a videocard after installation I think you can disable the "halt on no video card" in the bios. But Simon Breden had some trouble with it, see his homeserver blog. But you can go for one of the three m4a boards with a 780g onboard. Those will give you 2 pci-e x16 connectors. I don't think the onboard nic is supported. I always put an intel (e1000) in, just to prevent any trouble. I don't have any trouble with the sb700 in ahci mode. Hotplugging works like a charm. Transfering a couple of GB's over esata takes considerable less time than via usb. I have a pata to dual cf adapter and two industrial 16gb cf cards as mirrored root pool. It takes for ever to install nevada, at least 14 hours. I suspect the cf cards lack caches. But I don't update that regularly, still on snv104. And have 2 mirrors and a hot spare. The sixth port is an esata port I use to transfer large amounts of data. This system consumes about 73 watts idle and 82 under load i/o load. (5 disks , a separate nic ,8 gb ram and a be2400 all using just 73 watts!!!) Please note that frequency scaling is only supported on the K10 architecture. But don't expect to much power saving from it. A lower voltage yields far greater savings than a lower frequency. In september I'll do a post about the afore mentioned M4A boards and an lsi sas controller in one of the pcie x16 slots. -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist http://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, Bob Friesenhahn wrote: > On Thu, 23 Oct 2008, Constantin Gonzalez wrote: >> >> Yes, we're both aware of this. In this particular situation, the customer >> would restart his backup job (and thus the client application) in case >> the >> server dies. > > So it is ok for this customer if their backup becomes silently corrupted > and the backup software continues running? Consider that some of the > backup files may have missing or corrupted data in the middle. Your > customer is quite dedicated in that he will monitor the situation very > well and remember to reboot the backup system, correct any corrupted > files, and restart the backup software whenever the server panics and > reboots. This is what the customer told me. He uses rsync and he is ok with restarting the rsync whenever the NFS server restarts. > A properly built server should be able to handle NFS writes at gigabit > wire-speed. I'm advocating for a properly built system, believe me :). Cheers, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, yes, using slogs is the best solution. Meanwhile, using mirrored slogs from other servers' RAM-Disks running on UPSs seem like an interesting idea, if the reliability of UPS-backed RAM is deemed reliable enough for the purposes of the NFS server. Thanks for siggesting this! Cheers, Constantin Ross wrote: > Well, it might be even more of a bodge than disabling the ZIL, but how about: > > - Create a 512MB ramdisk, use that for the ZIL > - Buy a Micro Memory nvram PCI card for £100 or so. > - Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace the > Micro Memory card. > > The ramdisk isn't an ideal solution, but provided you don't export the pool > with it offline, it does work. We used it as a stop gap solution for a > couple of weeks while waiting for a Micro Memory nvram card. > > Our reasoning was that our server's on a UPS and we figured if something > crashed badly enough to take out something like the UPS, the motherboard, > etc, we'd be loosing data anyway. We just made sure we had good backups in > case the pool got corrupted and crossed our fingers. > > The reason I say wait 3-6 months is that there's a huge amount of activity > with SSD's at the moment. Sun said that they were planning to have flash > storage launched by Christmas, so I figure there's a fair chance that we'll > see some supported PCIe cards by next Spring. > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, Bob Friesenhahn wrote: > On Wed, 22 Oct 2008, Neil Perrin wrote: >> On 10/22/08 10:26, Constantin Gonzalez wrote: >>> 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to >>> me >>> that if a tar xvf were writing locally to a ZFS file system, the writes >>> wouldn't be synchronous either, so there's no point in forcing NFS users >>> to having a better availability experience at the expense of >>> performance. > > The conclusion reached here is quite seriously wrong and no Sun > employee should suggest it to a customer. If the system writing to a I'm not suggesting it to any customer. Actually, I argued quite a long time with the customer, trying to convince him that "slow but correct" is better. The conclusion above is a conscious decision by the customer. He says that he does not want NFS to turn any write into a synchronous write, he's happy if all writes are asynchronous, because in this case the NFS server is a backup to disk device and if power fails he simply restarts the backup 'cause he has the data in multiple copies anyway. > local filesystem reboots then the applications which were running are > also lost and will see the new filesystem state when they are > restarted. If an NFS server sponteneously reboots, the applications > on the many clients are still running and the client systems are using > cached data. This means that clients could do very bad things if the > filesystem state (as seen by NFS) is suddenly not consistent. One of > the joys of NFS is that the client continues unhindered once the > server returns. Yes, we're both aware of this. In this particular situation, the customer would restart his backup job (and thus the client application) in case the server dies. Thanks for pointing out the difference, this is indeed an important distinction. Cheers, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, >> - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE >> already >>that asks for the ability to disable the ZIL on a per filesystem >> basis? > > Yes: 6280630 zil synchronicity good, thanks for the pointer! > Though personally I've been unhappy with the exposure that zil_disable > has got. > It was originally meant for debug purposes only. So providing an official > way to make synchronous behaviour asynchronous is to me dangerous. IMHO, the need here is to give admins control over the way they want their file servers to behave. In this particular case, the admin argues that he knows what he's doing, that he doesn't want his NFS server to behave more strongly than a local filesystem and that he deserves control of that behaviour. Ideally, there would be an NFS option that lets customers choose whether they want to honor COMMIT requests or not. Disabling ZIL on a per filesystem basis is only the second best solution, but since that CR already exists, it seems to be the more realistic route. Thanks, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, On a busy NFS server, performance tends to be very modest for large amounts of small files due to the well known effects of ZFS and ZIL honoring the NFS COMMIT operation[1]. For the mature sysadmin who knows what (s)he does, there are three possibilities: 1. Live with it. Hard, if you see 10x less performance than could be and your users complain a lot. 2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost, especially if you're using an X4500/X4540 and can't swap out fast SAS drives for cheap SATA drives to free the budget for flash ZIL drives.[2] 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me that if a tar xvf were writing locally to a ZFS file system, the writes wouldn't be synchronous either, so there's no point in forcing NFS users to having a better availability experience at the expense of performance. So, if the sysadmin draws the informed and conscious conclusion that (s)he doesn't want to honor NFS COMMIT operations, what are options less disruptive than disabling ZIL completely? - I checked the NFS tunables from: http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html But could not find a tunable that would disable COMMIT honoring. Is there already an RFE asking for a share option that disable's the translation of COMMIT to synchronous writes? - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already that asks for the ability to disable the ZIL on a per filesystem basis? Once Admins start to disable the ZIL for whole pools because the extra performance is too tempting, wouldn't it be the lesser evil to let them disable it on a per filesystem basis? Comments? Cheers, Constantin [1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine [2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Start with desired end state in mind...
Hi, great, thank you. So ZFS isn't picky about finding the target fs already created and attributed when replicating data into it. This is very cool! Best regards, Constantin Darren J Moffat wrote: > Constantin Gonzalez wrote: >> Hi Darren, >> >> thank you for the clarification, I didn't know that. >> >>> See the man page for zfs(1) where the -R options for send is discussed. > > >> Back to Brad's RFS, what would one need to do to send a stream from a >> compressed filesystem to one with a different compression setting, if >> the source file system has the compression attribute set to a specific >> algorithm (i.e. not inherited)? > > $ zfs create -o compression=gzip-1 tank/gz1 > # put in your data > $ zfs snapshot tank/[EMAIL PROTECTED] > $ zfs create -o compression=gzip-9 tank/gz9 > $ zfs send tank/[EMAIL PROTECTED] | zfs recv -d tank/gz9 > >> Will leaving out -R just create a new, but plain unencrypted fs on the >> receivig side? > > Depends on inheritance. > >> What if one wants to replicated a whole package of filesystems via >> -R, but change properties on the receiving side before it happens? > > If they are all getting the same properties use inheritance if they > aren't then you (by the very nature of what you want to do) need to > precreate them with the appropriate options. > -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Start with desired end state in mind...
Hi Darren, thank you for the clarification, I didn't know that. > See the man page for zfs(1) where the -R options for send is discussed. oh, this is new. Thank you for bringing us -R. Back to Brad's RFS, what would one need to do to send a stream from a compressed filesystem to one with a different compression setting, if the source file system has the compression attribute set to a specific algorithm (i.e. not inherited)? Will leaving out -R just create a new, but plain unencrypted fs on the receivig side? What if one wants to replicated a whole package of filesystems via -R, but change properties on the receiving side before it happens? Best regards, Constantin > >> But for the sake of implementing the RFE, one could extend the ZFS >> send/receive framework with a module that permits manipulation of the >> data on the fly, specifically in order to allow for things like >> recompression, en/decryption, change of attributes at the dataset level, >> etc. > > No need this already works this way. > -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Start with desired end state in mind...
Hi Brad, this is indeed a good idea. But I assume that it will be difficult to do, due to the low-level nature of zfs send/receive. In your compression example, you're asking for zfs send/receive to decompress the blocks on the fly. But send/receive operates on a lower level: It doesn't care much what is actually inside the blocks, it just copies the block structure "as is". So, unless zfs send/receive starts looking inside the blocks it copies, it is probably better to use good old tar or cpio. I would also assume that zfs send/receive doesn't know (and doesn't have to) anything about zfs dataset properties. It just starts at the snapshot level (which works independently of whether it's used for a filesystem, a ZVOL, a whatever) and copies the subtree. But for the sake of implementing the RFE, one could extend the ZFS send/receive framework with a module that permits manipulation of the data on the fly, specifically in order to allow for things like recompression, en/decryption, change of attributes at the dataset level, etc. Best regards, Constantin Brad Diggs wrote: > I love the send and receive feature of zfs. However, the one feature > that it lacks is that I can't specify on the receive end how I want > the destination zfs filesystem to be be created before receiving the > data being sent. > > For example, lets say that I would like to do a compression study to > determine which level of compression of the gzip algorithm would save > the most space for my data. One of the easiest ways to do that > locally or remotely would be to use send/receive like so. > > zfs snapshot zpool/[EMAIL PROTECTED] > gz=1 > while [ ${gz} -le 9 ] > do >zfs send zpool/[EMAIL PROTECTED] | \ > zfs receive -o compression=gzip-${gz} zpool/gz${gz}data >zfs list zpool/gz${gz}data > done > zfs destroy zpool/[EMAIL PROTECTED] > > Another example. Lets assume that that the zfs encryption feature was > available today. Further, lets assume that I have a filesystem that > has compression and encryption enabled. I want to duplicate that exact > zfs filesystem on another system through send/receive. Today the > receive feature does not give me the ability to specify the desired end > state configuration of the destination zfs filesystem before receiving > the data. I think that would be a great feature. > > Just some food for thought. > > Thanks in advance, > Brad > > _______ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS with Memory Sticks
Hi Paul, > # fdisk -E /dev/rdsk/c7t0d0s2 then > # zpool create -f Radical-Vol /dev/dsk/c7t0d0 should work. The warnings you see are just there to double-check you don't overwrite any previously used pool which you may regret. -f overrules that. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS with Memory Sticks
Hi, > # /usr/sbin/zpool import > pool: Radical-Vol > id: 3051993120652382125 > state: FAULTED > status: One or more devices contains corrupted data. > action: The pool cannot be imported due to damaged devices or data. >see: http://www.sun.com/msg/ZFS-8000-5E > config: > > Radical-Vol UNAVAIL insufficient replicas > c7t0d0s0 UNAVAIL corrupted data ok, ZFS did recognize the disk, but the pool is corrupted. Did you remove it without exporting the pool first? > Following your command: > > $ /opt/sfw/bin/sudo /usr/sbin/zpool status > pool: Rad_Disk_1 > state: ONLINE > status: The pool is formatted using an older on-disk format. The pool can > still be used, but some features are unavailable. > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the > pool will no longer be accessible on older software versions. > scrub: none requested > config: > > NAMESTATE READ WRITE CKSUM > Rad_Disk_1 ONLINE 0 0 0 > c0t1d0ONLINE 0 0 0 > > errors: No known data errors But this pool should be accessible, since you can zpool status it. Have you check zfs get all Rad_Disk_1? Does it show mount points and whether it should be mounted? > But this device works currently on my Solaris PC's, the W2100z and a > laptop of mine. Strange. Maybe it's a USB issue. Have you checked: http://www.sun.com/io_technologies/usb/USB-Faq.html#Storage Especially #19? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS with Memory Sticks
Hi Paul, yes, ZFS is platform agnostic and I know it works in SANs. For the USB stick case, you may have run into labeling issues. Maybe Solaris SPARC did not recognize the x64 type label on the disk (which is strange, because it should...). Did you try making sure that ZFS creates an EFI label on the disk? You can check this by running zpool status and then the devices should look like c6t0d0 without the s0 part. If you want to force this, you can create an EFI label on the USB disk from hand by saying fdisk -E /dev/rdsk/cxtxdx. Hope this helps, Constantin Paul Gress wrote: > OK, I've been putting off this question for a while now, but it eating > at me, so I can't hold off any more. I have a nice 8 gig memory stick > I've formated with the ZFS file system. Works great on all my Solaris > PC's, but refuses to work on my Sparc processor. So I've formated it on > my Sparc machine (Blade 2500), works great there now, but not on my > PC's. Re-Formatted it on my PC, doesn't work on Sparc, and so on and so on. > > I thought it was a file system to go back and forth both architectures. > So when will this compatibility be here, or if it's possible now, what > is the secret? > > Paul > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Best practice for moving FS between pool on same machine?
Hi, Chris Quenelle wrote: > Thanks, Constantin! That sounds like the right answer for me. > Can I use send and/or snapshot at the pool level? Or do I have > to use it on one filesystem at a time? I couldn't quite figure this > out from the man pages. the ZFS team is working on a zfs send -r (recursive) option to be able to recursively send and receive hierarchies of ZFS filesystems in one go, including pools. So you'll need to do it one filesystem at a time. This is not always trivial: If you send a full snapshot, then an incremental one and the target filesystem is mounted, you'll likely get an error that the target filesystem was modified. Make sure the target filesystems are unmounted and ideally marked as unmountable while performing the send/receives. Also, you may want to use the -F option to receive which forces a rollback of the target filesystem to the most recent snapshot. I've written a script to do all of this, but it's only "works on my system" certified. I'd like to get some feedback and validation before I post it on my blog, so anyone, let me know if you want to try it out. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Scalability/performance
Hi Mike, > If I was to plan for a 16 disk ZFS-based system, you would probably > suggest me to configure it as something like 5+1, 4+1, 4+1 all raid-z > (I don't need the double parity concept) > > I would prefer something like 15+1 :) I want ZFS to be able to detect > and correct errors, but I do not need to squeeze all the performance > out of it (I'll be using it as a home storage server for my DVDs and > other audio/video stuff. So only a few clients at the most streaming > off of it) this is possibe. ZFS in theory does not significantly limit the n and 15+1 is indeed possible. But for a number of reasons (among them performance) people generally advise to use no more than 10+1. A lot of ZFS configuration wisdom can be found on the Solaris internals ZFS Best Practices Guide Wiki at: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Richard Elling has done a great job of thoroughly analyzing different reliability concepts for ZFS in his blog. One good introduction is the following entry: http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance That may help you find the right tradeoff between space and reliability. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Scalability/performance
Hi, > How are paired mirrors more flexiable? well, I'm talking of a small home system. If the pool gets full, the way to expand with RAID-Z would be to add 3+ disks (typically 4-5). With mirror only, you just add two. So in my case it's just about the granularity of expansion. The reasoning is that of the three factors reliability, performance and space, I value them in this order. Space comes last since disk space is cheap. If I had a bigger number of disks (12+), I'd be using them in RAID-Z2 sets (4+2 plus 4+2 etc.). Here, the speed is ok and the reliability is ok and so I can use RAID-Z2 instead of mirroring to get some extra space as well. > Right now, i have a 3 disk raid 5 running with the linux DM driver. One > of the most resent additions was raid5 expansion, so i could pop in a > matching disk, and expand my raid5 to 4 disks instead of 3 (which is > always interesting as your cutting on your parity loss). I think though > in raid5 you shouldn't put more then 6 - 8 disks afaik, so I wouldn't be > expanding this enlessly. > > So how would this translate to ZFS? I have learned so far that, ZFS ZFS does not yet support rearranging the disk cofiguration. Right now, you can expand a single disk to a mirror or an n-way mirror to an n+1 way mirror. RAID-Z vdevs can't be changed right now. But you can add more disks to a pool by adding more vdevs (You have a 1+1 mirror, add another 1+1 pair and get more space, have a 3+2 RAID-Z2 and add another 5+2 RAID etc.) > basically is raid + LVM. e.g. the mirrored raid-z pairs go into the > pool, just like one would use LVM to bind all the raid pairs. The > difference being I suppose, that you can't use a zfs mirror/raid-z > without having a pool to use it from? Here's the basic idea: - You first construct vdevs from disks: One disk can be one vdev. A 1+1 mirror can be a vdev, too. A n+1 or n+2 RAID-Z (RAID-Z2) set can be a vdev too. - Then you concatenate vdevs to create a pool. Pools can be extended by adding more vdevs. - Then you create ZFS file systems that draw their block usage from the resources supplied by the pool. Very flexible. > Wondering now is if I can simply add a new disk to my raid-z and have it > 'just work', e.g. the raid-z would be expanded to use the new > disk(partition of matching size) If you have a RAID-Z based pool in ZFS, you can add another group of disks that are organized in a RAID-Z manner (a vdev) to expand the storage capacity of the pool. Hope this clarifies things a bit. And yes, please check out the admin guide and the other collateral available on ZFS. It's full of new concepts and one needs some getting used to to explore all possibilities. Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Scalability/performance
Hi, > I'm quite interested in ZFS, like everybody else I suppose, and am about > to install FBSD with ZFS. welcome to ZFS! > Anyway, back to business :) > I have a whole bunch of different sized disks/speeds. E.g. 3 300GB disks > @ 40mb, a 320GB disk @ 60mb/s, 3 120gb disks @ 50mb/s and so on. > > Raid-Z and ZFS claims to be uber scalable and all that, but would it > 'just work' with a setup like that too? Yes. If you dump a set of variable-size disks into a mirror or RAID-Z configuration, you'll get the same result as if you had the smallest of their sizes. Then, the pool will grow when exchanging smaller disks with larger. I used to run a ZFS pool on 1x250GB, 1x200GB, 1x85 GB and 1x80 GB the following way: - Set up an 80 GB slice on all 4 disks and make a 4 disk RAID-Z vdev - Set up a 5 GB slice on the 250, 200 and 85 GB disks and make a 3 disk RAID-Z - Set up a 115GB slice on the 200 and the 250 GB disk and make a 2 disk mirror. - Concatenate all 3 vdevs into one pool. (You need zpool add -f for that). Not something to be done on a professional production system, but it worked for my home setup just fine. The remaining 50GB from the 250GB drive then went into a scratch pool. Kinda like playing Tetris with RAID-Z... Later, I decided using just paired disks as mirrors are really more flexible and easier to expand, since disk space is cheap. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for moving FS between pool on same machine?
Hi Chris, > What is the best (meaning fastest) way to move a large file system > from one pool to another pool on the same machine. I have a machine > with two pools. One pool currently has all my data (4 filesystems), but it's > misconfigured. Another pool is configured correctly, and I want to move the > file systems to the new pool. Should I use 'rsync' or 'zfs send'? zfs send/receive is the fastest and most efficient way. I've used it multiple times on my home server until I had my configuration right :). > What happens is I forgot I couldn't incrementally add raid devices. I want > to end up with two raidz(x4) vdevs in the same pool. Here's what I have now: For this reason, I decided to go with mirrors. Yes, they use more raw storage space, but they are also much more flexible to expand. Just add two disks when the pool is full and you're done. If you have a lot of disks or can afford to add disks 4-5 disks at a time, then RAID-Z may be as easy to do, but remember that two disk failures in RAID-5 variants can be quite common - You may want RAID-Z2 instead. > 1. move data to dbxpool2 > 2. remount using dbxpool2 > 3. destroy dbxpool1 > 4. create new proper raidz vdev inside dbxpool2 using devices from dbxpool1 Add: 0. Snapshot data in dbxpool1 so you can use zfs send/receive Then the above should work fine. > I'm constrained by trying to minimize the downtime for the group > of people using this as their file server. So I ended up with > an ad-hoc assignment of devices. I'm not worried about > optimizing my controller traffic at the moment. Ok. If you want to really be thorough, I'd recommend: 0. Run a backup, just in case. It never hurts. 1. Do a snapshot of dbxpool1 2. zfs send/receive dbxpool1 -> dbxpool2 (This happens while users are still using dbxpool1, so no downtime). 3. Unmount dbxpool1 4. Do a second snapshot of dbxpool1 5. Do an incremental zfs send/receive of dbxpool1 -> dbxpool2. (This should take only a small amount of time) 6. Mount dbxpool2 where dbxpool1 used to be. 7. Check everything is fine with the new mounted pool. 8. Destroy dbxpool1 9. Use disks from dbxpool1 to expand dbxpool2 (be careful :) ). You might want to exercise the above steps on an extra spare disk with two pools just to gain some confidence before doing it in production. I have a script that automatically does 1-6 that is looking for beta testers. If you're interested, let me know. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] New german white paper on ZFS
Hi, if you understand german or want to brush it up a little, I've a new ZFS white paper in german for you: http://blogs.sun.com/constantin/entry/new_zfs_white_paper_in Since there's already so much collateral on ZFS in english, I thought it's time for some localized stuff for my country. There are also some new ZFS slides that go with it, also in german. Let me know if you have any suggestions. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?
Hi Malachi, Malachi de Ælfweald wrote: > I'm actually wondering the same thing because I have b62 w/ the ZFS > bits; but need the snapshot's "-r" functionality. you're lucky, it's already there. From my b62 machine's "man zfs": zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED] Creates a snapshot with the given name. See the "Snapshots" section for details. -rRecursively create snapshots of all descendant datasets. Snapshots are taken atomically, so that all recursive snapshots correspond to the same moment in time. Or did you mean send -r? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?
Hi, > Our upgrade story isn't great right now. In the meantime, > you might check out Tim Haley's blog entry on using > bfu with zfs root. thanks. But doesn't live upgrade just start the installer from the new OS DVD with the right options? Can't I just do that too? Cheers, Constantin > > http://blogs.sun.com/timh/entry/friday_fun_with_bfu_and > > lori > > Constantin Gonzalez wrote: >> Hi, >> >> I'm a big fan of live upgrade. I'm also a big fan of ZFS boot. The >> latter is >> more important for me. And yes, I'm looking forward to both being >> integrated >> with each other. >> >> Meanwhile, what is the best way to upgrade a post-b61 system that is >> booted >> from ZFS? >> >> >> I'm thinking: >> >> 1. Boot from ZFS >> 2. Use Tim's excellent multiple boot datasets script to create a new >> cloned ZFS >>boot environment: >>http://blogs.sun.com/timf/entry/an_easy_way_to_manage >> 3. Loopback mount the new OS ISO image >> 4. Run the installer from the loopbacked ISO image in upgrade mode on >> the clone >> 5. Mark the clone to be booted the next time >> 6. Reboot into the upgraded OS. >> >> >> Questions: >> >> - How exactly do I do step 4? Before, luupgrade did everything for me, >> now >> what manpage do I need to do this? >> >> - Did I forget something above? I'm ok with losing some logfiles and >> stuff that >> maybe changed between the clone and the reboot, but is there >> anything else? >> >> - Did someone already blog about this and I haven't noticed yet? >> >> >> Cheers, >>Constantin >> >> > -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?
Hi, I'm a big fan of live upgrade. I'm also a big fan of ZFS boot. The latter is more important for me. And yes, I'm looking forward to both being integrated with each other. Meanwhile, what is the best way to upgrade a post-b61 system that is booted from ZFS? I'm thinking: 1. Boot from ZFS 2. Use Tim's excellent multiple boot datasets script to create a new cloned ZFS boot environment: http://blogs.sun.com/timf/entry/an_easy_way_to_manage 3. Loopback mount the new OS ISO image 4. Run the installer from the loopbacked ISO image in upgrade mode on the clone 5. Mark the clone to be booted the next time 6. Reboot into the upgraded OS. Questions: - How exactly do I do step 4? Before, luupgrade did everything for me, now what manpage do I need to do this? - Did I forget something above? I'm ok with losing some logfiles and stuff that maybe changed between the clone and the reboot, but is there anything else? - Did someone already blog about this and I haven't noticed yet? Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz pool with a slice on the boot disk
Hi Russell, Russell Baird wrote: > I created a pool with the following command… > > zpool create -f batches raidz c0t0d0s7 c0t1d0 c0t2d0 c0t3d0 > > Notice that I specified slice 7, which is an unused slice on my boot disk > t0. No part of the operating system exists on slice 7. Is this acceptable? > Will this jeopardize the redundancy of my raidz pool with part of it on the > boot disk? no, this should work fine from a ZFS perspective. I've been running a much more complicated setup at home for over a year and it worked well. That said, you still might want to reconsider: If your boot disk fails, you'll need to replace it and repartition it to fit your raid scheme before you can replace the s7 part of your RAID-Z. That might be a hassle you want to avoid. Out of a s similar situation, I decided to invest into more disks and then host my data and my OS on separate disks. The goal is to keep everything that makes your server unique (data + copies of any config changes etc.) onto a separate pool that can easily be plugged into a different, plain vanilla server, while leaving the boot filesystem as untouched as possible. This is just a recommendation with the goal of reducing administration and recovery complexity. As said, there should be no real harm from your config. There is a slight performance impact though: ZFS will enable the disk write cache on c0t1-t3 but not on c0t0, so, effectively, c0t0 is going to be the slowest drive in the RAID-Z set during certain circumstances and therefore slightly affect the performance of the RAID-Z vdev. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] A big Thank You to the ZFS team!
Hi, I just got ZFS boot up and running on my laptop. This being a major milestone in the history of ZFS, I thought I'd reflect a bit on what ZFS brought to my life so far: - I'm using ZFS at home and on my laptop since January 2006 for mission critical purposes: - Backups of my wife's and my own Macs. - Storing family photos (I have a baby now, so they _are_ mission critical :) ). - Storing my ca. 400 CDs that were carefully ripped and metadata'ed, which took a lot of work. - Providing fast and reliable storage for my PVR. - And of course all the rough stuff that happens to laptops on the road. - ZFS has already saved me from bit rot once. I could see that it fixed a bad block during a weekly scrub. What a great feeling to know that your data is much safer than it was before and to be able to see how and when it is being protected! It is kinda weird to talk to customers about adopting ZFS while knowing that my family pictures at home are probably stored safer than their company data... - ZFS enabled me to just take a bunch of differently sized drives that have been lying around somewhere and turn them into an easy to manage, consistent and redundant pool of storage that effortlessly handles very diverse workloads (File server, audio streaming, video streaming). - During the frequent migrations (Couldn't make up my mind first on how to slice and dice my 4 disks), zfs send/receive has been my best friend. It enabled me to painlessly migrate whole filesystems between pools in minutes. I'm now writing a script to further automate recursive and updating zfs send/receive orgies for backups and other purposes. - Disk storage is cheap, and thanks to ZFS it became reliable at zero cost. Therefore, I can snapshot a lot, not think about whether to delete stuff or not, or simply delete stuff I don't need know, while knowing it is still preserved in my snapshots. - As a result of all of this, I learned a great deal about Solaris 10 and it's other features, which is a big help in my day-to-day job. I know there's still a lot to do and that we're still working on some bugs, but I can safely say that ZFS is the best thing that happened to my data so far. So here's a big THANK YOU! to the ZFS team for making all of this and more possible for my little home system. Down the road, I've now migrated my pools to external mirrored USB disks (mirrored because it's fast and lowers complexity; USB, because it's pluggable and host-independent) and I'm thinking of how to backup them (I realize I still need a backup) onto other external disks or preferably another system. Again, zfs send/receive will be my friend here. ZFS boot on my home server is the other next big thing, enabling me to mirror my root file system more reliably than SVM can while saving space for live upgrade and enabling other cool stuff. I'm also thinking of using iSCSI zvols as Mac OS X storage for audio/video editing and whole-disk backups, but that requires some waiting until the Mac OS X iSCSI support has matured a bit. And then I can start to really archive stuff: Older backups that sit on CDs and are threatened by CD-rot, old photo CDs that have been sitting there and hopefully haven't begun to rot yet, maybe scan in some older photos, migrating my CD collection to a lossless format, etc. This sounds like I've been drinking too much koolaid, and I've probably have, but I guess all the above points remain valid even if I didn't work for Sun. So please take this email as being written by a private ZFS user and not a Sun employee. So, again, thank you so much ZFS team and keep up the good work! Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive question
Hi, Krzys wrote: > Ok, so -F option is not in U3, is there any way to replicate file system > and not be able to mount it automatically? so when I do zfs send/receive > it wont be mounted and changes would not be made so that further > replications could be possible? What I did notice was that if I am doing > zfs send/receive right one after another I am able to replicate all my > snaps, but when I wait a day or even few hours I get notice that file > system got changed, and that is because it was mounted and I guess > because of that I am not able to perform any more snaps to be send... > any idea what I could do meanwhile I am waiting for -F? this should work: zfs unmount pool/filesystem zfs rollback (latest snapshot) zfs send ... | zfs receive zfs mount pool/filesystem Better yet: Assuming you don't actually want to use the filesystem you replicate to, but just use it as a sink for backup purposes, you can mark it unmountable, then just send stuff to it. zfs set canmount=off pool/filesystem zfs rollback (latest snapshot, one last time) Then, whenever you want to access the receiving filesystem, clone it. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot: 3 smaller glitches with console, /etc/dfs/sharetab and /dev/random
Hi, >> 2. After going through the zfs-bootification, Solaris complains on >> reboot that >>/etc/dfs/sharetab is missing. Somehow this seems to have been >> fallen through >>the cracks of the find command. Well, touching /etc/dfs/sharetab >> just fixes >>the issue. > > This is unrelated to ZFS boot issues, and sounds like this bug: > > 6542481 No sharetab after BFU from snv_55 > > It's fixed in build 62. hmm, that doesn't fit what I saw: - Upgraded from snv_61 to snv_62 - snv_62 booted with not problems (other than the t_optmgmt bug) - Then migrated to ZFS boot - Now the sharetab issues shows up. So why did the sharetab issue only show up after the ZFSification of the boot process? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS boot: 3 smaller glitches with console, /etc/dfs/sharetab and /dev/random
Hi, I've now gone through both the opensolaris instructions: http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/ and Tim Foster's script: http://blogs.sun.com/timf/entry/zfs_bootable_datasets_happily_rumbling for making my laptop ZFS bootable. Both work well and here's a big THANK YOU to the ZFS boot team! There seem to be 3 smaller glitches with these approaches: 1. The instructions on opensolaris.org assume that one wants console output to show up in /dev/tty. This may be true for a server, but it isn't for a laptop or workstation user. Therefore, I suggest someone explains them to be optional as not everybody knows that these can be left out. 2. After going through the zfs-bootification, Solaris complains on reboot that /etc/dfs/sharetab is missing. Somehow this seems to have been fallen through the cracks of the find command. Well, touching /etc/dfs/sharetab just fixes the issue. 3. But here's a more serious one: While booting, Solaris complains: Apr 19 15:00:37 foeni kcf: [ID 415456 kern.warning] WARNING: No randomness provider enabled for /dev/random. Use cryptoadm(1M) to enable a provider. Somehow, /dev/random and/or it's counterpart in /devices seems to have suffered from the migration procedure. Does anybody know how to fix the /dev/random issue? I'm not very fluent in cryptoadm(1M) and some superficial reading of it's manpage did not enlighten me too much (cryptoadm list -p claims all is well...). Best regards and again, congratulations to the ZFS boot team! Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Who modified my ZFS receive destination?
Hi Trev, Trevor Watson wrote: > Hi Constantin, > > I had the same problem, and the solution was to make sure that the > filesystem is not mounted on the destination system when you perform the > zfs recv (zfs set mountpoint=none santiago/home). thanks! This time it worked: # zfs unmount santiago/home/constant # zfs rollback santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive santiago/home/constant # Still, this is kinda strange. This means, that we'll need to zfs unmount, then zfs rollback a lot when doing send/receive on a regular basis (as in weekly, daily, hourly, minutely cron-jobs) to be sure. Or keep any replicated filesystems unmounted _all_ the time. Best regards, Constantin > > Trev > > Constantin Gonzalez wrote: >> Hi, >> >> I'm currently migrating a filesystem from one pool to the other through >> a series of zfs send/receive commands in order to preserve all snapshots. >> >> But at some point, zfs receive says "cannot receive: destination has been >> modified since most recent snapshot". I am pretty sure nobody changed >> anything >> at my destination filesystem and I also tried rolling back to an earlier >> snapshot on the destination filesystem to make it clean again. >> >> Here's an excerpt of the snapshots on my source filesystem: >> >> # zfs list -rt snapshot pelotillehue/constant >> NAME >> USED AVAIL >> REFER MOUNTPOINT >> pelotillehue/[EMAIL PROTECTED] >> 236K - >> 33.6G - >> pelotillehue/[EMAIL PROTECTED] >> 747K - >> 46.0G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 >> 3.07G - >> 116G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 >> 18.9M - >> 115G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 >> 10.9M - >> 115G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00 >> 606M - >> 105G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01 >> 167M - >> 105G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 >> 5.31M - >> 105G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 >> 1.90M - >> 105G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 >> 1.26M - >> 105G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 >> 15.2M - >> 109G - >> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-15-00:00:00 >> 17.5M - >> 109G - >> >> ... (further lines omitted) >> >> >> On the destination filesystem, snapshots have been replicated through >> zfs send/receive up to the 2007-01-01 snapshot, so I do the following: >> >> # zfs send -i >> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 >> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 | zfs >> receive >> santiago/home/constant >> >> This worked, but now, only seconds later: >> >> # zfs send -i >> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 >> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs >> receive >> santiago/home/constant >> cannot receive: destination has been modified since most recent snapshot >> >> Fails. So I try rolling back to the 2007-01-08 snapshot on the >> destination >> filesystem to be clean again, but: >> >> # zfs rollback >> santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 >> # zfs send -i >> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 >> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs >> receive >> santiago/home/constant >> cannot receive: destination has been modified since most recent snapshot >> >> Hmm, why does ZFS think my destination has been modified, although I >> didn't >> do anything? >> >> Another peculiar thing: zfs list on the destination snapshots says: >> >> # zfs list -rt snapshot santiago/home/constant >> NAME >> USED AVAIL >> REFER MOUNTPOINT >> santiago/home/[EMAIL PROTECTED] >> 189K - >> 33.6G - >> santiago/home/[EMAIL PROTECTED]
Summary: [zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?
Hi, here's a quick summary of the answers I've seen so far: - Splitting mirrors is a current practice with traditional volume management. The goal is to quickly and effortlessly create a clone of a storage volume/pool. - Splitting mirrors with ZFS can be done, but it has to be done the hard way by resilvering, then unplugging the disk, then trying to import it somewhere else. zpool detach would render the detached disk unimportable. - Another, cleaner way of splitting a mirror would be to export the pool, then disconnect one drive, then re-import again. After that, the disconnected drive needs to be zpool detach'ed from the mother, while the clone can then be imported and its missing mirrors detached as well. But this involves unmounting the pool so it can't be done without downtime. - The supported alternative would be zfs snapshot, then zfs send/receive, but this introduces the complexity of snapshot management which makes it less simple, thus less appealing to the clone-addicted admin. - There's an RFE for supporting splitting mirrors: 5097228 http://bugs.opensolaris.org/view_bug.do?bug_id=5097228 IMHO, we should investigate if something like zpool clone would be useful. It could be implemented as a script that recursively snapshots the source pool, then zfs send/receives it to the destination pool, then copies all properties, but the actual reason why people do mirror splitting in the first place is because of its simplicity. A zpool clone or a zpool send/receive command would be even simpler and less error-prone than the tradition of splitting mirrors, plus it could be implemented more efficiently and more reliably than a script, thus bringing real additional value to administrators. Maybe zpool clone or zpool send/receive would be the better way of implementing 5097228 in the first place? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list [EMAIL PROTECTED] http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Who modified my ZFS receive destination?
Hi, I'm currently migrating a filesystem from one pool to the other through a series of zfs send/receive commands in order to preserve all snapshots. But at some point, zfs receive says "cannot receive: destination has been modified since most recent snapshot". I am pretty sure nobody changed anything at my destination filesystem and I also tried rolling back to an earlier snapshot on the destination filesystem to make it clean again. Here's an excerpt of the snapshots on my source filesystem: # zfs list -rt snapshot pelotillehue/constant NAME USED AVAIL REFER MOUNTPOINT pelotillehue/[EMAIL PROTECTED] 236K - 33.6G - pelotillehue/[EMAIL PROTECTED] 747K - 46.0G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 3.07G - 116G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 18.9M - 115G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 10.9M - 115G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00 606M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01 167M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 5.31M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 1.90M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 1.26M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 15.2M - 109G - pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-15-00:00:00 17.5M - 109G - ... (further lines omitted) On the destination filesystem, snapshots have been replicated through zfs send/receive up to the 2007-01-01 snapshot, so I do the following: # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 | zfs receive santiago/home/constant This worked, but now, only seconds later: # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive santiago/home/constant cannot receive: destination has been modified since most recent snapshot Fails. So I try rolling back to the 2007-01-08 snapshot on the destination filesystem to be clean again, but: # zfs rollback santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive santiago/home/constant cannot receive: destination has been modified since most recent snapshot Hmm, why does ZFS think my destination has been modified, although I didn't do anything? Another peculiar thing: zfs list on the destination snapshots says: # zfs list -rt snapshot santiago/home/constant NAMEUSED AVAIL REFER MOUNTPOINT santiago/home/[EMAIL PROTECTED] 189K - 33.6G - santiago/home/[EMAIL PROTECTED] 670K - 46.0G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 3.07G - 116G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 18.4M - 115G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 10.5M - 115G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00 603M - 105G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01 163M - 105G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 4.87M - 105G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 1.79M - 106G - santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 1.16M - 106G - santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:0057K - 109G - Note that the Used column for the 2007-01-08 snapshot says 57K on the destination, but 15.2M on the source. Could it be that the reception of the 2007-01-08 failed and ZFS didn't notice? I've tried this multiple times, including destroying snapshots and rolling back on the destination to the 2007-01-01 state, so what you see above is already a second try of the same. The other values vary too, but only slightly. Compression is turned on on both pools. The source pool has been scrubbed on Monday with no known data errors and the destination pool is brand new and I'm scrubbing it as we speak. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marce
Re: [zfs-discuss] Re: Poor man's backup by attaching/detaching mirror
Hi, >> How would you access the data on that device? > > Presumably, zpool import. yes. > This is basically what everyone does today with mirrors, isn't it? :-) sure. This may not be pretty, but it's what customers are doing all the time with regular mirrors, 'cause it's quick, easy and reliable. Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?
Hi Mark, Mark J Musante wrote: > On Tue, 10 Apr 2007, Constantin Gonzalez wrote: > >> Has anybody tried it yet with a striped mirror? What if the pool is >> composed out of two mirrors? Can I attach devices to both mirrors, let >> them resilver, then detach them and import the pool from those? > > You'd want to export them, not detach them. Detaching will overwrite the > vdev labels and make it un-importable. thank you for the export/import idea, it does sound cleaner from a ZFS perspective, but comes at the expense of temporarily unmounting the filesystems. So, instead of detaching, would unplugging, then detaching work? I'm thinking something like this: - zpool create tank mirror - {physically move to new box} - zpool detach tank On the new box: - zpool import tank - zpool detach tank - zpool detach tank This should work for one disk, and I assume this would also work for multiple disks? Thinking along similar lines, would it be a useful RFE to allow asynchronous mirroring like this: - , are both 250GB, is 500GB - zpool create tank mirror , This means that half of dev3 would mirror dev1, the other half would mirror dev2 and dev1,dev2 is a regular stripe. The utility of this would be for cases where customer have set up mirrors, then need to replace disks or upgrade the mirror after a long time, when bigger disks are easier to get than smaller ones and while reusing older disks. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs. rmvolmgr
Hi, sorry, I needed to be more clear: Here's what I did: 1. Connect USB storage device (a disk) to machine 2. Find USB device through rmformat 3. Try zpool create on that device. It fails with: can't open "/dev/rdsk/cNt0d0p0", device busy 4. svcadm disable rmvolmgr 5. Now zpool create works with that device and the pool gets created. 6. svcadm enable rmvolmgr 7. After that, everything works as expected, the device stays under control of the pool. >> can't open "/dev/rdsk/cNt0d0p0", device busy > > Do you remember exactly what command/operation resulted in this error? See above, it comes right after trying to create a zpool on that device. > It is something that tries to open device exclusively. So after ZFS opens the device exclusively, hald and rmvolmgr will ignore it? What happens at boot time, is zfs then quicker in grabbing the device than hald and rmvolmgr are? >> So far, I've just said svcadm disable -t rmvolmgr, did my thing, then >> said svcadm enable rmvolmgr. > > This can't possibly be true, because rmvolmgr does not open devices. Hmm. I really remember to have done the above. Actually, I've been pulling some hairs out trying to do zpools on external devices until I got the idea of diasbling the rmvolmgr, then it worked. > You'd need to also disable the 'hal' service. Run fuser on your device > and you'll see it's one of the hal addons that keeps it open: Perhaps something depended on rmvolmgr which release the device after I disabled the service? >> For instance, I'm now running several USB disks with ZFS pools on >> them, and >> even after restarting rmvolmgr or rebooting, ZFS, the disks and rmvolmgr >> get along with each other just fine. > > I'm confused here. In the beginning you said that something got in the > way, but now you're saying they get along just fine. Could you clarify. After creating the pool, the device now belongs to ZFS. Now, ZFS seems to be able to grab the device before anybody else. > One possible workaround would be to match against USB disk's serial > number and tell hal to ignore it using fdi(4) file. For instance, find > your USB disk in lshal(1M) output, it will look like this: > > udi = '/org/freedesktop/Hal/devices/pci_0_0/pci1028_12c_1d_7/storage_5_0' > usb_device.serial = 'DEF1061F7B62' (string) > usb_device.product_id = 26672 (0x6830) (int) > usb_device.vendor_id = 1204 (0x4b4) (int) > usb_device.vendor = 'Cypress Semiconductor' (string) > usb_device.product = 'USB2.0 Storage Device' (string) > info.bus = 'usb_device' (string) > info.solaris.driver = 'scsa2usb' (string) > solaris.devfs_path = '/[EMAIL PROTECTED],0/pci1028,[EMAIL > PROTECTED],7/[EMAIL PROTECTED]' (string) > > You want to match an object with this usb_device.serial property and set > info.ignore property to true. The fdi(4) would look like this: thanks, this sounds just like what I was looking for. So the correct way of having a zpool out of external USB drives is to: 1. Attach the drives 2. Find their USB serial numbers with lshal 3. Set up an fdi file that matches the disks and tells hal to ignore them The naming of the file /etc/hal/fdi/preprobe/30user/10-ignore-usb.fdi sounds like init.d style directory and file naming, ist this correct? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?
Hi, one quick&dirty way of backing up a pool that is a mirror of two devices is to zpool attach a third one, wait for the resilvering to finish, then zpool detach it again. The third device then can be used as a poor man's simple backup. Has anybody tried it yet with a striped mirror? What if the pool is composed out of two mirrors? Can I attach devices to both mirrors, let them resilver, then detach them and import the pool from those? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS vs. rmvolmgr
Hi, while playing around with ZFS and USB memory sticks or USB harddisks, rmvolmgr tends to get in the way, which results in a can't open "/dev/rdsk/cNt0d0p0", device busy error. So far, I've just said svcadm disable -t rmvolmgr, did my thing, then said svcadm enable rmvolmgr. Is there a more elegant approach that tells rmvolmgr to leave certain devices alone on a per disk basis? For instance, I'm now running several USB disks with ZFS pools on them, and even after restarting rmvolmgr or rebooting, ZFS, the disks and rmvolmgr get along with each other just fine. What and how does ZFS tell rmvolmgr that a particular set of disks belongs to ZFS and should not be treated as removable? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Hi, >>> - RAID-Z is _very_ slow when one disk is broken. >> Do you have data on this? The reconstruction should be relatively cheap >> especially when compared with the initial disk access. > > Also, what is your definition of "broken"? Does this mean the device > appears as FAULTED in the pool status, or that the drive is present and > not responding? If it's the latter, this will be fixed by my upcoming > FMA work. sorry, the _very_ may be exaggarated and depending much on the load of the system and the config. I'm referring to a couple of posts and anecdotal experience from colleagues. This means that indeed "slow" or "very slow" may be a mixture of reconstruction overhead and device timeout issue. So, it's nice to see that the upcoming FMA code will fix some of the slowness issues. Did anybody measure how much CPU overhead RAID-Z and RAID-Z2 parity computation induces, both for writes and for reads (assuming a data disk is broken)? This data would be useful when arguing for a "software RAID" scheme in front of hardware-RAID addicted customers. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Hi, Manoj Joseph wrote: > Can write-cache not be turned on manually as the user is sure that it is > only ZFS that is using the entire disk? yes it can be turned on. But I don't know if ZFS would then know about it. I'd still feel more comfortably with it being turned off unless ZFS itself does it. But maybe someone from the ZFS team can clarify this. Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Hi, > Now that zfsboot is becoming available, I'm wondering how to put it to > use. Imagine a system with 4 identical disks. Of course I'd like to use you lucky one :). > raidz, but zfsboot doesn't do raidz. What if I were to partition the > drives, such that I have 4 small partitions that make up a zfsboot > partition (4 way mirror), and the remainder of each drive becomes part > of a raidz? Sounds good. Performance will suffer a bit, as ZFS thinks it has two pools with 4 spindels each, but it should still perform better than the same on a UFS basis. You may also want to have two 2-way mirrors and keep the second for other purposes such as a scratch space for zfs migration or as spare disks for other stuff. > Do I still have the advantages of having the whole disk > 'owned' by zfs, even though it's split into two parts? I'm pretty sure that this is not the case: - ZFS has no guarantee that someone will do something else with that other partition, so it can't assume the right to turn on disk cache for the whole disk. - Yes, it could be smart and realize that it does have the whole disk, only split up across two pools, but then I assume that this is not your typical enterprise class configuration and so it probably didn't get implemented that way. I'd say that not being able to benefit from the disk drive's cache is not as bad in the face of ZFS' other advantages, so you can probably live with that. > Swap would probably have to go on a zvol - would that be best placed on > the n-way mirror, or on the raidz? I'd place it onto the mirror for performance reasons. Also, it feels cleaner to have all your OS stuff on one pool and all your user/app/data stuff on another. This is also recommended by the ZFS Best Practices Wiki on www.solarisinternals.com. Now back to the 4 disk RAID-Z: Does it have to be RAID-Z? Maybe you might want to reconsider using 2 2-way mirrors: - RAID-Z is slow when writing, you basically get only one disk's bandwidth. (Yes, with variable block sizes this might be slightly better...) - RAID-Z is _very_ slow when one disk is broken. - Using mirrors is more convenient for growing the pool: You run out of space, you add two disks, and get better performance too. No need to buy 4 extra disks for another RAID-Z set. - When using disks, you need to consider availability, performance and space. Of all the three, space is the cheapest. Therefore it's best to sacrifice space and you'll get better availability and better performance. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pathological ZFS performance
ut watching I/O throughput. > > For yet-another-fallback, I am thinking about using SATA-to-IDE > converters here: > > http://www.newegg.com/product/product.asp?item=N82E16812156010 > > It feels kind of nuts, but I have to think this would perform > better than what I have now. This would cost me the one SATA > drive I'm using now in a smaller pool. > > Rob T > _______ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating a pool
Hi Matt, cool, thank you for doing this! I'll still write my script since today my two shiny new 320GB USB disks will arrive :). I'll add to that the feature to first send all current snapshots, then bring down the services that depend on the filesystem, unmount the old fs, send a final incremental snapshot then zfs set mountpoint=x to the new filesystem, then bring up the services again. Hope this works as I imagine. Cheers, Constantin Matthew Ahrens wrote: > Constantin Gonzalez wrote: >> What is the most elegant way of migrating all filesystems to the new >> pool, >> including snapshots? >> >> Can I do a master snapshot of the whole pool, including >> sub-filesystems and >> their snapshots, then send/receive them to the new pool? >> >> Or do I have to write a script that will individually snapshot all >> filesystems >> within my old pool, then run a send (-i) orgy? > > Unfortunately, you will need to make/find a script to do the various > 'zfs send -i' to send each snapshot of each filesystem. > > I am working on 'zfs send -r', which will make this a snap: > > # zfs snapshot -r [EMAIL PROTECTED] > # zfs send -r [EMAIL PROTECTED] | zfs recv ... > > You'll also be able to do 'zfs send -r -i @yesterday [EMAIL PROTECTED]'. > > See RFE 6421958. > > --matt -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating a pool
Hi, > Today I have about half a dozen filesystems in the old pool plus dozens of > snapshots thanks to Tim Bray's excellent SMF snapshotting service. I'm sorry I mixed up Tim's last name. The fine guy who wrote the SMF snapshot service is Tim Foster. And here's the link: http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_8 There doesn't seem to be an easy answer to the original question of how to migrate a complete pool. Writing a script with a snapshot send/receive party seems to be the only approach. I wish I could zfs snapshot pool then zfs send pool | zfs receive dest and all blocks would be transferred as they are, including all embedded snapshots. Is that already an RFE? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Migrating a pool
Hi, soon it'll be time to migrate my patchwork pool onto a real pair of mirrored (albeit USB-based) external disks. Today I have about half a dozen filesystems in the old pool plus dozens of snapshots thanks to Tim Bray's excellent SMF snapshotting service. What is the most elegant way of migrating all filesystems to the new pool, including snapshots? Can I do a master snapshot of the whole pool, including sub-filesystems and their snapshots, then send/receive them to the new pool? Or do I have to write a script that will individually snapshot all filesystems within my old pool, then run a send (-i) orgy? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] CSI:Munich - How to save the world with ZFS and 12 USB Sticks
Hi, a few weeks ago, Richard Elling noticed our ZFS video: http://www.opensolaris.org/jive/thread.jspa?threadID=23220&tstart=120 Finally, we got the english version done. Many thanks to Marc Baumann, our brave video editor for making this possible. Here's the video and some comments, both in english: http://blogs.sun.com/constantin/entry/csi_munich_how_to_save The video alone is available here: http://video.google.com/videoplay?docid=8100808442979626078 Please forgive the occasional soundless lip-movements, it turns out that the english language has less redundancy than the german one :). Have fun, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2-way mirror or RAIDZ?
Hi, > I have a shiny new Ultra 40 running S10U3 with 2 x 250Gb disks. congratulations, this is a great machine! > I want to make best use of the available disk space and have some level > of redundancy without impacting performance too much. > > What I am trying to figure out is: would it be better to have a simple > mirror of an identical 200Gb slice from each disk or split each disk > into 2 x 80Gb slices plus one extra 80Gb slice on one of the disks to > make a 4 + 1 RAIDZ configuration? you probably want to mirror the OS slice of the disk to protect your OS and its configuration from the loss of a whole disk. Do it with SVM today and upgrade to a bootable ZFS mirror in the future. The OS slice needs only to be 5GB in size if you follow the standard recommendation, but 10 GB is probably a safe and easy to remember bet, leaving you some extra space for apps etc. Plan to be able to live upgrade into new OS versions. You may break up the mirror to do so, but this is kinda complicated and error-prone. Disk space is cheap, so I'd rather recommend you safe two slices per disk for creating 2 mirrored boot environments where you can LU back and forth. For swap, allocate an extra slice per disk and of course mirror swap too. 1GB swap should be sufficient. Now, you can use the rest for ZFS. Having only two physical disks, there is no good reason to do something other than mirroring. If you created 4+1 slices for RAID-Z, you would always lose the whole pool if one disk broke. Not good. You could play russian roulette by having 2+3 slices and RAID-Z2 and hoping that the right disk fails, but that isn't s good practice either and it wouldn't buy you any redundant space either, just leave an extra unprotected scratch slice. So, go for the mirror, it gives you good performance and less headaches. If you can spare the money, try increasing the number of disks. You'd still need to mirror boot and swap slices, but then you would be able to use a real RAID-Z config for the rest, enabling to leverage more disk capacity at a good redundancy/performance compromise. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)
Hi, Artem: Thanks. And yes, Peter S. is a great actor! Christian Mueller wrote: > who is peter stormare? (sorry, i'm from old europe...) as usual, Wikipedia knows it: http://en.wikipedia.org/wiki/Peter_Stormare and he's european too :). Great actor, great movies. I particularly like Constantine, not just because of the name, of course :) Out budget is quite limited at the moment, but after the 1.000.000th view on YouTube/Google Video we might want to reconsider our cast for the next episode :). But first, we need to get the english version finished... Cheers, Constantin > > thx & bye > christian > > Artem Kachitchkine schrieb: >> >>> Brilliant video, guys. >> >> Totally agreed, great work. >> >> Boy, would I like to see Peter Stormare in that video %) >> >> -Artem. > -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)
Hi Richard, Richard Elling wrote: > FYI, > here is an interesting blog on using ZFS with a dozen USB drives from > Constantin. > http://blogs.sun.com/solarium/entry/solaris_zfs_auf_12_usb thank you for spotting it :). We're working on translating the video (hope we get the lip-syncing right...) and will then re-release it in an english version. BTW, we've now hosted the video on YouTube so it can be embedded in the blog. Of course, I'll then write an english version of the blog entry with the tech details. Please hang on for a week or two... :). Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
Hi, I need to be a little bit more precise in how I formulate comments: 1. Yes, zpool remove is a desirable feature, no doubt about that. 2. Most of the cases where customers ask for "zpool remove" can be solved with zfs send/receive or with zpool replace. Think Pareto's 80-20 rule. 2a. The cost of doing 2., including extra scratch storage space or scheduling related work into planned downtimes is smaller than the cost of not using ZFS at all. 2b. Even in the remaining 20% of cases (figuratively speaking, YMMV) where zpool remove would be the only solution, I feel that the cost of sacrificing the extra storage space that would have become available through "zpool remove" is smaller than the cost of the project not benefitting from the rest of ZFS' features. 3. Bottom line: Everybody wants zpool remove as early as possible, but IMHO this is not an objective barrier to entry for ZFS. Note my use of the word "objective". I do feel that we have to implement zpool remove for subjective reasons, but that is a non technical matter. Is this an agreeable summary of the situation? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: can I use zfs on just a partition?
Hi, > When you do the initial install, how do you do the slicing? > > Just create like: > / 10G > swap 2G > /altroot 10G > /zfs restofdisk yes. > Or do you just create the first three slices and leave the rest of the > disk untouched? I understand the concept at this point, just trying to > explain to a third party exactly what they need to do to prep the system > disk for me :) No. You need to be able to tell ZFS what to use. Hence, if your pool is created at the slice level, you need to create a slice for it. So the above is the way to go. And yes, you only should do this on laptos and other machines where you only have 1 disk or are otherwise very disk-limited :). Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: How much do we really want zpool remove?
Hi, I do agree that zpool remove is a _very_ desirable feature, not doubt about that. Here are a couple of thoughts and workarounds, in random order, that might give us some more perspective: - My home machine has 4 disks and a big zpool across them. Fine. But what if a controller fails or worse, a CPU? Right, I need a second machine, if I'm really honest with myself and serious with my data. Don't laugh, ZFS on a Solaris server is becoming my mission-critical home storage solution that is supposed to last beyond CDs and DVDs and other vulnerable media. So, if I was an enterprise, I'd be willing to keep enough empty LUNs available to facilitate at least the migration of one or more filesystems if not complete pools. With a little bit of scripting, this can be done quite easily and efficiently through zfs send/receive and some LUN juggling. If I was an enterprise's server admin and the storage guys wouldn't have enough space for migrations, I'd be worried. - We need to avoid customers thinking "Veritas can shrink, ZFS can't". That is wrong. ZFS _filesystems_ grow and shrink all the time, it's just the pools below them that can just grow. And Veritas does not even have pools. People have started to follow a One-pool-to-store-them-all which I think is not always appropriate. Some alternatives: - One pool per zone might be a good idea if you want to migrate zones across systems which then becomes easy through zpool export/import in a SAN. - One pool per service level (mirror, RAID-Z2, fast, slow, cheap, expensive) might be another idea. Keep some cheap mirrored storage handy for your pool migration needs and you could wiggle your life around zpool remove. Switching between Mirror, RAID-Z, RAID-Z2 then becomes just a zfs send/receive pair. Shrinking a pool requires some more zfs send/receiving and maybe some scripting, but these are IMHO less painful than living without ZFS' data integrity and the other joys of ZFS. Sorry if I'm stating the obvious or stuff that has been discussed before, but the more I think about zpool remove, the more I think it's a question of willingness to plan/work/script/provision vs. a real show stopper. Best regards, Constantin P.S.: Now with my big mouth I hope I'll survive a customer confcall next week with a customer asking for exactly zpool remove :). -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: can I use zfs on just a partition?
Hi Tim, > Essentially I'd like to have the / and swap on the first 60GB of the disk. > Then use the remaining 100GB as a zfs partition to setup zones on. Obviously > the snapshots are extremely useful in such a setup :) > > Does my plan sound feasible from both a usability and performance standpoint? yes, it works, I do it on my laptop all the time: # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0d0 /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 Specify disk (enter its number): 0 selecting c0d0 Controller working list found [disk formatted, defect list found] Warning: Current Disk has mounted partitions. /dev/dsk/c0d0s0 is currently mounted on /. Please see umount(1M). /dev/dsk/c0d0s1 is currently used by swap. Please see swap(1M). /dev/dsk/c0d0s3 is part of active ZFS pool poolchen. Please see zpool(1M). /dev/dsk/c0d0s4 is in use for live upgrade /. Please see ludelete(1M). c0d0s5 is also free and can be used as a third live upgrade partition. My recommendation: Use at least 2 slices for the OS so you can enjoy live upgrade, one for swap and the rest for ZFS. Performance-wise, this is of course not optimal, but perfectly feasible. I have an Acer Ferrari 4000 which is known to have a slow disk, but it still works great for what I do (email, web, Solaris demos, presentations, occasional video). More complicated things are possible as well. The following blog entry: http://blogs.sun.com/solarium/entry/tetris_spielen_mit_zfs (sorry it's german) ilustrates how my 4 disks at home are sliced in order to get OS partitions on multiple disks, Swap and as much ZFS space as possible at acceptable redundancy despite differently-sized disks. Check out the graphic in the above entry to see what I mean. Works great (but I had to use -f to zpool create :) ) and gives me enough performance for all my home-serving needs. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] poor NFS/ZFS performance
Hi Roch, thanks, now I better understand the issue :). > Nope. NFS is slow for single threaded tar extract. The > conservative approach of NFS is needed with the NFS protocol > in order to ensure client's side data integrity. Nothing ZFS > related. ... > NFS is plenty fast in a throughput context (not that it does > not need work). The complaints we have here are about single > threaded code. ok, then it's "just" a single thread client latency of request issue, which (as increasingly often) software vendors need to realize. The proper way to deal with this, then, is to multi-thread on the application layer. Reminds my of many UltraSPARC T1 issues, which don't sit in hardware nor OS, but in the way applications have been developed for years :). Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] poor NFS/ZFS performance
Hi, I haven't followed all the details in this discussion, but it seems to me that it all breaks down to: - NFS on ZFS is slow due to NFS being very conservative when sending ACK to clients only after writes have definitely committed to disk. - Therefore, the problem is not that much ZFS specific, it's just a conscious focus on data correctness vs. speed on ZFS/NFS' part. - Currently known workarounds include: - Sacrifice correctness for speed by disabling ZIL or using a less conservative network file system. - Optimize NFS/ZFS to get as much speed as possible within the constraints of the NFS protocol. But one aspect I haven't seen so far is: How can we optimize ZFS on a more hardware oriented level to both achieve good NFS speeds and still preserve the NFS level of correctness? One possibility might be to give the ZFS pool enough spindles so it can comfortably handle many small IOs fast enough for them not to become NFS commit bottlenecks. This may require some tweaking on the ZFS side so it doesn't queue up write IOs for too long as to not delay commits more than necessary. Has anyone investigated this branch or am I too simplistic in my view of the underlying root of the problem? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Newbie questions about drive problems
Hi, > I have 3 drives. > The first one will be the primary/boot drive under UFS. The 2 others will > become a mirrored pool with ZFS. > Now, I have problem with the boot drive (hardware or software), so all the > data on my mirrored pool are ok? > How can I restore this pool? When I create the pool, do I need to save the > properties? All metadata for the pool is stored inside the pool. If the boot disk fails in any way, all pool data is safe. Worst case might be that you have to reinstall everything on the boot disk. After that, you just say "zfs import" to get your pool back and everything will be ok. > What happend when a drive crash when ZFS write some data on a raidz pool? If the crash occurs in the middle of a write operation, then the new data blocks will not be valid. ZFS will then revert back to the state before writing the new set of blocks. Therefore you'll have 100% data integrity but of course the new blocks that were written to the pool will be lost. > Do the pool go to the degraded state or faulted state? No, the pool will come up as online. The degraded state is only for devices that aren't accessible any more and the faulted state is for pools that do not have enough valid devices to be complete. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Encryption on ZFS / Disk Usage
Hi, Thomas Deutsch wrote: > Hi > > I'm thinking about to change from Linux/Softwareraid to > OpenSolaris/ZFS. During this, I've got some (probably stupid) > questions: don't worry, there are no stupid questions :). > 1. Is ZFS able to encrypt all the data? If yes: How safe is this > encryption? I'm currently using dm-crypt on linux for doing this. Encryption for ZFS is a planned feature, but not available now. See: http://www.opensolaris.org/os/project/zfs-crypto/ Another project is an encrypted loopback device called xlofi which can be used on top of ZFS: http://www.opensolaris.org/os/community/security/projects/xlofi/ I understand that both approaches are independent of the encryption mechanism, so one would be free to choose a suitably safe cypher that is supported by the Solaris Cryptographic Framework. > 2. How big is the usable diskspace? I know that a rai5 is using the > space of one disk for parity informations. A raid5 with four disk of > 300GB has 900GB Space. How is it with ZFS? How much space do I have in > this case? ZFS' RAIDZ1 uses one parity disk per RAIDZ set, similarly to RAID-5. ZFS' RAIDZ2 uses two parity disks per RAIDZ set. So, the amount of usable space can be computed by number of disks per RAIDZ set minus 1 or 2 depending on the algorithm times the minimum capacity per disk. Same calculation as with traditional RAID. But there are advantages for RAIDZ over traditional RAID-5: - No RAID-5 write hole. - Better performance through serialization of write requests. - Better performance through eliminating the need for read-modify-write. - Better data integrity through end-to-end checksums. - Faster re-syncing of replaced disks since you only need to recreate used blocks. - Compression can be easily switched on for some extra space depending of the nature of the data. See also: http://blogs.sun.com/roller/page/bonwick?entry=raid_z Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Load-balancing over vdevs vs. real disks?
Hi Eric, >> This means that we have one pool with 3 vdevs that access up to 3 >> different >> sliced on the same physical disk. minor correction: 1 pool, 3 vdevs, 3 slices per disk on 4 disks. >> Question: Does ZFS consider the underlying physical disks when >> load-balancing >> or does it only load-balance across vdevs thereby potentially overloading >> physical disks with up to 3 parallel requests per physical disk at once? > > ZFS only does dynamic striping across the (top-level) vdevs. > > I understand why you setup your pool that way, but ZFS really likes > whole disks instead of slices. ok, understood. When I run out of storage, I'll try to get 4 cheap SATA drives of equal size and migrate all over. > Trying to interpret that the devices are really slices and part of other > vdevds seems overly complicated for the gain achieved. So what data does ZFS base it's dynamic stripig on? Does it count IOPs per vdev or does it try to sense the load on the vdevs by measuring, say response times, queue leghts etc.? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Load-balancing over vdevs vs. real disks?
Hi, my ZFS pool for my home server is a bit unusual: pool: pelotillehue state: ONLINE scrub: scrub completed with 0 errors on Mon Aug 21 06:10:13 2006 config: NAMESTATE READ WRITE CKSUM pelotillehue ONLINE 0 0 0 mirrorONLINE 0 0 0 c0d1s5 ONLINE 0 0 0 c1d0s5 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0d0s3 ONLINE 0 0 0 c0d1s3 ONLINE 0 0 0 c1d0s3 ONLINE 0 0 0 c1d1s3 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0d1s4 ONLINE 0 0 0 c1d0s4 ONLINE 0 0 0 c1d1s4 ONLINE 0 0 0 The reason is simple: I have 4 differently-sized disks (80, 80, 200, 250 GB. It's a home server and so I crammed whatever I could find elswhere into that box :) ) and my goal was to create the biggest pool possible but retaining some level of redundancy. The above config therefore groups the biggest slices that can be created on all four disks into the 4-disk RAID-Z vdev, then the biggest slices that can be created on 3 disks into the 3-disk RAID-Z, then two large slices remain which are mirrored. It's like playing Tetris with disk slices... But the pool can tolerate 1 broken disk and it gave me maximum storage capacity, so be it. This means that we have one pool with 3 vdevs that access up to 3 different sliced on the same physical disk. Question: Does ZFS consider the underlying physical disks when load-balancing or does it only load-balance across vdevs thereby potentially overloading physical disks with up to 3 parallel requests per physical disk at once? I'm pretty sure ZFS is very intelligent and will do the right thing, but a confirmation would be nice here. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Home Server with ZFS
Hi, >> What i dont know is what happens if the boot disk dies? can i replace >> is, install solaris again and get it to see the zfs mirror? > > As I understand it, this be possible, but I haven't tried it and I'm > not an expert Solaris admin. Some ZFS info is stored in a persistent > file on your system disk, and you may have to do a little dance to get > around that. It's worth researching and practicing in advance :-). IIRC, then ZFS has all relevant information stored inside the pool. So you should be able to install a new OS into the replacement disk, then say "zpool import" (possibly with -d and the devices where the mirror lives) to re-import the pool. But I haven't really tried it myself :). All in all, ZFS is an excellent choice for a home server. I use ZFS as a video storage for a digital set top box (quotas are really handy here), as a storage for my music collection, as a backup storage for important data (including photos), etc. I'm currently juggling around 4 differently sized disks into a new config with the goal of getting as much storage as possible out of them at a minimum level of redundance. Interesting, Teris-like calculation exercise that I'd be happy to blog about when I'm done. Feel free to visit my blog for how to set up your home server as a ZFS iTunes streaming server :). Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposal: user-defined properties
Hi Eric, this is a great proposal and I'm sure this is going to help administrators a lot. One small question below: > Any property which contains a colon (':') is defined as a 'user > property'. The name can contain alphanumeric characters, plus the > following special characters: ':', '-', '.', '_'. User properties are > always strings, and are always inherited. No additional validation is > done on the contents. Properties are set and retrieved through the > standard mechanisms: 'zfs set', 'zfs get', and 'zfs inherit'. > # zfs list -o name,local:department > NAME LOCAL:DEPARTMENT > test 12345 > test/foo 12345 > # zfs set local:department=67890 test/foo > # zfs inherit local:department test > # zfs get -s local -r all test > NAME PROPERTY VALUE SOURCE > test/foo local:department 12345 local > # zfs list -o name,local:department > NAME LOCAL:DEPARTMENT > test - > test/foo 12345 the example suggests that properties may be case-insensitive. Is that the case (sorry for the pun)? If so, that should be noted in the user defined property definition just for clarity. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Enabling compression/encryption on a populated filesystem
Hi, there might be value in a "zpool scrub -r (as in "re-write blocks") other than the prior discussion on encryption and compression. For instance, a bit that is just about to rot might not be detected with a regular zfs scrub but it would be rewritten with a re-writing scrub. It would also exercise the writing "muscles" on disks that don't see a lot of writing, such as archives or system disks, thereby detecting any degradation that affects writing of data. Of course the re-writing must be 100% safe, but that can be done with COW quite easily. Then, admins would for instance run a "zpool scrub" every week and maybe a "zpool scrub -r" every month or so. Just my 2 cents, Constantin Luke Scharf wrote: > Darren J Moffat wrote: >> Buth the reason thing is how do you tell the admin "its done now the >> filesystem is safe". With compression you don't generally care if >> some old stuff didn't compress (and with the current implementation it >> has to compress a certain amount or it gets written uncompressed >> anyway). With encryption the human admin really needs to be told. > As a sysadmin, I'd be happy with another scrub-type command. Something > with the following meaning: > > "Reapply all block-level properties such as compression, encryption, > and checksum to every block in the volume. Have the admin come back > tomorrow and run 'zpool status' too see if it's zone." > > Mad props if I can do this on a live filesystem (like the other ZFS > commands, which also get mad props for being good tools). > > A natural command for this would be something like "zfs blockscrub > tank/volume". Also, "zpool blockscrub tank" would make sense to me as > well, even though it might touch more data. > > Of course, it's easy for me to just say this, since I'm not thinking > about the implementation very deeply... > > -Luke > > > ---- > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [raidz] file not removed: No space left on device
Hi Eric, Eric Schrock wrote: > You don't need to grow the pool. You should always be able truncate the > file without consuming more space, provided you don't have snapshots. > Mark has a set of fixes in testing which do a much better job of > estimating space, allowing us to always unlink files in full pools > (provided there are no snapshots, of course). This provides much more > logical behavior by reserving some extra slop. is this a planned and not yet implemented functionality or why did Tatjana see the "not able to rm" behaviour? Or should she use unlink(1M) in these cases? Best regards, Constantin > > - Eric > > On Mon, Jul 03, 2006 at 02:23:06PM +0200, Constantin Gonzalez wrote: >> Hi, >> >> of course, the reason for this is the copy-on-write approach: ZFS has >> to write new blocks first before the modification of the FS structure >> can reflect the state with the deleted blocks removed. >> >> The only way out of this is of course to grow the pool. Once ZFS learns >> how to free up vdevs this may become a better solution because you can then >> shrink the pool again after the rming. >> >> I expect many customers to run into similar problems and I've already gotten >> a number of "what if the pool is full" questions. My answer has always been >> "No file system should be used up more than 90% for a number of reasons", but >> in practice this is hard to ensure. >> >> Perhaps this is a good opportunity for an RFE: ZFS should reserve enough >> blocks in a pool in order to always be able to rm and destroy stuff. >> >> Best regards, >>Constantin >> >> P.S.: Most US Sun employees are on vacation this week, so don't be alarmed >> if the really good answers take some time :). > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [raidz] file not removed: No space left on device
Hi, of course, the reason for this is the copy-on-write approach: ZFS has to write new blocks first before the modification of the FS structure can reflect the state with the deleted blocks removed. The only way out of this is of course to grow the pool. Once ZFS learns how to free up vdevs this may become a better solution because you can then shrink the pool again after the rming. I expect many customers to run into similar problems and I've already gotten a number of "what if the pool is full" questions. My answer has always been "No file system should be used up more than 90% for a number of reasons", but in practice this is hard to ensure. Perhaps this is a good opportunity for an RFE: ZFS should reserve enough blocks in a pool in order to always be able to rm and destroy stuff. Best regards, Constantin P.S.: Most US Sun employees are on vacation this week, so don't be alarmed if the really good answers take some time :). Tatjana S Heuser wrote: > On a system still running nv_30, I've a small RaidZ filled to the brim: > > 2 3 [EMAIL PROTECTED] pts/9 ~ 78# uname -a > SunOS mir 5.11 snv_30 sun4u sparc SUNW,UltraAX-MP > > 0 3 [EMAIL PROTECTED] pts/9 ~ 50# zfs list > NAME USED AVAIL REFER MOUNTPOINT > mirpool1 33.6G 0 137K /mirpool1 > mirpool1/home 12.3G 0 12.3G /export/home > mirpool1/install 12.9G 0 12.9G /export/install > mirpool1/local1.86G 0 1.86G /usr/local > mirpool1/opt 4.76G 0 4.76G /opt > mirpool1/sfw 752M 0 752M /usr/sfw > > Trying to free some space is meeting a lot of reluctance, though: > 0 3 [EMAIL PROTECTED] pts/9 ~ 51# rm debug.log > rm: debug.log not removed: No space left on device > 0 3 [EMAIL PROTECTED] pts/9 ~ 55# rm -f debug.log > 2 3 [EMAIL PROTECTED] pts/9 ~ 56# ls -l debug.log > -rw-r--r-- 1 th12242027048 Jun 29 23:24 debug.log > 0 3 [EMAIL PROTECTED] pts/9 ~ 58# :> debug.log > debug.log: No space left on device. > 0 3 [EMAIL PROTECTED] pts/9 ~ 63# ls -l debug.log > -rw-r--r-- 1 th12242027048 Jun 29 23:24 debug.log > > There are no snapshots, so removing/clearing the files /should/ > be a way to free some space there. > > Of course this is the same filesystem where zdb dumps core > - see: > > *Synopsis*: zdb dumps core - bad checksum > http://bt2ws.central.sun.com/CrPrint?id=6437157 > *Change Request ID*: 6437157 > > (zpool reports the RaidZ pool as healthy while > zdb crashes with a 'bad checksum' message.) > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] add_install_client and ZFS and SMF incompatibility
Hi, I just set up an install server on my notebook and of course all the installer data is on a ZFS volume. I love the "zfs compression=on" command! It seems that the standard ./add_install_client script from the S10U2 Tools directory creates an entry in /etc/vfstab for a loopback mount of the Solaris miniroot into the /tftpboot directory. Unfortunately, at boot time (I'm using Nevada build 39), the mount_all script tries to mount the loopback mount from /vfstab before ZFS gets its filesystems mounted. So the SMF filesystem/local method fails and I have to either mount all ZFS filesystems from hand, then re-run mount_all or replace the vfstab entry with a simple symlink. Which only works until you say add_install_client the next time. Is this a known issue? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and Flash archives
Hi, I'm currently setting up a demo machine. It would be nice to set up everything the way I like it, including a number of ZFS filesystems, then create a flash archive, then install from that archive. Will there be any issues with webstart flash and ZFS? Does flar create need to be ZFS aware and if so, is it ZFS aware in S10u2b09a? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] user undo
Hi, so we have two questions: 1. Is it really ZFS' job to provide an undo functionality? 2. If it turns out to be a feature that needs to be implemented by ZFS, what is the better approach: Snapshot based or file-based? My personal opinion on 1) is: - The purpose of any Undo-like action is to provide a safety net to the user in case she commits an error that she wants to undo. - So, it depends on how we define "user" here. If by user we mean your regular file system user with a GUI etc., then of course it's a matter of the application. - But if user=sysadmin, I guess a more fundamental way of implementing "undo" is in order. We could either restrict the undo functionality to some admin interface and force admins to use just that, then it would still be a feature that the admin interface needs to implement. But in order to save all admins from shooting themselves into their knees, the best way would be to provide an admin-savvy safety net. - Now, coming from the other side, ZFS provides a nice and elegant way of implementing snapshots. That's where I count 1+1: If ZFS knew how to do snapshots right before any significant administrator or user action and if ZFS had a way of managing those snapshots so admins and users could easily undo any action (including zfs destroy, zpool destroy, or just rm -rf /*), then the benefit/investment ratio for implementing such a feature should be extremely interesting. One more step towards a truly foolproof filesystem. But: If it turns out that providing an undo function via snapshots is not possible/elegantly feasible/cheap or if there's any significant roadblock that prevents ZFS from providing an undo feature in an elegant way, then it might not be a good idea after all and we should just forget it. So I guess it boils down to: Can the ZFS framework be used to implement an undo feature much more elegantly than your classic filemanager while extending the range of undo customers to even the CLI based admin? Best regards, Constantin Erik Trimble wrote: Once again, I hate to be a harpy on this one, but are we really convinced that having a "undo" (I'm going to call is RecycleBin from now on) function for file deletion built into ZFS is a good thing? Since I've seen nothing to the contrary, I'm assuming that we're doing this by changing the actual effects of an "unlink(2)" sys lib call against a file in ZFS, and having some other library call added to take care of actual deletion. Even with it being a ZFS option parameter, I can see s many places that it breaks assumptions and causes problems that I can't think it's a good thing to blindly turn on for everything. And, I've still not seen a good rebuttal to the idea of moving this up to the Application level, and using a new library to implements the functionality (and requires Apps to specifically (and explicitly) support RecycleBin in the design). You will notice that Windows does this. The Recycle Bin is usable from within Windows Explorer, but if you use "del" from a command prompt, it actually deletes the file. I see no reason why we shouldn't support the same functionality (i.e. RecycleBin from within Nautilus (as it already does), and true deletion via "rm"). -Erik -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Backup/Restore of ZFS Properties
Hi, Yes, a trivial wrapper could: 1. Store all property values in a file in the fs 2. zfs send... 3. zfs receive... 4. Set all the properties stored in that file IMHO 3. and 4. need to be swapped - otherwise e.g. files will not be compressed when restored. hmm, I assumed that the ZFS stream format would take the blocks as they are (compressed) and then restore them in a 1:1 fashion (compressed) no matter what the target fs' compression setting is. Then, the missing compression attribute would only affect new files, while old files are still compressed (just like ZFS doesn't unpack everything if you just turn off compression). Can anybody clarify? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A .zfs/info file
Hi, Darren J Moffat wrote: Over coffee with a colleague (cc'd) we were talking about the problem of taking advantage of ZFS over NFS (or CIFS) from a non Solaris machine. We already have the .zfs/snapshot dir and this is great. One of the other areas was knowing the settings on your data set are. So enter .zfs/info which would be an ascii representation of the information from `zfs get all`. I can see some problems with this, and it reminds me a little too much of what happened to /proc on Linux and so a bit uncomfortable about suggesting. I share the uncomfort with the /proc analogy. But Wes' scripting approach seems to be just fine for me. The timestamping would communicate the SLA of "just a script" versus the magically hacked nature of pseudo-files. But being able to poll data out of ZFS over NFS is probably just a minor issue. In germany, we say: "Give 'em the little finger an they'll want the whole hand". So, I assume the next thing a ZFS-over-NFS user would want is to change stuff over NFS which would then become difficult. Resisting the pseudo-files-as-a- broken-API-for-changing-stuff urge might then be even more appropriate. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] user undo
Hi, the current discussion on how to implement "undo" seems to circulate around concepts and tweaks for replacing any "rm" like action with "mv" and then fix the problems associated with namespaces, ACLs etc. Why not use snapshots? A snapshot-oriented implementation of undo would: - Create a snapshot of the FS whenever anything is attempted that someone might want to undo. This could be done even at the most fundamental level (i.e. before any "zpool" or "zfs" command, where the potential damage to be undone is biggest). - The undo-feature would then exchange the live FS with the snapshot taken prior to the revoked action. Just tweak one or two pointers and the undo is done. - This would transparently work with any app, user action, even admin action, depending on where the snapshotting code would be hooked up to. - As an alternative to undo, the user can browse the .zfs hierarchy in search of that small file which got lost in an rm -rf orgy without having to restore the snapshot with all the other unwanted files. - When ZFS wants to reclaim blocks, it would start deleting the oldest undo-snapshots. - To separate undo-snapshots from user-triggered ones, the undo-code could place its snapshots in .zfs/snapshots/undo . Did I miss something why undo can't be implemented with snapshots? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Snapshot management proposal idea
Hi Tim, thank you for your comments. Bringing in SMF is an excellent idea and should make what admins like to do much more elegant. I guess the question here is to find out: - What degree of canned functionality is needed to address 80% of every admin's need. - Who should provide the functionality: The ZFS core team, a community of people (inside OpenSolaris or outside) or some person/group who take this as their personal project and maybe publishes it. We're probably more than half way through the first. Maybe it's time to create a project under the opensolaris community to nail down the second and start thinking of what functionality is really needed to be inside ZFS to make our life with integrating SMF support for managing snapshots easier. I'm not sure user-defined properties in ZFS is the solution. Depending on the apps, scripts and users using such properties, pools, filesystems and snapshots can easily be polluted with lots of properties that someone thought might be useful. Perhaps it may be more beneficial to go through the process of discussing and deciding which set of properties would make snapshot management much easier, then go with it. Best regards, Constantin Tim Foster wrote: Hey Constantin, On Mon, 2006-05-08 at 11:37 +0200, Constantin Gonzalez wrote: I took the liberty of renaming the thread to $SUBJECT because I think that what we really are looking for is an ability for ZFS to automatically manage snapshot after they have been created. Wow, nice summary of the problem! I thought I'd add a few ideas into the fray. Here's what I was thinking could be fairly quick to implement, which administrators could build upon if necessary. I believe we certainly should provide a way for users to schedule automatic snapshots, but probably not build it into the filesystem itself (imho - ZFS is a filesystem dammit Jim, not a backup solution!) This could be easily implemented via a set of SMF instances which create/destroy cron jobs which would themselves call a simple script responsible for taking the snapshots. Of course, this isn't as flexible as an administrator writing their own scripts, but it could be enough for most users, with those that want more functionality being able to build on this functionality. So, it's not as intelligent as the daemon Bill was suggesting, we wouldn't poll the FS to reap snapshots when space is limited. For that functionality, I'd hope for an as-yet-nonexistent ZFS FMA event to report that some pools are getting short on space, which could be the trigger for deleting these auto-snapshots if necessary (I'd also imagine lots of other things would be interested in keying off such an event as well...) The service that we could have for taking auto snapshots could be called /system/filesystem/zfs/auto-snapshot We'd have one instance per set of automatic snapshots taken. Which isn't to say we need one instance per filesystem, as we could define instances that snapshot all child filesystems contained this top level filesystem. /system/filesystem/zfs/auto-snapshot:[fs-name] The properties we'd have for each instance are: - interval = minutes | hours | days | months | years - period = take snapshots every how many [interval]s - keep = number of snapshots to keep before rolling over (delete the oldest when we hit the limit) - offset = # seconds into the start of the period at which we take the snapshot ( < period * interval) - snapshot-children = true | false Here's some examples of SMF instances that would implement auto-snapshots. The following instance takes a snapshot every 4 days, at 01:00, keeping 30 snapshots into the past : /system/filesystem/zfs/auto-snapshot:tank interval = days period = 4 keep = 30 offset = (60 * 60) snapshot-children = false This instance takes a weekly snapshot, keeping the last two, and will snapshot all children of export/home/timf[1] : /system/filesystem/zfs/auto-snapshot:export/home/timf interval = days period = 7 keep = 2 offset = 0 snapshot-children = true Essentially, I'm really just suggesting a glorified interface to cron, so why not just use cron ? Well, I suspect having a service like this would be easier to manage than a heap of cron jobs : at a glance, I can tell which auto-snapshots are being taken, when and how. I also like the idea of tieing into SMF, since that means other options, like the GUI interfaces in the Visual Panels project, may become available in the future. Anyway, that's what I was thinking of (and it wouldn't be too hard to implement) I've no doubt this could be refined - but does anyone think this is the right direction to go in ? cheers, tim [1] I'm not yet sure if SMF instanc
[zfs-discuss] ZFS Snapshot management proposal idea (was: Properties of ZFS snapshots I'd like to see...)
Hi, thank you for the excellent comments, thoughts and ideas on the "Properties of ZFS snapshots I'd like to see..." thread. I took the liberty of renaming the thread to $SUBJECT because I think that what we really are looking for is an ability for ZFS to automatically manage snapshot after they have been created. To summarize: - "Snapshot management in ZFS" can be defined as an automatic way of: - Using free space on disk for automatically or manually generated snapshots but giving priority to new data on the disk at the expense of destroying old snapshots that are considered less useful. - Implementing policies that decide which snapshot to keep even at the cost of not having enough space for new data. Possible policies include: - Prioritize recent snapshots over older one. (Assuming that the older the snapshot, the less the user cares) - Prioritize older snapshots over recent ones. (Assuming, the older the snapshots, the more errors can be corrected) - Any combination of the above. (I.e. keep at least one yearly, one monthly, one weekly and one daily snapshot). - Giving users the possibility to decide what policies to apply to snapshots when created. - Giving users the possibility to configure automatic snapshots at regular intervals (similar to NetApps). - Automatically snapshotting a pool or a filesystem before any administrative action in order to facilitate a "zfs undo" functionality. - Much or all of the above can be implemented today with user- or admin-level scripts. The question therefore is whether this should be incorporated into ZFS or not. Here are pros and cons: Pros: - Make it easier for users and admins to enjoy the benefits of snapshots without having to write scripts. - Make advanced functionality available to users and admins that would take a lot of complex scripting and therefore can be implemented more elegantly inside zfs than outside zfs. ("zfs undo" and free space management for instance). - Reduce the risk of user and admin errors when scripting by providing a single point of development for a critical functionality. (Example: dealing with different time standards is non-trivial; scripts may be less robust than OS-level code, etc.). Cons: - ZFS is a file system, not a backup management system. Leave that to the application and 3rd party vendors. - Deleting snapshots is a difficult question and each user/admin/site may have very different policies about when to delete them and when not. This makes a one-size-fits all approach either insufficient or not generic enough for all users to be really useful. Feel free to add to the lists so we can make up our minds. Maybe this can evolve into something the ZFS team may be interested in. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Properties of ZFS snapshots I'd like to see...
Hi Wes, Wes Williams wrote: Interesting idea Constantin. However, perhaps instead of or in addition to your idea, I'd like to have a mechanism or script that would overwrite the older snapshots [u]only if[/u] some more current snapshot were created. Ideally this mechanism would prevent your idea of expired snapshots being removed in some case where the creation of new snapshots somehow failed. yeah, that could be another pair of snapshot properties: number of snapshots to minimally/maximally keep. Additionally, by only removing the snapshots after the creation of their replacement is successful, this should prevent the possibility of data loss if there were a major system problem during the creation of a new snapshot as well. Yes, snapshot replacement should always be split up into creating a successor, see if it was successful, then delete the old one. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Properties of ZFS snapshots I'd like to see...
Hi Al, 1) But is this something that belongs in ZFS or is this a backup/restore type tool that is simply a "user" of zfs? ... Again - this looks like an operational backup/restore policy. Not a ZFS function. So the question is: Is advanced management of snapshots (aging, expiring, etc.) something left to the domain of a ZFS user (backup/restore application, administrator, script) or should these concepts be adopted by ZFS as a filesystem (which BTW is already much more). IMHO, backup/restore is much more than playing with snapshots. The dividing line starts when you copy your data to a different medium. As soon as the data stays on the disk, I wouldn't say it's backup/restore related. As long as it's just snapshots, it should be definitely not be called "backup/restore". But you're right in that my desired functionality can "easily" be implemented with scripts. Then I would still argue for including this functionality as part of the ZFS user interface, because of ease of use and minimization of possible errors for the administrator. If it ain't simple to use, chances are that people won't. Same goes for snapshots: If admins don't have a really easy way to get rid of them, chances are they will use them less. Another point of view might be ease of implementation. A few person-months spent at Sun (or the OpenSolaris developer community) might come up with a more robust, clean, efficient, bug-free, elegant way of achieving the task of snapshot management than millions of person-months in many admins creating scripts and re-inventing wheels that may be half-baked. But yes, it is a matter of interpretation who should take care of managing snapshots after they've been created, ZFS or some application/script/user action. Thinking further, ZFS could start doing automatic snapshots (invisible from the user) by just keeping every uber-block at each interval. Then, when the admin panics, ZFS could say "hmm, here's a couple of leftover snapshots that happen to still exist because you had a lot space left on the disks that you may find useful". Now you're describing a form of filesystem snapshotting function that might have to be closely integrated with zfs. This is in addition to the other data replication features that are already in the pipeline for zfs. Yes, this is when the above discussed features definitively cross the line towards ZFS' responsibilities. Actually, it would be cool if ZFS took a hidden snapshot each time a zfs or zpool command is issued. Then an admin could say "zfs undo" after she/he discovered that she/he just did a horrible mistake. The basic idea behind this whole thinking is to maximize utilization of free blocks. If your disk utilization is only 50%, why not use the other 50% for snapshots by default, that could save your life? IMHO the majority of the functionality you're describing belongs in a backup/restore tool that is simply a consumer of zfs functionality. And this functionality could be easily scripted using your scripting tool of choice. yes and no, depending on the interpretation. The potential of having a "zfs undo" subcommand and the automatic exploitation of free space on disk for keeping snapshots as part of overall snapshot management are definitely something that ZFS can do much better internally, as opposed to having to implement it with some other app. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Properties of ZFS snapshots I'd like to see...
Hi, (apologies if this has been discussed before, I hope not) while setting up a script at home to do automatic snapshots, a number of wishes popped into my mind: The basic problem with regular snapshotting is that you end up managing so many of them. Wouldn't it be nice if you could assign an expiration date to a snapshot? For instance: zfs snapshot -e 3d tank/[EMAIL PROTECTED] would create a regular snapshot with an expiration date of 3 days from the date it was created. You could then change the expiration date with zfs set if you want to keep it longer. "0" would mean no expiration date and so on. Then, ZFS would be free to destroy the snapshot to free up space, but only if it must: Just like the yogurt in your fridge, you may or may not be able to eat it after the best before date, but you are guaranteed to be able to eat it (or sue the yogurt company) if it's inside the best before date. Another property could control the rigidness of this policy: Hard expiration would destroy the snapshot as soon as the expiry time arrives, soft expiration would work like the yogurt example above. The benefits of this approach would be ease of complexity: Imagine you do a snapshot every week, then you'll have 52 snapshots by the end of one year. This means that sysadmins will start writing scripts to automatically delete snapshots they don't need (I'm about to do just that) at the risk of deleting the wrong snapshot. Or, they won't because it takes too much thinking (you really want to make that script really robust). Another set of expiration related properties could allow for more complex snapshot management: - Multiple layers of snapshots: Keep one Yearly, one monthly, one weekly and the snapshot from yesterday always available. - Multiple priorities: Assign priorities to snapshots so less important ones get destroyed first. - Specify date ranges to destroy/modify attributes on multiple snapshots at once. Is this something we're already looking at or should we start looking at this as an RFE? Thinking further, ZFS could start doing automatic snapshots (invisible from the user) by just keeping every uber-block at each interval. Then, when the admin panics, ZFS could say "hmm, here's a couple of leftover snapshots that happen to still exist because you had a lot space left on the disks that you may find useful". The basic idea behind this whole thinking is to maximize utilization of free blocks. If your disk utilization is only 50%, why not use the other 50% for snapshots by default, that could save your life? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS: More information on ditto blocks?
Hi, (apologies if this was discussed before, I _did_ some research, but this one may have slipped for me...) Looking through the current Sun ZFS Technical presentation, I found a ZFS feature that was new to me: Ditto Blocks. In search of more information, I asked Google but there seem to be no real information other than the source code on Ditto Blocks. From the Ditto Block slide, I conclude that: - ZFS blocks can have multiple copies (up to 3), even on the same disk, but preferably on multiple disks, if possible. - The uber-block has an additional 3 copies (we already knew that) - The ZFS metadata structure has 2 or more copies (that was new to me) - In the future, users will be able to ask for multiple copies of their data (wow, what a great feature for laptop users with big, but single disks!) Can someone elaborate more on ditto blocks? Perhaps that would be a great blog entry (Google didn't find anything for "site:blogs.sun.com zfs ditto blocks"). In particular: - Are regular data blocks multiplied by default if the disk isn't mirrored/ raid-z'ed and there's enough space? - What are the general rules on what blocks get multiplied how often? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss