Hi all, I put together the attached one-pager on the ZFS Automatic Snapshots service which I've been maintaining on my blog to date.
I would like to see if this could be integrated into ON and believe that a first step towards this is a project one-pager: so I've attached a draft version. I'm happy to defer judgement to the ZFS team as to whether this would be a suitable addition to OpenSolaris - if the consensus is that it's better for the service to remain in it's current un-integrated state and be discovered through BigAdmin or web searches, that's okay by me. [ just thought I'd ask ] cheers, tim -- Tim Foster, Sun Microsystems Inc, Solaris Engineering Ops http://blogs.sun.com/timf
Template Version: @(#)onepager.txt 1.31 07/08/08 SMI [ timf note: this is still a Draft, last updated 02/04/2008 using the template at http://www.opensolaris.org/os/community/arc/handbook/onepager/ ] This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: ZFS Automatic Snapshots 1.2. Name of Document Author/Supplier: Tim Foster 1.3. Date of This Document: 02/04/2008 1.4. Name of Major Document Customer(s)/Consumer(s): 1.4.1. The Community you expect to review your project: ZFS OpenSolaris Community [editor's note - I'm not sure what was expected for 1.4.1 above] 1.4.2. The ARC(s) you expect to review your project: PSARC 1.5. Email Aliases: 1.5.2. Responsible Engineer: [EMAIL PROTECTED] 1.5.4. Interest List: zfs-discuss@opensolaris.org 2. Project Summary 2.1. Project Description: This project delivers an SMF service which allows the admin to performs regular, periodic snapshots of user/administrator-specified ZFS filesystems. It is loosely coupled with the ZFS codebase, using only the ZFS CLI, cron and SMF to perform it's functionality. 2.2. Risks and Assumptions: The current prototype has been implemented entirely in Korn shell - performance/scalability testing has not yet been carried out to determine whether this implementation is fast enough. If much tighter integration into the ZFS codebase is required, then this project will need additional resources. This project is not officially Sun funded - the engineer is doing this in his spare time. This could be mitigated by additional resources if a significant amount of additional engineering is recommended by the ARC and those resources become available. 3. Business Summary 3.1. Problem Area: This adds one more feature to the capabilities ZFS brings to Solaris, integrating ZFS more tightly with the operating system and providing a feature that some expect ZFS to have already. 3.2. Market/Requester: No specific person has asked for this feature, but it appears to be a general feature of many NAS boxes. The idea for such a system in ZFS came from a discussion on the zfs-discuss@opensolaris.org mailing list: http://www.opensolaris.org/jive/thread.jspa?messageID=37190 3.3. Business Justification: Not providing scheduled periodic ZFS snapshots on Solaris out of the box, means that there's one more thing that a system administrator needs to write and debug scripts for, before putting a Solaris system into production to best exercise the features ZFS can provide. Having a common facility in Solaris that does this would prevent duplication of effort at user sites, increase the speed-to deploy a Solaris system, and make life easier for support staff when users either request this feature, or try to troubleshoot a user's homemade solution. 3.4. Competitive Analysis: Many other NAS products and operating systems that support snapshots already do this. These include: http://www.emc.com/products/software/snapview2.jsp http://www.microsoft.com/windows/products/windowsvista/features/details/shadowcopy.mspx http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=relnotes&fname=/usr/relnotes/nasmgr http://www.netapp.com/ftp/snapshot-brochure.pdf http://www.real-storage.com/nas-snapshots.html http://people.freebsd.org/~rse/snapshot/ 3.5. Opportunity Window/Exposure: We're playing catchup. 3.6. How will you know when you are done?: The major features have already been implemented in Korn shell, but we need to perform more testing, and get additional code reviews. Community feedback can be used to determine if we've implemented enough of the functionality for this to be useful. [ editor's note: yes, that's pretty vague. I don't know what specific metrics I could use here - any suggestions ? ] 4. Technical Description: 4.1. Details: The service works by having a separate service instance, each denoting a separate schedule of periodic snapshots, per group of fileystems. The SMF method script is responsible for adding/removing the snapshot cron job, which corresponds to enabling and disabling the service. The method script is also called directly from cron according to the crontab entries - in which case it is responsible for taking the snapshot. Filesystems are grouped together either by setting their names as a space separated list in an SMF instance property, or queried dynamically by the method script, by the service searching for an instance-specific ZFS user-property across all ZFS filesystems. With ZFS Delegated Administration (PSARC 2006/465), users can specify this property on their own filesystems, and need not reconfigure the SMF service. The service can also be responsible for destroying older snapshots taken by the service, allowing the administrator to keep a given number of snapshots into the past. The service can perform a backup command at each invocation of the cron job - the admin specifies what command to run at the end of a pipe that starts with "zfs send <filesystem>@<snapshot>", with the option of sending an incremental stream from the previous periodic snapshot. What does this offer that a simple "zfs snapshot <filesystem>@snap" entry in crontab doesn't? Using SMF allows the adminstrator to easily see when snapshots fail for some reason, allows them to easily enable/disable snapshots for groupings of filesystems and adds additional features, like performing backups of their filesystems. In the default configuration, we have daily, weekly, hourly, monthly and yearly snapshots - each managed under a different SMF instance. The administrator could add instances to take more frequent snapshots for some filesystems, less frequent snapshots for other filesystems - and have the service manage the complexity of dealing with cron for them. This has been a personal project up till now, with code (licensed under CDDL) and implementation posted on the engineer's blog. The README documentation for the project is at: http://blogs.sun.com/timf/resource/README.zfs-auto-snapshot.txt The "SEE ALSO" section of the README has a list of links showing the various stages of the project to date. To summarize, the project has evolved over 10 versions since May 2006 to the present date. Users have been running the code, and providing feedback, which has been integrated into each subsequent version. Two known bugs are worth calling out here: One is to do with our reliance on cron. That is, to properly allow the administrator have snapshots taken every 3 days, we'd need to re-write the crontab entry when the days in the month aren't evenly divisible by 3 - at the moment, the following crontab day field would look like: 1,4,7,10,13,16,19,22,25,28,31 after taking the snapshot on the 31st our next snapshot should be taken on the 3rd of the following month - but as implemented, it'll get taken on the 1st instead. Other time periods are similarly affected. The other bug is 6474294 Need to be able to better control who can read files in a snapshot. This service doesn't change the implications of that bug, but having automatic snapshots could result in more people running into the situation. 4.2. Bug/RFE Number(s): TBD 4.3. In Scope: Everything discussed in this one-pager is in scope. 4.4. Out of Scope: While this service does provide a means for a snapshot stream to be stored remotely (the "backup" option allows for a ZFS send-stream to get piped to an administrator-specified command) it doesn't provide the eqivalent "restore" command. This is not a general purpose backup tool (ie. does not fix 5004379). This is also not a general purpose remote replication facility (5036182) although some users have already started using it as a "poor man's cluster". [ editor's note - with that in mind, could this service ultimately end up confusing people who are expecting the above? Should we postpone work on this part of the big picture till the above facilities are available? ] 4.5. Interfaces: The interface will be the SMF service, allowing users to create instances of the service to perform work. We suspect the stability level will be Evolving, but would like advice. Over the course of the prototype development, we've added, but never removed several service properties - a 0.1 manifest will work correctly with a 0.10 version of the service. 4.6. Doc Impact: The ZFS Administration Guide could be modified to reference this service. 4.7. Admin/Config Impact: Adding this SMF service will introduce no change to the way Solaris is currently installed or administered. Out of the box, the included service instances can be installed as "disabled". The administrator would need to enable each service they wanted to use, then mark filesystems for inclusion under each the snapshot schedule set by the now-enabled instances. 4.8. HA Impact: // What new requirements does this proposal place on the High // Availability or Clustering aspects of the component? [ editor's note: I'm not sure of the answer here - I assume HA clusters already have some form of SMF synchronisation to ensure that failover-nodes have the same SMF configuration applied automatically, should the running node change it's SMF configuration ? ] 4.9. I18N/L10N Impact: Additional translation of the ZFS Administration Guide could be required. 4.10. Packaging & Delivery: One additional package, which delivers the default instance, the included instances and the method script. No impact during Install/Upgrade. 4.11. Security Impact: If periodic snapshots are taken of sensitive data, then 6474294 may be worth visiting prior to integration, however this service only highlights that problem - it exists without the service as well. 4.12. Dependencies: Cron, SMF and ZFS. The service works with ZFS from s10u2 and later - later ZFS versions include faster recursive snapshots, which the method script detects and uses the feature if it's available. 5. Reference Documents: The following bugids have been mentioned in this one-pager, under sections 4.4 and 4.11. 5004379 want comprehensive backup strategy 5036182 want remote replication (intent-log based) 6474294 Need to be able to better control who can read files in a snapshot. 6. Resources and Schedule: 6.1. Projected Availability: TDB 6.2. Cost of Effort: // Order of magnitude people and time for the *whole* project, not // just the development engineering part. // You may wish to split the estimate between feature // implementation, implementing adminsitrative Interfaces, unit // tests, documentation, support training material, i18n, etc. [editor's note - any ideas ? The prototype is done - there's additional work to integrate it in ON and Install, properly use RBAC, perhaps a few weeks work in my spare time ? ] 6.4. Product Approval Committee requested information: 6.4.1. Consolidation or Component Name: ON 6.4.7. Target RTI Date/Release: TBD // List target release & build and/or date. // RTI = Request to Integrate - when does *this* project // expect to be ready to integrate its changes back into // the master source tree? We are not asking when the // component wants to ship, but instead, when the // gatekeeper/PM needs to expect your changes to show up. // examples: S8u7_1, S9_45, Aug 2002... 6.4.8. Target Code Design Review Date: TBD 6.5. ARC review type: Standard 6.6. ARC Exposure: open 6.6.1. Rationale: Part of OpenSolaris 7. Prototype Availability: 7.1. Prototype Availability: An evolving prototype has been available since May 2006. More work is needed to add RBAC SMF authorisations to manage the service instances. 7.2. Prototype Cost: $0
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss