[zfs-discuss] zfs space efficiency
hello ! i think of using zfs for backup purpose of large binary data files (i.e. vmware vm`s, oracle database) and want to rsync them in regular interval from other systems to one central zfs system with compression on. i`d like to have historical versions and thus want to make a snapshot before each backup - i.e. rsync. now i wonder: if i have one large datafile on zfs, make a snapshot from that zfs fs holding it and then overwrting that file by a newer version with slight differences inside - what about the real disk consumption on the zfs side ? do i need to handle this a special way to make it space-efficient ? do i need to use rsync --inplace ? typically , rsync writes a complete new (temporary) file based on the existing one and on what has change at the remote site - and then replacing the old one by the new one via delete/rename. i assume this will eat up my backup space very quickly, even when using snapshots and even if only small parts of the large file are changing. comments? regards roland This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
So it is expected behavior on my Nexenta alpha 7 server for Sun's nfsd to stop responding after 2 hours of running a bittorrent client over nfs4 from a linux client, causing zfs snapshots to hang and requiring a hard reboot to get the world back in order? Thomas There is no NFS over ZFS issue (IMO/FWIW). If ZFS is talking to a JBOD, then the slowness is a characteristic of NFS (not related to ZFS). So FWIW on JBOD, there is no ZFS+NFS issue in the sense that I don't know howwe couldchange ZFS to be significantly better at NFS nor do I know how to change NFS that would help _particularly_ ZFS. Doesn't mean there is none, I just don't know about them. So please ping me if you highlight such an issue. So if one replaces ZFS by some other filesystem and gets large speedup I'm interested (make sure the other filesystem either runs with write cache off, or flushes it on NFS commit). So that leaves us with a Samba vs NFS issue (not related to ZFS). We know that NFS is able to create file _at most_ at one file per server I/O latency. Samba appears better and this is what we need to investigate. It might be better in a way that NFS can borrow (maybe through some better NFSV4 delegation code) or Samba might be better by being careless with data. If we find such an NFS improvement it will help all backend filesystems not just ZFS. Which is why I say: There is no NFS over ZFS issue. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs receive
Hi, As part of a disk subsystem upgrade I am thinking of using ZFS but there are two issues at present 1) The current filesystems are mounted as /hostname/mountpoint except for one directory where the mount point is storage dir/storage application dir. Is is possible to mount a ZFS filesystem as /hostname/storage dir/storage application so that /hostname/storage dir contains only directory storage application. Storage dir is empty apart from the storage application directory which contains all the file? 2) Is there any possibility of having a zfs ireceive for an interactive receive similar to the ufsrestore -i command? After twenty one years of working with Sun kit, my experience is that I either have to restore a complete filesystem (three disks failing in a RAID5 set) or I have to restore an individual file or directory. I have been told that zfs receive is very quick at restoring a filesystem unfortunately it does not permit an interactive restore of selected files and directories. Which is why I would like to see zfs ireceive if possible which work on a zfs send created data stream but allow for interactive or specified files or directories to be restored. Is does not matter if it is 10x slower than restoring a complete filesystem, it is the ability to selectively restore directories and files. TIA Russell ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS delegation script
Couldn't wait for ZFS delegation, so I cobbled something together; see attachment. Nico -- #!/bin/ksh ARG0=$0 PROG=${0##*/} OIFS=$IFS # grep -q rocks, but it lives in xpg4... OPATH=$PATH PATH=/usr/xpg4/bin:/bin:/sbin # Configuration (see usage message below) # # This is really based on how a particular server on SWAN is configured, # with datasets named tank/zone-export that are intended to be # administered by the zone admins, not just the global zone admins. # # Maybe it would be better to just used user props to track delegation. # USER_ZFS_BASE=tank/users ZONE_ZFS_BASE=tank ZONE_ZFS_SUFFIX=-export PROF_PREFIX=Zoned NFS Mgmt Hack for DELEG_ZFS_PROF=ZFS Delegation Hack usage () { cat EOF Usage: pfexec $PROG [-x] [-n] zfs zfs arguments pfexec $PROG [-x] [-n] chown chown args dataset pfexec $PROG [-x] [-n] add-zone-profile zonename pfexec $PROG [-x] [-n] setup Options: -x debug -n dry-run EOF fmt EOF With this program you can execute with privilege any zfs command that operates on a filesystem or snapshot named $USER_ZFS_BASE/username[/*] or $ZONE_ZFS_BASE/zonename$ZONE_ZFS_SUFFIX[/*] where username is the user running $PROG or where zonename is the name of a zone for which the user has have administrative authority. You can also delegate administration of ZFS dataset by using properties called :owner_user_username: (any value will do) or :owner_profiles: with a comma-separated list of profiles as its value -- any user with one of those profiles can admin the given dataset. Users must have an RBAC profile granted which allows them to execute this command with all privileges (privs=all). Administrative authority for a zone is granted by granting a profile named ${PROF_PREFIX} zonename The add-zone-profile adds such profiles. The chown sub-command allows users to chown to themselves any dataset for which they have authority. The setup sub-command creates a profile, Delegated ZFS Hack which you can grant to users (e.g., to all users via PROFS_GRANTED in policy.conf(4)). This script must be executed with euid=0 via pfexec(1) or a profiled shell. dataset is always a ZFS dataset name (i.e., no leading '/'!). EOF exit 1 } err () { print -u2 -- Error: $@ exit 1 } realpath () { typeset dir dirs if [[ $1 != */* ]] then IFS=: set -A dirs -- $PATH IFS=$OIFS for dir in [EMAIL PROTECTED] do if [[ -x ${dir}/$1 ]] then print -- ${dir}/$1 return 0 fi done elif [[ $1 = /* || $1 = */* ]] then (cd ${1%/*} /dev/null print -- $(/bin/pwd)/${1##*/}) return $? fi err Can't resolve path to $PROG } validate_object () { typeset i j prop op val user zone profs if [[ $1 = ${USER_ZFS_BASE}/* ]] then # A user's dataset user=${1#$USER_ZFS_BASE/} user=${user%%/*} [[ $username = $user ]] return 0 elif [[ $1 = ${ZONE_ZFS_BASE}/* ]] then # A zone's dataset zone=${1#$ZONE_ZFS_BASE/} zone=${zone%${ZONE_ZFS_SUFFIX}*} for i in [EMAIL PROTECTED] do [[ $zone = $i ]] return 0 done fi # More fun: if the dataset has a property of the form # :owner_user_username: or :owner_profiles:, the latter having # a comma-separated list of profile names as a value zfs get -H -o value type $1 2/dev/null|read val [[ -z $val ]] err Dataset $1 does not exist zfs get -H -o value :owner_user_${username}: $1|read val [[ $val != - ]] return 0 zfs get -H -o value :owner_profiles: $1|read val for i in [EMAIL PROTECTED] do IFS=, set -A profs -- $val IFS=$OIFS for j in [EMAIL PROTECTED] do [[ $i = $j ]] return 0 done done [[ $1 = [EMAIL PROTECTED] ]] validate_object [EMAIL PROTECTED] return 0 # usage() exits print FOO usage } validate_prop () { typeset prop prop=${1%%=*} case $prop in mountpoint|quota|zoned|reservation|volsize|devices|setuid|:owner_*) err Cannot set $prop properties;; *) return 0;; esac } zfs_create_opts () { typeset opt OPTARG prop arg # KSH getopts bug workaround OPTIND=1 set -A zfs_args create while getopts sb:o:V: opt do case $opt in s|b|V) err $PROG does not support volumes;; o) validate_prop $OPTARG [EMAIL PROTECTED] [EMAIL PROTECTED] ;; [?])usage;; esac done shift $((OPTIND - 1)) [[ $# -eq 1 ]] || usage # The user creating this should have
Re: [zfs-discuss] ZFS delegation script
On Sat, Jun 23, 2007 at 12:18:05PM -0500, Nicolas Williams wrote: Couldn't wait for ZFS delegation, so I cobbled something together; see attachment. I forgot to slap on the CDDL header... #!/bin/ksh # # CDDL HEADER START # # The contents of this file are subject to the terms of the # Common Development and Distribution License (the License). # You may not use this file except in compliance with the License. # # You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE # or http://www.opensolaris.org/os/licensing. # See the License for the specific language governing permissions # and limitations under the License. # # When distributing Covered Code, include this CDDL HEADER in each # file and include the License file at usr/src/OPENSOLARIS.LICENSE. # If applicable, add the following below this CDDL HEADER, with the # fields enclosed by brackets [] replaced with your own identifying # information: Portions Copyright [] [name of copyright owner] # # CDDL HEADER END # # # Copyright 2007 Sun Microsystems, Inc. All rights reserved. # Use is subject to license terms. # ARG0=$0 PROG=${0##*/} OIFS=$IFS # grep -q rocks, but it lives in xpg4... OPATH=$PATH PATH=/usr/xpg4/bin:/bin:/sbin # Configuration (see usage message below) # # This is really based on how a particular server on SWAN is configured, # with datasets named tank/zone-export that are intended to be # administered by the zone admins, not just the global zone admins. # # Maybe it would be better to just used user props to track delegation. # USER_ZFS_BASE=tank/users ZONE_ZFS_BASE=tank ZONE_ZFS_SUFFIX=-export PROF_PREFIX=Zoned NFS Mgmt Hack for DELEG_ZFS_PROF=ZFS Delegation Hack usage () { cat EOF Usage: pfexec $PROG [-x] [-n] zfs zfs arguments pfexec $PROG [-x] [-n] chown chown args dataset pfexec $PROG [-x] [-n] add-zone-profile zonename pfexec $PROG [-x] [-n] setup Options: -x debug -n dry-run EOF fmt EOF With this program you can execute with privilege any zfs command that operates on a filesystem or snapshot named $USER_ZFS_BASE/username[/*] or $ZONE_ZFS_BASE/zonename$ZONE_ZFS_SUFFIX[/*] where username is the user running $PROG or where zonename is the name of a zone for which the user has have administrative authority. You can also delegate administration of ZFS dataset by using properties called :owner_user_username: (any value will do) or :owner_profiles: with a comma-separated list of profiles as its value -- any user with one of those profiles can admin the given dataset. Users must have an RBAC profile granted which allows them to execute this command with all privileges (privs=all). Administrative authority for a zone is granted by granting a profile named ${PROF_PREFIX} zonename The add-zone-profile adds such profiles. The chown sub-command allows users to chown to themselves any dataset for which they have authority. The setup sub-command creates a profile, Delegated ZFS Hack which you can grant to users (e.g., to all users via PROFS_GRANTED in policy.conf(4)). This script must be executed with euid=0 via pfexec(1) or a profiled shell. dataset is always a ZFS dataset name (i.e., no leading '/'!). EOF exit 1 } err () { print -u2 -- Error: $@ exit 1 } realpath () { typeset dir dirs if [[ $1 != */* ]] then IFS=: set -A dirs -- $PATH IFS=$OIFS for dir in [EMAIL PROTECTED] do if [[ -x ${dir}/$1 ]] then print -- ${dir}/$1 return 0 fi done elif [[ $1 = /* || $1 = */* ]] then (cd ${1%/*} /dev/null print -- $(/bin/pwd)/${1##*/}) return $? fi err Can't resolve path to $PROG } validate_object () { typeset i j prop op val user zone profs if [[ $1 = ${USER_ZFS_BASE}/* ]] then # A user's dataset user=${1#$USER_ZFS_BASE/} user=${user%%/*} [[ $username = $user ]] return 0 elif [[ $1 = ${ZONE_ZFS_BASE}/* ]] then # A zone's dataset zone=${1#$ZONE_ZFS_BASE/} zone=${zone%${ZONE_ZFS_SUFFIX}*} for i in [EMAIL PROTECTED] do [[ $zone = $i ]] return 0 done fi # More fun: if the dataset has a property of the form # :owner_user_username: or :owner_profiles:, the latter having # a comma-separated list of profile names as a value zfs get -H -o value type $1 2/dev/null|read val [[ -z $val ]] err Dataset $1 does not exist zfs get -H -o value :owner_user_${username}: $1|read val [[ $val != - ]] return 0 zfs get -H -o value :owner_profiles: $1|read val for i in [EMAIL PROTECTED] do IFS=, set -A profs -- $val
Re: [zfs-discuss] ZFS delegation script
On Sat, Jun 23, 2007 at 12:31:28PM -0500, Nicolas Williams wrote: On Sat, Jun 23, 2007 at 12:18:05PM -0500, Nicolas Williams wrote: Couldn't wait for ZFS delegation, so I cobbled something together; see attachment. I forgot to slap on the CDDL header... And I forgot to add a -p option here: #!/bin/ksh That should be: #!/bin/ksh -p Note that this script is not intended to be secure, just to keep honest people honest and from making certain mistakes. Setuid-scripts (which this isn't quite) are difficult to make secure. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs space efficiency
Erik Trimble wrote: roland wrote: hello ! i think of using zfs for backup purpose of large binary data files (i.e. vmware vm`s, oracle database) and want to rsync them in regular interval from other systems to one central zfs system with compression on. i`d like to have historical versions and thus want to make a snapshot before each backup - i.e. rsync. now i wonder: if i have one large datafile on zfs, make a snapshot from that zfs fs holding it and then overwrting that file by a newer version with slight differences inside - what about the real disk consumption on the zfs side ? do i need to handle this a special way to make it space-efficient ? do i need to use rsync --inplace ? typically , rsync writes a complete new (temporary) file based on the existing one and on what has change at the remote site - and then replacing the old one by the new one via delete/rename. i assume this will eat up my backup space very quickly, even when using snapshots and even if only small parts of the large file are changing. You are correct, when you write a new file, we will allocate space for that entire new file, even if some of its blocks happen to have the same content as blocks in the previous file. This is one of the reasons that we implemented zfs send. If only a few blocks of a large file were modified on the sending side, then only those blocks will be sent, and we will find the blocks extremely quickly (in O(modified blocks) time; using the POSIX interfaces (as rsync does) would take O(filesize) time). Of course, if the system you're backing up from is not running ZFS, this does not help you. Under ZFS, any equivalent to 'cp A B' takes up no extra space. The metadata is updated so that B points to the blocks in A. Should anyone begin writing to B, only the updated blocks are added on disk, with the metadata for B now containing the proper block list to be used (some from A, and the new blocks in B). So, in your case, you get maximum space efficiency, where only the new blocks are stored, and the old blocks simply are referenced. That is not correct; what lead you to believe that? With ZFS (and UFS, EXT2, WAFL, VxFS, etc), cp a b will copy the contents of the file, resulting in two copies stored on disk. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs space efficiency
roland wrote: hello ! i think of using zfs for backup purpose of large binary data files (i.e. vmware vm`s, oracle database) and want to rsync them in regular interval from other systems to one central zfs system with compression on. i`d like to have historical versions and thus want to make a snapshot before each backup - i.e. rsync. now i wonder: if i have one large datafile on zfs, make a snapshot from that zfs fs holding it and then overwrting that file by a newer version with slight differences inside - what about the real disk consumption on the zfs side ? do i need to handle this a special way to make it space-efficient ? do i need to use rsync --inplace ? typically , rsync writes a complete new (temporary) file based on the existing one and on what has change at the remote site - and then replacing the old one by the new one via delete/rename. i assume this will eat up my backup space very quickly, even when using snapshots and even if only small parts of the large file are changing. comments? regards roland I'm pretty sure about this answer, but others should correct me if I'm wrong. :-) Under ZFS, any equivalent to 'cp A B' takes up no extra space. The metadata is updated so that B points to the blocks in A. Should anyone begin writing to B, only the updated blocks are added on disk, with the metadata for B now containing the proper block list to be used (some from A, and the new blocks in B). So, in your case, you get maximum space efficiency, where only the new blocks are stored, and the old blocks simply are referenced. What I'm not sure of is exactly how ZFS does this. Does the metadata for B contain an entire list of all blocks (in order) for that file? Or does each block effectively contain a pointer to the next (and possibly prev) block, in effect a doubly-linked list? I'd hope for the former, since that seems most efficient. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zfs space efficiency
So, in your case, you get maximum space efficiency, where only the new blocks are stored, and the old blocks simply are referenced. so - i assume that whenever some block is read from file A and written unchanged to file B, zfs recognizes this and just creates a new reference to file A ? that would be great. i shouldn`t ask so much but try on my own, instead :) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Garner So it is expected behavior on my Nexenta alpha 7 server for Sun's nfsd to stop responding after 2 hours of running a bittorrent client over nfs4 from a linux client, causing zfs snapshots to hang and requiring a hard reboot to get the world back in order? We have seen this behavior, but it appears to be entirely related to the hardware having the Intel IPMI stuff swallow up the NFS traffic on port 623 directly by the network hardware and never getting. http://blogs.sun.com/shepler/entry/port_623_or_the_mount -- paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs space efficiency
if i have one large datafile on zfs, make a snapshot from that zfs fs holding it and then overwrting that file by a newer version with slight differences inside - what about the real disk consumption on the zfs side ? If all the blocks are rewritten, then they're all new blocks as far as ZFS knows. do i need to handle this a special way to make it space-efficient ? do i need to use rsync --inplace ? I would certainly try that to see if it worked, and if your access can cope with files being partially edited at times. typically , rsync writes a complete new (temporary) file based on the existing one and on what has change at the remote site - and then replacing the old one by the new one via delete/rename. i assume this will eat up my backup space very quickly, even when using snapshots and even if only small parts of the large file are changing. Yes, I think so. I believe this is even more of a problem for a server with Windows clients (via CIFS) because many of the apps tend to rewrite the entire file on save. Network Appliance eventually added an option on their software to let you do additional work and save space if files are substantially similar to the last snapshot. Theirs works on file close, so it's only a CIFS option. ZFS could conceivably do the same for local access as well, but I don't think anyone's tried to work on it yet. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs space efficiency
Matthew Ahrens wrote: Erik Trimble wrote: Under ZFS, any equivalent to 'cp A B' takes up no extra space. The metadata is updated so that B points to the blocks in A. Should anyone begin writing to B, only the updated blocks are added on disk, with the metadata for B now containing the proper block list to be used (some from A, and the new blocks in B). So, in your case, you get maximum space efficiency, where only the new blocks are stored, and the old blocks simply are referenced. That is not correct; what lead you to believe that? With ZFS (and UFS, EXT2, WAFL, VxFS, etc), cp a b will copy the contents of the file, resulting in two copies stored on disk. --matt Basically, the descriptions of Copy on Write. Or does this apply only to Snapshots? My original understanding was that CoW applied whenever you were making a duplicate of an existing file. I can understand that 'cp' might not do that (given that there must be some (system-call) mechanism for ZFS to distinguish that we are replicating an existing file, not just creating a whole new one). Now that I think about it, I'm not sure that I can see any way to change the behavior of POSIX calls to allow for this type of mechanism. You'd effectively have to create a whole new system call with multiple file arguments. sigh Wishfull thinking, I guess. Now, wouldn't it be nice to have syscalls which would implement cp and mv, thus abstracting it away from the userland app? -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs space efficiency
On 6/23/07, Erik Trimble [EMAIL PROTECTED] wrote: Matthew Ahrens wrote: Basically, the descriptions of Copy on Write. Or does this apply only to Snapshots? My original understanding was that CoW applied whenever you were making a duplicate of an existing file. CoW happens all the time. If you overwrite a file, instead of writing it to the same location on disk, ZFS allocates a new block, writes to that, and then creates a new tree in parallel (all on new, previously unused blocks). Then it changes the root of the tree to point to the newly allocated blocks. Now that I think about it, I'm not sure that I can see any way to change the behavior of POSIX calls to allow for this type of mechanism. You'd effectively have to create a whole new system call with multiple file arguments. sigh Files that are mostly the same, or exactly the same? If they're exactly the same, it's called a hardlink ;) If they're mostly the same, I guess, you could come up with a combination of a sparse file and a symlink. But I don't think the needed functionality is commonly enough used to bother implementing in kernel space. If you really want it in your application, do it yourself. Make a little file with two filenames, and a bitmap indicating which of them the application blocks should come from. Now, wouldn't it be nice to have syscalls which would implement cp and mv, thus abstracting it away from the userland app? Not really. Different apps want different behavior in their copying, so you'd have to expose a whole lot of things - how much of the copy has completed? how fast is it going? - even if they never get used by the userspace app. And it duplicates functionality - you can do everything necessary in userspace with stat(), read(), write() and friends. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Scalability/performance
Oliver Schinagl wrote: zo basically, what you are saying is that on FBSD there's no performane issue, whereas on solaris there (can be if write caches aren't enabled) Solaris plays it safe by default. You can, of course, override that safety. FreeBSD plays it safe too. It's just that UFS, and other file systems on FreeBSD, understand write caches and flush at appropriate times. If Solaris UFS is updated to flush the write cache when necessary (it's not only at sync time, of course), it too can enable the write cache! :-) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss