Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On Sep 10, 2007, at 13:40, [EMAIL PROTECTED] wrote: > I am not against refactoring solutions, but zfs quotas and the > lack of > user quotas in general either leave people trying to use zfs quotas > in lieu > of user quotas, suggesting weak end runs against the problem (a > cron to > calculate hogs), or belittling the need to actually limit disk > usage per > user id. And let's not forget group ID ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On 10 Sep 2007, at 16:41, Brian H. Nelson wrote: Stephen Usher wrote: Brian H. Nelson: I'm sure it would be interesting for those on the list if you could outline the gotchas so that the rest of us don't have to re-invent the wheel... or at least not fall down the pitfalls. Also, here's a link to the ufs on zvol blog where I originally found the idea: http://blogs.sun.com/scottdickson/entry/fun_with_zvols_-_ufs Not everything I've seen blogged about UFS and zvols fills me with warm fuzzies. For instance, the above takes no account of the fact that the UFS filesystem needs to be in a consistent state before a snapshot is taken - e.g. using lockfs(1M). Example: Preparation ... basket# zfs create -V 10m pool0/v1 basket# newfs /dev/zvol/rdsk/pool0/v1 newfs: /dev/zvol/rdsk/pool0/v1 last mounted as /tmp/v1 newfs: construct a new file system /dev/zvol/rdsk/pool0/v1: (y/n)? y Warning: 4130 sector(s) in last cylinder unallocated /dev/zvol/rdsk/pool0/v1:20446 sectors in 4 cylinders of 48 tracks, 128 sectors 10.0MB in 1 cyl groups (14 c/g, 42.00MB/g, 20160 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, basket# mount -r /dev/zvol/dsk/pool0/v1 /tmp/v1 Scenario 1 ... basket# date >/tmp/v1/f1; zfs snapshot pool0/[EMAIL PROTECTED] basket# cat /tmp/v1/f1 Mon Sep 10 23:07:42 BST 2007 basket# mount -r /dev/zvol/dsk/pool0/[EMAIL PROTECTED] /tmp/v1s1 basket# ls /tmp/v1s1 f1 lost+found/ basket# cat /tmp/v1s1/f1 basket# date >/tmp/v1/f1; zfs snapshot pool0/[EMAIL PROTECTED] basket# mount -r /dev/zvol/dsk/pool0/[EMAIL PROTECTED] /tmp/v1s2 basket# cat /tmp/v1s2/f1 Mon Sep 10 23:07:42 BST 2007 basket# cat /tmp/v1/f1 Mon Sep 10 23:09:19 BST 2007 Note: the first snapshot sees the file but not the contents, while the second snapshot sees stale data. Scenario 2 ... basket# date >/tmp/v1/f2; lockfs -wf /tmp/v1; zfs snapshot pool0/ [EMAIL PROTECTED]; lockfs -u /tmp/v1 basket# mount -r /dev/zvol/dsk/pool0/[EMAIL PROTECTED] /tmp/v1s3 mount: Mount point /tmp/v1s3 does not exist. basket# mkdir /tmp/v1s3 basket# mount -r /dev/zvol/dsk/pool0/[EMAIL PROTECTED] /tmp/v1s3 basket# cat /tmp/v1s3/f2 Mon Sep 10 23:18:17 BST 2007 basket# cat /tmp/v1/f2 Mon Sep 10 23:18:17 BST 2007 basket# Note: the snapshot is consistent because of the lockfs(1M) calls. Phil smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
[EMAIL PROTECTED] wrote: > All of these threads to this point have not answered the needs in > anyway close to an solution that user quotas allow. I thought I did answer that... for some definition of "answer"... >> The main gap for .edu sites is quotas which will likely be solved >> some other way in the long run... Meanwhile, pile on >> http://bugs.opensolaris.org/view_bug.do?bug_id=6501037 Or, if you're so inclined, http://cvs.opensolaris.org/source/ The point being that it either isn't a high priority for the ZFS team, there are other solutions to the problem (which may not require changes to ZFS), or you can fix it on your own. You can impact any or all of these things. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
[EMAIL PROTECTED] wrote on 09/10/2007 12:13:18 PM: > [EMAIL PROTECTED] wrote: > > Very true, you could even pay people to track down heavy users and > > bonk them on the head. Why is everyone responding with alternate routes to > > a simple need? > > For the simple reason that sometimes it is good to challenge existing > practice and try and find the real need rather than "I need X because > I've always done it using X". I am not against refactoring solutions, but zfs quotas and the lack of user quotas in general either leave people trying to use zfs quotas in lieu of user quotas, suggesting weak end runs against the problem (a cron to calculate hogs), or belittling the need to actually limit disk usage per user id. All of these threads to this point have not answered the needs in anyway close to an solution that user quotas allow. > > We always used a vfstab and dfstab (or exportfs) file before and used a > separate software RAID and filesystem before too. Yes, and the replacements (when talking ZFS) are either parity or better -- that makes switching a win-win. ENOSUCH when talking user quotas. > > > User quotas have been used in the past, and will be used in > > the future because they work (well), are simple, tied into many existing > > workflows/systems and very understandable for both end users and > > administrators. You can come up with 100 other ways to accomplish psudo > > user quotas or end runs around the core issue (did we really have google > > space farming suggested -- we are reading a FS mailing list here?), but > > quotas are tested and well understood fixes to these problems. Just > > because someone decided to call ZFS pool reservations quotas does not mean > > the need for real user quotas is gone. > > Reservations in ZFS are quite different to Quotas, ZFS has both > concepts. A reservation is a guaranteed minimum, a quota in ZFS is a > guaranteed maximum. > Reservations (the general term when talking most of the disk virtualizing and pooling technologies in play today) usually cover both the floor (guaranteed space) and ceiling (max alloc space) for the pool volume, dynamic store, or backing store. ZFS Quotas (reservations) can be called whatever you want -- it has just become frustrating when people start pushing ZFS quotas (reservations) as a drop in replacement for user quotas. They are tools for different issues with some overlap. Even though one can pound in a nail with a screwdriver, I would rather have a hammer. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
[EMAIL PROTECTED] wrote: > Very true, you could even pay people to track down heavy users and > bonk them on the head. Why is everyone responding with alternate routes to > a simple need? For the simple reason that sometimes it is good to challenge existing practice and try and find the real need rather than "I need X because I've always done it using X". We always used a vfstab and dfstab (or exportfs) file before and used a separate software RAID and filesystem before too. > User quotas have been used in the past, and will be used in > the future because they work (well), are simple, tied into many existing > workflows/systems and very understandable for both end users and > administrators. You can come up with 100 other ways to accomplish psudo > user quotas or end runs around the core issue (did we really have google > space farming suggested -- we are reading a FS mailing list here?), but > quotas are tested and well understood fixes to these problems. Just > because someone decided to call ZFS pool reservations quotas does not mean > the need for real user quotas is gone. Reservations in ZFS are quite different to Quotas, ZFS has both concepts. A reservation is a guaranteed minimum, a quota in ZFS is a guaranteed maximum. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
[EMAIL PROTECTED] wrote on 09/10/2007 11:40:16 AM: > Richard Elling wrote: > > There is also a long tail situation here, which is how I approached the > > problem at eng.Auburn.edu. 1% of the users will use > 90% of the space. For > > them, I had special places. For everyone else, they were lumped > into large-ish > > buckets. A daily cron job easily identifies the 1% and we could proactively > > redistribute them, as needed. Of course, quotas are also easily defeated > > and the more clever students played a fun game of hide-and-seek, but I > > digress. There is more than one way to solve these allocation problems. > > Ah I remember those games well and they are one of the reasons I'm now a > Solaris developer! Though at Glasgow Uni's Comp Sci department it > wasn't disk quotas (peer pressure was used for us) but print quotas > which were much more fun to try and bypass and environmentally > responsible to quota in the first place. > Very true, you could even pay people to track down heavy users and bonk them on the head. Why is everyone responding with alternate routes to a simple need? User quotas have been used in the past, and will be used in the future because they work (well), are simple, tied into many existing workflows/systems and very understandable for both end users and administrators. You can come up with 100 other ways to accomplish psudo user quotas or end runs around the core issue (did we really have google space farming suggested -- we are reading a FS mailing list here?), but quotas are tested and well understood fixes to these problems. Just because someone decided to call ZFS pool reservations quotas does not mean the need for real user quotas is gone. User quotas are a KISS solution to space hogs. Zpool quotas (really pool reservations) are not unless you can divvy up data slices into small fs mounts and have no user overlap in the partition. user quotas + zfs quotas > zfs quotas; -Wade ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Richard Elling wrote: > There is also a long tail situation here, which is how I approached the > problem at eng.Auburn.edu. 1% of the users will use > 90% of the space. For > them, I had special places. For everyone else, they were lumped into > large-ish > buckets. A daily cron job easily identifies the 1% and we could proactively > redistribute them, as needed. Of course, quotas are also easily defeated > and the more clever students played a fun game of hide-and-seek, but I > digress. There is more than one way to solve these allocation problems. Ah I remember those games well and they are one of the reasons I'm now a Solaris developer! Though at Glasgow Uni's Comp Sci department it wasn't disk quotas (peer pressure was used for us) but print quotas which were much more fun to try and bypass and environmentally responsible to quota in the first place. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Mike Gerdts wrote: > On 9/8/07, Richard Elling <[EMAIL PROTECTED]> wrote: >> Changing the topic slightly, the strategic question is: >> why are you providing disk space to students? > > For most programming and productivity (e.g. word processing, etc.) > people will likely be better suited by having network access for their > personal equipment with local storage. Most students today are carrying around more storage in their pocket than they'll get from the university. > For cases when specialized expensive tools ($10k + per seat) are used, > it is not practical to install them on hundreds or thousands of > personal devices for a semester or two of work. The typical computing > lab that provides such tools is not well equipped to deal with > removable media such as flash drives. I disagree, any lab machine bought in the past 5 years or so has a USB port, even SunRays. >Further, such tools will often > times be used to do designs that require simulations to run as batch > jobs that run under grid computing tools such as Grid Engine, Condor, > LSF, etc. Yes, but you won't have 15,000 students running grid engine. But even if you do, you can adopt the services models now prevalent in the industry. For example, rather than providing storage for a class, let Google or Yahoo do it. > Then, of course, there are files that need to be shared, have reliable > backups, etc. Pushing that out to desktop or laptop machines is not > really a good idea. Clearly the business of a university has different requirements than student instruction. But even then, it seems we're stuck in the 1960s rather than the 21st century. I think I might have some home directory somewhere at USC, where I currently attend, but I'm not really sure. I know I have a (Sun-based :-) email account with some sort of quota, but that isn't implemented as a file system quota. I keep my stuff in my pocket. This won't work entirely for situations like Steve's compute cluster, but it will for many. There is also a long tail situation here, which is how I approached the problem at eng.Auburn.edu. 1% of the users will use > 90% of the space. For them, I had special places. For everyone else, they were lumped into large-ish buckets. A daily cron job easily identifies the 1% and we could proactively redistribute them, as needed. Of course, quotas are also easily defeated and the more clever students played a fun game of hide-and-seek, but I digress. There is more than one way to solve these allocation problems. The real PITA was cost accounting, especially for government contracts :-( The cost of managing the storage is much greater than the cost of the storage, so the trend will inexorably be towards eliminating the management costs -- hence the management structure of ZFS is simpler than the previous solutions. The main gap for .edu sites is quotas which will likely be solved some other way in the long run... Meanwhile, pile on http://bugs.opensolaris.org/view_bug.do?bug_id=6501037 -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Stephen Usher wrote: > > Brian H. Nelson: > > I'm sure it would be interesting for those on the list if you could > outline the gotchas so that the rest of us don't have to re-invent the > wheel... or at least not fall down the pitfalls. > Also, here's a link to the ufs on zvol blog where I originally found the idea: http://blogs.sun.com/scottdickson/entry/fun_with_zvols_-_ufs -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Mike Gerdts wrote: > The UFS on zvols option sounds intriguing to me, but I would guess > that the following could be problems: > > 1) Double buffering: Will ZFS store data in the ARC while UFS uses > traditional file system buffers? > This is probably an issue. You also have the journal+COW combination issue. I'm guessing that both would be performance concerns. My application is relatively low bandwidth, so I haven't dug deep into this area. > 2) Boot order dependencies. How does the startup of zfs compare to > processing of /etc/vfstab? I would guess that this is OK due to > legacy mount type supported by zfs. If this is OK, then dfstab > processing is probably OK. Zvols by nature are not available under ZFS automatic mounting. You would need to add the /dev/zvol/dsk/... lines to /etc/vfstab just as you would for any other /dev/dsk... or /dev/md/dsk/... devices. If you are not using the z_pool_ for anything else, I would remove the automatic mount point for it. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Stephen Usher wrote: > Brian H. Nelson: > > I'm sure it would be interesting for those on the list if you could > outline the gotchas so that the rest of us don't have to re-invent the > wheel... or at least not fall down the pitfalls. > I believe I ran into one or both of these bugs: 6429996 zvols don't reserve enough space for requisite meta data 6430003 record size needs to affect zvol reservation size on RAID-Z Basically what happened was that the zpool filled to 100% and broke UFS with 'no space left on device' errors. This was quite strange to sort out since the UFS zvol had 30GB of free space. I never got any replies to my request for more info and/or workarounds for the above bugs. My workaround and recommendation is to leave a 'healthy' amount of un-allocated space in the zpool. I don't know what a good level for 'healthy' is. Currently I've left about 1% (2GB) on a 200GB raid-z pool. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Alec Muffett wrote: >>> Mounts under /net are derived from the filesystems actually shared >>> from the servers; the automount daemon uses the MOUNT protocol to >>> determine this. If you're looking at a path not already seen, the >>> information will be fresh, but that's where the good news ends. >> I know that, yes, but why can't we put such an abstraction >> elsewhere in >> the name space? One thing I have always disliked about /net mounts is >> that they're too magical; it should be possible to replicate them >> in some form in other mount maps. > > In short, you're proposing a solution to the zillions-of-nfs-exports > issue, which instead of using "wait for v4 to implement a server-side > export consolidation" thingy, would instead be a "better, smarter / > net-alike on v2/v3, but give it a sensible name and better namespace > semantics"? > > I could go for that... The only problem with this approach is that for current systems you would have to make sure that all vendors implemented the new scheme in their automounter and that you could retrospectively add the ability to old systems still in use which have been orphaned by their vendors. I don't see that this is likely to be realistically possible. The only other option, therefore, is to somehow fudge it at the server end. Steve ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On Sep 9, 2007, at 5:14 AM, [EMAIL PROTECTED] wrote: > > >> Mounts under /net are derived from the filesystems actually shared >> from the servers; the automount daemon uses the MOUNT protocol to >> determine this. If you're looking at a path not already seen, the >> information will be fresh, but that's where the good news ends. >> We don't refresh this information reliably, so if you add a new >> share in a directory we've already scanned, you won't see it until >> the mounts time out and are removed. We should refresh this data >> more readily and no matter what the source of data. > > > I know that, yes, but why can't we put such an abstraction > elsewhere in > the name space? One thing I have always disliked about /net mounts is > that they're too magical; it should be possible to replicate them > in some form in other mount maps. There is nothing that would get in the way of this type of approach. A simple migration of the -hosts map (/net) functionality would be to take the prefix used for a regular mount (e.g. server:/a/b) and any share/export found at the server with the same prefix would be available at the client's mount point (e.g. /a/b/c -and- /a/b/d). This would allow the client to mount server:/export/home and all subordinate shares/exports under that single mount. The upcoming NFSv4 client mirror-mounts project will provide this functionality exactly without the need for automount changes (as has been mentioned). Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
> >> Mounts under /net are derived from the filesystems actually shared >> from the servers; the automount daemon uses the MOUNT protocol to >> determine this. If you're looking at a path not already seen, the >> information will be fresh, but that's where the good news ends. > > I know that, yes, but why can't we put such an abstraction > elsewhere in > the name space? One thing I have always disliked about /net mounts is > that they're too magical; it should be possible to replicate them > in some form in other mount maps. In short, you're proposing a solution to the zillions-of-nfs-exports issue, which instead of using "wait for v4 to implement a server-side export consolidation" thingy, would instead be a "better, smarter / net-alike on v2/v3, but give it a sensible name and better namespace semantics"? I could go for that... -a ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On Sep 7, 2007, at 18:25, Stephen Usher wrote: > (I still have many-many machines on Solaris 8) I can see it > being at least a decade until all the machines we have being at a > level > to handle NFSv4. If you need to have a Solaris 8 environment, but want to minimize the number of machines you have to manage, the recently announced Project Etude may be of some interest to you: http://blogs.sun.com/dp/entry/project_etude_revealed It creates a Solaris 8 environment in a Solaris 10 container / zone. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
>Mounts under /net are derived from the filesystems actually shared >from the servers; the automount daemon uses the MOUNT protocol to >determine this. If you're looking at a path not already seen, the >information will be fresh, but that's where the good news ends. >We don't refresh this information reliably, so if you add a new >share in a directory we've already scanned, you won't see it until >the mounts time out and are removed. We should refresh this data >more readily and no matter what the source of data. I know that, yes, but why can't we put such an abstraction elsewhere in the name space? One thing I have always disliked about /net mounts is that they're too magical; it should be possible to replicate them in some form in other mount maps. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Mike Gerdts wrote: > On 9/8/07, Richard Elling <[EMAIL PROTECTED]> wrote: >> Changing the topic slightly, the strategic question is: >> why are you providing disk space to students? > > For most programming and productivity (e.g. word processing, etc.) > people will likely be better suited by having network access for their > personal equipment with local storage. Local storage would be a nightmare for secure back-ups. Having said that, for those using Windows PC and MacOS X we do let them have control of their machine and store things locally, but it's their own risk. The central service merely provides a (smallish) home directory which we guarantee to back-up. Quotas are needed in this case because users can't be trusted to play fair, especially if they don't realise how bit the files that they are dragging and dropping are. These machines are also firewalled to hell and back. For the rest of the researchers, who have Linux or Solaris machines, we do not allow them administrative access. All software and home directories are NFS mounted from the central server so that any machine a user logs into will give them the same set of tools so that they can do their work anywhere they need to. Thier home directories need to be policed by the system because users can't be fully trusted to play fair and secondly some software will try to cache lots of data in their home directories without the user knowing. Now, in our current set-up all these users have a soft limit and a hard quota. Every night a cron job parses the output of repquota -a and informs those people who have gone overtheir soft quota and hard quota. The difference in size between the soft and hard quotas is enough that, in general, it doesn't affect the user's work and allows them to remediate the problem before it becomes critical (and important files suddenly get emptied or the user can't log in). For large datasets the research groups have their own servers from which data etc. is available. As said previously, the central allocation of space is merely enough for day-to-day documents/theses/papers etc. Oh, and our HPC grid is fully intergrated into this set-up as well. The idea being a consistant experience throughout the research network. Steve -- --- Computer Systems Administrator, E-Mail:[EMAIL PROTECTED] Department of Earth Sciences, Tel:- +44 (0)1865 282110 University of Oxford, Parks Road, Oxford, UK. Fax:- +44 (0)1865 272072 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
[EMAIL PROTECTED] wrote: > >> For NFSv2/v3, there's no easy answers. Some have experimented >> with executable automounter maps that build a list of filesystems >> on the fly, but ick. At some point, some of the global namespace >> ideas we kick around may benefit NFSv2/v3 as well. > > > The question for me is: why does this work for /net mounts (to a point, > of course) and why can't we emulate this for other mount points? Mounts under /net are derived from the filesystems actually shared from the servers; the automount daemon uses the MOUNT protocol to determine this. If you're looking at a path not already seen, the information will be fresh, but that's where the good news ends. We don't refresh this information reliably, so if you add a new share in a directory we've already scanned, you won't see it until the mounts time out and are removed. We should refresh this data more readily and no matter what the source of data. Rob T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
> why are you providing disk space to students? > > When you solve this problem, the quota problem is moot. > > NB. I managed a large University network for several years, and > am fully aware of the costs involved. I do not believe that the > 1960s timeshare model will survive in such environments. So are you saying you don't believe the network is the computer? :-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Richard Elling wrote: > Stephen Usher wrote: >> I've just subscribed to this list after Alec's posting and reading >> the comments in the archive and I have a couple of comments: >> > > Welcome Steve, > I think you'll find that we rehash this about every quarter with an > extra kicker just before school starts in the fall. > > Changing the topic slightly, the strategic question is: > why are you providing disk space to students? This is actually the research network, so this is for facalty, post-doctural fellows and post-graduate students to do their research jobs. The only undergraduates involved are 4th year ones doing research projects within the research teams. The space being allocated is the basic resource supplied centrally by the Department and for some is the only resource that they have as they don't get any money for their own computing systems in their grants. > When you solve this problem, the quota problem is moot. Not really, not when you have few resources but have to give them out fairly. Steve ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On 9/8/07, Richard Elling <[EMAIL PROTECTED]> wrote: > Changing the topic slightly, the strategic question is: > why are you providing disk space to students? For most programming and productivity (e.g. word processing, etc.) people will likely be better suited by having network access for their personal equipment with local storage. For cases when specialized expensive tools ($10k + per seat) are used, it is not practical to install them on hundreds or thousands of personal devices for a semester or two of work. The typical computing lab that provides such tools is not well equipped to deal with removable media such as flash drives. Further, such tools will often times be used to do designs that require simulations to run as batch jobs that run under grid computing tools such as Grid Engine, Condor, LSF, etc. Then, of course, there are files that need to be shared, have reliable backups, etc. Pushing that out to desktop or laptop machines is not really a good idea. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Stephen Usher wrote: > I've just subscribed to this list after Alec's posting and reading the > comments in the archive and I have a couple of comments: > Welcome Steve, I think you'll find that we rehash this about every quarter with an extra kicker just before school starts in the fall. Changing the topic slightly, the strategic question is: why are you providing disk space to students? When you solve this problem, the quota problem is moot. NB. I managed a large University network for several years, and am fully aware of the costs involved. I do not believe that the 1960s timeshare model will survive in such environments. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
>For NFSv2/v3, there's no easy answers. Some have experimented >with executable automounter maps that build a list of filesystems >on the fly, but ick. At some point, some of the global namespace >ideas we kick around may benefit NFSv2/v3 as well. The question for me is: why does this work for /net mounts (to a point, of course) and why can't we emulate this for other mount points? Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On Fri, Sep 07, 2007 at 06:19:34PM -0500, Mike Gerdts wrote: > > backups and restores. Snapshots of the zvols could be mounted as > other UFS file systems that could allow for self-service restores. > Perhaps this would make it so that you can write data to tape a bit > less frequently. This would be a huge win I think. We do something similar with our mail system (NFS mounted to a NetApp). We quicese all the dbs (bdb essentially) and execute a snapshot. Takes mere moments. Then we backup from the snapshot. This allows us to perform a multi-hour backup without having to take the mailsystem offline at all. To be able to apply this to other systems, especially ones that wouldn't even know any better (UFS, NTFS, etc) would certainly be a nice way to go. In fact, I'll have to try this on the XP box on my desk that mounts iSCSI zvols the next time I'm in the office. ;) -brian -- "Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it's just that most of the shit out there is built by people who'd be better suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Agreed on the quota issue. When you have 50K users, having a filesystem per user becomes unwieldy and effectively unusable. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On 9/7/07, Stephen Usher <[EMAIL PROTECTED]> wrote: > > Brian H. Nelson: > > I'm sure it would be interesting for those on the list if you could > outline the gotchas so that the rest of us don't have to re-invent the > wheel... or at least not fall down the pitfalls. The UFS on zvols option sounds intriguing to me, but I would guess that the following could be problems: 1) Double buffering: Will ZFS store data in the ARC while UFS uses traditional file system buffers? 2) Boot order dependencies. How does the startup of zfs compare to processing of /etc/vfstab? I would guess that this is OK due to legacy mount type supported by zfs. If this is OK, then dfstab processing is probably OK. I say intriguing because it could give you a the improved data integrity checks and bit more flexibility in how you do things like backups and restores. Snapshots of the zvols could be mounted as other UFS file systems that could allow for self-service restores. Perhaps this would make it so that you can write data to tape a bit less frequently. If deduplication comes into zfs, you may be able to get to a point where course project instructions that say "cp ~course/hugefile ~" become not so expensive - you would be charging quota to each user but only storing one copy. Depending on the balance of CPU power vs. I/O bandwidth, compressed zvols could be a real win, more than paying back the space required to have a few snapshots around. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On Fri, Sep 07, 2007 at 11:25:38PM +0100, Stephen Usher wrote: > Nicolas Williams: > > Unfortunately for us at the coal face it's very rare that we can do the > ideal thing. Quotas are part of the problem but the main problem is that > there is currently no way over overcoming the interoperability problems > using the toolset offered by ZFS. Understood. I'll let the ZFS team answer this. > One way around this for NFSv2/3 clients would be if the ZFS NFS server > could "consolidate" a tree of filesystems so that to the clients it > looks like one filesystem. From the outside the development group this > seems like the 90% solution which would probably take less engineering > effort than the full implementation of a user quota system. I'm not sure > why the OS (outside the ZFS subsystem) would need to know that the > directory tree it's seeing is composed of separate "filesystems" and is > not just one big filesystem. (Unless, of course, there are tape archival > programs which require to save and recreate ZFS sub-filesystems.) It > would also have the added benefit of making df(1) usable again. ;-) Unfortunately there's no way to do this and preserve NFS and POSIX semantics (those preserved by NFS). Think of hard links, to name but one very difficult problem. Just the task of creating a uniform, persistent inode number space out of a multitude of distinct filesystems would be daunting indeed. That is, there are good technical reasons why what you propose is non-trivial. The "why the OS ... would need to know that the directory tree it's seeing is composed of separate "filesystems"" lies in POSIX semantics. And it's as true on the client side as on the server side. The problem you're running into is a limitation of the *client*, not of the server. The quota support you're asking for is to enable a server-side workaround for a client-side problem.. > Believe me when I say that I'd love to use ZFS and would love to be able > to recommend it to everyone as, other than this particular set of > problems, it seems such a great system. My posting on Slashdot was the > culmination of frustration and disappointment after a number of days > trying every trick I could think of to get it working and failing. My view (remember, I'm not in the ZFS team) is that ZFS may simply not be applicable to your use case. That you may find other use cases where it is applicable. If adding quota support is easy, if it's all you need to workaround the automounter issue and if my opinion mattered, then I'd say that we should have ZFS quotas. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Mike Gerdts wrote: > Having worked in academia and multiple Fortune 100's, the problem > seems to be most prevalent in academia, although possibly a minor > inconvenience in some engineering departments in industry. In the > .edu where I used to manage the UNIX environment, I would have a tough > time weighing the complexities of quotas he mentions vs. the other > niceties. My guess is that unless I had something that was really > broken, I would stay with UFS or VxFS waiting for a fix. > UFS on a zvol is a pretty good compromise. You get lots of the nice ZFS stuff (checksums, raidz/z2, snapshots, growable pool, etc) with no changes in userland. There are a couple gotcha's but as long as you're aware of them, it works pretty good. We've been using it since January. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Mike Gerdts wrote: > It appears as though the author has not yet tried out snapshots. The > fact that space used by a snapshot for the sysadmin's convenience > counts against the user's quota is the real killer. Very soon there will be another way to specify quotas (and reservations) such that they only apply to the space used by the active dataset. This should make the effect of quotas more obvious to end users while allowing them to remain blissfully unaware of any snapshot activity by the sysadmin. -Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On 9/7/07, Mike Gerdts <[EMAIL PROTECTED]> wrote: > For me, quotas are likely to be a pain point that prevents me from > making good use of snapshots. Getting changes in application teams' > understanding and behavior is just too much trouble. Others are: not to mention there are smaller-scale users that want the data protection, checksumming and scalability that ZFS offers (although the whole zdev/zpool/etc. thing might wind up causing me to have to buy more disks to add more space, if i were to use it) it would be nice to have a ZFS lite(tm) for those of us that just want easily expandable filesystems (as in, add a new disk/device and not have to think of some larger geometry) with inline checksumming/COW/metadata/ditto blocks/etc/etc goodness. basically like a home edition. i don't care about LUNs, send/receive, quotas, snapshots (for the most part), setting up different zpools to gain specific performance benefits, etc. i just want raid-z/raid-z2 with a easy way to add disks. i have not actually used ZFS yet because i've been waiting for opensolaris/solaris (or even freebsd possibly) to support eSATA hardware or something related. the hardware support front for SOHO users has also been slow. that's not a shortcoming of ZFS though... but does make me wish i had the basic protection features of ZFS with hardware support like linux. - my two cents ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
The complaint is not new, and the problem isn't quotas or lack thereof. The problem is that remote filesystem clients can't cope with frequent changes to a server's share list, which is just ZFS's "filesystems are cheap" approach promotes. Basically ZFS was ahead of everyone's implementation of NFSv4 client- side mount mirroring, which would very much help with the dynamic nature of ZFS usage. It does not help that no NFSv3 automounter is sufficiently dynamic to reasonably cope with filesystems coming and going. Given the automounter pain this customer would like to have one large filesystem and quotas. And that's how quotas are a secondary problem. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On 9/7/07, Alec Muffett <[EMAIL PROTECTED]> wrote: > > The main bugbear is what the ZFS development team laughably call > > quotas. They aren't quotas, they are merely filesystem size > > restraints. To get around this the developers use the "let them eat > > cake" mantra, "creating filesystems is easy" so create a new > > filesystem for each user, with a "quota" on it. This is the ZFS way. Having worked in academia and multiple Fortune 100's, the problem seems to be most prevalent in academia, although possibly a minor inconvenience in some engineering departments in industry. In the .edu where I used to manage the UNIX environment, I would have a tough time weighing the complexities of quotas he mentions vs. the other niceties. My guess is that unless I had something that was really broken, I would stay with UFS or VxFS waiting for a fix. It appears as though the author has not yet tried out snapshots. The fact that space used by a snapshot for the sysadmin's convenience counts against the user's quota is the real killer. This would force me into a disk to disk (rsync, because "zfs send | zfs recv" would require snapshots to stay around for incrementals) backup + snapshot scenario to be able to keep snapshots while minimizing their impact on users. That means double the disk space. Doubling the quota is not an option because without soft quotas there is no way to keep people from using all of their space. Frankly, that would be so much trouble I would be better off using tape for restores, just like with UFS or VxFS. > > Now, with each user having a separate filesystem this breaks. The > > automounter will mount the parent filesystem as before but all you > > will see are the stub directories ready for the ZFS daughter > > filesystems to mount onto and there's no way of consolidating the > > ZFS filesystem tree into one NFS share or rules in automount map > > files to be able to do sub-directory mounting. While NFS4 holds some promise here, it is not a solution today. It won't be until all OS's that came out before 2008 are gone. That will be a while. Use of macros (e.g. * server:/home/&) can go a long ways. If that doesn't do it, an executable map that does the appropriate munging may be in order. > > The problem here is one of legacy code, which you'll find > > throughout the academic, and probably commercial world. Basically, > > there's a lot of user generated code which has hard coded paths so > > any new system has to replicate what has gone before. (The current > > system here has automount map entries which map new disks to the > > names of old disks on machines long gone, e.g. /home/eeyore_data/ ) Put such entries before the * entry and things should be OK. For me, quotas are likely to be a pain point that prevents me from making good use of snapshots. Getting changes in application teams' understanding and behavior is just too much trouble. Others are: 1. There seems to be no integration with backup tools that are time+space+I/O efficient. If my storage is on Netapp, I can use NDMP to do incrementals between snapshots. No such thing exists with ZFS. 2. Use of clones is out because I can't do a space-efficient restore. 3. ARC messes up my knowledge of how much RAM my machine is making good use of. After the first backup, vmstat says that I am just at the brink of not having enough RAM that paging (file system and pager) will begin soon. This may be fine on a file server, but it really messes with me if it is a J2EE server and I'm trying to figure out how many more app servers I can add. I have a lot of hopes for ZFS and have used it with success (and failures) in limited scope. I'm sure that with time the improvements will come that make that scope increase dramatically, but for now it is confined to the lab. :( Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Alec Muffett wrote: >> But >> finally, and this is the critical problem, each user's home >> directory is now a separate NFS share. >> >> At first look that final point doesn't seem to be much of a worry >> until you look at the implications that brings. To cope with a >> distributed system with a large number of users the only managable >> way of handling NFS mounts is via an automounter. The only >> alternative would be to have an fstab/vfstab file holding every >> filesystem any user might want. In the past this has been no >> problem at all, for all your user home directories on a server you >> could just export the parent directory holding all the user home >> directories and put a line "users -rw,intr myserver:/disks/users" >> and it would work happily. >> >> Now, with each user having a separate filesystem this breaks. The >> automounter will mount the parent filesystem as before but all you >> will see are the stub directories ready for the ZFS daughter >> filesystems to mount onto and there's no way of consolidating the >> ZFS filesystem tree into one NFS share or rules in automount map >> files to be able to do sub-directory mounting. Sun's NFS team is close to putting back a fix to the Nevada NFS client for this where a single mount of the root of a ZFS tree lets you wander into the daughter filesystems on demand, without automounter configuration. You have to be using NFSv4, since it relies on the server namespace protocol feature. Some other NFSv4 clients already do this. This has always been a part of the plan to cope with more right-sized filesystems, we've just not there yet. For NFSv2/v3, there's no easy answers. Some have experimented with executable automounter maps that build a list of filesystems on the fly, but ick. At some point, some of the global namespace ideas we kick around may benefit NFSv2/v3 as well. Rob T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss