Re: [zfs-discuss] User quota design discussion..

Matthew Ahrens Thu, 12 Mar 2009 10:03:52 -0700

Jorgen Lundman wrote:

In the style of a discussion over a beverage, and talking aboutuser-quotas on ZFS, I recently pondered a design for implementing userquotas on ZFS after having far too little sleep.
It is probably nothing new, but I would be curious what you expertsthink of the feasibility of implementing such a system and/or whether ornot it would even realistically work.
I'm not suggesting that someone should do the work, or even that I will,but rather in the interest of chatting about it.

As it turns out, I'm working on zfs user quotas presently, and expect tointegrate in about a month. My implementation is in-kernel, integrated withthe rest of ZFS, and does not have the drawbacks you mention below.

Feel free to ridicule me as required! :)

Thoughts:
Here at work we would like to have user quotas based on uid (andpresumably gid) to be able to fully replace the NetApps we run. CurrentZFS are not good enough for our situation. We simply can not mount500,000 file-systems on all the NFS clients. Nor do all servers we runsupport mirror-mounts. Nor do auto-mount see newly created directorieswithout a full remount.
Current UFS-style-user-quotas are very exact. To the byte even. We donot need this precision. If a user has 50MB of quota, and they are ableto reach 51MB usage, then that is acceptable to us. Especially sincethey have to go under 50MB to be able to write new data, anyway.


Good, that's the behavior that user quotas will have -- delayed enforcement.

Instead of having complicated code in the kernel layer, slowing down thefile-system with locking and semaphores (and perhaps avoiding learningindepth ZFS code?), I was wondering if a more simplistic setup could bedesigned, that would still be acceptable. I will use the word'acceptable' a lot. Sorry.
My thoughts are that the ZFS file-system will simply write a'transaction log' on a pipe. By transaction log I mean uid, gid and'byte count changed'. And by pipe I don't necessarily mean pipe(2), butit could be a fifo, pipe or socket. But currently I'm thinking'/dev/quota' style.
User-land will then have a daemon, whether or not it is one daemon perfile-system or really just one daemon does not matter. This process willopen '/dev/quota' and empty the transaction log entries constantly. Takethe uid,gid entries and update the byte-count in its database. How westore this database is up to us, but since it is in user-land it shouldhave more flexibility, and is not as critical to be fast as it wouldhave to be in kernel.
The daemon process can also grow in number of threads as demand increases.
Once a user's quota reaches the limit (note here that /the/ call towrite() that goes over the limit will succeed, and probably a couplemore after. This is acceptable) the process will "blacklist" the uid inkernel. Future calls to creat/open(CREAT)/write/(insert list of calls)will be denied. Naturally calls to unlink/read etc should still succeed.If the uid goes under the limit, the uid black-listing will be removed.
If the user-land process crashes or dies, for whatever reason, thebuffer of the pipe will grow in the kernel. If the daemon is restartedsufficiently quickly, all is well, it merely needs to catch up. If thepipe does ever get full and items have to be discarded, a full-scan willbe required of the file-system. Since even with UFS quotas we need tooccasionally run 'quotacheck', it would seem this too, is acceptable (ifundesirable).

My implementation does not have this drawback. Note that you would need touse the recovery mechanism in the case of a system crash / power loss aswell. Adding potentially hours to the crash recovery time is not acceptable.

If you have no daemon process running at all, you have no quotas at all.But the same can be said about quite a few daemons. The administratorsneed to adjust their usage.
I can see a complication with doing a rescan. How could this be doneefficiently? I don't know if there is a neat way to make this happeninternally to ZFS, but from a user-land only point of view, perhaps asnapshot could be created (synchronised with the /dev/quota pipereading?) and start a scan on the snapshot, while still processingkernel log. Once the scan is complete, merge the two sets.
Advantages are that only small hooks are required in ZFS. The byteupdates, and the blacklist with checks for being blacklisted.
Disadvantages are that it is loss of precision, and possibly slowerrescans? Sanity?

Not to mention that this information needs to get stored somewhere, and dealtwith when you zfs send the fs to another system.

But I do not really know the internals of ZFS, so I might be completelywrong, and everyone is laughing already.
Discuss?


--matt
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] User quota design discussion..

Reply via email to