Re: [zfs-discuss] Re: ZFS RAID10
RM: I do not understand - why in some cases with smaller block writing block twice could be actually faster than doing it once every time? I definitely am missing something here... In addition to what Neil said, I want to add that when an application O_DSYNC write cover only parts of a file record you have the choice of issuing a log I/O that contains only the newly written data or do a full record I/O (using the up-to-date cached record) along with a small log I/O to match. So if you do 8K writes to a file stored using 128K records, you truly want each 8K writes to go to the log and then every txg, take the state of a record and I/O that. You certainly don't want to I/O 128K every 8K writes. But then if you do a 100K write, it's not as clear a win. Should I cough up the full 128K I/O now, hoping that the record will not be modified further before the txg clock hits ? That's part of what goes into zfs_immediate_write_sz. And even for full record writes, there are some block allocation issues that come into play and complicates things further. -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] Re: ZFS RAID10
Robert Milkowski writes: Hello Neil, Thursday, August 10, 2006, 7:02:58 PM, you wrote: NP Robert Milkowski wrote: Hello Matthew, Thursday, August 10, 2006, 6:55:41 PM, you wrote: MA On Thu, Aug 10, 2006 at 06:50:45PM +0200, Robert Milkowski wrote: btw: wouldn't it be possible to write block only once (for synchronous IO) and than just point to that block instead of copying it again? MA We actually do exactly that for larger (32k) blocks. Why such limit (32k)? NP By experimentation that was the cutoff where it was found to be NP more efficient. It was recently reduced from 64K with a more NP efficient dmu-sync() implementaion. NP Feel free to experiment with the dynamically changable tunable: NP ssize_t zfs_immediate_write_sz = 32768; I've just checked using dtrace on one of production nfs servers that 90% of the time arg5 in zfs_log_write() is exactly 32768 and the rest is always smaller. Those should not be O_DSYNC though. Are they ? The I/O should be deferred to a subsequent COMMIT but then I'm not sure how it's handled then. -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Removing a device from a zfs pool
Hi there Are there any consideration given to this feature...? I would also agree that this will not only be a testing feature, but will find it's way into production. It would probably work on the same princaple of swap -a and swap -d ;) Just a little bit more complex. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance using slices vs. entire disk?
Darren: With all of the talk about performance problems due to ZFS doing a sync to force the drives to commit to data being on disk, how much of a benefit is this - especially for NFS? I would not call those things as problems, more like setting proper expectations. My understanding is that enabling write cache helps by providing I/O concurrency for drives that do not implement other form of Command Queuing. In other cases, WCE should not buy much if anything. I'd be interested in analysing any cases that shows otherwise... -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] user quotas vs filesystem quotas?
Hi, I'm looking at moving two UFS quota-ed filesystems to ZFS under Solaris 10 release 6/06, and the quota issue is gnarly. One filesystem is user home directories and I'm aiming towards the one zfs filesystem per user model, attempting to use Casper Dik's auto_home script for on-the-fly zfs filesystem creation. I'm having problems there, but that is an automounter issue, not ZFS. The other filesystem is /var/mail on my mail server. I've traditionally run (big) user quotas in mailboxes just to keep some malicious emailer from filling up /var/mail, maybe. The notion of having one zfs filesystem per mailbox seems unwieldy, just to run quotas per user. Are there any plans/schemes for per-user quotas within a ZFS filesystem, akin to the UFS quotaon(1M) mechanism? I take it that quotaon won't work with a ZFS filesystem, right? Suggestions please? My notion right now is to drop quotas for /var/mail. Jeff Earickson Colby College ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of seeks?
On Aug 9, 2006, at 8:18 AM, Roch wrote: So while I'm feeling optimistic :-) we really ought to be able to do this in two I/O operations. If we have, say, 500K of data to write (including all of the metadata), we should be able to allocate a contiguous 500K block on disk and write that with a single operation. Then we update the Uberblock. Hi Anton, Optimistic a little yes. The data block should have aggregated quite well into near recordsize I/Os, are you sure they did not ? No O_DSYNC in here right ? When I repeated this with just 512K written in 1K chunks via dd, I saw six 16K writes. Those were the largest. The others were around 1K-4K. No O_DSYNC. dd if=/dev/zero of=xyz bs=1k count=512 So some writes are being aggregated, but we're missing a lot. Once the data blocks are on disk we have the information necessary to update the indirect blocks iteratively up to the ueberblock. Those are the smaller I/Os; I guess that becauseof ditto blocks they go to physically seperate locations, by design. We shouldn't have to wait for the data blocks to reach disk, though. We know where they're going in advance. One of the key advantages of the überblock scheme is that we can, in a sense, speculatively write to disk. We don't need the tight ordering that UFS requires to avoid security exposures and allow the file system to be repaired. We can lay out all of the data and metadata, write them all to disk, choose new locations if the writes fail, etc. and not worry about any ordering or state issues, because the on-disk image doesn't change until we commit it. You're right, the ditto block mechanism will mean that some writes will be spread around (at least when using a non-redundant pool like mine), but then we should have at most three writes followed by the überblock update, assuming three degrees of replication. All of these though are normally done asynchronously to applications, unless the disks are flooded. Which is a good thing (I think they're asynchronous anyway, unless the cache is full). But I follow you in that, It may be remotely possible to reduce the number of Iterations in the process by assuming that the I/O will all succeed, then if some fails, fix up the consequence and when all done, update the ueberblock. I would not hold my breath quite yet for that. Hmmm. I guess my point is that we shouldn't need to iterate at all. There are no dependencies between these writes; only between the complete set of writes and the überblock update. -- Anton ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS LVM and EVMS
Thanks for replying (I thought nobody would bother.) So, If understand correctly, I won't give up ANYTHING available in EVMS. LVM , Linux Raid -by going to ZFS and Raid -Z Right ? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Proposal: zfs create -o
Following up on earlier mail, here's a proposal for create-time properties. As usual, any feedback or suggestions is welcome. For those curious about the implementation, this finds its way all the way down to the create callback, so that we can pick out true create-time properties (e.g. volblocksize, future crypto properties). The remaining properties are handled by the generic creation code. - Eric A. INTRODUCTION A complicated ZFS installation will typically create a number of datasets, each with their own property settings. Currently, this requires several steps, one for creating the dataset, and one for each property that must be configured: # zfs create pool/fs # zfs set compression=on pool/fs # zfs set mountpoint=/export pool/fs ... This has several drawbacks, the first of which is simply unnecessary steps. For these complicated setups, it would be simpler to create the dataset and all its properties at the same time. This has been requested by the ZFS community, and resulted in the following RFE: 6367103 create-time properties More importantly, it forces the user to instantiate (and often mount) the dataset before assigning properties. In the case of the 'mountpoint' property, it means that we create an inherited mountpoint, only to be later changed when the property is modified. This also makes setting the 'canmount' property (PSARC 2006/XXX) more intuitive. This RFE is also required for crypto support, as the encryption algorithm must be known when the filesystem is created It also has the benefit of cleaning up the implementation of other creation-time properties (volsize and volblocksize) that were previously special cases. B. DESCRIPTION This case adds a new option, 'zfs create -o', which allows for any ZFS property to be set at creation time. Multiple '-o' options can appear in the same subcommand. Specifying the same property multiple times in the same command results in an error. For example: # zfs create -o compression=on -o mountpoint=/export pool/fs The option '-o' was chosen over '-p' (for 'property') to reserve this for a future RFE: 6290249 zfs {create,clone,rename} -p to create parents The functionality of 'zfs create -b' has been superceded by this new option, though it will be retained for backwards compatibility. There is no plan to formally obsolete or remove this options. For example: # zfs create -b 16k -V 10M pool/vol is equivalent to # zfs create -o volblocksize=16k -V 10M pool/vol If '-o volblocksize' is specified in addition to '-b', the resulting behavior is undefined. C. MANPAGE CHANGES TBD ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS LVM and EVMS
No, there are some features we haven't implemented, that may or may not be available in other RAID solutions. In particular: - ZFS storage pool cannot be 'shrunk', i.e. removing an entire toplevel device (mirror, RAID group, etc). Devices can be removed by attaching and detaching to existing mirrors, but you cannot shrink the overall size of the pool. - ZFS RAID-Z stripes cannot be expanded. ZFS storage pools are all dynamically striped across all device groups. So you can add a new RAID-Z group ((5+1) - 2x(5+1) for example), but you cannot expand an existing stripe ((5+1) - (6+1)). There are likely other features that are different and/or missing from other solutions, so it's a little extreme to say you won't give up ANYTHING. But in terms of large scale features, there's not much besides the two above, and remember that you have a lot to gain ;-) - Eric On Fri, Aug 11, 2006 at 09:28:58AM -0700, Humberto Ramirez wrote: Thanks for replying (I thought nobody would bother.) So, If understand correctly, I won't give up ANYTHING available in EVMS. LVM , Linux Raid -by going to ZFS and Raid -Z Right ? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Difficult to recursive-move ZFS filesystems to another server
Just wanted to point this out -- I have a large web tree that used to have UFS user quotas on it. I converted to ZFS using the model that each user has their own ZFS filesystem quota instead. I worked around some NFS/automounter issues, and it now seems to be working fine. Except now I have to move it to another server. The problem is that there doesn't appear to be any recursive dump/restore command that lets me do this easily. 'zfs send' and 'zfs receive' only appear to work within filesystem boundaries. What I want to do is move all of zfspool/www from server A to server B. Each user filesystem underneath zfspool/www: zfspool/www/user-joe zfspool/www/user-john zfspool/www/user-mary ...has a unique quota assigned to it. There doesn't appear to be a way to move zfspool/www and its decendants en masse to a new machine with those quotas intact. I have to script the recreation of all of the descendant filesystems by hand. I can move the *data* with tar or rsync easily enough, but it seems silly that I have to recreate all the descendant filesystems and their characteristics by hand. I know the comprehensive dump subject has been brought up before... I'd like to reiterate a suggestion that it'd be nice if the various commands (zfs send/receive, zfs snapshot) could optionally include a filesystem's descendants. If zfs send could do this and included the filesystem quotas, it might solve this issue. Or maybe I'm missing something? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of seeks?
On Fri, Aug 11, 2006 at 11:04:06AM -0500, Anton Rang wrote: Once the data blocks are on disk we have the information necessary to update the indirect blocks iteratively up to the ueberblock. Those are the smaller I/Os; I guess that becauseof ditto blocks they go to physically seperate locations, by design. We shouldn't have to wait for the data blocks to reach disk, though. We know where they're going in advance. One of the key advantages of the ?berblock scheme is that we can, in a sense, speculatively write to disk. We don't need the tight ordering that UFS requires to avoid security exposures and allow the file system to be repaired. We can lay out all of the data and metadata, write them all to disk, choose new locations if the writes fail, etc. and not worry about any ordering or state issues, because the on-disk image doesn't change until we commit it. You're right, the ditto block mechanism will mean that some writes will be spread around (at least when using a non-redundant pool like mine), but then we should have at most three writes followed by the ?berblock update, assuming three degrees of replication. The problem is that you don't know the actual *contents* of the parent block until *all* of its children have been written to their final locations. (This is because the block pointer's value depends on the final location) The ditto blocks don't really effect this, since they can all be written out in parallel. So you end up with the current N phases; data, it's parents, it's parents, ..., uberblock. But I follow you in that, It may be remotely possible to reduce the number of Iterations in the process by assuming that the I/O will all succeed, then if some fails, fix up the consequence and when all done, update the ueberblock. I would not hold my breath quite yet for that. Hmmm. I guess my point is that we shouldn't need to iterate at all. There are no dependencies between these writes; only between the complete set of writes and the ?berblock update. Again, there is; if a block write fails, you have to re-write it and all of it's parents. So the best you could do would be: 1. assign locations for all blocks, and update the space bitmaps as necessary. 2. update all of the non-Uberdata blocks with their actual contents (which requires calculating checksums on all of the child blocks) 3. write everything out in parallel. 3a. if any write fails, re-do 1+2 for that block, and 2 for all of its parents, then start over at 3 with all of the changed blocks. 4. once everything is on stable storage, update the uberblock. That's a lot more complicated than the current model, but certainly seems possible. Cheers, - jonathan (this is only my understanding of how ZFS works; I could be mistaken) -- Jonathan Adams, Solaris Kernel Development ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of seeks?
On Aug 11, 2006, at 12:38 PM, Jonathan Adams wrote: The problem is that you don't know the actual *contents* of the parent block until *all* of its children have been written to their final locations. (This is because the block pointer's value depends on the final location) But I know where the children are going before I actually write them. There is a dependency of the parent's contents on the *address* of its children, but not on the actual write. We can compute everything that we are going to write before we start to write. (Yes, in the event of a write failure we have to recover; but that's very rare, and can easily be handled -- we just start over, since no visible state has been changed.) The ditto blocks don't really effect this, since they can all be written out in parallel. The reason they affect my desire of turning the update into a two-phase commit (make all the changes, then update the überblock) is because the ditto blocks are deliberately spread across the disk, so we can't collect them into a single write (for a non-redundant pool, or at least a one- disk pool -- presumably they wind up on different disks for a two-disk pool, in which case we can still do a single write per disk). Again, there is; if a block write fails, you have to re-write it and all of it's parents. So the best you could do would be: 1. assign locations for all blocks, and update the space bitmaps as necessary. 2. update all of the non-Uberdata blocks with their actual contents (which requires calculating checksums on all of the child blocks) 3. write everything out in parallel. 3a. if any write fails, re-do 1+2 for that block, and 2 for all of its parents, then start over at 3 with all of the changed blocks. 4. once everything is on stable storage, update the uberblock. That's a lot more complicated than the current model, but certainly seems possible. (3a could actually be simplified to simply mark the bad blocks as unallocatable, and go to 1, but it's more efficient as you describe.) The eventual advantage, though, is that we get the performance of a single write (plus, always, the überblock update). In a heavily loaded system, the current approach (lots of small writes) won't scale so well. (Actually we'd probably want to limit the size of each write to some small value, like 16 MB, simply to allow the first write to start earlier under fairly heavy loads.) As I pointed out earlier, this would require getting scatter/gather support through the storage subsystem, but the potential win should be quite large. Something to think about for the future. :-) Incidentally, this is part of how QFS gets its performance for streaming I/O. We use an allocate forward policy, allow very large allocation blocks, and separate the metadata from data. This allows us to write (or read) data in fairly large I/O requests, without unnecessary disk head motion. Anton ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS
Leon Koll wrote: On 8/11/06, eric kustarz [EMAIL PROTECTED] wrote: Leon Koll wrote: ... So having 4 pools isn't a recommended config - i would destroy those 4 pools and just create 1 RAID-0 pool: #zpool create sfsrocks c4t00173801014Bd0 c4t00173801014Cd0 c4t001738010140001Cd0 c4t0017380101400012d0 each of those devices is a 64GB lun, right? I did it - created one pool, 4*64GB size, and running the benchmark now. I'll update you on results, but one pool is definitely not what I need. My target is - SunCluster with HA ZFS where I need 2 or 4 pools per node. Why do you need 2 or 4 pools per node? If you're doing HA-ZFS (which is SunCluster 3.2 - only available in beta right now), then you should divide your storage up to the number of I know, I run the 3.2 now. *active* pools. So say you have 2 nodes and 4 luns (each lun being 64GB), and only need one active node - then you can create one pool of To have one active node doesn't look smart to me. I want to distribute load between 2 nodes, not to have 1 active and 1 standby. The LUN size in this test is 64GB but in real configuration it will be 6TB all 4 luns, and attach the 4 luns to both nodes. The way HA-ZFS basically works is that when the active node fails, it does a 'zpool export', and the takeover node does a 'zpool import'. So both nodes are using the same storage, but they cannot use the same storage at the same time, see: http://www.opensolaris.org/jive/thread.jspa?messageID=49617 Yes, it works this way. If however, you have 2 nodes, 4 luns, and wish both nodes to be active, then you can divy up the storage into two pools. So each node has one active pool of 2 luns. All 4 luns are doubly attached to both nodes, and when one node fails, the takeover node then has 2 active pools. I agree with you - I can have 2 active pools, not 4 in case of dual-node cluster. So how many nodes do you have? and how many do you wish to be active at a time? Currently - 2 nodes, both active. If I define 4 pools, I can easily expand the cluster to the 4-nodes configuration, that may be the good reason to have 4 pools. Ok, that makes sense. And what was your configuration for VxFS and SVM/UFS? 4 SVM concat volumes (I need a concatenation of 1TB LUNs if I am in SC3.1 that doesn't support EFI label) with UFS or VxFS on top. So you have 2 nodes, 2 file systems (of either UFS or VxFS) on each node? I'm just trying to make sure its a fair comparison bewteen ZFS, UFS, and VxFS. And now comes the questions - my short test showed that 1-pool config doesn't behave better than 4-pools one - with the first the box was hung, with the second - didn't. Why do you think the 1-pool config is better? I suggested the 1 pool config before i knew you were doing HA-ZFS :) Purposely dividing up your storage (by creating separate pools) in a non-clustered environment usually doesn't make sense (root being one notable exception). eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Proposal expand raidz
Just a data point -- our netapp filer actually creates additional raid groups that are added to the greater pool when you add disks, much as zfs does now. They aren't simply used to expand the one large raid group of the volume.I've been meaning to rebuild the whole thing to get use of the multiple parity disks back. Ours is a few years old and isn't running the latest software rev, so maybe they've overcome this now, but thought I'd mention it. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Difficult to recursive-move ZFS filesystems to another server
On Fri, Aug 11, 2006 at 10:02:41AM -0700, Brad Plecs wrote: There doesn't appear to be a way to move zfspool/www and its decendants en masse to a new machine with those quotas intact. I have to script the recreation of all of the descendant filesystems by hand. Yep, you need 6421959 want zfs send to preserve properties ('zfs send -p') 6421958 want recursive zfs send ('zfs send -r') --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Looking for motherboard/chipset experience, again
What about the Asus M2N-SLI Deluxe motherboard? It has 7 SATA ports, supports ECC memory, socket AM2, generally looks very attractive for my home storage server. Except that it, and the nvidia nForce 570-SLI it's built on, don't seem to be on the HCL. I'm hoping that's just yet, not reported yet. Anybody run Solaris on it? Or at least on any nForce 570-SLI board? Would you risk buying it to find out yourself? I've heard rumors of ZFS in one of the more obscure Linuxes, perhaps Ubuntu; I suppose that could be a backup plan if I try and Solaris doesn't work. I have the general feeling that Linux runs on anything I can buy today, pretty much, since I've been using it for over a decade and am somewhat plugged into the community. I don't yet have the impression that Solaris runs on most anything, possibly after tracking down a few drivers. Does it, really? Should I be not worrying about this so much? -- David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/ RKBA: http://www.dd-b.net/carry/ Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/ Dragaera/Steven Brust: http://dragaera.info/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] user quotas vs filesystem quotas?
On August 11, 2006 10:31:50 AM -0400 Jeff A. Earickson [EMAIL PROTECTED] wrote: Suggestions please? Ideally you'd be able to move to mailboxes in $HOME instead of /var/mail. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question on Zones and memory usage (65120349)
Follow-up: it looks to me like prstat displays the portion of the system's physical memory in use by the processes in that zone. How much memory does that system have? Something seems amiss, as a V490 can hold up to 32GB, and prstat is showing 163GB of physical memory just for fmtest. Irma Garcia wrote: Hi All, Sun Fire V440 Solaris 10 Solaris Resource Manager Customer wrote the following: I have a v490 with 4 zones: tsunami:/#-zoneadm list -iv ID NAME STATUS PATH 0 global running / 4 fmstage running /fmstage 12 fmprod running /fmprod 15 fmtest running /fmtest fmtest has a pool assigned to it with acess to 2 cpus. When I run the psstat -Z in the fmtest zone I see; ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE 15 192 169G 163G 100% 0:29:55 96% fmtest on the global zone (tsunami) I see with the psstat -Z ; ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE 15 188 169G 163G 100% 0:46:00 48% fmtest 0 54 708M 175M 0.1% 2:23:40 0.1% global 12 27 112M 51M 0.0% 0:02:48 0.0% fmprod 4 27 281M 66M 0.0% 0:14:13 0.0% fmstage Questions? Does the 100% memory usage on each mean that the fmtest zone is using all the memory. How come when I run the top command I see different result for memory usage. What is the best method to tie a certian percentage of memory to certain zones — rcapd ?? Thanks in Advance Irma - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- -- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.com OS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Unreliable ZFS backups or....
On August 11, 2006 5:25:11 PM -0700 Peter Looyenga [EMAIL PROTECTED] wrote: I looked into backing up ZFS and quite honostly I can't say I am convinced about its usefullness here when compared to the traditional ufsdump/restore. While snapshots are nice they can never substitute offline backups. It doesn't seem to me that they are meant to. However, while you can make one using 'zfs send' it somewhat worries me that the only way to perform a restore is by restoring the entire filesystem (/snapshot). I somewhat shudder at the thought of having to restore /export/home this way to retrieve but a single file/directory. You can mitigate this by creating more granular filesystems, e.g. a filesystem per user homedir. This has other advantages like per-user quotas. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question on Zones and memory usage (65120349)
On 8/11/06, Irma Garcia [EMAIL PROTECTED] wrote: ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE 15 188 169G 163G 100% 0:46:00 48% fmtest 0 54 708M 175M 0.1% 2:23:40 0.1% global 12 27 112M 51M 0.0% 0:02:48 0.0% fmprod 4 27 281M 66M 0.0% 0:14:13 0.0% fmstage Questions? Does the 100% memory usage on each mean that the fmtest zone is using all the memory. How come when I run the top command I see different result for memory usage. The %mem column is the sum of the %mem that each process uses. Unfortuantely, that value seems to include the pages that are shared between many processes (e.g. database files, libc, etc.) without dividing by the number of processes that have that memory mapped. In other words, if you have 50 database processes that have used mmap() on the same 1 GB database, prstat will think that 50 GB of RAM is used when only 1 GB is really used. I have seen prstat report that oracle workloads on a 15k domain are using well over a terabyte of memory. This is kinda hard to do on a domain with ~300 GB of RAM 50 GB swap. What is the best method to tie a certian percentage of memory to certain zones — rcapd ?? I *think* that rcapd suffers from the same problem that prstat does and may cause undesirable behavior. Because of the way that it works, I fully expect that if rcapd begins to force pages out, the paging activity for the piggy workload will cause severe performance degredation for everything on the machine. My personal opinion (not backed by extensive testing) is that rcapd is more likely to do more harm than good. If the workload that you are trying to control is java-based, consider using the various java flags to limit heap size. This will not protect you against memory leaks in the JVM, but it will protect against a misbehaving app. The same is likely true for the stack size. If the workload you are trying to control is some other single process, consider using ulimit to limit the stack and heap size. Set the size= option for all tmpfs file systems. Bug the folks that are working on memory sets and swap sets to get this code out sooner than later. If running on sun4v, consider LDOM's when they are available (November?). Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss