Re: [zfs-discuss] SXCE build 90 vs S10U6?
On Thu, Jun 12, 2008 at 10:12 PM, Tim [EMAIL PROTECTED] wrote: I guess I find the difference between b90 and opensolaris trivial given we're supposed to be getting constant updates following the sxce builds. But the supported version of OpenSolaris will not be on the same schedule as sxce. Opensolaris 2008.05 is based on snv_86. The supported version will only have bug fixes until 2008.11. That is, it follows much more of fthe same type of schedule that sxde did. Additionally, OpenSolaris has completely redone the installation and packaging bits. When you are running a bunch of servers with aggregate storage capacity of over 100 TB you are probably doing something that is rather important to the company that shelled out well over $100,000 for the hardware. In most (not all) environments that I have worked in this says that you don't want to be relying too heavily on 1.0 software[1] or external web services[2] that the maintainer has not shown a track record[3] of maintaining in a way that meets typical enterprise-level requirements. 1. The non-live CD installer has not even made it into the unstable Mercurial repository. The pkg and beadm commands and associated libraries have less than a month of existence in anything that any vendor is claiming to support. 2. AFAIK, pkg.sun.com does not serve packages yet. pkg.opensolaris.org serves up packages from snv_90 by default even though snv_86 is the variant that is supposedly supported. 3. There were numerous complaints of repeated timeouts when the snv_90 packages were released resulting in having to restart the upgrade from the start. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs promote and ENOSPC (+panic with dtrace)
On Wed, Jun 11, 2008 at 12:58 AM, Robin Guo [EMAIL PROTECTED] wrote: Hi, Mike, It's like 6452872, it need enough space for 'zfs promote' Not really - in 6452872 a file system is at its quota before the promote is issued. I expect that a promote may cause several KB of metadata changes that require some space and as such would require more space than the quota. In my case, quotas are not in used. I had over 1.8 GB free before I issued the zfs promote and fully expected to have roughly the same amount of space free after the promote. It seems as though a wrong comparison about the amount of required free space is being made. I have been able to reproduce - but then when I started poking at it with dtrace (no destructive actions) I got a panic. # mdb *.0 Loading modules: [ unix genunix specfs dtrace cpu.generic uppc scsi_vhci zfs random ip hook neti sctp arp usba fctl md lofs sppp crypto ptm ipc fcp fcip cpc logindmux sv nsctl sdbc ufs rdc ii nsmb ] ::status debugging crash dump vmcore.0 (32-bit) from indy2 operating system: 5.11 snv_86 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=e0620d38 addr=200 occurred in module unkn own due to a NULL pointer dereference dump content: kernel pages only ::stack 0x200(eb1ea000) zfs_ioc_promote+0x3b() zfsdev_ioctl+0xd8(2d8, 5a23, 8045e40, 13, e8b3a020, e0620f78) cdev_ioctl+0x2e(2d8, 5a23, 8045e40, 13, e8b3a020, e0620f78) spec_ioctl+0x65(ddfb6c00, 5a23, 8045e40, 13, e8b3a020, e0620f78) fop_ioctl+0x49(ddfb6c00, 5a23, 8045e40, 13, e8b3a020, e0620f78) ioctl+0x155() sys_call+0x10c() The dtrace command that I was running was: dtrace -n 'fbt:zfs:dsl_dataset_promote:return { trace(arg0); stack() }' -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs promote and ENOSPC
I needed to free up some space to be able to create and populate a new upgrade. I was caught off guard by the amount of free space required by zfs promote. bash-3.2# uname -a SunOS indy2 5.11 snv_86 i86pc i386 i86pc bash-3.2# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 5.49G 1.83G55K /rpool [EMAIL PROTECTED] 46.5K - 49.5K - rpool/ROOT5.39G 1.83G18K none rpool/ROOT/2008.052.68G 1.83G 3.38G legacy rpool/ROOT/2008.05/opt 814M 1.83G 22.3M legacy rpool/ROOT/2008.05/[EMAIL PROTECTED]43K - 22.3M - rpool/ROOT/2008.05/opt/SUNWspro739M 1.83G 739M legacy rpool/ROOT/2008.05/opt/netbeans 52.9M 1.83G 52.9M legacy rpool/ROOT/preview2 2.71G 1.83G 2.71G /mnt rpool/ROOT/[EMAIL PROTECTED] 6.13M - 2.71G - rpool/ROOT/preview2/opt 27K 1.83G 22.3M legacy rpool/export 89.8M 1.83G19K /export rpool/export/home 89.8M 1.83G 89.8M /export/home bash-3.2# zfs promote rpool/ROOT/2008.05 cannot promote 'rpool/ROOT/2008.05': out of space Notice that I have 1.83 GB of free space and the snapshot from which the clone was created (rpool/ROOT/[EMAIL PROTECTED]) is 2.71 GB. It was not until I had more than 2.71 GB of free space that I could promote rpool/ROOT/2008.05. This behavior does not seem to be documented. Is it a bug in the documentation or zfs? -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
On Fri, Jun 06, 2008 at 06:27:01PM -0400, Brian Hechinger wrote: On Fri, Jun 06, 2008 at 02:58:09PM -0700, eric kustarz wrote: clients do not. Without per-filesystem mounts, 'df' on the client will not report correct data though. I expect that mirror mounts will be coming Linux's way too. The should already have them: http://blogs.sun.com/erickustarz/en_US/entry/linux_support_for_mirror_mounts Where does that leave those of us who need to deal with OSX clients? Does apple have any plans to get in on this? Apple plans on supporting NFSv4... including mirror mounts (barring any unseen, insurmountable hurdles). HTH --macko Not speaking officially for Apple, but just as an engineer who works on this stuff. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
On Fri, Jun 06, 2008 at 03:43:29PM -0700, eric kustarz wrote: On Jun 6, 2008, at 3:27 PM, Brian Hechinger wrote: On Fri, Jun 06, 2008 at 02:58:09PM -0700, eric kustarz wrote: clients do not. Without per-filesystem mounts, 'df' on the client will not report correct data though. I expect that mirror mounts will be coming Linux's way too. The should already have them: http://blogs.sun.com/erickustarz/en_US/entry/linux_support_for_mirror_mounts Where does that leave those of us who need to deal with OSX clients? Does apple have any plans to get in on this? They need to implement NFSv4 in general first :) Technically, Mac OS X 10.5 Leopard has some basic NFSv4.0 support in it. But just enough to make it look like it passes all the Connectathon tests. Not enough to warrant use by anyone but the terminally curious (or masochistic). This is mentioned briefly in the mount_nfs(8) man page. It would be reasonable to expect that future MacOSX releases will include increasing levels of functionality and that NFSv4 will eventually be made the default NFS version. But you'd have to ask them on their lists what the status of that is... i know i would like it... Or get lucky and happen to have one of their engineers catch the question on this list and reply... ;-) --macko ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] LiveUpgrade Bug? -- ZFS root finally here in SNV90
If you jumpstart a system, and have it default to a shared / and /var, you can do the following: lucreate -n lu1 lustatus Due to a bug you now have to edit: /bootpool/boot/menu.lst Once that's done, you're in good shape and both environments are bootable (review boot -L) -- If we do the same thing with a separate / and /var using jumpstart profile entry: bootenv installbe bename zfsboot dataset /var Things appear to be working well Lucreate appears to do all the snapshots/clones, and sets some special parameters for /var But when you try to boot from that BE, things die pretty early on in the boot process, likely related to the fact that it didn't actually mount /var Is this a known bug? (is this the proper way to validate, and potentially file bugs for ZFS boot?) Thanks, -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, June 05, 2008 1:56 PM To: Ellis, Mike Cc: ZFS discuss Subject: Re: [zfs-discuss] ZFS root finally here in SNV90 Mike, As we discussed, you can't currently break out other datasets besides /var. I'll add this issue to the FAQ. Thanks, Cindy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
On Wed, Jun 4, 2008 at 11:18 PM, Rich Teer [EMAIL PROTECTED] wrote: Why would one do that? Just keep an eye on the root pool and all is good. The only good argument I have for separating out some of /var is for boot environment management. I grew tired of repeating my arguments and suggestions and wrote a blog entry. http://mgerdts.blogspot.com/2008/03/future-of-opensolaris-boot-environment.html -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Get your SXCE on ZFS here!
The FAQ document ( http://opensolaris.org/os/community/zfs/boot/zfsbootFAQ/ ) has a jumpstart profile example: install_type initial_install pool newpool auto auto auto mirror c0t0d0 c0t1d0 bootenv installbe bename sxce_xx The B90 jumpstart check program (SPARC) flags that the disks should be specified as: c0t0d0s0 c0t1d0s0 (slices) Can someone confirm the FAQ is indeed incorrect perhaps make the adjustment to the FAQ if so warranted? Thanks, -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Cindy Swearingen Sent: Wednesday, June 04, 2008 6:50 PM To: Tim Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Get your SXCE on ZFS here! Tim, Start at the zfs boot page, here: http://www.opensolaris.org/os/community/zfs/boot/ Review the information and follow the links to the docs. Cindy - Original Message - From: Tim [EMAIL PROTECTED] Date: Wednesday, June 4, 2008 4:29 pm Subject: Re: [zfs-discuss] Get your SXCE on ZFS here! To: Kyle McDonald [EMAIL PROTECTED] Cc: zfs-discuss@opensolaris.org, andrew [EMAIL PROTECTED] On Wed, Jun 4, 2008 at 5:01 PM, Kyle McDonald [EMAIL PROTECTED] wrote: andrew wrote: With the release of the Nevada build 90 binaries, it is now possible to install SXCE directly onto a ZFS root filesystem, and also put ZFS swap onto a ZFS filesystem without worrying about having it deadlock. ZFS now also supports crash dumps! To install SXCE to a ZFS root, simply use the text-based installer, after choosing Solaris Express from the boot menu on the DVD. DVD download link: http://www.opensolaris.org/os/downloads/sol_ex_dvd_1/ This release also (I beleive) supports installing on ZFS through JumpStart. Does anyone have a pointer for Docs on what the syntax is for a JumpStart profile to configure ZFS root? -Kyle This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Does this mean zfs boot/root on sparc is working as well? If so... FINALLY :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root finally here in SNV90
In addition to the standard containing the carnage arguments used to justify splitting /var/tmp, /var/mail, /var/adm (process accounting etc), is there an interesting use-case where would one split out /var for compression reasons (as in, turn on compression for /var so that process accounting, network flow, and other such fun logs can be kept on a compressed filesystem (while keeping / (and thereby /usr etc.) uncompressed?) The ZFSBOOT-FAQ document doesn't really show how to break out multiple filesystems with jumpstart profiles... An example there might be helpful... (as its clear this is a frequently asked question :-) Also a compression on, ditto-data-bits on, (or perhaps a generic place to insert zpool/zfs parameters) as part of the jumpstart profiles could also be useful... If SSD is coming fast and furious, being able to use compression, shared free-space (quotas etc) to keep the boot-images small enough so they'll fit and accommodate live-upgrade patching, will become increasingly important. http://www.networkworld.com/news/2008/060308-sun-flash-storage.html?page =1 Rock on guys, -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rich Teer Sent: Thursday, June 05, 2008 12:19 AM To: Bob Friesenhahn Cc: ZFS discuss Subject: Re: [zfs-discuss] ZFS root finally here in SNV90 On Wed, 4 Jun 2008, Bob Friesenhahn wrote: Did you actually choose to keep / and /var combined? Is there any THat's what I'd do... reason to do that with a ZFS root since both are sharing the same pool and so there is no longer any disk space advantage? If / and /var are not combined can they have different assigned quotas without one inheriting limits from the other? Why would one do that? Just keep an eye on the root pool and all is good. -- Rich Teer, SCSA, SCNA, SCSECA CEO, My Online Home Inventory URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer http://www.myonlinehomeinventory.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] /var/sadm on zfs?
On Sun, Jun 1, 2008 at 3:53 AM, Enda O'Connor [EMAIL PROTECTED] wrote: Jim Litchfield at Sun wrote: I think you'll find that any attempt to make zones (certainly whole root ones) will fail after this. right, zoneadm install actually copies in the global zones undo.z into the local zone, so that patchrm of an existing patch will work. haven't tried out what happens when the undo is missing, My guess it works just fine - based upon the fact that patchadd -d does not create the undo.z file. Admittedly, it is sloppy to just get rid of the undo.z file - the existence of the other related directories is (save/patchid) may trip something up. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic: avl_find() succeeded inside avl_add()
On Sat, May 31, 2008 at 9:38 PM, Mike Gerdts [EMAIL PROTECTED] wrote: $ find /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix/.make.state.lock /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix/debug64 panic The stack from this one is... ::stack vpanic(128d918, 300093c3778, 2a1010c7418, 0, 300093c39a8, 1229000) avl_add+0x38(300091da548, 300093c3778, 649e740, 30005f1a180, 800271d6, 128d800) mzap_open+0x18c(cf, 300091da538, 300091df998, 30005f1a180, 300091da520, 300091da508) zap_lockdir+0x54(30003ac6b88, 26b32, 0, 0, 1, 2a1010c78f8) zap_cursor_retrieve+0x40(2a1010c78f0, 2a1010c77d8, 0, 1, 2a1010c78f0, 2) zfs_readdir+0x224(3, 2a1010c7aa0, 30009173308, 2, 2000, 2a1010c77f0) fop_readdir+0x44(300091fe940, 2a1010c7aa0, 30005f403b0, 2a1010c7a9c, 2000, 111dd48) getdents64+0x90(4, 2a1010c7ad0, 2000, 0, 30008245dd0, 0) syscall_trap32+0xcc(4, ff1a, 2000, 0, 0, 0) It tripped up on: 300091fe940::print vnode_t v_path v_path = 0x300082608c0 /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix/debug64 Which is a subdirectory of where it tripped up before. I am able to do find /ws/mount -name serengeti -prune without problems. To make it so that I can hopefully proceed with the build I have moved the directory out of the way, then did an hg update so that I can hopefully get the build I was trying to do to complete. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs equivalent of ufsdump and ufsrestore
On Sat, May 31, 2008 at 9:18 AM, David Magda [EMAIL PROTECTED] wrote: On May 31, 2008, at 06:03, Joerg Schilling wrote: The other method works as root if you use -atime (see man page) and is available since 13 years. Would it be possible to assign an RBAC role to a regular user to accomplish this? If so, would you know which one? You can use ppriv -D -e star ... to figure out which privileges you lack to be able to reset the atime. I suspect that in order to perform backups (and reset atime), you would need to have file_dac_read and file_dac_write. A backup program that has those privileges has everything they need to gain full root access. I wish that there was a flag to open(2) to say not to update the atime and that there was a privilege that could be granted to allow this flag without granting file_dac_write. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] /var/sadm on zfs?
On Sat, May 31, 2008 at 5:16 PM, Bob Friesenhahn [EMAIL PROTECTED] wrote: On my heavily-patched Solaris 10U4 system, the size of /var (on UFS) has gotten way out of hand due to the remarkably large growth of /var/sadm. Can this directory tree be safely moved to a zfs filesystem? How much of /var can be moved to a zfs filesystem without causing boot or runtime issues? /var/sadm is not used during boot. If you have been patching regularly, you probably have a bunch of undo.Z files that are used only in the event that you want to back out. If you don't think you will be backing out any patches that were installed 90 or more days ago the following commands may be helpful: To understand how much space would be freed up by whacking the old undo files: # find /var/sadm/pkg -mtime +90 -name undo.Z | xargs du -k \ | nawk '{t+= $1; print $0} END {printf(Total: %d MB\n, t / 1024)}' Copy the old backout files somewhere else: # cd /var/sadm # find pkg -mtime +90 -name undo.Z \ | cpio -pdv /somewhere/else Remove the old (90+ days) undo files # find /var/sadm/pkg -mtime +90 -name undo.Z | xargs rm -f Oops, I needed those files to back out 123456-01 # cd /somewhere/else # find pkg -name undo.Z | grep 123456-01 \ | cpio -pdv /var/sadm # patchrm 123456-01 Before you do this, test it and convince yourself that it works. I have not seen Sun documentation (either docs.sun.com or sunsolve.sun.com) that says that this is a good idea - but I haven't seen any better method for getting rid of the cruft that builds up in /var/sadm either. I suspect that further discussion on this topic would be best directed to [EMAIL PROTECTED] or sun-managers mailing list (see http://www.sunmanagers.org/). -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] /var/sadm on zfs?
On Sat, May 31, 2008 at 7:37 PM, Jim Litchfield at Sun [EMAIL PROTECTED] wrote: I think you'll find that any attempt to make zones (certainly whole root ones) will fail after this. In no way am I speaking authoritatively on this (I really wish all the install, lu, patch code was open...), but are you perhaps confusing the patch backout data with the CAS and and modifiable (e, v) files that have pristine values stashed under /var/sadm/pkg/pkgname/save/pspool/pkgname? My understanding was that files of type f are installed into non-global zones by copying the file from the installed location in the global zone. So as to not pick up changes that occurred to editable (e - e.g. /etc/passwd) and volatile (v - e.g. /var/adm/messages) in the global zone, the pristine copy is picked up from pspool directories I mention above. To try things out, let's play with alternate root installation of studio 11 at /opt/studio11 - on ZFS. Playing is best done on clones... zfs snapshot pool0/opt/[EMAIL PROTECTED] zfs clone pool0/opt/[EMAIL PROTECTED] pool0/junk I have patch 121015-06 installed - which patches SPROcc. It was installed with patchadd -R /opt/studio11 -M `pwd` * with several patches in the cwd while running the command. The contents of /pool0/junk/var/sadm/pkg/SPROcc look like: var/sadm/pkg/SPROcc var/sadm/pkg/SPROcc/install var/sadm/pkg/SPROcc/install/copyright var/sadm/pkg/SPROcc/install/depend var/sadm/pkg/SPROcc/pkginfo var/sadm/pkg/SPROcc/save var/sadm/pkg/SPROcc/save/121015-06 var/sadm/pkg/SPROcc/save/121015-06/undo.Z var/sadm/pkg/SPROcc/save/pspool var/sadm/pkg/SPROcc/save/pspool/SPROcc var/sadm/pkg/SPROcc/save/pspool/SPROcc/install var/sadm/pkg/SPROcc/save/pspool/SPROcc/install/copyright var/sadm/pkg/SPROcc/save/pspool/SPROcc/install/depend var/sadm/pkg/SPROcc/save/pspool/SPROcc/pkginfo var/sadm/pkg/SPROcc/save/pspool/SPROcc/pkgmap var/sadm/pkg/SPROcc/save/pspool/SPROcc/save var/sadm/pkg/SPROcc/save/pspool/SPROcc/save/121015-06 var/sadm/pkg/SPROcc/save/pspool/SPROcc/save/121015-06/undo.Z I back out the patch (patchrm -R /pool0/junk 121015-06), then reinstall it without backout info (patchadd -R /pool0/junk -d 121015-06). Things have changed to... var/sadm/pkg/SPROcc var/sadm/pkg/SPROcc/install var/sadm/pkg/SPROcc/install/checkinstall var/sadm/pkg/SPROcc/install/copyright var/sadm/pkg/SPROcc/install/depend var/sadm/pkg/SPROcc/install/patch_checkinstall var/sadm/pkg/SPROcc/install/patch_postinstall var/sadm/pkg/SPROcc/pkginfo var/sadm/pkg/SPROcc/save var/sadm/pkg/SPROcc/save/pspool var/sadm/pkg/SPROcc/save/pspool/SPROcc var/sadm/pkg/SPROcc/save/pspool/SPROcc/install var/sadm/pkg/SPROcc/save/pspool/SPROcc/install/copyright var/sadm/pkg/SPROcc/save/pspool/SPROcc/install/depend var/sadm/pkg/SPROcc/save/pspool/SPROcc/pkginfo var/sadm/pkg/SPROcc/save/pspool/SPROcc/pkgmap Notice the lack of undo.Z files (and associated patch directories), but the rest looks the same. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] panic: avl_find() succeeded inside avl_add()
I just experienced a zfs-related crash. I have filed a bug (don't know number - grumble). I have a crash dump but little free space. If someone would like some more info from the core, please let me know in the next few days. ::status debugging crash dump /pool0/vmcore.0 (64-bit) from sun operating system: 5.11 snv_76 (sun4u) panic message: avl_find() succeeded inside avl_add() dump content: kernel pages only ::stack vpanic(128d918, 3000c1daab0, 2a101673418, 0, 3000b6a3770, 1229000) avl_add+0x38(300106ee398, 3000c1daab0, 649e740, 3001b377980, 800271d6, 128d800) mzap_open+0x18c(cf, 300106ee388, 3000c94caa0, 3001b377980, 300106ee370, 300106ee358) zap_lockdir+0x54(300039bce68, 26b32, 0, 0, 1, 2a1016738f8) zap_cursor_retrieve+0x40(2a1016738f0, 2a1016737d8, 0, 1, 2a1016738f0, 2) zfs_readdir+0x224(3, 2a101673aa0, 3000dfc7980, 2, 2000, 2a1016737f0) fop_readdir+0x44(3000df541c0, 2a101673aa0, 3000cb58dc8, 2a101673a9c, 2000, 111dd48) getdents64+0x90(8, 2a101673ad0, 2000, 2004, 3001e54cac8, ff0b) syscall_trap32+0xcc(8, ff0f4000, 2000, 2004, 0, ff0b) # zpool status pool0 pool: pool0 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t1d0s7 ONLINE 0 0 0 c0t0d0s7 ONLINE 0 0 0 errors: No known data errors # zpool get all pool0 NAME PROPERTY VALUE SOURCE pool0 size 27.2G - pool0 used 24.9G - pool0 available2.38G - pool0 capacity 91% - pool0 altroot - default pool0 health ONLINE - pool0 guid 8395455814253440113 - pool0 version 8 default pool0 bootfs - default pool0 delegation on default pool0 autoreplace off default pool0 temporaryoff default -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic: avl_find() succeeded inside avl_add()
On Sat, May 31, 2008 at 8:48 PM, Mike Gerdts [EMAIL PROTECTED] wrote: I just experienced a zfs-related crash. I have filed a bug (don't know number - grumble). I have a crash dump but little free space. If someone would like some more info from the core, please let me know in the next few days. And I am able to reproduce... From a fresh crash: ::status debugging crash dump vmcore.6 (64-bit) from sun operating system: 5.11 snv_76 (sun4u) panic message: avl_find() succeeded inside avl_add() dump content: kernel pages only ::stack vpanic(128d918, 30011ba6638, 2a101594fb8, 0, 30011ba6868, 1229000) avl_add+0x38(30011bad320, 30011ba6638, 649e740, 3000d2d8180, 800271d6, 128d800) mzap_open+0x18c(cf, 30011bad310, 30011b2b480, 3000d2d8180, 30011bad2f8, 30011bad2e0) zap_lockdir+0x54(30004910c08, 26b32, 0, 0, 1, 2a1015951e8) zap_lookup+0x18(30004910c08, 26b32, 2a101595680, 8, 1, 2a1015952a8) zfs_dirent_lock+0x2f8(2a101595370, 3000b859518, 2a101595680, 2a101595378, 6, 4) zfs_dirlook+0x19c(3000b859518, 2a101595680, 2a101595678, 2a101595680, 0, 0) zfs_lookup+0x188(3000b855d00, 2a101595680, 2a101595678, 2a101595940, 0, 30004c32440) fop_lookup+0x4c(3000b855d00, 2a101595680, 2a101595678, 2a101595940, 0, 3000101fa40) lookuppnvp+0x324(2a101595940, 0, 0, 3000b855d00, 30008c61b70, 3000101fa40) lookuppnat+0x10c(3000c864600, 0, 0, 0, 2a101595ad8, 0) lookupnameat+0x5c(c461c, 0, 0, 0, 2a101595ad8, 0) cstatat_getvp+0x16c(18bd000, c461c, 1, 0, 2a101595ad8, 0) cstatat64_32+0x58(ffd19553, c461c, 1, ffbfbcc0, 1000, 0) syscall_trap32+0xcc(c461c, ffbfbcc0, c462c, 0, ff00, 80808080) 3000c864600::print vnode_t { ... v_path = 0x3000c837458 /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix $ ls /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix Makefile debug64/ $ find /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix/.make.state.lock /ws/mount/onnv-gate/usr/src/uts/sun4u/serengeti/unix/debug64 panic -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Create ZFS now, add mirror later
Is there a way to to a create a zfs file system (e.g. zpool create boot /dev/dsk/c0t0d0s1) Then, (after vacating the old boot disk) add another device and make the zpool a mirror? (as in: zpool create boot mirror /dev/dsk/c0t0d0s1 /dev/dsk/c1t0d0s1) Thanks! emike This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS in S10U6 vs openSolaris 05/08
On Tue, May 27, 2008 at 12:44 PM, Rob Logan [EMAIL PROTECTED] wrote: There is something more to consider with SSDs uses as a cache device. why use SATA as the interface? perhaps http://www.tgdaily.com/content/view/34065/135/ would be better? (no experience) cards will start at 80 GB and will scale to 320 and 640 GB next year. By the end of 2008, Fusion io also hopes to roll out a 1.2 TB card. 160 parallel pipelines that can read data at 800 megabytes per second and write at 600 MB/sec 4K blocks and then streaming eight simultaneous 1 GB reads and writes. In that test, the ioDrive clocked in at 100,000 operations per second... beat $30 dollars a GB, These could be rather interesting as swap devices. On the face of it, $30/GB is pretty close to the list price of taking a T5240 from 32 GB to 64 GB. However, it is *a lot* less than feeding system-board DIMM slots to workloads that use a lot of RAM but are fairly inactive. As such, a $10k PCIe card may be able to allow a $42k 64 GB T5240 handle 5+ times the number of not-too-busy J2EE instances. If anyone's done any modelling or testing of such an idea, I'd love to hear about it. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: A general question
I like the link you sent along... They did a nice job with that. (but it does show that mixing and matching vastly different drive-sizes is not exactly optimal...) http://www.drobo.com/drobolator/index.html Doing something like this for ZFS allowing people to create pools by mixing/matching drives, raid1, and raidz/z2 drives in a zpool makes for a pretty cool page. If one of the statistical gurus can add MTBF MTTdataLoss etc. to that as a calculator at the bottom that would be even better. (someone did some static graphs for different thumper configurations for this in the past... This would just make that more general purpose/GUI driven... Sounds like a cool project) -- No mention anywhere of removing drives thereby reducing capacity though... Raid-re-striping isn't all that much fun, especially with larger drives... (and even ZFS lacks some features in this area for now) See the answer to you other question below. (from their FAQ) -- MikeE What file systems does drobo support? RESOLUTION: Drobo is a usb external disk array that is formatted by the host operating system (Windows or OS X). We currently support NTFS, HFS+, and FAT32 file systems with firmware revision 1.0.2. Drobo is not a ZFS file system. STATUS: Current specification 1.0.2 Applies to: Drobo DRO4D-U -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Hull Sent: Saturday, May 24, 2008 7:00 PM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS: A general question OK so in my (admittedly basic) understanding of raidz and raidz2, these technologies are very similar to raid5 and raid6. BUT if you set up one disk as a raidz vdev, you (obviously) can't maintain data after a disk failure, but you are protected against data corruption that is NOT a result of disk failure. Right? So is there a resource somewhere that I could look at that clearly spells out how many disks I could have vs. how much resulting space I would have that would still protect me against disk failure (a la the Drobolator http://www.drobo.com/drobolator/index.html)? I mean, if I have a raidz vdev with one disk, then I add a disk, am I protected from disk failure? Is it the case that I need to have disks in groups of 4 to maintain protection against single disk failure with raidz and in groups of 5 for raidz2? It gets even more confusing if I wanted to add disks of varying sizes... And you said I could add a disk (or disks) to a mirror -- can I force add a disk (or disks) to a raidz or raidz2? Without destroying and rebuilding as I read would be required somewhere else? And if I create a zpool and add various single disks to it (without creating raidz/mirror/etc), is it the case that the zpool is essentially functioning like spanning raid? Ie, no protection at all?? Please either point me to an existing resource that spells this out a little clearer or give me a little more explanation around it. And... do you think that the Drobo (www.drobo.com) product is essentially just a box with OpenSolaris and ZFS on it? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs iostat
On Sun, May 18, 2008 at 7:34 AM, Karsten L. [EMAIL PROTECTED] wrote: Hi guys, is there a way to find out the current i/o-stats for a zfs-partition? I know of zpool iostat, but it only lists the i/o-stats of the whole pool. I need something like zfs iostat, or how can I get the stats with general systemtools of a particular directory? any idea would be appreciated karsten Have you tried fsstat? I think it will do what you are looking for whether it is zfs, ufs, tmpfs, etc. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs! mirror and break
I currently have a zpool with two 8Gbyte disks in it. I need to replace them with a single 56Gbyte disk. with veritas I would just add the disk in as a mirror and break off the other plex then destroy it. I see no way of being able to do this with zfs. Being able to migrate data without having to unmount and remount filesystems is very important to me. Can anyone say when such functionality will be implemented? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How many ZFS pools is it sensible to use on a single server?
On Tue, Apr 15, 2008 at 9:22 AM, David Collier-Brown [EMAIL PROTECTED] wrote: We've discussed this in considerable detail, but the original question remains unanswered: if an organization *must* use multiple pools, is there an upper bound to avoid or a rate of degradation to be considered? I have a keen interest in this as well. I would really like zones to be able to independently fail over between hosts in a zone farm. The work coming out of the Indiana, IPS, Caiman, etc. projects imply that zones will have to be on zfs. In order to fail zones over between systems independently either I need to have a zpool per zone or I need to have per-dataset replication. Considering that with some workloads 20+ zones on a T2000 is quite feasible, a T5240 could be pushing 80+ zones and as such a relatively large number of zpools. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZVOL access permissions?
Could someone kindly provide some details on using a zvol in sparse-mode? Wouldn't the COW nature of zfs (assuming COW still applies on ZVOLS) quickly erode the sparse nature of the zvol? Would sparse data-presentation only work by delegating a part of a zpool to a zone, but that's at the file-level, not raw? Thanks -- mikee - Original Message - From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Sent: Sat Apr 12 10:02:18 2008 Subject: [zfs-discuss] ZVOL access permissions? How can I set up a ZVOL that's accessible by non-root users, too? The intent is to use sparse ZVOLs as raw disks in virtualization (reducing overhead compared to file-based virtual volumes). Thanks, -mg This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] Preventing zpool imports on boot
On Thu, Feb 14, 2008 at 11:17 PM, Dave [EMAIL PROTECTED] wrote: I don't want Solaris to import any pools at bootup, even when there were pools imported at shutdown/at crash time. The process to prevent importing pools should be automatic and not require any human intervention. I want to *always* import the pools manually. Hrm... what if I deleted zpool.cache after importing/exporting any pool? Are these the only times zpool.cache is created? I wish zpools had a property of 'atboot' or similar, so that you could mark a zpool to be imported at boot or not. Like this? temporary By default, all pools are persistent and are automati- cally opened when the system is rebooted. Setting this boolean property to on causes the pool to exist only while the system is up. If the system is rebooted, the pool has to be manually imported by using the zpool import command. Setting this property is often useful when using pools on removable media, where the devices may not be present when the system reboots. This pro- perty can also be referred to by its shortened column name, temp. (I am trying to move this thread over to zfs-discuss, since I originally posted to the wrong alias) storage-discuss trimmed in my reply. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL controls in Solaris 10 U4?
On Jan 30, 2008 2:27 PM, Jonathan Loran [EMAIL PROTECTED] wrote: Before ranting any more, I'll do the test of disabling the ZIL. We may have to build out these systems with Open Solaris, but that will be hard as they are in production. I would have to install the new OS on test systems and swap out the drives during scheduled down time. Ouch. Live upgrade can be very helpful here, either for upgrading or applying a flash archive. Once you are comfortable that Nevada performs like you want, you could prep the new OS on alternate slices or broken mirrors. Activating the updated OS should take only a few seconds longer than a standard init 6. Failback is similarly easy. I can't remember the last time I swapped physical drives to minimize the outage during an upgrade. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resizing a mirror
On Jan 29, 2008 5:55 PM, Andrew Gabriel [EMAIL PROTECTED] wrote: Having attached new bigger disks to a mirror, and detached all the older smaller disks, how to I tell ZFS to expand the size of the mirror to match that of the bigger disks? I had a look through the system admin guide, but couldn't find this anywhere. In SVM, you just say metattach mirror with no devices listed to achieve this, but the equivalent in zpool gives a syntax error. I thought I saw something on the list lately saying that there is a bug that requires you to export the zpool and then import it to get the additional space to be seen. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moving zfs to an iscsci equallogic LUN
Use zpool replace to swap one side of the mirror with the iscsi lun. -- mikee - Original Message - From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Sent: Tue Jan 15 08:46:40 2008 Subject: Re: [zfs-discuss] Moving zfs to an iscsci equallogic LUN What would be the commands for the three way mirror or an example of what your describing. I thought the 200gb would have to be the same size to attach to the existing mirror and you would have to attach two LUN disks vs one LUN. Once it attaches it automatically reslivers or syncs the disk then if I wanted to I could remove the two 73 GB disks or still keep them in the pool and expand the pool later if I want? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware for zfs home storage
On 1/14/08, eric kustarz [EMAIL PROTECTED] wrote: On Jan 14, 2008, at 11:08 AM, Tim Cook wrote: www.mozy.com appears to have unlimited backups for 4.95 a month. Hard to beat that. And they're owned by EMC now so you know they aren't going anywhere anytime soon. mozy's been okay, but only for windows/OS X. uploading can be slow sometimes... i do like rsync.net since it is a totally standards based solution, not proprietary. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware for zfs home storage
except in my experience it is piss poor slow... but yes it is another option that is -basically- built on standards (i say that only because it's not really a traditional filesystem concept) On 1/14/08, David Magda [EMAIL PROTECTED] wrote: On Jan 14, 2008, at 17:15, mike wrote: On 1/14/08, eric kustarz [EMAIL PROTECTED] wrote: On Jan 14, 2008, at 11:08 AM, Tim Cook wrote: www.mozy.com appears to have unlimited backups for 4.95 a month. Hard to beat that. And they're owned by EMC now so you know they aren't going anywhere anytime soon. mozy's been okay, but only for windows/OS X. uploading can be slow sometimes... i do like rsync.net since it is a totally standards based solution, not proprietary. There's also Amazon's S3. Published APIs so you can use already available utilities / libraries into whatever scripted solution you can think of. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can't access my data
On Jan 4, 2008 2:42 PM, George Shepherd - Sun Microsystems Home system [EMAIL PROTECTED] wrote: Hi Folks.. I have/had a zpool containing one filesystem. I had to change my hostid and needed to import my pool, (I've done his OK in the past). After the import the mount of my filesystem failed. # zpool import homespool cannot mount 'homespool/homes': mountpoint or dataset is busy ^^ # zfs list NAME USED AVAIL REFER MOUNTPOINT homespool9.91G 124G18K /homespool homespool/homes 9.91G 124G 9.91G /homes ^^ Is something else already mounted at /homes? # df -k /homes Did you or someone else cd /homes before trying this, thus causing the mount point to be busy? # fuser /homes If you still can't resolve it # zfs set mountpoint=/somewhere_else homespool/homes # zfs mount -a (not sure this needed) # cd /somewhere_else -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help needed ZFS vs Veritas Comparison
On Dec 28, 2007 8:40 AM, Sengor [EMAIL PROTECTED] wrote: Real comparison of features should include scenarios such as: - how ZFS/VxVM compare in BCV like environments (eg. when volumes are presented back to the same host) - how they all cope with various multipathing solutions out there - Filesystem vs Volume snapshots - Portability within cluster like environments (SCSI reserves LUN visibility to multiple synchronous hosts) - Disaster recovery scenarios - Ease/Difficulty with data migrations across physical arrays - Boot volumes - Online vs Offline attribute/parameter changes Very good list! I can't think of more right now, it's way past midnight here ;) How about these? - Integration with backup system - Active-active cluster (parallel file system) capabilities - Integration with OS maintenance activities (install, upgrade, patching, etc.) - Relative performance on anticipated workload - Staffing issues (what do people know, how many hours to train, how long before proficiency) - Supportability on multiple platforms at the site (e.g. Solaris, Linux, HP-UX, AIX, ...) - Impact of failure modes (missing license key especially major system changes, on-disk corruption) - Opportunities to do things previously not possible ZFS doesn't win on many of those, but with the improvements that I have seen throughout the storage stack it is somewhat likely that the required improvements are already on the roadmap. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] fclose failing at 2G on a ZFS filesystem
On Dec 25, 2007 1:33 PM, K [EMAIL PROTECTED] wrote: if (fclose (file)) { fprintf (stderr, fatal: unable to close temp file: %s\n, strerror (errno)); exit (1); I don't understand why the above piece of code is failing... What command line is used to compile the code? I would guess that you don't have large file support. A variant of the following would probably be good: cc -c $CFLAGS `getconf LFS_CFLAGS` myprog.c cc -o myprog $LDFLAGS `getconf LFS_LDFLAGS` -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does Oracle support ZFS as a file system with Oracle RAC?
On Dec 18, 2007 11:01 AM, David Runyon [EMAIL PROTECTED] wrote: Does anyone know this? There are multiple file system usages involved in Oracle RAC: 1) Oracle Home - This is where the oracle software lives. This can be on a file system shared among all nodes or a per-host file system. ZFS should work fine in the per-host configuration, but I don't know about an official support statement. This is likely not very important because of... 2) Database files - I'll lump redo logs, etc. in with this. In Oracle RAC these must live on a shared-rw (e.g. clustered VxFS, NFS) file system. ZFS does not do this. If you drink the Oracle kool-aid and are using 10g or later the database files will go into ASM, which seems to share a number of characteristics with (but is largely complementary to) ZFS. That is, it spreads writes among all allocated disks, provides redundancy without an underlying volume manager or hardware RAID, is transaction safe, etc. I am pretty sure that ASM also supports per-block checksums, space efficient snapshots, block level incremental backups, etc. Although ASM is a relatively new technology, I think it has many more hours of runtime and likely more space in production use than ZFS. I think that ZFS holds a lot of promise for shared-nothing database clusters, such as is being done by Greenplumb with their extended variant of Postgres. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] /usr/bin and /usr/xpg4/bin differences
On Dec 16, 2007 1:16 AM, Sasidhar Kasturi [EMAIL PROTECTED] wrote: Yes .. but have a look at the bug i am working on .. Bug id:6493125 http://bugs.opensolaris.org/view_bug.do?bug_id=6493125 Thank you, Sasidhar. I'm not sure what question you are asking... 1) Why are there two variants of df? If this is the question, it is likely because Solaris already had df that had some behavior that people had grown to depend on. A standards committee came along and said We need the -P and -v option and the formatting of these headers needs to be ... and these columns needs to be like Sun needed to not disrupt existing customers and needed XPG4 compliance to satisfy a potentially different set of customers. Thus, the XPG4 variant got some small changes that (at the time?) weren't appropriate for the traditional version. There is some likelihood that the bug you are working on is not the first time that the differences between the XPG4 variant and the /usr/bin variant have decreased. 2) How could the same source code produce different output? It looks as though enabling the -P option is pretty straightforward - modification of the getopts() string at removal of the #ifdef at line 593 and getting rid of the #ifdef and corresponding #endif at lines 605 and 607 should be sufficient. 590 #ifdef XPG4 591 while ((arg = getopt(argc, argv, F:o:abehkVtgnlPZ)) != EOF) { 592 #else 593 while ((arg = getopt(argc, argv, F:o:abehkVtgnlvZ)) != EOF) { 594 #endif 595 if (arg == 'F') { 596 if (F_option) 597 errmsg(ERR_FATAL + ERR_USAGE, 598 more than one FSType specified); 599 F_option = 1; 600 FSType = optarg; 601 } else if (arg == 'V' ! V_option) { 602 V_option = TRUE; 603 } else if (arg == 'v' ! v_option) { 604 v_option = TRUE; 605 #ifdef XPG4 606 } else if (arg == 'P' ! P_option) { 607 SET_OPTION(P); 608 #endif Of course, updating the usage error, man page, etc. would be appropriate too. You can see a few other #ifdef XPG4 blocks that show the quite small differences between the two variants. Also... since there is nothing zfs-specific here, opensolaris-code may be a more appropriate forum. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] /usr/bin and /usr/xpg4/bin differences
On Dec 15, 2007 11:31 PM, KASTURI VENKATA SESHA SASIDHAR [EMAIL PROTECTED] wrote: Hello, I am working on open solaris bugs .. and need to change the code of df in the above two folders.. I would like to know why there are two df's with diff options in the respective folders.. /usr/bin/df is different is from /usr/xpg4/bin/df!! The code for both variants of df come from the same source (usr/src/cmd/fs.d/df.c). The xpg4 variant is compiled with -DXPG4. After a build in usr/src/cmd/fs.d is complete you will see the following: $ ls df* df df.odf.po.xpg4 df.xpg4 df.cdf.po df.xcl df.xpg4.o It looks to me as though df becomes /usr/bin/df and df.xpg4 becomes /usr/xpg4/bin/df. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Error in zpool man page?
On Fri, 2007-12-07 at 08:02 -0800, jonathan soons wrote: The man page gives this form: zpool create [-fn] [-R root] [-m mountpoint] pool vdev ... however, lower down, there is this command: # zpool create mirror c0t0d0 c0t1d0 mirror c1t0d0 c1t1d0 Isn't the pool element missing in the command? In the command you pasted above yes, however, looking at the man pages I have, I see the correct command line. What OS and rev was this from? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mike Dotson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Error in zpool man page?
On Fri, 2007-12-07 at 08:24 -0800, jonathan soons wrote: SunOS 5.10 Last change: 25 Apr 2006 Yes, I see that my other server is more up to date. SunOS 5.10 Last change: 13 Feb 2007 This one was recently installed. What OS rev? (more /etc/release) I don't have any systems later than update 3 patched to January 2007 and have the correct man page. Looks like perhaps bug 6419899 which was fixed in patch 119246-16 and 119246-21 was released on 11-DEC-2006 and included in Solaris 10 11/06 (update 3). Latest is rev 27 of patch 119246. Is there a patch that was not included with 10_Recommended? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Thanks... Mike Dotson Area System Support Engineer - ACS West Phone: (503) 343-5157 [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 w/ small random encrypted text files
On Nov 29, 2007 11:41 AM, Richard Elling [EMAIL PROTECTED] wrote: It depends on the read pattern. If you will be reading these small files randomly, then there may be a justification to tune recordsize. In general, backup/restore workloads are not random reads, so you may be ok with the defaults. Try it and see if it meets your performance requirements. -- richard It seems as though backup/restore of small files would be a random pattern, unless you are using zfs send/receive. Since no enterprise backup solution that I am aware of uses zfs send/receive, most people doing backups of zfs are using something that does something along the lines of while readdir ; do open file read from file write to backup stream close file done Since files are unlikely to be on disk in a contiguous manner, this looks like a random read operation to me. Am I wrong? -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic receiving incremental snapshots
On Aug 25, 2007 8:36 PM, Stuart Anderson [EMAIL PROTECTED] wrote: Before I open a new case with Sun, I am wondering if anyone has seen this kernel panic before? It happened on an X4500 running Sol10U3 while it was receiving incremental snapshot updates. Thanks. Aug 25 17:01:50 ldasdata6 ^Mpanic[cpu0]/thread=fe857d53f7a0: Aug 25 17:01:50 ldasdata6 genunix: [ID 895785 kern.notice] dangling dbufs (dn=fe82a3532d10, dbuf=fe8b4e338b90) I saw dangling dbufs panics beginning with S10U4 beta and the then current (May '07) nevada builds. If you are running a kernel newer than the x86 equivalent of 125100-10, you may be seeing the same thing. The panics I saw were not triggered by zfs receive, so you may be seeing something different. An IDR was produced for me. If you have Sun support search for my name, you can likely get the same IDR (errr, an IDR with the same fix - mine was SPARC) to see if it addresses your problem. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Home Motherboard
I actually have a related motherboard, chassis, dual power-supplies and 12x400 gig drives already up on ebay too. If I recall Areca cards are supported in OpenSolaris... http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItemitem=300172982498 On 11/22/07, Jason P. Warr [EMAIL PROTECTED] wrote: If you want a board that is a steal look at this one: http://www.ascendtech.us/itemdesc.asp?ic=MBTAS2882G3NR Tyan S2882, Dual Socket 940 Opteron, 8 DDR slots, 2 PCI-X 133 busses with 2 slots each, Dual Core support. $80. Pair is with a couple of Opteron 270's from ebay for $195: http://cgi.ebay.com/MATCH-PAIR-AMD-Opteron-270-64Bit-DualCore-940pin-2Ghz_W0QQitemZ290182379420QQihZ019QQcategoryZ80142QQssPageNameZWDVWQQrdZ1QQcmdZViewItem Granted, you need an E-ATX case but those are not that expensive and an EPS12V power supply. For less than $600 you can have a hell of a base server grade system with 4 cores and 2-4G of ram. - Original Message - From: Rob Logan [EMAIL PROTECTED] To: zfs-discuss@opensolaris.org Sent: Wednesday, November 21, 2007 11:17:19 PM (GMT-0600) America/Chicago Subject: [zfs-discuss] Home Motherboard grew tired of the recycled 32bit cpus in http://www.opensolaris.org/jive/thread.jspa?messageID=127555 and bought this to put the two marvell88sx cards in: $255 http://www.supermicro.com/products/motherboard/Xeon3000/3210/X7SBE.cfm http://www.supermicro.com/manuals/motherboard/3210/MNL-0970.pdf $195 1333FSB 2.6GHz Xeon 3075 (basicly a E6750) Any Core 2 Quad/Duo in LGA775 will work, including 45nm dies: http://rob.com/sun/x7sbe/45nm-pricing.jpg $270 Four 1G PC2-6400 DDRII 800MHz 240-pin ECC Unbuffered SDRAM $ 55 LOM (IPMI and Serial over LAN) http://www.supermicro.com/manuals/other/AOC-SIMSOLC-HTC.pdf # /usr/X11/bin/scanpci pci bus 0x cardnum 0x00 function 0x00: vendor 0x8086 device 0x29f0 Intel Corporation Server DRAM Controller pci bus 0x cardnum 0x01 function 0x00: vendor 0x8086 device 0x29f1 Intel Corporation Server Host-Primary PCI Express Bridge pci bus 0x cardnum 0x1a function 0x00: vendor 0x8086 device 0x2937 Intel Corporation USB UHCI Controller #4 pci bus 0x cardnum 0x1a function 0x01: vendor 0x8086 device 0x2938 Intel Corporation USB UHCI Controller #5 pci bus 0x cardnum 0x1a function 0x02: vendor 0x8086 device 0x2939 Intel Corporation USB UHCI Controller #6 pci bus 0x cardnum 0x1a function 0x07: vendor 0x8086 device 0x293c Intel Corporation USB2 EHCI Controller #2 pci bus 0x cardnum 0x1c function 0x00: vendor 0x8086 device 0x2940 Intel Corporation PCI Express Port 1 pci bus 0x cardnum 0x1c function 0x04: vendor 0x8086 device 0x2948 Intel Corporation PCI Express Port 5 pci bus 0x cardnum 0x1c function 0x05: vendor 0x8086 device 0x294a Intel Corporation PCI Express Port 6 pci bus 0x cardnum 0x1d function 0x00: vendor 0x8086 device 0x2934 Intel Corporation USB UHCI Controller #1 pci bus 0x cardnum 0x1d function 0x01: vendor 0x8086 device 0x2935 Intel Corporation USB UHCI Controller #2 pci bus 0x cardnum 0x1d function 0x02: vendor 0x8086 device 0x2936 Intel Corporation USB UHCI Controller #3 pci bus 0x cardnum 0x1d function 0x07: vendor 0x8086 device 0x293a Intel Corporation USB2 EHCI Controller #1 pci bus 0x cardnum 0x1e function 0x00: vendor 0x8086 device 0x244e Intel Corporation 82801 PCI Bridge pci bus 0x cardnum 0x1f function 0x00: vendor 0x8086 device 0x2916 Intel Corporation Device unknown pci bus 0x cardnum 0x1f function 0x02: vendor 0x8086 device 0x2922 Intel Corporation 6 port SATA AHCI Controller pci bus 0x cardnum 0x1f function 0x03: vendor 0x8086 device 0x2930 Intel Corporation SMBus Controller pci bus 0x cardnum 0x1f function 0x06: vendor 0x8086 device 0x2932 Intel Corporation Thermal Subsystem pci bus 0x0001 cardnum 0x00 function 0x00: vendor 0x8086 device 0x0329 Intel Corporation 6700PXH PCI Express-to-PCI Bridge A pci bus 0x0001 cardnum 0x00 function 0x01: vendor 0x8086 device 0x0326 Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A pci bus 0x0001 cardnum 0x00 function 0x02: vendor 0x8086 device 0x032a Intel Corporation 6700PXH PCI Express-to-PCI Bridge B pci bus 0x0001 cardnum 0x00 function 0x03: vendor 0x8086 device 0x0327 Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B pci bus 0x0003 cardnum 0x02 function 0x00: vendor 0x11ab device 0x6081 Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller pci bus 0x000d cardnum 0x00 function 0x00: vendor 0x8086 device 0x108c Intel Corporation 82573E Gigabit Ethernet Controller (Copper) pci bus 0x000f cardnum 0x00 function 0x00: vendor 0x8086 device 0x109a Intel Corporation 82573L Gigabit Ethernet Controller pci bus 0x0011 cardnum 0x04 function 0x00: vendor 0x1002 device 0x515e ATI Technologies Inc ES1000 # cfgadm -a Ap_Id Type
Re: [zfs-discuss] How to create ZFS pool ?
On Thu, 2007-11-15 at 05:25 -0800, Boris Derzhavets wrote: Thank you very much Mike for your feedback. Just one more question. I noticed five device under /dev/rdsk:- c1t0d0p0 c1t0d0p1 c1t0d0p2 c1t0d0p3 c1t0d0p4 been created by system immediately after installation completed. I believe it's x86 limitation (no more then 4 primary partitions) If I've got your point right, in case when Other OS partition gets number 3. I am supposed to run:- # zpool create pool c1t0d0p3 Yes. Just make sure it's the correct partition, ie. partition 3 is actually where you want the zpool otherwise you'll corrupt/loose what ever data is on that partition. You also need to make sure that partition 3 is defined and you can see it in fdisk as Solaris creates these p? devices whether they exist or not. So if I read your previous email correctly, you'll need to run format, select your first disk then run fdisk again. Empty/unused space doesn't mean a partition has been created. From there, you'll want to create a new partition and if you're not familiar with Solaris fdisk, it's a PITA until you get really used to it. You'll want to start one (1) cylinder past the end of your last partition so there's no overlap, then calculate the size of the partition. I usually use cylinders for this. So on one of my systems: Total disk size is 17849 cylinders Cylinder size is 16065 (512 byte) blocks Cylinders Partition StatusType Start End Length% = == = === == === 1 ActiveSolaris2 1 52245224 29 SELECT ONE OF THE FOLLOWING: 1. Create a partition 2. Specify the active partition 3. Delete a partition 4. Change between Solaris and Solaris2 Partition IDs 5. Exit (update disk configuration and exit) 6. Cancel (exit without updating disk configuration) Enter Selection: So the last cylinder is 5224 so we'll start on 5225 and to use the rest of the disk, you'll want to take the max cylinders (17849 from top line) and subtract 5225 which gives you 12624. Select 1 to create a new partition: Select the partition type to create: 1=SOLARIS2 2=UNIX3=PCIXOS 4=Other 5=DOS12 6=DOS16 7=DOSEXT 8=DOSBIG 9=DOS16LBA A=x86 BootB=Diagnostic C=FAT32 D=FAT32LBA E=DOSEXTLBA F=EFI0=Exit? Select 4 for Other OS Specify the percentage of disk to use for this partition (or type c to specify the size in cylinders). Now select c for cylinders (I've never been much one for trusting percentages;) Enter starting cylinder number: 5225 Enter partition size in cylinders: 12624 (It'll ask you about making it the active partition - say no here) Total disk size is 17849 cylinders Cylinder size is 16065 (512 byte) blocks Cylinders Partition StatusType Start End Length% = == = === == === 1 ActiveSolaris2 1 52245224 29 2 Other OS 5225 1784812624 71 SELECT ONE OF THE FOLLOWING: 1. Create a partition 2. Specify the active partition 3. Delete a partition 4. Change between Solaris and Solaris2 Partition IDs 5. Exit (update disk configuration and exit) 6. Cancel (exit without updating disk configuration) Double check you're not overlapping any of the partitions and select 5 to save the partition. In this case, the pool would be c1t0d0p2. Not the most technically accurate but think of p0 as the entire disk and your first partition starts with p1 and so forth. Hope that helps. If you want, post your fdisk partition table if you want a second set of eyes. Boris. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mike Dotson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to create ZFS pool ?
On Wed, 2007-11-14 at 21:23 +, A Darren Dunham wrote: On Wed, Nov 14, 2007 at 09:40:59AM -0800, Boris Derzhavets wrote: I was able to create second Solaris partition by running #fdisk /dev/rdsk/c1t0d0p0 I'm afraid that won't do you much good. Solaris only works with one Solaris partition at a time (on any one disk). If you have free space that you want to play with, it should be within the existing partition (or be on another disk). Is it posible to create zfs pool with third partition ? I doubt it, but I think it more of a general Solaris limitation than anything to do with ZFS specifically. You can't use another Solaris partition but you could use a different partition ID: Total disk size is 9729 cylinders Cylinder size is 16065 (512 byte) blocks Cylinders Partition StatusType Start End Length% = == = === == === 1 IFS: NTFS 0 10431044 11 2 Linux native 1044 23481305 13 3 ActiveSolaris2 2349 49592611 27 4 Other OS 4960 97284769 49 SELECT ONE OF THE FOLLOWING: 1. Create a partition 2. Specify the active partition 3. Delete a partition 4. Change between Solaris and Solaris2 Partition IDs 5. Exit (update disk configuration and exit) 6. Cancel (exit without updating disk configuration) Notice partition 4 is Other OS which is where I have my zfs pool: helios(2): zpool status pool: lpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAMESTATE READ WRITE CKSUM lpool ONLINE 0 0 0 c0d0p4ONLINE 0 0 0 errors: No known data errors So to create the pool in my case would be: zpool create lpool c0d0p4 -- Mike Dotson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Hierarchal zfs mounts
Looking for a way to mount a zfs filesystem ontop of another zfs filesystem without resorting to legacy mode. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hierarchal zfs mounts
Mike DeMarco wrote: Looking for a way to mount a zfs filesystem ontop of another zfs filesystem without resorting to legacy mode. doesn't simply 'zfs set mountpoint=...' work for you? -- Michael Schuster Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss well if you create let say a local/apps and a local/apps-bin then zfs set mountpoint=/apps local/apps zfs set mountpoint=/apps/bin local/apps-bin now if you reboot the system there is no mechanism to tell zfs to mount /apps first and /apps/bin second so you could get /apps/bin mounted first and then /apps either will mount overtop or wont mount. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mounting ZFS Pool to a different server
The ideal situation it would go like: host1# zpool export pool host2# zpool import pool If you know (really know) that it is offline on the other server (e.g. you can verify the host is dead), you can use: # zpool import -f pool Mike On 10/19/07, Mertol Ozyoney [EMAIL PROTECTED] wrote: Hi; One of my customers is using ZFs on IBM DS4800 Lun's. They use one lun for each ZFS pool if it matters. They want to take the pool offline from one server and take it nline from an other server. In summary they want to take the control of a ZFS pool if the primary server fails for some reason. I know we can do it with Sun Cluster how ever this is pretty complex and expensive . How can this be achieved? Regards Mertol [image: http://www.sun.com/emrkt/sigs/6g_top.gif] http://www.sun.com/ *Mertol Ozyoney * Storage Practice - Sales Manager *Sun Microsystems, TR* Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email [EMAIL PROTECTED] [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mike Gerdts http://mgerdts.blogspot.com/ image001.gif___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] meet import error after reinstall the OS
Hi, I just reinstall my machine with onnv build75. When I try to import the zfs, I meet an error. The pool is created with a slice c1t1d0s7 and a whole disk c1t2d0s0. How do I fix the error? Below is the output from zpool and zdb. # zpool import pool: tank id: 8219303556773256880 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-6X config: tankUNAVAIL missing device c1t2d0ONLINE Additional devices are known to be part of this pool, though their exact configuration cannot be determined. volams:/ 112 # zdb -l /dev/dsk/c1t1d0s7 LABEL 0 failed to unpack label 0 LABEL 1 LABEL 2 version=3 name='tank' state=0 txg=44 pool_guid=8219303556773256880 top_guid=1191595199136351517 guid=1191595199136351517 vdev_tree type='disk' id=1 guid=1191595199136351517 path='/dev/dsk/c1t1d0s7' devid='id1,[EMAIL PROTECTED]/h' whole_disk=0 metaslab_array=18 metaslab_shift=29 ashift=9 asize=58707935232 LABEL 3 version=3 name='tank' state=0 txg=44 pool_guid=8219303556773256880 top_guid=1191595199136351517 guid=1191595199136351517 vdev_tree type='disk' id=1 guid=1191595199136351517 path='/dev/dsk/c1t1d0s7' devid='id1,[EMAIL PROTECTED]/h' whole_disk=0 metaslab_array=18 metaslab_shift=29 ashift=9 asize=58707935232 volams:/ 113 # zdb -l /dev/dsk/c1t2d0s2 cannot open '/dev/dsk/c1t2d0s2': I/O error volams:/ 114 # zdb -l /dev/dsk/c1t2d0s0 LABEL 0 version=3 name='tank' state=0 txg=43 pool_guid=8219303556773256880 top_guid=4844356610838567439 guid=4844356610838567439 vdev_tree type='disk' id=0 guid=4844356610838567439 path='/dev/dsk/c1t2d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 metaslab_array=14 metaslab_shift=29 ashift=9 asize=73394552832 LABEL 1 version=3 name='tank' state=0 txg=43 pool_guid=8219303556773256880 top_guid=4844356610838567439 guid=4844356610838567439 vdev_tree type='disk' id=0 guid=4844356610838567439 path='/dev/dsk/c1t2d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 metaslab_array=14 metaslab_shift=29 ashift=9 asize=73394552832 LABEL 2 version=3 name='tank' state=0 txg=43 pool_guid=8219303556773256880 top_guid=4844356610838567439 guid=4844356610838567439 vdev_tree type='disk' id=0 guid=4844356610838567439 path='/dev/dsk/c1t2d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 metaslab_array=14 metaslab_shift=29 ashift=9 asize=73394552832 LABEL 3 version=3 name='tank' state=0 txg=43 pool_guid=8219303556773256880 top_guid=4844356610838567439 guid=4844356610838567439 vdev_tree type='disk' id=0 guid=4844356610838567439 path='/dev/dsk/c1t2d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 metaslab_array=14 metaslab_shift=29 ashift=9 asize=73394552832 thanks Mike This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HAMMER
On 10/18/07, Darren J Moffat [EMAIL PROTECTED] wrote: zfs send | ssh -C | zfs recv I was going to suggest this, but I think (I could be wrong...) that ssh would then use zlib for compression and that ssh is still a single-threaded process. This has two effects: 1) gzip compression instead of compress - may or may not be right for the application 2) encryption + compression happens in same thread. While this may be fine for systems that can do both at wire or file system speed, it is not ideal if transfer rates are already constrained by CPU speed. The Niagara 2 CPU likely changes the importance of 2 a bit. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HAMMER
On 10/18/07, Darren J Moffat [EMAIL PROTECTED] wrote: Unfortunately it doesn't yet because ssh can't yet use the N2 crypto - because it uses OpenSSL's libcrypto without using the ENGINE API. Marketing needs to get in line with the technology. The word I received was that any application that linked against the included version of OpenSSL automatically gets to take advantage of the N2 crypto engine, so long as it is using one of the algorithms supported by N2 engine. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HAMMER
On 10/18/07, Darren J Moffat [EMAIL PROTECTED] wrote: Which marketing documentation (not person) says that ? It was a person giving a technology brief in the past 6 weeks or so. It kinda went like so long as they link against the bundled openssl and not a private copy of openssl they will automatically take advantage of the offload engine. It isn't actually false but it has a caveat that the application must be using the OpenSSL ENGINE API, which Apache mod_ssl does and it must use the EVP_ interfaces in OpenSSL's libcrypto (not the lower level direct software algorithm ones). Remember marketing info his very high level, the devil as aways is in the code. Yeah, I know. It's often times difficult to find the right code when you know what you are looking for. When you don't know that you should be fact-checking, the code rarely finds its way in front of you. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
On 10/18/07, Bill Sommerfeld [EMAIL PROTECTED] wrote: that sounds like a somewhat mangled description of the cross-calls done to invalidate the TLB on other processors when a page is unmapped. (it certainly doesn't happen on *every* update to a mapped file). I've seen systems running Veritas Cluster Oracle Cluster Ready Services idle at about 10% sys due to the huge number of monitoring scripts that kept firing. This was on a 12 - 16 CPU 25k domain. A quite similar configuration on T2000's had negligible overhead. Lesson learned: cross-calls (and thread migrations, and ...) are much cheaper on systems with lower latency between CPUs. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UC Davis Cyrus Incident September 2007
On 10/18/07, Gary Mills [EMAIL PROTECTED] wrote: What's the command to show cross calls? mpstat will show it on a system basis. xcallsbypid.d from the DTraceToolkit (ask google) will tell you which PID is responsible. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] df command in ZFS?
On 10/17/07, David Runyon [EMAIL PROTECTED] wrote: I was presenting to a customer at the EBC yesterday, and one of the people at the meeting said using df in ZFS really drives him crazy (no, that's all the detail I have). Any ideas/suggestions? I suspect that this is related to the notion that file systems are cheap and the traditional notion of quotas is replaced by cheap file systems. This makes it so that a system with 1000 users that previously had a small number of file systems now has over 1000 file systems. What used to be relatively simple output from df now turns into 40+ screens[1] on the default sized terminal window. 1. If you are in this situation, there is a good chance that the formatting of df cause line folding or wrapping that doubles the number of lines to 80+ screens of df output. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/24/07, Paul B. Henson [EMAIL PROTECTED] wrote: but checking the actual release notes shows no ZFS mention. 3.0.26 to 3.2.0? That seems an odd version bump... 3.0.x and before are GPLv2. 3.2.0 and later are GPLv3. http://news.samba.org/announcements/samba_gplv3/ -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/20/07, Paul B. Henson [EMAIL PROTECTED] wrote: Again though, that would imply two different storage locations visible to the clients? I'd really rather avoid that. For example, with our current Samba implementation, a user can just connect to '\\files.csupomona.edu\username' to access their home directory or '\\files.csupomona.edu\groupname' to access a shared group directory. They don't need to worry on which physical server it resides or determine what server name to connect to. MS-DFS could be helpful here. You could have a virtual samba instance that generates MS-DFS redirects to the appropriate spot. At one point in the past I wrote a script (long since lost - at a different job) that would automatically convert automounter maps into the appropriately formatted symbolic links used by the Samba MS-DFS implementation. It worked quite well for giving one place to administer the location mapping while providing transparency to the end-users. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zoneadm clone doesn't support ZFS snapshots in
On 9/20/07, Matthew Flanagan [EMAIL PROTECTED] wrote: Mike, I followed your procedure for cloning zones and it worked well up until yesterday when I tried applying the S10U4 kernel patch 12001-14 and it wouldn't apply because I had my zones on zfs :( Thanks for sharing. That sucks. I'm still figuring out how to fix this other than moving all of my zones onto UFS. How about a dtrace script that changes the fstyp in statvfs() returns to say that it is ufs? :) I bet someone comes along and says that isn't supported either... -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zoneadm clone doesn't support ZFS snapshots in
On 9/21/07, Christine Tran [EMAIL PROTECTED] wrote: patch and install tools can't figure out pools yet. If you have a 1GB pool and 10 filesystems on it, du reports each having 1GB, do you have 10GB capacity? The tools can't tell. Please check the archives, this subject has been extensively discussed. Two responses come immediately to mind... 1) Thanks for protecting stupid/careless people from doing bad things. 2) UNIX has a longstanding tradition of adding a -f flag for cases when the sysadmin realizes there is additional risk but feels that appropriate precautions have been taken. I would really like to ask Sun for a roadmap as to when this is going to be supported. Since this is the zfs list (not zones or install list) and it is OpenSolaris (not Solaris) I guess I should probably find a more appropriate forum. So, for now I will use OpenSolaris where I can and wait patiently for the new installer + snap upgrade basket and wait for it to find its way into Solaris in about a year or two. In the meantime, I'll probably end up putting most zones on a particular competitor's NAS devices and looking into how well their file system cloning capabilities play in coordination with iSCSI. irony Oh, wait! What if the NAS device runs out of space while I'm patching? Better rule out the thin provisioning capabilities of the HDS storage that Sun sells as well. /irony -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would a device list output be a reasonable feature for zpool(1)?
Yup... With Leadville/MPXIO targets in the 32-digit range, identifying the new storage/LUNs is not a trivial operatrion. -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Russ Petruzzelli Sent: Monday, September 17, 2007 1:51 PM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Would a device list output be a reasonable feature for zpool(1)? Seconded! MC wrote: With the arrival of ZFS, the format command is well on its way to deprecation station. But how else do you list the devices that zpool can create pools out of? Would it be reasonable to enhance zpool to list the vdevs that are available to it? Perhaps as part of the help output to zpool create? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Live Upgrade
On 9/15/07, Coy Hile [EMAIL PROTECTED] wrote: Is there any update/work-around/patch/etc as of the S10u4 WOS for the bugs that existed with respect to LU, Zones, and ZFS? More specifically, the following: 6359924 live upgrade needs to include support for zfs I bet that Live Upgrade never does, but Snap Upgrade does. http://opensolaris.org/os/project/caiman/Snap_Upgrade/ It is likely worth considering more of the roadmap when reading that page. http://opensolaris.org/os/project/caiman/Roadmap/ -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] space allocation vs. thin provisioning
Short question: I'm curious as to how ZFS manages space (free and used) and how its usage interacts with thin provisioning provided by HDS arrays. Is there any effort to minimize the number of provisioned disk blocks that get writes so as to not negate any space benefits that thin provisioning may give? Background more detailed questions: In Jeff Bonwick's blog[1], he talks about free space management and metaslabs. Of particular interest is the statement: ZFS divides the space on each virtual device into a few hundred regions called metaslabs. 1. http://blogs.sun.com/bonwick/entry/space_maps In Hu Yoshida's (CTO, Hitachi Data Systems) blog[2] there is a discussion of thin provisioning at the enterprise array level. Of particular interest is the statement: Dynamic Provisioning is not a panacea for all our storage woes. There are applications that do a hard format or write across the volume when they do an allocation and that would negate the value of thin provisioning. In another entry[3] he goes on to say: Capacity is allocated to 'thin' volumes from this pool in units of 42 MB pages 2. http://blogs.hds.com/hu/2007/05/dynamic_or_thin_provisioning.html 3. http://blogs.hds.com/hu/2007/05/thin_and_wide_.html This says that any time that a 42 MB region gets one sector written, 42 MB of storage is permanently[4] allocated to the virtual LUN. 4. Until the LUN is destroyed, that is. I know that ZFS does not do a write across all of the disk as part of formatting. Does it, however, drop some sort of metaslab data structures on each of those few hundred regions? When space is allocated, does it make an attempt to spread the allocations across all of the metaslabs, or does it more or less fill up one metaslab before moving to the next? As data is deleted, do the freed blocks get reused before never used blocks? Is there any collaboration between the storage vendors and ZFS developers to allow the file system to tell the storage array this range of blocks is unused so that the array can reclaim the space? I could see this as useful when doing re-writes of data (e.g. crypto rekey) to concentrate data that had become scattered into contiguous space. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] space allocation vs. thin provisioning
On 9/14/07, Moore, Joe [EMAIL PROTECTED] wrote: I was trying to compose an email asking almost the exact same question, but in the context of array-based replication. They're similar in the sense that you're asking about using already-written space, rather than to go off into virgin sectors of the disks (in my case, in the hope that the previous write is still waiting to be replicated and thus can be replaced by the current data) At one point, I thought this was how data replication should happen too. However, unless you have two consecutive writes to the same space, coalescing the writes could make it so that the data (generically, including fs metadata) on the replication target may be corrupt. Generally speaking, you need to have in-order writes to ensure that you maintain crash consistent data integrity in the event of a various failure modes. Of course, I can see how writes could be batched coalesced and applied in a journaled manner such that each batch fully applies or is rolled back on the target. I haven't heard of this being done. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do I get my pool back?
have you tried zpool clear? Peter Tribble wrote: On 9/13/07, Solaris [EMAIL PROTECTED] wrote: Try exporting the pool then import it. I have seen this after moving disks between systems, and on a couple of occasions just rebooting. Doesn't work. (How can you export something that isn't imported anyway?) -- http://www.sun.com/solaris * Michael Lee * Area System Support Engineer *Sun Microsystems, Inc.* Phone x40782 / 866 877 8350 Email [EMAIL PROTECTED] http://www.sun.com/solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression=on and zpool attach
On 11/09/2007, Mike DeMarco [EMAIL PROTECTED] wrote: I've got 12Gb or so of db+web in a zone on a ZFS filesystem on a mirrored zpool. Noticed during some performance testing today that its i/o bound but using hardly any CPU, so I thought turning on compression would be a quick win. If it is io bound won't compression make it worse? Well, the CPUs are sat twiddling their thumbs. I thought reducing the amount of data going to disk might help I/O - is that unlikely? IO bottle necks are usually caused by a slow disk or one that has heavy workloads reading many small files. Two factors that need to be considered are Head seek latency and spin latency. Head seek latency is the amount of time it takes for the head to move to the track that is to be written, this is a eternity for the system (usually around 4 or 5 milliseconds). Spin latency is the amount of time it takes for the spindle to spin the track to be read or written over the head. Ideally you only want to pay the latency penalty once. If you have large reads and writes going to the disk then compression may help a little but if you have many small reads or writes it will do nothing more than to burden your CPU with a no gain amount of work to do since your are going to be paying Mr latency for each read or write. Striping several disks together with a stripe width that is tuned for your data model is how you could get your performance up. Stripping has been left out of the ZFS model for some reason. Where it is true that RAIDZ will stripe the data across a given drive set it does not give you the option to tune the stripe width. Do to the write performance problems of RAIDZ you may not get a performance boost from it stripping if your write to read ratio is too high since the driver has to calculate parity for each write. benefit of compression on the blocks that are copied by the mirror being resilvered? No! Since you are doing a block for block mirror of the data, this would not could not compress the data. No problem, another job for rsync then :) -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression=on and zpool attach
On 9/12/07, Mike DeMarco [EMAIL PROTECTED] wrote: Striping several disks together with a stripe width that is tuned for your data model is how you could get your performance up. Stripping has been left out of the ZFS model for some reason. Where it is true that RAIDZ will stripe the data across a given drive set it does not give you the option to tune the stripe width. Do to the write performance problems of RAIDZ you may not get a performance boost from it stripping if your write to read ratio is too high since the driver has to calculate parity for each write. I am not sure why you think striping has been left out of the ZFS model. If you create a ZFS pool without the raidz or mirror keywords, the pool will be striped. Also, the recordsize tunable can be useful for matching up application I/O to physical I/O. Thanks, - Ryan -- UNIX Administrator http://prefetch.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss Oh... How right you are. I dug into the PDFs and read up on Dynamic striping. My bad. ZFS rocks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression=on and zpool attach
I've got 12Gb or so of db+web in a zone on a ZFS filesystem on a mirrored zpool. Noticed during some performance testing today that its i/o bound but using hardly any CPU, so I thought turning on compression would be a quick win. If it is io bound won't compression make it worse? I know I'll have to copy files for existing data to be compressed, so I was going to make a new filesystem, enable compression and rysnc everything in, then drop the old filesystem and mount the new one (with compressed blocks) in its place. But I'm going to be hooking in faster LUNs later this week. The plan was to remove half of the mirror, attach a new disk, remove the last old disk and attach the second half of the mirror (again on a faster disk). Will this do the same job? i.e. will I see the benefit of compression on the blocks that are copied by the mirror being resilvered? No! Since you are doing a block for block mirror of the data, this would not could not compress the data. -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On 9/8/07, Richard Elling [EMAIL PROTECTED] wrote: Changing the topic slightly, the strategic question is: why are you providing disk space to students? For most programming and productivity (e.g. word processing, etc.) people will likely be better suited by having network access for their personal equipment with local storage. For cases when specialized expensive tools ($10k + per seat) are used, it is not practical to install them on hundreds or thousands of personal devices for a semester or two of work. The typical computing lab that provides such tools is not well equipped to deal with removable media such as flash drives. Further, such tools will often times be used to do designs that require simulations to run as batch jobs that run under grid computing tools such as Grid Engine, Condor, LSF, etc. Then, of course, there are files that need to be shared, have reliable backups, etc. Pushing that out to desktop or laptop machines is not really a good idea. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On 9/7/07, Alec Muffett [EMAIL PROTECTED] wrote: The main bugbear is what the ZFS development team laughably call quotas. They aren't quotas, they are merely filesystem size restraints. To get around this the developers use the let them eat cake mantra, creating filesystems is easy so create a new filesystem for each user, with a quota on it. This is the ZFS way. Having worked in academia and multiple Fortune 100's, the problem seems to be most prevalent in academia, although possibly a minor inconvenience in some engineering departments in industry. In the .edu where I used to manage the UNIX environment, I would have a tough time weighing the complexities of quotas he mentions vs. the other niceties. My guess is that unless I had something that was really broken, I would stay with UFS or VxFS waiting for a fix. It appears as though the author has not yet tried out snapshots. The fact that space used by a snapshot for the sysadmin's convenience counts against the user's quota is the real killer. This would force me into a disk to disk (rsync, because zfs send | zfs recv would require snapshots to stay around for incrementals) backup + snapshot scenario to be able to keep snapshots while minimizing their impact on users. That means double the disk space. Doubling the quota is not an option because without soft quotas there is no way to keep people from using all of their space. Frankly, that would be so much trouble I would be better off using tape for restores, just like with UFS or VxFS. Now, with each user having a separate filesystem this breaks. The automounter will mount the parent filesystem as before but all you will see are the stub directories ready for the ZFS daughter filesystems to mount onto and there's no way of consolidating the ZFS filesystem tree into one NFS share or rules in automount map files to be able to do sub-directory mounting. While NFS4 holds some promise here, it is not a solution today. It won't be until all OS's that came out before 2008 are gone. That will be a while. Use of macros (e.g. * server:/home/) can go a long ways. If that doesn't do it, an executable map that does the appropriate munging may be in order. The problem here is one of legacy code, which you'll find throughout the academic, and probably commercial world. Basically, there's a lot of user generated code which has hard coded paths so any new system has to replicate what has gone before. (The current system here has automount map entries which map new disks to the names of old disks on machines long gone, e.g. /home/eeyore_data/ ) Put such entries before the * entry and things should be OK. For me, quotas are likely to be a pain point that prevents me from making good use of snapshots. Getting changes in application teams' understanding and behavior is just too much trouble. Others are: 1. There seems to be no integration with backup tools that are time+space+I/O efficient. If my storage is on Netapp, I can use NDMP to do incrementals between snapshots. No such thing exists with ZFS. 2. Use of clones is out because I can't do a space-efficient restore. 3. ARC messes up my knowledge of how much RAM my machine is making good use of. After the first backup, vmstat says that I am just at the brink of not having enough RAM that paging (file system and pager) will begin soon. This may be fine on a file server, but it really messes with me if it is a J2EE server and I'm trying to figure out how many more app servers I can add. I have a lot of hopes for ZFS and have used it with success (and failures) in limited scope. I'm sure that with time the improvements will come that make that scope increase dramatically, but for now it is confined to the lab. :( Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On 9/7/07, Mike Gerdts [EMAIL PROTECTED] wrote: For me, quotas are likely to be a pain point that prevents me from making good use of snapshots. Getting changes in application teams' understanding and behavior is just too much trouble. Others are: not to mention there are smaller-scale users that want the data protection, checksumming and scalability that ZFS offers (although the whole zdev/zpool/etc. thing might wind up causing me to have to buy more disks to add more space, if i were to use it) it would be nice to have a ZFS lite(tm) for those of us that just want easily expandable filesystems (as in, add a new disk/device and not have to think of some larger geometry) with inline checksumming/COW/metadata/ditto blocks/etc/etc goodness. basically like a home edition. i don't care about LUNs, send/receive, quotas, snapshots (for the most part), setting up different zpools to gain specific performance benefits, etc. i just want raid-z/raid-z2 with a easy way to add disks. i have not actually used ZFS yet because i've been waiting for opensolaris/solaris (or even freebsd possibly) to support eSATA hardware or something related. the hardware support front for SOHO users has also been slow. that's not a shortcoming of ZFS though... but does make me wish i had the basic protection features of ZFS with hardware support like linux. - my two cents ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
On 9/7/07, Stephen Usher [EMAIL PROTECTED] wrote: Brian H. Nelson: I'm sure it would be interesting for those on the list if you could outline the gotchas so that the rest of us don't have to re-invent the wheel... or at least not fall down the pitfalls. The UFS on zvols option sounds intriguing to me, but I would guess that the following could be problems: 1) Double buffering: Will ZFS store data in the ARC while UFS uses traditional file system buffers? 2) Boot order dependencies. How does the startup of zfs compare to processing of /etc/vfstab? I would guess that this is OK due to legacy mount type supported by zfs. If this is OK, then dfstab processing is probably OK. I say intriguing because it could give you a the improved data integrity checks and bit more flexibility in how you do things like backups and restores. Snapshots of the zvols could be mounted as other UFS file systems that could allow for self-service restores. Perhaps this would make it so that you can write data to tape a bit less frequently. If deduplication comes into zfs, you may be able to get to a point where course project instructions that say cp ~course/hugefile ~ become not so expensive - you would be charging quota to each user but only storing one copy. Depending on the balance of CPU power vs. I/O bandwidth, compressed zvols could be a real win, more than paying back the space required to have a few snapshots around. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/WAFL lawsuit
On 9/6/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: This is my personal opinion and all, but even knowing that Sun encourages open conversations on these mailing lists and blogs it seems to falter common sense for people from @sun.com to be commenting on this topic. It seems like something users should be aware of, but if I were working at Sun I would feel a very strong urge to clear any public conversation about the topic with management. As always, I do appreciate the frank insight given from the sun folks -- I am just worried that you may be doing yourself a disservice talking about it. i completely disagree. i work for a fortune 50 company and we have a hell of a time with the legal department or other people who refuse to think it's okay to speak frankly about things in their company. obviously trade secrets and other things aside, i think it is ultimately beneficial and helps a company feel more accountable when it allows direct public exchange with employees and not through spin-educated marketeers or public relation folk. i don't expect anyone from sun on the zfs list would tell us anything other than their personal opinion. i appreciate it too. from reading forums and mailing lists, to having sun volunteer 6? people to help memcached continue to flourish, i think sun is a role model for a company who continues to profit but has figured out that certain things can be free and ultimately they are helping make more mature products and encourage innovation. they also would get the bonus of having things like memcached run better on their platforms then too :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (politics) Sharks in the waters
On 9/5/07, Joerg Schilling [EMAIL PROTECTED] wrote: As I wrote before, my wofs (designed and implemented 1989-1990 for SunOS 4.0, published May 23th 1991) is copy on write based, does not need fsck and always offers a stable view on the media because it is COW. Side question: If COW is such an old concept, why haven't there been many filesystems that have become popular that use it? ZFS, BTRFS (I think) and maybe WAFL? At least that I know of. It seems like an excellent guarantee of disk commitment, yet we're all still fussing with journalled filesystems, filesystems that fragment, buffer lags (or whatever you might call it) etc. Just stirring the pot, seems like a reasonable question (perhaps one to take somewhere else or start a new thread...) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The Dangling DBuf Strikes Back
On 9/3/07, Dale Ghent [EMAIL PROTECTED] wrote: I saw a putback this past week from M. Maybee regarding this, but I thought I'd post here that I saw what is apparently an incarnation of 6569719 on a production box running s10u3 x86 w/ latest (on sunsolve) patches. I have 3 other servers configured the same way WRT work load, zfs pools and hardware resources, so if this occurs again I'll see about logging a case and getting a relief patch. Anyhow, perhaps a backport to s10 may be in order [note: the patches I mention are s10 sparc specific. Translation to x86 required.] As of a few weeks ago s10u3 with latest patches did not have this problem for me, but s10u4 beta and snv69 did. My situation was on sun4v, not i386. More specifically: S10 118833-36, 118833-07, 118833-10: # zpool import pool: zfs id: 679728171331086542 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-5E config: zfs FAULTED corrupted data c0d1s3FAULTED corrupted data snv_69, s10u4beta: Boot device: /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]:dhcp File and args: -s SunOS Release 5.11 Version snv_69 64-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Booting to milestone milestone/single-user:default. Configuring /dev Using DHCP for network configuration information. Requesting System Maintenance Mode SINGLE USER MODE # zpool import panic[cpu0]/thread=300028943a0: dangling dbufs (dn=3000392dbe0, dbuf=3000392be08) 02a10076f270 zfs:dnode_evict_dbufs+188 (3000392dbe0, 0, 1, 1, 2a10076f320, 7b729000) %l0-3: 03000392ddf0 03000392ddf8 %l4-7: 02a10076f320 0001 03000392bf20 0003 02a10076f3e0 zfs:dmu_objset_evict_dbufs+100 (2, 0, 0, 7b722800, 0, 3516900) %l0-3: 7b72ac00 7b724510 7b724400 03516a70 %l4-7: 03000392dbe0 03516968 7b7228c1 0001 ... Sun offered me an IDR against 125100-07, but since I could not reproduce the problem on that kernel, I never tested it. This does imply that they think there is a dangling dbufs problem in 125100-07 that they think they have a fixed for support-paying customers. Perhaps this is the problem and related solution that you would be interested in. The interesting thing with my case is that the backing store for this device is a file on a ZFS file system, served up has a virtual disk in an LDOM. From the primary LDOM, there is no corruption. An unexpected reset (panic, I believe) of the primary LDOM seems to have caused the corruption in the guest LDOM. What was that about having the redundancy as close to the consumer as possible? :) -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] do zfs filesystems isolate corruption?
On 8/11/07, Stan Seibert [EMAIL PROTECTED] wrote: I'm not sure if that answers the question you were asking, but generally I found that damage to a zpool was very well confined. But you can't count on it. I currently have an open case where a zpool became corrupt and put the system into a panic loop. As this case has progressed, I found that the panic loop part of it is not present in any released version of S10 tested (S10U3 + 118833-36, 125100-07, 125100-10) but does exist in snv69. The test mechanism is whether zpool import (no pool name) causes the system to panic or not. If that happens, I'm going on the assumption that if this causes panic, having the appropriate zpool.cache in place will cause it to panic during every boot. Oddly enough, I know I can't blame the storage subsystem on this - it is ZFS as well. :) It goes like this: HDS 99xx T2000 primary ldom S10u3 with a file on zfs presented as a block device for an ldom T2000 guest ldom zpool on slice 3 of block device mentioned above Depending on the OS running on the guest LDOM zpool import gives different results: S10U3 118833-36 - 125100-10: zpool is corrupt restore from backups S10u4 Beta, snv69 and I think snv59: panic - S10u4 backtrace is very different from snv* -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] It is that time again... ZFS + Firewire/USB - and a specific enclosure
I am looking into getting something like this: http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItemrd=1item=120069356632ssPageName=STRK:MEWA:ITih=002 For a home storage server. I would like to run ZFS. Preferrably FreeBSD (if basic functionality is completely bug-free and I won't lose any data :)) as I am more comfortable with it. However, I have been subscribed to the ZFS lists for some time now and done a Google check up for topics relating to this for some time, and the jury still seems to be out on whether or not this would be a good idea. I need it mainly for DVD storage (for easy playback) and personal archiving. So maybe 3-4 read streams and 1-2 write streams - probably at peak time. Is this a viable option? I want capacity more than speed; but I want at least single parity redundancy with RAID-Z. Would it be an option to add 8 drive units as needed, making them each a single RAID-Z device (so 7 drives usable?) and add them into the same zpool to increase available storage when I would get low? I might be missing some concepts here... but I keep querying this list hoping someone has done some of the footwork already (I don't have as many funds available to me as in the past to mess around with experimental/untested ideas) not to mention I don't want to be experimenting with my data... or maybe someone at least has some advice. I have seen a couple comments about Firewire, not many about USB. I don't care about the hotplugging, I can power down to replace any drives or do maintenance. It's mainly for cheap, quiet enclosures that can export JBOD... Thanks, mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Will de-duplication be added?
On 7/29/07, Lance Brown [EMAIL PROTECTED] wrote: Does anybody know if native block level replication or block level de-duplication as NetApp calls it will be added? This has been discussed a bit in recent threads. http://www.google.com/search?hl=enq=%22zfs-discuss%22+site%3Amail.opensolaris.org+%28dedup+OR+%22de-duplication%22+OR+deduplication%29btnG=Google+Search -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any fix for zpool import kernel panic (reboot loop)?
On 7/25/07, Andre Wenas [EMAIL PROTECTED] wrote: Hi Rodney, I have been using zfs root/boot for few months without any problem. I can also import the pool from other environment. Do you have problem importing the zfs boot pool only ? or you can's use the zfs boot at all ? I have a zfs pool on (on a T2000) that zpool import (with or without a pool name) will cause a dangling dbufs panic on S10u3, S10u4beta, and Nevada b61ish. Prior to booting from my jumpstart server and disabling zpool.cache, the machine was stuck in a panic loop. A support call is open and it is a known problem that (I'm told) is being worked on. I only mention this to say that this type of problem is not restricted to zfs boot. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] import a group
One last question, when it comes to patching these zones, is it better to patch it normally or destroy all the local zones and patch only the global zone and use sh file to recreate all the zones. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] import a group
Greetings, Given zfs pools, how does one import these pools to another node in the cluster. Mike This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] import a group
Sorry, my question is not clear enough. These pools contain a zone each. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pseudo file system access to snapshots?
On 7/11/07, Matthew Ahrens [EMAIL PROTECTED] wrote: This restore problem is my key worry in deploying ZFS in the area where I see it as most beneficial. Another solution that would deal with the same problem is block-level deduplication. So far my queries in this area have been met with silence. I must have missed your messages on deduplication. That's OK... I think that I probably spill my guts adequately on the topic at: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-August/034065.html http://mail.opensolaris.org/pipermail/zfs-discuss/2007-April/040034.html http://mail.opensolaris.org/pipermail/storage-discuss/2007-May/002711.html The only previous thread that seemed to have much discussion was http://mail.opensolaris.org/pipermail/zfs-discuss/2007-January/ But did you see this thread on it? zfs space efficiency, 6/24 - 7/7? I stopped reading the zfs space efficiency thread just before it got interesting. :) The dedup methods suggested there are quite consistent with those that I had previously suggested. We've been thinking about ZFS dedup for some time, and want to do it but have other priorities at the moment. This is very good to hear. Hmmm... I just ran into another snag with this. I had been assuming that clones and snapshots were more closely related. But when I tried to send the differences between the source of a clone and a snapshot within that clone I got this message: I'm not sure what you mean by more closely related. The only reason we don't support that is because we haven't gotten around to adding the special cases and error checking for it (and I think you're the first person to notice its omission). But it's actually in the works now so stay tuned for an update in a few months. I was trying to say (but didn't) that I thought that there should be nearly (very important word) no differences between finding the differences from a clone's origin to a snapshot of the clone and finding the differences between snapshots between snapshots of the same file system. I'm also glad to see this is in the works. Most of my use cases for ZFS involve use of clones. Lack of space-efficient backups and especially restores makes me wait to use ZFS outside of the lab. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pseudo file system access to snapshots?
On 7/11/07, Darren J Moffat [EMAIL PROTECTED] wrote: Mike Gerdts wrote: Perhaps a better approach is to create a pseudo file system that looks like: mntpt/pool /@@ /@today /@yesterday /fs /@@ /@2007-06-01 /otherfs /@@ How is this different from cd mntpt/.zfs/snapshot/ ? mntpt/.zfs/snapshot provides file-level access to the contents of the snapshot. If you back those up, then restore every snapshot, you will potentially be using way more disk space. What I am proposing is that cat mntpt/pool/@snap1 delivers a data stream corresponding to the output of zfs send and that cat mntpnt/pool/@[EMAIL PROTECTED] delivers a data stream corresponding to zfs send -i snap1 snap2. This would allow existing backup tools to perform block level incremental backups. Assuming that writing to the various files is the equivalent of the corresponding zfs receive commands, it provides for block level restores that preserve space efficiency as well. Why? Suppose I have a server with 50 full root zones on it. Each zone has a zonepath at /zones/zonename that is about 8 GB. This implies that I need 400 GB just for zone paths. Using ZFS clones, I can likely trim that down to far less than 100 and probably less than 20. I can't trim it down that far if I don't have a way to restore the system. This restore problem is my key worry in deploying ZFS in the area where I see it as most beneficial. Another solution that would deal with the same problem is block-level deduplication. So far my queries in this area have been met with silence. Hmmm... I just ran into another snag with this. I had been assuming that clones and snapshots were more closely related. But when I tried to send the differences between the source of a clone and a snapshot within that clone I got this message: incremental source must be in same filesystem usage: send [-i snapshot] snapshot Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Pseudo file system access to snapshots?
As I have been grappling with how I will manage ZFS backups using existing enterprise backup tools (e.g. Netbackup), it seems as though two approaches continue to dominate: 1) Just use the POSIX interface like is used for UFS, VxFS, etc. This has key the disadvantage that it is not efficient (space, time, performance, etc.) during backup and restore could be impossible due to the space inefficiency. Other disadvantages exist as well. 2) Use zfs send (but not receive) to do disk-to-disk backups, then back up the zfs send images to tape. This is also inefficient due to extra space, time, etc. but you ACL's, snapshots, clones, etc. seem as though they will be preserved on restore. The interface to the backup software will require some scripting, much like anything else that requires a quiesce before backup. For a while I was thinking that zfs send data streams would be nice to work with NDMP. However, this solution will only play well with the commercial products that have been going after the storage appliance market for quite some time. I'm not aware of free tools that speak NDMP. Perhaps a better approach is to create a pseudo file system that looks like: mntpt/pool /@@ /@today /@yesterday /fs /@@ /@2007-06-01 /otherfs /@@ As you might imagine, reading from pool/@today would be equivalent to zfs send [EMAIL PROTECTED]. Some sort of notation (pool/@[EMAIL PROTECTED] ?) would be needed to represent zfs send -i [EMAIL PROTECTED] [EMAIL PROTECTED]. Reading from pool/fs/@@ would be equivalent to zfs snapshot pool/fs@timestamp; zfs send pool/fs@timestamp. Writing to a particular path would have the same effect as zfs receive. Is this something that is maybe worth spending a few more cycles on, or is it likely broken from the beginning? Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Take Three: PSARC 2007/171 ZFS Separate Intent Log
On 7/7/07, Cyril Plisko [EMAIL PROTECTED] wrote: Hello, This is a third request to open the materials of the PSARC case 2007/171 ZFS Separate Intent Log I am not sure why two previous requests were completely ignored (even when seconded by another community member). In any case that is absolutely unaccepted practice. The past week of inactivity is likely related to most of Sun in the US being on mandatory vacation. Sun typically shuts down for the week that contains July 4 and (I think) the week between Christmas and Jan 1. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS usb keys
I had a similar situation between x86 and SPARC, version number. When I created the pool on the LOWER rev machine, it was seen by the HIGHER rev machine. This was a USB HDD, not a stick. I can now move the drive between boxes. HTH, Mike Dick Davies wrote: Thanks to everyone for the sanity check - I think it's a platform issue, but not an endian one. The stick was originally DOS-formatted, and the zpool was built on the first fdisk partition. So Sparcs aren't seeing it, but the x86/x64 boxes are. -- http://www.sun.com/solaris * Michael Lee * Area System Support Engineer *Sun Microsystems, Inc.* Phone x40782 / 866 877 8350 Email [EMAIL PROTECTED] http://www.sun.com/solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] ZFS - DB2 Performance
At what Solaris10 level (patch/update) was the single-threaded compression situation resolved? Could you be hitting that one? -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roch - PAE Sent: Tuesday, June 26, 2007 12:26 PM To: Roshan Perera Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS - DB2 Performance Possibly the storage is flushing the write caches when it should not. Until we get a fix, cache flushing could be disabled in the storage (ask the vendor for the magic incantation). If that's not forthcoming and if all pools are attached to NVRAM protected devices; then these /etc/system evil tunable might help : In older solaris releases we have set zfs:zil_noflush = 1 On newer releases set zfs:zfs_nocacheflush = 1 If you implement this, Do place a comment that this is a temporary workaround waiting for bug 6462690 to be fixed. About Compression, I don't have the numbers but a reasonable guess would be that it can consumes roughly 1-Ghz of CPU to compress 100MB/sec. This will of course depend on the type of data being compressed. -r Roshan Perera writes: Hi all, I am after some help/feedback to the subject issue explained below. We are in the process of migrating a big DB2 database from a 6900 24 x 200MHz CPU's with Veritas FS 8TB of storage Solaris 8 to 25K 12 CPU dual core x 1800Mhz with ZFS 8TB storage SAN storage (compressed RaidZ) Solaris 10. Unfortunately, we are having massive perfomance problems with the new solution. It all points towards IO and ZFS. Couple of questions relating to ZFS. 1. What is the impace on using ZFS compression ? Percentage of system resources required, how much of a overhead is this as suppose to non-compression. In our case DB2 do similar amount of read's and writes. 2. Unfortunately we are using twice RAID (San level Raid and RaidZ) to overcome the panic problem my previous blog (for which I had good response). 3. Any way of monitoring ZFS performance other than iostat ? 4. Any help on ZFS tuning in this kind of environment like caching etc ? Would appreciate for any feedback/help wher to go next. If this cannot be resolved we may have to go back to VXFS which would be a shame. Thanks in advance. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Scalability/performance
On 6/20/07, Constantin Gonzalez [EMAIL PROTECTED] wrote: One disk can be one vdev. A 1+1 mirror can be a vdev, too. A n+1 or n+2 RAID-Z (RAID-Z2) set can be a vdev too. - Then you concatenate vdevs to create a pool. Pools can be extended by adding more vdevs. - Then you create ZFS file systems that draw their block usage from the resources supplied by the pool. Very flexible. This actually brings up something I was wondering about last night: If I was to plan for a 16 disk ZFS-based system, you would probably suggest me to configure it as something like 5+1, 4+1, 4+1 all raid-z (I don't need the double parity concept) I would prefer something like 15+1 :) I want ZFS to be able to detect and correct errors, but I do not need to squeeze all the performance out of it (I'll be using it as a home storage server for my DVDs and other audio/video stuff. So only a few clients at the most streaming off of it) I would be interested in hearing if there are any other configuration options to squeeze the most space out of the drives. I have no issue with powering down to replace a bad drive, and I expect that I'll only have one at the most fail at a time. If I really do need room for two to fail then I suppose I can look for a 14 drive space usable setup and use raidz-2. Thanks, mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Scalability/performance
On 6/20/07, Paul Fisher [EMAIL PROTECTED] wrote: I would not risk raidz on that many disks. A nice compromise may be 14+2 raidz2, which should perform nicely for your workload and be pretty reliable when the disks start to fail. Would anyone on the list not recommend this setup? I could live with 2 drives being used for parity (or the parity concept) I would be able to reap the benefits of ZFS - self-healing, corrupted file reconstruction (since it has some parity to read from) and should have decent performance (obviously not smokin' since I am not configuring this to try for the fastest possible) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Karma Re: Re: Best use of 4 drives?
On 6/15/07, Brian Hechinger [EMAIL PROTECTED] wrote: Hmmm, that's an interesting point. I remember the old days of having to stagger startup for large drives (physically large, not capacity large). Can that be done with SATA? I had to link 2 600w power supplies together to be able to power on 12 drives... I believe it is up to the controller (and possibly the drives) to support staggering. But it is allowed in SATA if the controller/drives support it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Btrfs, COW for Linux [somewhat OT]
it's about time. this hopefully won't spark another license debate, etc... ZFS may never get into linux officially, but there's no reason a lot of the same features and ideologies can't make it into a linux-approved-with-no-arguments filesystem... as a more SOHO user i like ZFS mainly for it's COW and integrity, and being able to add more storage later on. the latter is nothing new though. but telling the world who needs hardware raid? software can do it much better is a concept that excites me; it would be great for linux to have something like that as well that could be merged into the kernel without any debate. On 6/14/07, David Magda [EMAIL PROTECTED] wrote: Hello, Somewhat off topic, but it seems that someone released a COW file system for Linux (currently in 'alpha'): * Extent based file storage (2^64 max file size) * Space efficient packing of small files * Space efficient indexed directories * Dynamic inode allocation * Writable snapshots * Subvolumes (separate internal filesystem roots) - Object level mirroring and striping * Checksums on data and metadata (multiple algorithms available) - Strong integration with device mapper for multiple device support - Online filesystem check * Very fast offline filesystem check - Efficient incremental backup and FS mirroring ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Btrfs, COW for Linux [somewhat OT]
because i don't want bitrot to destroy the thousands of pictures and memories i keep? i keep important personal documents, etc. filesystem corruption is not a feature to me. perhaps i spoke incorrectly but i consider COW to be one of the reasons a filesystem can keep itself in check, the disk write will be transactional and guaranteed if it says successful, correct? i still plan on using offsite storage services to maintain the physical level of redundancy (house burns down, equipment is stolen, HDs will always die at some point, etc.) but as a user who has had corruption happen many times (FAT32, NTFS, XFS, JFS) it is encouraging to see more options that put emphasis on integrity... On 6/14/07, Frank Cusack [EMAIL PROTECTED] wrote: On June 14, 2007 3:57:55 PM -0700 mike [EMAIL PROTECTED] wrote: as a more SOHO user i like ZFS mainly for it's COW and integrity, and huh. As a SOHO user, why do you care about COW? -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Btrfs, COW for Linux [somewhat OT]
On 6/14/07, Frank Cusack [EMAIL PROTECTED] wrote: Yes, but there are many ways to get transactions, e.g. journalling. ext3 is journaled. it doesn't seem to always be able to recover data. it also takes forever to fsck. i thought COW might alleviate some of the fsck needs... it just seems like a more efficient (or guaranteed?) method of disk commitment. but i am speaking purely from the sidelines. i don't know all the internals of filesystems, just the ones that have bitten me in the past. ps. top posting especially sucks when you ask multiple questions. yes, sir! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Holding disks for home servers
looks like you used 3 for a total of 15 disks, right? I have a CM stacker too - I used the CM 4-disks-in-3-5.25-slots though. I am currently trying to sell it too, as it is bulky and I would prefer using eSATA/maybe Firewire/USB enclosures and a small controller machine (like a Shuttle) so it is much easier to move around, and much easier to expand. You'll hit a ceiling real quick with those big chassis (I already did, it only holds 12 disks in current fashion and I have a 16 port Areca card) and I don't want to get stuck once again running out of space, ZFS or not. Not to mention I had to custom bind two 600w power supplies together to give it enough juice to run... I want something not as insane. I just want storage :) On 6/7/07, Rob Logan [EMAIL PROTECTED] wrote: On the third upgrade of the home nas, I chose http://www.addonics.com/products/raid_system/ae4rcs35nsa.asp to hold the disks. each hold 5 disks, in the space of three slots and 4 fit into a http://www.google.com/search?q=stacker+810 case for a total of 20 disks. But if given a chance to go back in time, the http://www.supermicro.com/products/accessories/mobilerack/CSE-M35TQ.cfm has LEDs next to the drive, and doesn't vibrate as much. photos in http://rob.com/sun/zfs/ Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] current state of play with ZFS boot and install?
I've been using the zfsbootkit to modify my jumpstart images. As far as I know, the kit is the current process for zfs boot until further notice. http://www.opensolaris.org/os/community/install/files/zfsboot-kit-20060418.i386.tar.bz2 See readme in the package. On Thu, 2007-05-31 at 02:06 -0700, Marko Milisavljevic wrote: I second that... I am trying to figure out what is missing so that I can use ZFS exclusively... right now as far as I know two major obstacles are no support from installer and issues with live update. Are both of those expected to be resolved this year? On 5/30/07, Carl Brewer [EMAIL PROTECTED] wrote: Out of curiosity, I'm wondering if Lori, or anyone else who actually writes the stuff, has any sort of a 'current state of play' page that describes the latest OS ON release and how it does ZFS boot and installs? There's blogs all over the place, of course, which have a lot of stale information, but is there a 'the current release supports this, and this is how you install it' page anywhere, or somewhere in particular to watch? I've been playing with ZFS boot since around b34 or whenever it was that it first started to be able to be used as a boot partition with the temporary ufs partition hack, but I understand it's moved beyond that. I've been downloading and playing with the ON builds every now and then, but haven't found (haven't looked in the right places?) anywhere where each build has this is what this build does differently, this is what works and how documented. can someone belt me with a cluestick please? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mike Dotson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Re: ZFS - Use h/w raid or not?Thoughts. Considerations.
Also the unmirrored memory for the rest of the system has ECC and ChipKill, which provides at least SOME protection against random bit-flips. -- Question: It appears that CF and friends would make a descent live-boot (but don't run on me like I'm a disk) type of boot-media due to the limited write/re-write limitations of flash-media. (at least the non-exotic type of flash-media) Would something like future zfs-booting on a pair of CF-devices reduce/lift that limitation? (does the COW nature of ZFS automatically spread WRITES across the entire CF device?) [[ is tmp-fs/swap going to remain a problem till zfs-swap adds some COW leveling to the swap-area? ]] Thanks, -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Carson Gaspar Sent: Tuesday, May 29, 2007 8:05 PM To: Richard Elling Cc: zfs-discuss@opensolaris.org; Anton B. Rang Subject: Re: [zfs-discuss] Re: ZFS - Use h/w raid or not?Thoughts. Considerations. Richard Elling wrote: But I am curious as to why you believe 2x CF are necessary? I presume this is so that you can mirror. But the remaining memory in such systems is not mirrored. Comments and experiences are welcome. CF == bit-rot-prone disk, not RAM. You need to mirror it for all the same reasons you need to mirror hard disks, and then some. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Re: ZFS - Use h/w raid or not?Thoughts.Considerations.
Hey Richard, thanks for sparking the conversation... This is a very interesting topic (especially if you take it out of the HPC we need 1000 servers to have this minimal boot image space into general purpose/enterprise computing) -- Based on your earlier note, it appears you're not planning to use cheapo free after rebate CF cards :-) (The cheap-ones would probably be perfect for ZFS a-la cheap-o-JBOD). Having boot disks mirrored across controllers has had sys-admins sleep better over the years (especially in FC-loop-cases with both drives on the same loop... Sigh). If the USB-bus one might hang these fancy FC-cards on is robust enough then perhaps a single battle hardened CF-card will suffice... (although zfs ditto-blocks or some form of protection might still be considered a good thing?) Having 2 cards would certainly make the unlikely replacement of a card a LOT more straight-forward than a single-card failure... Much of this would depend on the quality of these CF-cards and how they put up under load/stress/time -- If we're going down this CF-boot path, many of us are going to have to re-think our boot-environment quite a bit. We've been spoiled with 36+ GB mirrored-boot drives for some time now (if you do a lot of PATCHING, you'll find that even those can get tight But that's a discussion for a different day) I don't think most enterprise boot disk layouts are going to fit (even unmirrored) onto a single 4GB CF-card. So we'll have to play some games where we start splitting off /opt, /var, (which is fairly read-write intensive when you have process-accounting etc. running) onto some other non-CF filesystem (likely a SAN of some variety). At some point the hackery a 4GB CF-card is going to force us to do, is going to become more complex than just biting the bullet and doing a full multipath-ed SAN-boot calling it a day. (or perhaps some future iSCSI/NFS boot for the SAN-averse) Seriously though... If (say in some HPC/grid space?) you can stick your ENTIRE boot environment onto a 4GB CF-card, why not just do the SAN, NFS/iSCSI boot thing instead? (what ever happened to: http://blogs.sun.com/dweibel/entry/sprint_snw_2006#comments ) -- But lets explore the CF thing some more... There is something there, although I think Sun might have to provide some best-practices/suggestions as to how customers that don't run a minimum-config-no-local-apps, pacct, monitoring, etc. solaris environment are best to use something like this. Use it as a pivot boot onto the real root-image? That would delegate the CF-card to little more than a rescue/utility image Kinda cool, but not earth-shattering I would think (especially for those already utilizing wanboot for such purposes) -- Splitting off /var and friends from the boot environment (and still packing the boot env say on a ditto-block 4GB FC card) is still going to leave a pretty tight boot env. Obviously you want to be able to do some fancy live-upgrade stuff in this space too, and all of a sudden a single 4GB flash-card don't look so big anymore 2 of them, with some ZFS (and compression?) or even SDS mirroring between them would possibly go a long way to make replacement easier, give you redundancy (zfs/sds mirrors), some wiggle-room for live-upgrade scenarios, and who knows what else. Still tight though -- If it's a choice between 1-CF or NONE, we'll take 1-CF I guess Fear of the unknown (and field data showing how these guys hold up over time) would really determine uptake I guess. (( as you said, real data regarding these specialized CF-cards will be required... Is it going to vary greatly from vendor to vendor? Usecase to usecase? I'm not looking forward to blazing the trail here Something doesn't seem right, especially without the safety net of a mirrored environment... But maybe that's just old-school sys-admin superstition Lets get some data, set me straight...)) -- Right now we can stick 4x 4GB memory sticks into a x4200 (creating a cactus looking device :-) A single built-in CF is obviously cleaner/safer, but also somewhat limiting in terms of redundancy or even just capacity. Has anyone considered taking say 2x 4G CF cards, and sticking them inside one of the little sas-drive-enclosures? Customers could purchase upto 4 of those for certain servers, (t2000/x4200 etc.) and treat these as if they were really fast, lower-power/heat, (never fails no need to mirror?) ~9GB drives. In the long-run, is that easier and more flexible? -- It would be really interesting to hear how others out there might try to use a CF-boot-option in their environment. Good thread, lets bat this around some more. -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 29, 2007 9:48 PM To: Ellis, Mike Cc: Carson Gaspar; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Re: ZFS - Use h/w raid or not?Thoughts.Considerations. Ellis, Mike wrote
RE: [zfs-discuss] Re: ZFS - Use h/w raid or not?Thoughts.Considerations.
Hey Richard, thanks for sparking the conversation... This is a very interesting topic (especially if you take it out of the HPC we need 1000 servers to have this minimal boot image space into general purpose/enterprise computing) -- Based on your earlier note, it appears you're not planning to use cheapo free after rebate CF cards :-) (The cheap-ones would probably be perfect for ZFS a-la cheap-o-JBOD). Having boot disks mirrored across controllers has had sys-admins sleep better over the years (especially in FC-loop-cases with both drives on the same loop... Sigh). If the USB-bus one might hang these fancy FC-cards on is robust enough then perhaps a single battle hardened CF-card will suffice... (although zfs ditto-blocks or some form of protection might still be considered a good thing?) Having 2 cards would certainly make the unlikely replacement of a card a LOT more straight-forward than a single-card failure... Much of this would depend on the quality of these CF-cards and how they put up under load/stress/time -- If we're going down this CF-boot path, many of us are going to have to re-think our boot-environment quite a bit. We've been spoiled with 36+ GB mirrored-boot drives for some time now (if you do a lot of PATCHING, you'll find that even those can get tight But that's a discussion for a different day) I don't think most enterprise boot disk layouts are going to fit (even unmirrored) onto a single 4GB CF-card. So we'll have to play some games where we start splitting off /opt, /var, (which is fairly read-write intensive when you have process-accounting etc. running) onto some other non-CF filesystem (likely a SAN of some variety). At some point the hackery a 4GB CF-card is going to force us to do, is going to become more complex than just biting the bullet and doing a full multipath-ed SAN-boot calling it a day. (or perhaps some future iSCSI/NFS boot for the SAN-averse) Seriously though... If (say in some HPC/grid space?) you can stick your ENTIRE boot environment onto a 4GB CF-card, why not just do the SAN, NFS/iSCSI boot thing instead? (what ever happened to: http://blogs.sun.com/dweibel/entry/sprint_snw_2006#comments ) -- But lets explore the CF thing some more... There is something there, although I think Sun might have to provide some best-practices/suggestions as to how customers that don't run a minimum-config-no-local-apps, pacct, monitoring, etc. solaris environment are best to use something like this. Use it as a pivot boot onto the real root-image? That would delegate the CF-card to little more than a rescue/utility image Kinda cool, but not earth-shattering I would think (especially for those already utilizing wanboot for such purposes) -- Splitting off /var and friends from the boot environment (and still packing the boot env say on a ditto-block 4GB FC card) is still going to leave a pretty tight boot env. Obviously you want to be able to do some fancy live-upgrade stuff in this space too, and all of a sudden a single 4GB flash-card don't look so big anymore 2 of them, with some ZFS (and compression?) or even SDS mirroring between them would possibly go a long way to make replacement easier, give you redundancy (zfs/sds mirrors), some wiggle-room for live-upgrade scenarios, and who knows what else. Still tight though -- If it's a choice between 1-CF or NONE, we'll take 1-CF I guess Fear of the unknown (and field data showing how these guys hold up over time) would really determine uptake I guess. (( as you said, real data regarding these specialized CF-cards will be required... Is it going to vary greatly from vendor to vendor? Usecase to usecase? I'm not looking forward to blazing the trail here Something doesn't seem right, especially without the safety net of a mirrored environment... But maybe that's just old-school sys-admin superstition Lets get some data, set me straight...)) -- Right now we can stick 4x 4GB memory sticks into a x4200 (creating a cactus looking device :-) A single built-in CF is obviously cleaner/safer, but also somewhat limiting in terms of redundancy or even just capacity. Has anyone considered taking say 2x 4G CF cards, and sticking them inside one of the little sas-drive-enclosures? Customers could purchase upto 4 of those for certain servers, (t2000/x4200 etc.) and treat these as if they were really fast, lower-power/heat, (never fails no need to mirror?) ~9GB drives. In the long-run, is that easier and more flexible? -- It would be really interesting to hear how others out there might try to use a CF-boot-option in their environment. Good thread, lets bat this around some more. -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 29, 2007 9:48 PM To: Ellis, Mike Cc: Carson Gaspar; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Re: ZFS - Use h/w raid or not?Thoughts.Considerations. Ellis, Mike wrote
Re: [zfs-discuss] zfs root: legacy mount or not?
On Fri, 2007-05-25 at 14:29 -0600, Lori Alt wrote: Bill Sommerfeld wrote: IMHO, there should be no need to put any ZFS filesystems in /etc/vfstab, but (this is something of a digression based on discussion kicked up by PSARC 2007/297) it's become clear to me that ZFS filesystems *should* be mounted by mountall and mount -a rather than via a special-case invocation of zfs mount at the end of the fs-local method script. in other words: teach mount how to find the list of filesystems in attached pools and mix them in to the dependency graph it builds to mount filesystems in the right order, rather than mounting everything-but-zfs first and then zfs later. I agree with this. This seems like a necessary response to both PSARC/2007/297 and also necessary for eliminating legacy mounts for zfs root file systems. The problem of the interaction between legacy and non-legacy mounts will just get worse once we are using non-legacy mounts for the file systems in the BE. Could we also look into why system-console insists on waiting for ALL the zfs mounts to be available? Shouldn't the main file system food groups be mounted and then allow console-login (much like single user or safe-mode)? Would help in many cases where an admin needs to work on a system but doesn't need, say 20k users home directories mounted, to do this work. Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mike Dotson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root: legacy mount or not?
On Fri, 2007-05-25 at 15:50 -0600, Lori Alt wrote: Mike Dotson wrote: On Fri, 2007-05-25 at 14:29 -0600, Lori Alt wrote: Would help in many cases where an admin needs to work on a system but doesn't need, say 20k users home directories mounted, to do this work. So single-user mode is not sufficient for this? Not all work needs to be done in single user:) And I wouldn't consider a 4+ hour boot time just for mounting file systems a good use of cpu time when an admin could be doing other things - preparation for next patching, configuring changes to webserver, etc. Or just monitoring the status of the file system mounts to give an update to management on how many file systems are mounted and how many are left. Point is, why is console-login dependent on *all* the file systems being mounted in *multiboot*. Does it really need to depend on *all* the file systems being mounted? Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Thanks... Mike Dotson Area System Support Engineer - ACS West Phone: (503) 343-5157 [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root: legacy mount or not?
On Fri, 2007-05-25 at 15:19 -0700, Eric Schrock wrote: This has been discussed many times in smf-discuss, for all types of login. Basically, there is no way to say console login for root only. As long as any user can log in, we need to have all the filesystems mounted because we don't know what dependencies there may be. Simply changing the definition of console-login isn't a solution because it breaks existing assumptions and software. devils_advocate So how are you guaranteeing NFS server and automount with autofs are up, running and working for the user for console-login. /devils_advocate I don't buy this argument and you don't have to say console-login for root only you just have to have console-login and the services available are minimal and may not include *all* services much like when a nfs server is down, etc. If the software depends on a file system or all the file systems to be mounted, it adds that as a dependency (filesystem/local). console-login does not require this - only non-root users. (I remember a smf config bug with apache not requiring filesystem/local and failing to start) What software is dependent on console-login? helios(3): svcs -D console-login STATE STIMEFMRI In fact the console-login depends on filesystem/minimal which to me means minimal file systems not all file systems and there is no software dependent on console-login - where's the disconnect? From what I see, problem is auditd is dependent on filesystem/local which is where we possibly have the hangup. A much better option is the 'trigger mount' RFE that would allow ZFS to quickly 'mount' a filesystem but not pull all the necessary data off disk until it's first accessed. Agreed but there's still the issue with console-login being dependent on all file systems instead of minimal file systems. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock -- Mike Dotson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root: legacy mount or not?
On Fri, 2007-05-25 at 15:46 -0700, Eric Schrock wrote: On Fri, May 25, 2007 at 03:39:11PM -0700, Mike Dotson wrote: In fact the console-login depends on filesystem/minimal which to me means minimal file systems not all file systems and there is no software dependent on console-login - where's the disconnect? You're correct - I thought console-login depended in filesystem/local, not filesystem/minimal. ZFS filesystems are not mounted as part of filesystem/minimal, so remind me what the promlem is? Create 20k zfs file systems and reboot. Console login waits for all the zfs file systems to be mounted (fully loaded 880, you're looking at about 4 hours so have some coffee ready). The *only* place I can see the filesystem/local dependency is in svc:/system/auditd:default, however, on my systems it's disabled. Haven't had a chance to really prune out the dependency tree to really find the disconnect but once /, /var, /tmp and /usr are mounted, the conditions for console-login should be met. As you mentioned, best solution for this number of filesystems in zfs land is the *automount* fs option where it mounts the filesystems as needed to reduce the *boot time*. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock -- Thanks... Mike Dotson Area System Support Engineer - ACS West Phone: (503) 343-5157 [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] DBMS on zpool
This is probably a good place to start. http://blogs.sun.com/realneel/entry/zfs_and_databases Please post back to the group with your results, I'm sure many of us are interested. Thanks, -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of homerun Sent: Friday, May 18, 2007 8:42 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] DBMS on zpool Hi Just playing around with zfs , trying to locate DBMS data files to zpool. DBMS i mean here are oracle and informix. currently noticed that read operations perfomance is excelent but all write operations are not and also write operations performance variates a lot. My quess for not so good write performance and write performance variation is double buffering , DBMS buffers and zfs caching. together. Have anyone seen or tested best practices how should DBMS setup be implemented using zpool ; zfs or zvol. Thanks This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss