Re: [zfs-discuss] Zones on shared storage - a warning
hey mike/cindy, i've gone ahead and filed a zfs rfe on this functionality: 6915127 need full support for zfs pools on files implmenting this rfe is a requirement for supporting encapsulated zones on shared storage. ed On Thu, Jan 07, 2010 at 03:26:17PM -0700, Cindy Swearingen wrote: Hi Mike, I can't really speak for how virtualization products are using files for pools, but we don't recommend creating pools on files, much less NFS-mounted files and then building zones on top. File-based pool configurations might be used for limited internal testing of some features, but our product testing does not include testing storage pools on files or NFS-mounted files. Unless Ed's project gets refunded, I'm not sure how much farther you can go with this approach. Thanks, Cindy On 01/07/10 15:05, Mike Gerdts wrote: [removed zones-discuss after sending heads-up that the conversation will continue at zfs-discuss] On Mon, Jan 4, 2010 at 5:16 PM, Cindy Swearingen cindy.swearin...@sun.com wrote: Hi Mike, It is difficult to comment on the root cause of this failure since the several interactions of these features are unknown. You might consider seeing how Ed's proposal plays out and let him do some more testing... Unfortunately Ed's proposal is not funded last I heard. Ops Center uses many of the same mechanisms for putting zones on ZFS. This is where I saw the problem initially. If you are interested in testing this with NFSv4 and it still fails the same way, then also consider testing this with a local file instead of a NFS-mounted file and let us know the results. I'm also unsure of using the same path for the pool and the zone root path, rather than one path for pool and a pool/dataset path for zone root path. I will test this myself if I get some time. I have been unable to reproduce with a local file. I have been able to reproduce with NFSv4 on build 130. Rather surprisingly the actual checksums found in the ereports are sometimes 0x0 0x0 0x0 0x0 or 0xbaddcafe00 Here's what I did: - Install OpenSolaris build 130 (ldom on T5220) - Mount some NFS space at /nfszone: mount -F nfs -o vers=4 $file:/path /nfszone - Create a 10gig sparse file cd /nfszone mkfile -n 10g root - Create a zpool zpool create -m /zones/nfszone nfszone /nfszone/root - Configure and install a zone zonecfg -z nfszone set zonepath = /zones/nfszone set autoboot = false verify commit exit chmod 700 /zones/nfszone zoneadm -z nfszone install - Verify that the nfszone pool is clean. First, pkg history in the zone shows the timestamp of the last package operation 2010-01-07T20:27:07 install pkg Succeeded At 20:31 I ran: # zpool status nfszone pool: nfszone state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors I booted the zone. By 20:32 it had accumulated 132 checksum errors: # zpool status nfszone pool: nfszone state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone DEGRADED 0 0 0 /nfszone/root DEGRADED 0 0 132 too many errors errors: No known data errors fmdump has some very interesting things to say about the actual checksums. The 0x0 and 0xbaddcafe00 seem to shout that these checksum errors are not due to a couple bits flipped # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 0x290cbce13fc59dce 3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 4cksum_actual = 0x0 0x0 0x0 0x0 4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 0x330107da7c4bcec0 5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 0x4e0b3a8747b8a8 6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 I halted the zone, exported the pool, imported the pool, then did a scrub. Everything seemed
Re: [zfs-discuss] snv_110 - snv_121 produces checksum errors on Raid-Z pool
hey richard, so i just got a bunch of zfs checksum errors after replacing some mirrored disks on my desktop (u27). i originally blamed the new disks, until i saw this thread, at which point i started digging in bugster. i found the following related bugs (i'm not sure which one adam was refering to): 6847180 Status of newly replaced drive became faulted due to checksum errors after scrub in raidz1 pool http://bugs.opensolaris.org/view_bug.do?bug_id=6847180 6869090 on thumper with ZFS (snv_120) raidz causes checksum errors from all drives http://bugs.opensolaris.org/view_bug.do?bug_id=6869090 i think the issue i'm seeing may be 6847180. reading through the bug, i get the impression that it can affect disks in mirrors as well as raidz configurations. to complicate the situation, i just upgraded from snv_121 to snv_122. the initial checksum after resilvering errors i saw were on snv_121. i'm not seeing any new errors with snv_122. (of course i haven't tried a new resilvering operation since upgrading to snv_122, i'll probably do that tommorow.) i've zpool clear'ed the problem, did a scrub, and things look ok. i'm currently testing the pool by doing more scrubs + builds on it to see if i get any more errors. ed On Wed, Sep 02, 2009 at 09:19:03AM -0700, Richard Elling wrote: On Sep 2, 2009, at 2:38 AM, Daniel Carosone wrote: Furthermore, this clarity needs to be posted somewhere much, much more visible than buried in some discussion thread. I've added a note in the ZFS Troubleshooting Guide wiki. However, I could not find a public CR. If someone inside Sun can provide a CR number, I'll add that to the reference. http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Resolving_Software_Problems -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cleaning up cloned zones
hey anil, given that things work, i'd recommend leaving them alone. if you really want to insist on cleaning things up aesthetically then you need to do multiple zfs operation and you'll need to shutdown the zones. assuming you haven't cloned any zones (because if you did that complicates things), you could do: - shutdown your zones - zfs promote of the latest zbe - destroy all of the new snapshots of the promoted zbe (and the old zbe filesystems which are now dependants of those snapshots.) - a rename of the promoted zbe to whatever name you want to standardize on. note that i haven't tested any of this, but in theory it should work. it may be the case that some of the zfs operations above may fail due to the zoned bit being set for zbes. if this is the case then you'll need to clear the zoned bit, do the operations, and then reset the zoned bit. please don't come crying to me if this doesn't work. ;) ed On Wed, Jul 29, 2009 at 07:44:37PM -0700, Anil wrote: I create a couple of zones. I have a zone path like this: r...@vps1:~# zfs list -r zones/cars NAME USED AVAIL REFER MOUNTPOINT zones/fans 1.22G 3.78G22K /zones/fans zones/fans/ROOT 1.22G 3.78G19K legacy zones/fans/ROOT/zbe 1.22G 3.78G 1.22G legacy I then upgrade the global zone, this creates the zfs clones/snapshots for the zones: r...@vps1:~# zfs list -r zones/fans NAMEUSED AVAIL REFER MOUNTPOINT zones/fans 4.78G 5.22G22K /zones/fans zones/fans/ROOT4.78G 5.22G19K legacy zones/fans/ROOT/zbe2.64G 5.22G 2.64G legacy zones/fans/ROOT/zbe-1 2.13G 5.22G 3.99G legacy I create a couple of new zones, the mounted zfs tree looks like this: r...@vps1:~# zfs list -r zones/cars NAME USED AVAIL REFER MOUNTPOINT zones/cars 1.22G 3.78G22K /zones/cars zones/cars/ROOT 1.22G 3.78G19K legacy zones/cars/ROOT/zbe 1.22G 3.78G 1.22G legacy So, now the problem is, I have some zones that have a zbe-1 and some that have a zfs clone with just zbe name. After making sure everything works for a month now, I want to clean up that. I want to promote all of them to be just zbe. I understand I won't be able to revert back to original zone bits, but I could have 40+ zones on this system, and I prefer them all to be consistent looking. Here is a full hierarchy now: r...@vps1:~# zfs get -r mounted,origin,mountpoint zones/fans NAME PROPERTYVALUE SOURCE zones/fans mounted yes- zones/fans origin - - zones/fans mountpoint /zones/fansdefault zones/fans/ROOTmounted no - zones/fans/ROOTorigin - - zones/fans/ROOTmountpoint legacy local zones/fans/ROOT/zbemounted no - zones/fans/ROOT/zbeorigin - - zones/fans/ROOT/zbemountpoint legacy local zones/fans/ROOT/z...@zbe-1 mounted - - zones/fans/ROOT/z...@zbe-1 origin - - zones/fans/ROOT/z...@zbe-1 mountpoint - - zones/fans/ROOT/zbe-1 mounted yes- zones/fans/ROOT/zbe-1 origin zones/fans/ROOT/z...@zbe-1 - zones/fans/ROOT/zbe-1 mountpoint legacy local How do I go about renaming and destroying the original zbe fs? I believe this will involve me to promote the zbe-1 and then destroy zbe followed by renaming zbe-1 to zbe. But this is a live system, I don't have something to play with first. Any tips? Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] snapshot management issues
hey all, so recently i wrote some zones code to manage zones on zfs datasets. the code i wrote did things like rename snapshots and promote filesystems. while doing this work, i found a few zfs behaviours that, if changed, could greatly simplify my work. the primary issue i hit was that when renaming a snapshot, any clones derived from those snapshot are unmounted/remounted. promoting a dataset (which results in snapshots being moved from one dataset to another) doesn't result in any clones being unmounted/remounted. this made me wonder if the mount cycle caused by renames is actually necessary or is it just an artifact of the current implementation? removing this unmount/remount would greatly simplify my dataset management code. (the snapshot rename can also fail if any clones are zoned or in use, so eliminating these mount operations would remove one potential failure mode for zone administration operations.) this problem was compounded by the fact that all the clone filesystems i was dealing with were zoned. the code i wrote runs in the global zone and zfs prevents the global zone from mounting or unmounting zoned filesystems. (so my code additionally had to manipulate the zoned attribute for cloned datasets.) hence, if there's no way to eliminate the need to unmount/remount filesystems when renaming snapshots, how would people feel about adding a option to zfs/libzfs to be able to override the restrictions imposed by the zoned attribute? ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs promote/destroy enhancements?
On Thu, Apr 23, 2009 at 11:31:07AM -0500, Nicolas Williams wrote: On Thu, Apr 23, 2009 at 09:59:33AM -0600, Matthew Ahrens wrote: zfs destroy [-r] -p sounds great. I'm not a big fan of the -t template. Do you have conflicting snapshot names due to the way your (zones) software works, or are you concerned about sysadmins creating these conflicting snapshots? If it's the former, would it be possible to change the zones software to avoid it? I think the -t option -- automatic snapshot name conflict resolution -- makes a lot of sense in the context of snapshots and clones mostly managed by a system component (zonedm, beadm) but where users can also create snapshots (e.g., for time slider, backups): you don't want the users to create snapshot names that will later prevent zoneadm/beadm destroy. Making the users responsible for resolving such conflicts seems not user-friendly to me. However, if we could just avoid the conflicts in the first place then we'd not need an option for automatic snapshot name conflict resolution. Conflicts could be avoided by requiring that all snapshot names of a dataset and of clones of snapshots of that dataset, and so on, be unique. Snapshot name uniqueness could be a property of the root dataset of a snapshot/clone tree. an interesting idea. i can file an RFE on this as well, but there are a couple side effects to consider with this approach. setting this property would break zfs snapshot -r if there are multiple snapshots and clones of a single filesystem. callers that creates snapshots (ie zones) usually have a simple naming schemes. they look at what snapshots are there and pick a new name that isn't used yet. with this approach, picking a new name becomes harder because iterating over the existing snapshot namespace just became harder. (i guess that callers could adopt a policy of creating snaoshots with incrementing names until they get some return code other than EEXIST.) ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs promote/destroy enhancements?
On Thu, Apr 23, 2009 at 09:59:33AM -0600, Matthew Ahrens wrote: Ed, zfs destroy [-r] -p sounds great. I'm not a big fan of the -t template. Do you have conflicting snapshot names due to the way your (zones) software works, or are you concerned about sysadmins creating these conflicting snapshots? If it's the former, would it be possible to change the zones software to avoid it? conflicting names are pretty common for zones. the zones infrastructure uses SUNWzoneXXX for zone cloning and zbe-XXX for zone BE management. i guess we could switch to using timestamps... ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs promote/destroy enhancements?
hey all, in both nevada and opensolaris, the zones infrastructure tries to leverage zfs where ever possible. we take advantage of snapshotting and cloning for things like zone cloning and zone be management. because of this, we've recently run into multiple scenarios where a zoneadm uninstall fails. 6787557 zoneadm uninstall fails when zone has zfs clones http://bugs.opensolaris.org/view_bug.do?bug_id=6787557 7491 problems destroying zones with cloned dependents http://defect.opensolaris.org/bz/show_bug.cgi?id=7491 these failures occur when we try to destroy the zfs filesystem associated with a zone, but that filesystem has been snapshotted and cloned. the way we're fixing these problems is by doing a promotion before the destroy. jerry has fixed 6787557 for nevada in zoneadm, but now i'm looking at having to re-implement a similar fix for opensolaris in the ipkg brand for 7491. hence, i'm wondering if it would make more sense just to add this functionality directly into zfs(1m)/libzfs. this would involve enhancements to the zfs promote and destroy subcommands. here's what i'm thinking. the first component would be a new -t template option to zfs promote. this option would instruct zfs promote to check for snapshot naming collisions between the origin and promotion target filesystems, and to rename any origin filesystem snapshots with conflicting names before attempting the promotion. the conflicting snapshots will be renamed to templateXXX, where XXX is an integer used to make the snapshot name unique. today users have to do this renaming manually if they want the promotion to succeed. to illustrate how this new functionality would work, say i have the following filesystems/snapshots: tank/zones/zone1 tank/zones/zo...@sunwzone1 tank/zones/zo...@user1 tank/zones/zone2(clone of tank/zones/zo...@sunwzone1) tank/zones/zo...@sunwzone1 if i do a zfs promote -t SUNWzone tank/zones/zone2, then this would involved a rename of zo...@sunwzone1 to zo...@sunwzone2, and a promotion of tank/zones/zone2. the @user1 snapshot would not be renamed because there was no naming conflict with the filesystem being promoted. hence i would end up with: tank/zones/zone2 tank/zones/zo...@sunwzone1 tank/zones/zo...@sunwzone2 tank/zones/zo...@user1 tank/zones/zone1(clone of tank/zones/zo...@sunwzone2) if i did a zfs promote -t user tank/zones/zone2, then this would this involved a rename of zo...@sunwzone1 to zo...@user2, and then a promotion of tank/zones/zone2. hence i would end up with: tank/zones/zone2 tank/zones/zo...@sunwzone1 tank/zones/zo...@user1 tank/zones/zo...@user2 tank/zones/zone1(clone of tank/zones/zo...@user2) the second component would be two new flags to zfs destroy: zfs destroy [-p [-t template]] the -p would instruct zfs destroy to try to promote the oldest clone of the youngest snapshot of the filesystem being destroyed before doing the destroy. if the youngest filesystem doesn't have a clone, the command will fail unless -r was specified. if -r was specified we will continue to look through snapshot from youngest to oldest looking for the first one with a clone. if a snapshot with a clone is found, the oldest clone will be promoted before the destroy. if a template was specified via -t, this will be passed through to the promote operation. thoughts? ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] strange zfs recieve behavior
On Sun, Oct 14, 2007 at 09:37:42PM -0700, Matthew Ahrens wrote: Edward Pilatowicz wrote: hey all, so i'm trying to mirror the contents of one zpool to another using zfs send / recieve while maintaining all snapshots and clones. You will enjoy the upcoming zfs send -R feature, which will make your script unnecessary. sweet. while working on it i realized that this just really needed to be built in functionality. :) i assume that this wil allow for backups by doing zfs snap -r and zfs send -R one day, then sometime later doing the same thing and just sending the deltas for every filesystem? will this also include any other random snapshots that were created in between when the two zfs send -R commands are run? (not just snapshots that were used for clones.) is there an bugid/psarc case number? [EMAIL PROTECTED] zfs send -i 070221 export/ws/[EMAIL PROTECTED] | zfs receive -v -d export2 receiving incremental stream of export/ws/[EMAIL PROTECTED] into export2/ws/[EMAIL PROTECTED] cannot receive: destination has been modified since most recent snapshot You may be hitting 6343779 ZPL's delete queue causes 'zfs restore' to fail. To work around it, use zfs recv -F. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] strange zfs recieve behavior
hey all, so i'm trying to mirror the contents of one zpool to another using zfs send / recieve while maintaining all snapshots and clones. essentially i'm taking a recursive snapshot. them i'm mirroring the oldest snapshots first and working my way forward. to deal with clones i have a hack that uses zfs promote. i've scripted it and things seem to work... except of course for one thing. ;) there's one snapshot on my system that i can't seem to transfer. here's the problem: ---8--- [EMAIL PROTECTED] zfs send export/ws/[EMAIL PROTECTED] | zfs receive -v -d export2 receiving full stream of export/ws/[EMAIL PROTECTED] into export2/ws/[EMAIL PROTECTED] received 134MB stream in 28 seconds (4.77MB/sec) [EMAIL PROTECTED] [EMAIL PROTECTED] zfs send -i 070221 export/ws/[EMAIL PROTECTED] | zfs receive -v -d export2 receiving incremental stream of export/ws/[EMAIL PROTECTED] into export2/ws/[EMAIL PROTECTED] cannot receive: destination has been modified since most recent snapshot ---8--- as far as i know, there's nothing special about these two snapshots. ---8--- [EMAIL PROTECTED] zfs list | grep export/ws/xen-1 export/ws/xen-1 105M 3.09G 104M /export/ws/xen-1 export/ws/[EMAIL PROTECTED]570K - 103M - export/ws/[EMAIL PROTECTED] 0 - 104M - [EMAIL PROTECTED] zfs get -Hp -o value creation export/ws/[EMAIL PROTECTED] 1172088367 [EMAIL PROTECTED] zfs get -Hp -o value creation export/ws/[EMAIL PROTECTED] 1192301172 ---8--- any idea what might be wrong here? it seems that the problem is on the recieve side. i've even tried doing: zfs rollback export2/ws/[EMAIL PROTECTED] before doing the second send but that didn't make any difference. i'm currently running snv_74. both pool are currently at zfs v8, but the source pool has seen lots of zfs and live upgrades. ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS over a layered driver interface
hey swetha, i don't think there is any easy answer for you here. i'd recommend watching all device operations (open, read, write, ioctl, strategy, prop_op, etc) that happen to the ramdisk device when you don't use your layered driver, and then again when you do. then you could compare the two to see what the differences are. or optionally, you could do some more analysis into that ioctl failure to see if it's an important ioctl that should be succeeding and instead is causing a cascading failure. to investigate both of these possiblilties i'd recommend using dtrace fbt probes, the answer won't be easy to find, but dtrace will make finding it easier. lastly, you could start analyzing to see what zio_wait() is waiting for. for this you'd want to use mdb -k to look at the current kernel state and compare that to the source to see what zfs is trying to do and what it's blocked on. (you might actually consider forcing a crash dump and analyzing it offline, since then the state is not changing and you can always come back to it.) ed On Tue, May 15, 2007 at 01:12:06PM -0700, Shweta Krishnan wrote: With what Edward suggested, I got rid of the ldi_get_size() error by defining the prop_op entry point appropriately. However, the zpool create still fails - with zio_wait() returning 22. bash-3.00# dtrace -n 'fbt::ldi_get_size:entry{self-t=1;} fbt::ldi_get_size:entry/self-t/{} fbt::ldi_get_size:return/self-t/{trace((int)arg1);} fbt::ldi_get_size:return{self-t=0;}' -c 'zpool create adsl-pool /dev/layerzfsminor1' dtrace: description 'fbt::ldi_get_size:entry' matched 4 probes cannot create 'adsl-pool': invalid argument for this pool operation dtrace: pid 2487 has exited CPU ID FUNCTION:NAME 0 21606 ldi_get_size:entry 0 21607 ldi_get_size:return 0 bash-3.00# dtrace -n 'fbt:zfs:zfs_ioc_pool_create:entry{self-t=1;} fbt:zfs::return/self-t arg1 == 22/{stack(); exit(0);} fbt:zfs:zfs_ioc_pool_create:return{self-t=0;}' dtrace: description 'fbt:zfs:zfs_ioc_pool_create:entry' matched 1317 probes CPU ID FUNCTION:NAME 0 63848 zio_wait:return zfs`vdev_label_init+0x4ed zfs`vdev_label_init+0x4e zfs`vdev_create+0x4b zfs`spa_create+0x233 zfs`zfs_ioc_pool_create+0x4a zfs`zfsdev_ioctl+0x119 genunix`cdev_ioctl+0x48 specfs`spec_ioctl+0x86 genunix`fop_ioctl+0x37 genunix`ioctl+0x16b unix`sys_syscall32+0x101 I see the strategy routine of my layered driver being invoked, and reads and writes are being done. (in the more detailed dtrace dump, I see zio_vdev_io_start and other zio functions being invoked). Is there a way to figure out where exactly this is breaking? Could it be due to an ioctl failure, since the kernel log shows a failure for the ioctl to the real device? Thanks, Swetha. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS over a layered driver interface
i've seen this ldi_get_size() failure before and it usually occurs on drivers that don't implement their prop_op(9E) entry point correctly or that don't implement the dynamic [Nn]blocks/[Ss]size property correctly. what does your layered driver do in it's prop_op(9E) entry point? also, what driver is your layered driver layered over? ed On Mon, May 14, 2007 at 09:37:51AM -0700, Eric Schrock wrote: This is likely because ldi_get_size() is failing for your device. We've seen this before on 3rd party devices, and have been meaning to create a special errno (instead of EINVAL) to give a more helpful message in this case. - Eric On Sun, May 13, 2007 at 11:54:45PM -0700, Shweta Krishnan wrote: I ran zpool with truss, and here is the system call trace. (again, zfs_lyr is the layered driver I am trying to use to talk to the ramdisk driver). When I compared it to a successful zpool creation, the culprit is the last failing ioctl i.e. ioctl(3, ZFS_IOC_CREATE_POOL, address) I tried looking at the source code for the failing ioctl, but didn't get any hints there. Guess I must try dtrace (which I am about to learn!). bash-3.00# truss -f zpool create adsl-pool /devices/pseudo/[EMAIL PROTECTED]:zfsminor1 2 /var/tmp/zpool.truss bash-3.00# grep Err /var/tmp/zpool.truss 2232: open(/var/ld/ld.config, O_RDONLY) Err#2 ENOENT 2232: xstat(2, /lib/libdiskmgt.so.1, 0x080469C8)Err#2 ENOENT 2232: xstat(2, /lib/libxml2.so.2, 0x08046868) Err#2 ENOENT 2232: xstat(2, /lib/libz.so.1, 0x08046868) Err#2 ENOENT 2232: stat64(/devices/pseudo/[EMAIL PROTECTED]:zfsminor1s2, 0x080429E0) Err#2 ENOENT 2232: modctl(MODSIZEOF_DEVID, 0x03740001, 0x080429BC, 0x08071714, 0x) Err#22 EINVAL 2232: mkdir(/var/run/sysevent_channels/syseventd_channel, 0755) Err#17 EEXIST 2232: unlink(/var/run/sysevent_channels/syseventd_channel/17) Err#2 ENOENT 2232/1: umount2(/var/run/sysevent_channels/syseventd_channel/17, 0x) Err#22 EINVAL 2232/1: ioctl(7, I_CANPUT, 0x) Err#89 ENOSYS 2232/1: stat64(/adsl-pool, 0x08043330)Err#2 ENOENT 2232/1: ioctl(3, ZFS_IOC_POOL_CREATE, 0x08041BC4) Err#22 EINVAL This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Netapp to Solaris/ZFS issues
On Wed, Dec 06, 2006 at 07:28:53AM -0700, Jim Davis wrote: We have two aging Netapp filers and can't afford to buy new Netapp gear, so we've been looking with a lot of interest at building NFS fileservers running ZFS as a possible future approach. Two issues have come up in the discussion - Adding new disks to a RAID-Z pool (Netapps handle adding new disks very nicely). Mirroring is an alternative, but when you're on a tight budget losing N/2 disk capacity is painful. - The default scheme of one filesystem per user runs into problems with linux NFS clients; on one linux system, with 1300 logins, we already have to do symlinks with amd because linux systems can't mount more than about 255 filesystems at once. We can of course just have one filesystem exported, and make /home/student a subdirectory of that, but then we run into problems with quotas -- and on an undergraduate fileserver, quotas aren't optional! well, if the mount limitation is imposed by the linux kernel you might consider trying running linux in zone on solaris (via BrandZ). Since BrandZ allows you to execute linux programs on a solaris kernel you shoudn't have a problem with limits imposed by the linux kernel. brandz currently ships in an solaris express (or solaris express community release) build snv_49 or later. you can find more info on brandz here: http://opensolaris.org/os/community/brandz/ ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs: zvols minor #'s changing and causing probs w/ volumes
if your running solaris 10 or an early nevada build then it's possible your hitting this bug (which i fixed in build 35): 4976415 devfsadmd for zones could be smarter when major numbers change if you're running a recent nevada build then this could be a new issue. so what version of solaris are you running? thanks ed On Tue, Oct 31, 2006 at 03:26:06PM -0700, Jason Gallagher - Sun Microsystems wrote: Team, **Please respond to me and my coworker listed in the Cc, since neither one of us are on this alias** QUICK PROBLEM DESCRIPTION: Cu created a dataset which contains all the zvols for a particular zone. The zone is then given access to all the zvols in the dataset using a match statement in the zoneconfig (see long problem description for details). After the initial boot of the zone everything appears fine and the localzone zvol dev files match the globalzone zvol dev files. Upon the reboot of the box (following the initial that had no problems) the Minor numbers of the zvols are different between the local zone and the Global zone and some of the volumes are not mounted in the correct location and some volumes can't even be mounted. LONG PROBLEM DESCRIPTION: All the details are listed below, authored by the customer: Here is a summary of the problem we are experiencing. Just a quick little background. We had hoped to use ZFS for our filesystems but as a result of our backup system not fully supporting ZFS yet we are stuck with using UFS for now. In an effort to make migrating to ZFS in the future that much easier and to be able to take advantage of some of the other features ZFS gives us we have decided to use ZFS volumes and create UFS filesystems on top of them. We have created a dataset which contain all the zvols for a particular zone. The zone is then given access to all the zvols in the dataset using a match statement in the zoneconfig. [/users/mdey] [EMAIL PROTECTED] zonecfg -z biscotti info zonepath: /zones/biscotti autoboot: false pool: inherit-pkg-dir: dir: /lib inherit-pkg-dir: dir: /platform inherit-pkg-dir: dir: /sbin inherit-pkg-dir: dir: /usr inherit-pkg-dir: dir: /opt net: address: 10.1.33.91 physical: hme0 device match: /dev/zvol/rdsk/d1000pool/biscotti-vols/* device match: /dev/zvol/dsk/d1000pool/biscotti-vols/* There are 4 volumes in the dataset [/] [EMAIL PROTECTED] zfs list -r d1000pool/biscotti-vols NAME USED AVAIL REFER MOUNTPOINT d1000pool/biscotti-vols 400M 197G49K none d1000pool/biscotti-vols/vol1 11.2M 197G 11.2M - d1000pool/biscotti-vols/vol2 10.7M 197G 10.7M - d1000pool/biscotti-vols/vol3 11.0M 197G 11.0M - d1000pool/biscotti-vols/vol4 10.5M 197G 10.5M - The volumes are mounted in the zone via the zones vfstab /dev/zvol/dsk/d1000pool/biscotti-vols/vol1 /dev/zvol/rdsk/d1000pool/biscotti-vols/vol1 /vol1 ufs 2 yes - /dev/zvol/dsk/d1000pool/biscotti-vols/vol2 /dev/zvol/rdsk/d1000pool/biscotti-vols/vol2 /vol2 ufs 2 yes - /dev/zvol/dsk/d1000pool/biscotti-vols/vol3 /dev/zvol/rdsk/d1000pool/biscotti-vols/vol3 /vol3 ufs 2 yes - /dev/zvol/dsk/d1000pool/biscotti-vols/vol4 /dev/zvol/rdsk/d1000pool/biscotti-vols/vol4 /vol4 ufs 2 yes - After the initial boot of the zone everything appears fine and the localzone zvol dev files match the globalzone zvol dev files. [/] [EMAIL PROTECTED] ls -lL /zones/biscotti/dev/zvol/rdsk/d1000pool/biscotti-vols total 0 crw--- 1 root sys 256, 2 Oct 23 21:23 vol1 crw--- 1 root sys 256, 3 Oct 23 21:23 vol2 crw--- 1 root sys 256, 4 Oct 23 21:23 vol3 crw--- 1 root sys 256, 5 Oct 23 21:23 vol4 [/] [EMAIL PROTECTED] ls -lL /dev/zvol/rdsk/d1000pool/biscotti-vols total 0 crw--- 1 root sys 256, 2 Oct 23 21:02 vol1 crw--- 1 root sys 256, 3 Oct 23 21:02 vol2 crw--- 1 root sys 256, 4 Oct 23 21:02 vol3 crw--- 1 root sys 256, 5 Oct 23 21:02 vol4 I login to the zone and create a file in each mount to keep track of which volume is which. [/] [EMAIL PROTECTED] ls -l /vol?/vol? -rw--- 1 root root 10485760 Oct 23 21:38 /vol1/vol1 -rw--- 1 root root 10485760 Oct 23 21:38 /vol2/vol2 -rw--- 1 root root 10485760 Oct 23 21:38 /vol3/vol3 -rw--- 1 root root 10485760 Oct 23 21:38 /vol4/vol4 I then create a new volume vol5 then shutdown the zone and then reboot the box with an init 6 Upon the reboot of the box the Minor numbers of the zvols are different between the local zone and the Global zone. [/zones/biscotti/dev/zvol/rdsk/d1000pool/biscotti-vols] [EMAIL PROTECTED] ls -lL total 0 crw--- 1 root sys
Re: [zfs-discuss] zfs: zvols minor #'s changing and causing probs w/ volumes
i think that this fix may be being backported as part of the brandz project backport, but i don't think anyone is backporting it outside of that. you might want to add a new call record and open a subCR if you need this to be backported. the workaround is just what you've already discovered. delete any old nodes before booting the zone. ed On Tue, Oct 31, 2006 at 07:27:44PM -0700, David I Radden wrote: Thanks Ed. The ticket shows the customer running Solaris 10. Do you know if the fix will be incorporated in an S10 update or patch? Or possibly an S10 workaround made available? Thanks again! Dave Radden x74861 --- Edward Pilatowicz wrote On 10/31/06 18:53,: if your running solaris 10 or an early nevada build then it's possible your hitting this bug (which i fixed in build 35): 4976415 devfsadmd for zones could be smarter when major numbers change if you're running a recent nevada build then this could be a new issue. so what version of solaris are you running? thanks ed On Tue, Oct 31, 2006 at 03:26:06PM -0700, Jason Gallagher - Sun Microsystems wrote: Team, **Please respond to me and my coworker listed in the Cc, since neither one of us are on this alias** QUICK PROBLEM DESCRIPTION: Cu created a dataset which contains all the zvols for a particular zone. The zone is then given access to all the zvols in the dataset using a match statement in the zoneconfig (see long problem description for details). After the initial boot of the zone everything appears fine and the localzone zvol dev files match the globalzone zvol dev files. Upon the reboot of the box (following the initial that had no problems) the Minor numbers of the zvols are different between the local zone and the Global zone and some of the volumes are not mounted in the correct location and some volumes can't even be mounted. LONG PROBLEM DESCRIPTION: All the details are listed below, authored by the customer: Here is a summary of the problem we are experiencing. Just a quick little background. We had hoped to use ZFS for our filesystems but as a result of our backup system not fully supporting ZFS yet we are stuck with using UFS for now. In an effort to make migrating to ZFS in the future that much easier and to be able to take advantage of some of the other features ZFS gives us we have decided to use ZFS volumes and create UFS filesystems on top of them. We have created a dataset which contain all the zvols for a particular zone. The zone is then given access to all the zvols in the dataset using a match statement in the zoneconfig. [/users/mdey] [EMAIL PROTECTED] zonecfg -z biscotti info zonepath: /zones/biscotti autoboot: false pool: inherit-pkg-dir: dir: /lib inherit-pkg-dir: dir: /platform inherit-pkg-dir: dir: /sbin inherit-pkg-dir: dir: /usr inherit-pkg-dir: dir: /opt net: address: 10.1.33.91 physical: hme0 device match: /dev/zvol/rdsk/d1000pool/biscotti-vols/* device match: /dev/zvol/dsk/d1000pool/biscotti-vols/* There are 4 volumes in the dataset [/] [EMAIL PROTECTED] zfs list -r d1000pool/biscotti-vols NAME USED AVAIL REFER MOUNTPOINT d1000pool/biscotti-vols 400M 197G49K none d1000pool/biscotti-vols/vol1 11.2M 197G 11.2M - d1000pool/biscotti-vols/vol2 10.7M 197G 10.7M - d1000pool/biscotti-vols/vol3 11.0M 197G 11.0M - d1000pool/biscotti-vols/vol4 10.5M 197G 10.5M - The volumes are mounted in the zone via the zones vfstab /dev/zvol/dsk/d1000pool/biscotti-vols/vol1 /dev/zvol/rdsk/d1000pool/biscotti-vols/vol1 /vol1 ufs 2 yes - /dev/zvol/dsk/d1000pool/biscotti-vols/vol2 /dev/zvol/rdsk/d1000pool/biscotti-vols/vol2 /vol2 ufs 2 yes - /dev/zvol/dsk/d1000pool/biscotti-vols/vol3 /dev/zvol/rdsk/d1000pool/biscotti-vols/vol3 /vol3 ufs 2 yes - /dev/zvol/dsk/d1000pool/biscotti-vols/vol4 /dev/zvol/rdsk/d1000pool/biscotti-vols/vol4 /vol4 ufs 2 yes - After the initial boot of the zone everything appears fine and the localzone zvol dev files match the globalzone zvol dev files. [/] [EMAIL PROTECTED] ls -lL /zones/biscotti/dev/zvol/rdsk/d1000pool/biscotti-vols total 0 crw--- 1 root sys 256, 2 Oct 23 21:23 vol1 crw--- 1 root sys 256, 3 Oct 23 21:23 vol2 crw--- 1 root sys 256, 4 Oct 23 21:23 vol3 crw--- 1 root sys 256, 5 Oct 23 21:23 vol4 [/] [EMAIL PROTECTED] ls -lL /dev/zvol/rdsk/d1000pool/biscotti-vols total 0 crw--- 1 root sys 256, 2 Oct 23 21:02 vol1 crw--- 1 root sys 256, 3 Oct 23 21:02 vol2 crw--- 1 root sys 256, 4 Oct 23 21:02 vol3 crw--- 1 root sys 256, 5 Oct 23
Re: [zfs-discuss] zfs questions from Sun customer
zfs should work fine with disks under the control of solaris mpxio. i don't know about any of the other multipathing solutions. if you're trying to use a device that's controlled by another multipathing solution, you might want to try specifying the full path to the device, ex: zpool create -f extdisk /dev/foo2/vpath1c ed On Wed, Jul 26, 2006 at 09:47:03AM -0600, David Curtis wrote: Please reply to [EMAIL PROTECTED] Background / configuration ** zpool will not create a storage pool on fibre channel storage. I'm attached to an IBM SVC using the IBMsdd driver. I have no problem using SVM metadevices and UFS on these devices. List steps to reproduce the problem(if applicable): Build Solaris 10 Update 2 server Attach to an external storage array via IBM SVC Load lpfc driver (6.02h) Load IBMsdd software (1.6.1.0-2) Attempt to use zpool create to make a storage pool: # zpool create -f extdisk vpath1c internal error: unexpected error 22 at line 446 of ../common/libzfs_pool.c * reply to customer It looks like you have an additional unwanted software layer between Solaris and the disk hardware. Currently ZFS needs to access the physical device to work correctly. Something like: # zpool create -f extdisk c5t0d0 c5t1d0 .. Let me know if this works for you. * follow-up question from customer Yes, using the c#t#d# disks work, but anyone using fibre-channel storage on somethink like IBM Shark or EMC Clariion will want multiple paths to disk using either IBMsdd, EMCpower or Solaris native MPIO. Does ZFS work with any of these fibre channel multipathing drivers? Thanks for any assistance you can provide. -- David Curtis - TSESun Microsystems 303-272-6628 Enterprise Services [EMAIL PROTECTED] OS / Installation Support Monday to Friday 9:00 AM to 6:00 PM Mountain ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs questions from Sun customer
zfs depends on ldi_get_size(), which depends on the device being accessed exporting one of the properties below. i guess the the devices generated by IBMsdd and/or EMCpower/or don't generate these properties. ed On Wed, Jul 26, 2006 at 01:53:31PM -0700, Eric Schrock wrote: On Wed, Jul 26, 2006 at 02:11:44PM -0600, David Curtis wrote: Eric, Here is the output: # ./dtrace2.dtr dtrace: script './dtrace2.dtr' matched 4 probes CPU IDFUNCTION:NAME 0 17816 ldi_open_by_name:entry /dev/dsk/vpath1c 0 16197 ldi_get_otyp:return 0 0 15546ldi_prop_exists:entry Nblocks 0 15547 ldi_prop_exists:return 0 0 15546ldi_prop_exists:entry nblocks 0 15547 ldi_prop_exists:return 0 0 15546ldi_prop_exists:entry Size 0 15547 ldi_prop_exists:return 0 0 15546ldi_prop_exists:entry size 0 15547 ldi_prop_exists:return 0 OK, this definitely seems to be a driver bug. I'm no driver expert, but it seems that exporting none of the above properties is a problem - ZFS has no idea how big this disk is! Perhaps someone more familiar with the DDI/LDI interfaces can explain the appropriate way to implement these on the driver end. But at this point its safe to say that ZFS isn't doing anything wrong. The layered driver is exporting a device in /dev/dsk, but not exporting basic information (such as the size or number of blocks) that ZFS (and potentially the rest of Solaris) needs to interact with the device. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 'zpool history' proposal
On Wed, May 03, 2006 at 03:05:25PM -0700, Eric Schrock wrote: On Wed, May 03, 2006 at 02:47:57PM -0700, eric kustarz wrote: Jason Schroeder wrote: eric kustarz wrote: The following case is about to go to PSARC. Comments are welcome. eric To piggyback on earlier comments re: adding hostname and user: What is the need for zpool history to distinguish zfs commands that were executed by priviledged users in non-global zones for those datasets under ngz ownership? I personally don't see a need to distinguish between zones. However, with delegated administration, it would be nice to know who did (say) destroy that file system - the local root or some remote user. Keep in mind that one username (or uid) in a local zone is different from the same username in the global zone, since they can be running different name services. In the simplest example, you could have an entry that said something like: root zfs destroy tank/foo And if you were using datasets delegated to local zones, you wouldn't know if that was 'root' in the global zone or 'root' in the local zone. If you are going to log a user at all, you _need_ to log the zone name as well. Even without usernames, it would probably be useful to know that a particular action was done in a particular zone. Imagine a service provider with several zones delegated to different users, and each user has their own portion of the namespace. At some point, you get a servicecall from a customer saying someone deleted my filesystems You could look at the zpool history, but without a zone name, you wouldn't know if was your fault (from the global zone) or theirs (from the local zone). - Eric why don't you see a need to distinguish between zones? in most cases (but not all) a zone administrator doesn't deal with pools. they deal with datasets allocated to their zone, and for the same reasons that the global zone administrator might want access to zfs command histories, a zone administrator might want access to zfs command histories that apply to datasets allocated to their zones. which makes me wonder if perhaps zfs command history buffers should also be supported on datasets allocated to zones? or perhaps a zone administrator should should be able to view a subset of the zfs command history, specifically the transactions that affect datasets allocated to their zone? ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss