Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
We just picked up the fastest SSD we could in the local biccamera, which turned out to be a CSSDーSM32NI, with supposedly 95MB/s write speed. I put it in place, and replaced the slog over: 0m49.173s 0m48.809s So, it is slower than the CF test. This is disappointing. Everyone else seems to use Intel X25-M, which have a write-speed of 170MB/s (2nd generation) so perhaps that is why it works better for them. It is curious that it is slower than the CF card. Perhaps because it shares with so many other SATA devices? Oh and we'll probably have to get a 3.5 frame for it, as I doubt it'll stay standing after the next earthquake. :) Lund Jorgen Lundman wrote: This thread started over in nfs-discuss, as it appeared to be an nfs problem initially. Or at the very least, interaction between nfs and zil. Just summarising speeds we have found when untarring something. Always in a new/empty directory. Only looking at write speed. read is always very fast. The reason we started to look at this was because the 7 year old netapp being phased out, could untar the test file in 11 seconds. The x4500/x4540 Suns took 5 minutes. For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I had lying around, but it can be downloaded here if you want the same test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz) The command executed generally, is: # mkdir .test34 time gtar --directory=.test34 -zxf /tmp/MTOS-4.261-ja.tar.gz Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3 0m11.114s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4 5m11.654s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3 8m55.911s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4 10m32.629s Just untarring the tarball on the x4500 itself: : x4500 OpenSolaris svn117 server 0m0.478s : x4500 Solaris 10 10/08 server 0m1.361s So ZFS itself is very fast. Replacing NFS with different protocols, identical setup, just changing tar with rsync, and nfsd with sshd. The baseline test, using: rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4 3m44.857s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh 0m1.387s So, get rid of nfsd and it goes from 3 minutes to 1 second! Lets share it with smb, and mount it: OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar 0m24.480s Neat, even SMB can beat nfs in default settings. This would then indicate to me that nfsd is broken somehow, but then we try again after only disabling ZIL. Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4 0m8.453s 0m8.284s 0m8.264s Nice, so this is theoretically the fastest NFS speeds we can reach? We run postfix+dovecot for mail, which probably would be safe to not use ZIL. The other type is FTP/WWW/CGI, which has more active writes/updates. Probably not as good. Comments? Enable ZIL, but disable zfscache (Just as a test, I have been told disabling zfscache is far more dangerous). Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4 0m45.139s Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a whole lot about slog. First I tried creating a 2G slog on the boot mirror: Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4 1m59.970s Some improvements. For a lark, I created a 2GB file in /tmp/ and changed the slog to that. (I know, having the slog in volatile RAM is pretty much the same as disabling ZIL. But it should give me theoretical maximum speed with ZIL enabled right?). Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4 0m8.916s Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we would test with a CF card attached. Alas the 600X (92MB/s) card are not out until next month, rats! So, we bought a 300X (40MB/s) card. Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4 0m26.566s Not too bad really. But you have to reboot to see a CF card, fiddle with BIOS for the boot order etc. Just not an easy add on a live system. A SATA emulated SSD DISK can be hot-swapped. Also, I learned an interesting lesson about rebooting with slog at /tmp/junk. I am hoping to pick up a SSD SATA device today and see what speeds we get out of that. That rsync (1s) vs nfs(8s) I can accept as over-head on a much more complicated protocol, but why would it take 3 minutes to write the same data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog is default, but both writing the same way. Does nfsd add FD_SYNC to every close regardless as to whether the application did or not? This I have not yet wrapped my head around. For example, I know rsync
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Everyone else should be using the Intel X25-E. There's a massive difference between the M and E models, and for a slog it's IOPS and low latency that you need. I've heard that Sun use X25-E's, but I'm sure that original reports had them using STEC. I have a feeling the 2nd generation X25-E's are going to give STEC a run for their money though. If I were you, I'd see if you can get your hands on an X25-E for evaluation purposes. Also, if you're just running NFS over gigabit ethernet, a single X25-E may be enough, but at around 90MB/s sustained performance for each, you might need to stripe a few of them to match the speeds your Thumper is capable of. We're not running an x4500, but we were lucky enough to get our hands on some PCI 512MB nvram cards a while back, and I can confirm they make a huge difference to NFS speeds - for our purposes they're identical to ramdisk slog performance. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool import hungs up forever...
after several errors on QLogic HBA pool cache was damaged and zfs cannot import pool, there is no any disk or cpu activity during import... #uname -a SunOS orion 5.11 snv_111b i86pc i386 i86pc # zpool import pool: data1 id: 6305414271646982336 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: data1 ONLINE c14t0d0 ONLINE and after #zpool import -f data1 terminal still waiting forever. also after # zdb -e data1 Uberblock magic = 00bab10c version = 6 txg = 2682808 guid_sum = 14250651627001887594 timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009 Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects terminal still waiting forever too, and thereis no helpful messages in system log, and there are no ideas how to recover this pool, any help will be greatly appreciated... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hungs up forever...
On 29.07.09 13:04, Pavel Kovalenko wrote: after several errors on QLogic HBA pool cache was damaged and zfs cannot import pool, there is no any disk or cpu activity during import... #uname -a SunOS orion 5.11 snv_111b i86pc i386 i86pc # zpool import pool: data1 id: 6305414271646982336 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: data1 ONLINE c14t0d0 ONLINE and after #zpool import -f data1 terminal still waiting forever. Try to use echo 0tpid of zpool::pid2proc|::walk thread|::findstack -v | mdb -k to find out what it is doing. see fmdump -eV output for fresh error reports from ZFS also after # zdb -e data1 Uberblock magic = 00bab10c version = 6 txg = 2682808 guid_sum = 14250651627001887594 timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009 Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects terminal still waiting forever too, and thereis no helpful messages in system log, Does zdb -e -t 2682807 data1 make any difference? victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Hi James, I'll not reply in line since the forum software is completely munging your post. On the X25-E I believe there is cache, and it's not backed up. While I haven't tested it, I would expect the X25-E to have the cache turned off while used as a ZIL. The 2nd generation X25-E announced by Intel does have 'safe storage' as they term it. I believe it has more cache, a faster write speed, and is able to guarantee that the contents of the cache will always make it to stable storage. My guess would be that since it's designed for the server market, the cache on the X25-E would be irrelevant - the device is going to honor flush requests and the ZIL will be stable. I suspect that the X25-E G2 will ignore flush requests, with Intel's engineers confident that the data in the cache is safe. The NVRAM card we're using is a MM-5425, identical to the one used in the famous 'blog on slogs', I was lucky to get my hands on a pair and some drivers :-) I think the raid controller approach is a nice idea though, and should work just as well. I'd love an 80GB ioDrive to use as our ZIL, I think that's the best hardware solution out there right now, but until Fusion-IO release Solaris drivers I'm going to have to stick with my 512MB... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motherboard for home zfs/solaris file server
Hi, thank you so much for this post. This is exactly what I was looking for. I've been eyeing the M3A76-CM board, but will now look at 78 and M4A as well. Actually, not that many Asus M3A, let alone M4A boards show up yet on the OpenSolaris HCL, so I'd like to encourage everyone to share their hardware experience by clicking on the submit hardware link on: http://www.sun.com/bigadmin/hcl/data/os/ I've done it a couple of times and it's really just a matter of 5-10 minutes where you can help others know if a certain component works or not or if a special driver or /etc/driver_aliases setting is required. I'm also interested in getting the power down. Right now, I have the Athlon X2 5050e (45W TDP) on my list, but I'd also like to know more about the possibilities of the Athlon II X2 250 and whether it has better potential for power savings. Neal, the M3A78 seems to have a RealTek RTL8111/8168B NIC chip. I pulled this off a Gentoo Wiki, because strangely this information doesn't show up on the Asus website. Also, thanks for the CF to pata hint for the root pool mirror. Will try to find fast CFs to boot from. The performance problems you see when writing may be related to master/slave issues, but I'm not a good PC tweaker to back that up. Cheers, Constantin F. Wessels wrote: Hi, I'm using asus m3a78 boards (with the sb700) for opensolaris and m2a* boards (with the sb600) for linux some of them with 4*1GB and others with 4*2Gb ECC memory. Ecc faults will be detected and reported. I tested it with a small tungsten light. By moving the light source slowly towards the memory banks you'll heat them up in a controlled way and at a certain point bit flips will occur. I recommend you to go for a m4a board since they support up to 16 GB. I don't know if you can run opensolaris without a videocard after installation I think you can disable the halt on no video card in the bios. But Simon Breden had some trouble with it, see his homeserver blog. But you can go for one of the three m4a boards with a 780g onboard. Those will give you 2 pci-e x16 connectors. I don't think the onboard nic is supported. I always put an intel (e1000) in, just to prevent any trouble. I don't have any trouble with the sb700 in ahci mode. Hotplugging works like a charm. Transfering a couple of GB's over esata takes considerable less time than via usb. I have a pata to dual cf adapter and two industrial 16gb cf cards as mirrored root pool. It takes for ever to install nevada, at least 14 hours. I suspect the cf cards lack caches. But I don't update that regularly, still on snv104. And have 2 mirrors and a hot spare. The sixth port is an esata port I use to transfer large amounts of data. This system consumes about 73 watts idle and 82 under load i/o load. (5 disks , a separate nic ,8 gb ram and a be2400 all using just 73 watts!!!) Please note that frequency scaling is only supported on the K10 architecture. But don't expect to much power saving from it. A lower voltage yields far greater savings than a lower frequency. In september I'll do a post about the afore mentioned M4A boards and an lsi sas controller in one of the pcie x16 slots. -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hungs up forever...
fortunately, after several hours terminal went back -- # zdb -e data1 Uberblock magic = 00bab10c version = 6 txg = 2682808 guid_sum = 14250651627001887594 timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009 Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects capacity operations bandwidth errors descriptionused avail read write read write read write cksum data1 5.74T 6.99T 772 0 96.0M 0 0 091 /dev/dsk/c14t0d05.74T 6.99T 772 0 96.0M 0 0 0 223 # i've tried to run zdb -e -t 2682807 data1 and #echo 0t::pid2proc|::walk thread|::findstack -v | mdb -k shows stack pointer for thread fbc2cca0: fbc4d980 [ fbc4d980 _resume_from_idle+0xf1() ] fbc4d9b0 swtch+0x147() fbc4da40 sched+0x3fd() fbc4da70 main+0x437() fbc4da80 _locore_start+0x92() and #fmdump -eV shows checksum errors, such as Jul 28 2009 11:17:35.386268381 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0x1baa23c52ce01c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x578154df5f3260c0 vdev = 0x6e4327476e17daaa (end detector) pool = data1 pool_guid = 0x578154df5f3260c0 pool_context = 2 pool_failmode = wait vdev_guid = 0x6e4327476e17daaa vdev_type = disk vdev_path = /dev/dsk/c14t0d0p0 vdev_devid = id1,s...@n2661000612646364/q parent_guid = 0x578154df5f3260c0 parent_type = root zio_err = 50 zio_offset = 0x2313d58000 zio_size = 0x4000 zio_objset = 0x0 zio_object = 0xc zio_level = 0 zio_blkid = 0x0 __ttl = 0x1 __tod = 0x4a6ea60f 0x1705fcdd Jul 28 2009 11:17:35.386268179 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0x1baa23c52ce01c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x578154df5f3260c0 vdev = 0x6e4327476e17daaa (end detector) pool = data1 pool_guid = 0x578154df5f3260c0 pool_context = 2 pool_failmode = wait vdev_guid = 0x6e4327476e17daaa vdev_type = disk vdev_path = /dev/dsk/c14t0d0p0 vdev_devid = id1,s...@n2661000612646364/q parent_guid = 0x578154df5f3260c0 parent_type = root zio_err = 50 zio_offset = 0x5c516eac000 zio_size = 0x4000 zio_objset = 0x0 zio_object = 0xc zio_level = 0 zio_blkid = 0x0 __ttl = 0x1 __tod = 0x4a6ea60f 0x1705fc13 can i hope, that some data will recover after several hours with zpool import -f data1 command? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] resizing zpools by growing LUN
Hi all, I need to know if it is possible to expand the capacity of a zpool without loss of data by growing the LUN (2TB) presented from an HP EVA to a Solaris 10 host. I know that there is a possible way in Solaris Express Community Edition, b117 with the autoexpand property. But I still work with Solaris 10 U7. Besides, when will this feature be integrated in Solaris 10? Is there a workaround? I have checked it out with format tool - without effects. Thanks for any info. Jan -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Set New File/Folder ZFS ACLs Automatically through Samba?
Jeff, On Tue, 28 Jul 2009, Jeff Hulen wrote: Do any of you know how to set the default ZFS ACLs for newly created files and folders when those files and folders are created through Samba? I want to have all new files and folders only inherit extended (non-trivial) ACLs that are set on the parent folders. But when a file is created through samba on the zfs file system, it gets mode 744 (trivial) added to it. For directories, it gets mode 755 added to it. I've tried everything I could find and think of: 1.) Setting a umask. 2.) Editing /etc/sfw/smb.conf 'force create mode' and 'force directory mode. Then `svcadm restart samba`. 3.) Adding trivial inheritable ACLs to the parent folder. Changes 1 and 2 had no effect. In number 3 I got folders to effectively do what I want, but not files. I set the ACLs of the parent to: drwx--+ 24 AD+administrator AD+records2132 Jul 28 12:01 records/ user:AD+administrator:rwxpdDaARWcCos:fdi---:allow user:AD+administrator:rwxpdDaARWcCos:--:allow group:AD+records:rwxpd-aARWc--s:fdi---:allow group:AD+records:rwxpd-aARWc--s:--:allow group:AD+release:r-x---a-R-c---:--:allow owner@:rwxp---A-W-Co-:fd:allow group@:rwxp--:fd:deny everyone@:rwxp---A-W-Co-:fd:deny Then new directories and files get created like this from a windows workstation connected to the server: drwx--+ 2 AD+testuser AD+domain users 2 Jul 28 12:01 test user:AD+administrator:rwxpdDaARWcCos:fdi---:allow user:AD+administrator:rwxpdDaARWcCos:--:allow group:AD+records:rwxpd-aARWc--s:fdi---:allow group:AD+records:rwxpd-aARWc--s:--:allow owner@:rwxp---A-W-Co-:fdi---:allow owner@:---A-W-Co-:--:allow group@:rwxp--:fdi---:deny group@:--:--:deny everyone@:rwxp---A-W-Co-:fdi---:deny everyone@:---A-W-Co-:--:deny owner@:--:--:deny owner@:rwxp---A-W-Co-:--:allow group@:-w-p--:--:deny group@:r-x---:--:allow everyone@:-w-p---A-W-Co-:--:deny everyone@:r-x---a-R-c--s:--:allow -rwxr--r--+ 1 AD+testuser AD+domain users 0 Jul 28 12:01 test.txt user:AD+administrator:rwxpdDaARWcCos:--:allow group:AD+records:rwxpd-aARWc--s:--:allow owner@:---A-W-Co-:--:allow group@:--:--:deny everyone@:---A-W-Co-:--:deny owner@:--:--:deny owner@:rwxp---A-W-Co-:--:allow group@:-wxp--:--:deny group@:r-:--:allow everyone@:-wxp---A-W-Co-:--:deny everyone@:r-a-R-c--s:--:allow I need group AD+release to have read-only access to only specific files within records. I could set that up, but any new files or folders that are created will be viewable by AD+release. That would not be acceptable. Do any of you know how to set the samba file/folder creation ACLS on ZFS file systems? Or do you have something I could try? The following setup works quite well for us with a self compiled Samba 3.0.34 taken from the SFW source tree. The only problem we ran into was that Microsoft Office sometimes seems to set permissions on files in an, at least for me, unpredictable way. smb.conf: ... [data] ; ; public fileserver share ; path = /smb/data comment = user and group directories public = no writable = yes browseable = yes vfs objects = zfsacl inherit permissions = yes inherit acls = yes store dos attributes = yes hide dot files = no nfs4: mode = simple nfs4: acedup = merge zfsacl: acesort = dontcare ; delete readonly = yes ; ; set to no else Microsoft Excel/Word cause permission problems ; map archive = no map hidden = no map read only = no map system = no Some zfs properties of the top-level zfs which get inherited to the children NAME PROPERTY VALUESOURCE smb snapdir visible local smb aclmode groupmaskdefault smb aclinherit restricted default smb casesensitivity sensitive- Now for every group directory reflecting a particular department such as kizinfra we set permissions as # ls -ldV kizinfra
Re: [zfs-discuss] zpool import hungs up forever...
On 29.07.09 14:42, Pavel Kovalenko wrote: fortunately, after several hours terminal went back -- # zdb -e data1 Uberblock magic = 00bab10c version = 6 txg = 2682808 guid_sum = 14250651627001887594 timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009 Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects capacity operations bandwidth errors descriptionused avail read write read write read write cksum data1 5.74T 6.99T 772 0 96.0M 0 0 091 /dev/dsk/c14t0d05.74T 6.99T 772 0 96.0M 0 0 0 223 # So we know that there are some checksum errors there but at least zdb was able to open pool in read-only mode. i've tried to run zdb -e -t 2682807 data1 and #echo 0t::pid2proc|::walk thread|::findstack -v | mdb -k This is wrong - you need to put PID of the 'zpool import data1' process right after '0t'. and #fmdump -eV shows checksum errors, such as Jul 28 2009 11:17:35.386268381 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0x1baa23c52ce01c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x578154df5f3260c0 vdev = 0x6e4327476e17daaa (end detector) pool = data1 pool_guid = 0x578154df5f3260c0 pool_context = 2 pool_failmode = wait vdev_guid = 0x6e4327476e17daaa vdev_type = disk vdev_path = /dev/dsk/c14t0d0p0 vdev_devid = id1,s...@n2661000612646364/q parent_guid = 0x578154df5f3260c0 parent_type = root zio_err = 50 zio_offset = 0x2313d58000 zio_size = 0x4000 zio_objset = 0x0 zio_object = 0xc zio_level = 0 zio_blkid = 0x0 __ttl = 0x1 __tod = 0x4a6ea60f 0x1705fcdd This tells us that object 0xc in metabjset (objset 0x0) is corrupted. So to get more details you can do the following: zdb -e - data1 zdb -e -bbcs data1 victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hungs up forever...
I recently noticed that importing larger pools that are occupied by large amounts of data can do zpool import for several hours while zpool iostat only showing some random reads now and then and iostat -xen showing quite busy disk usage, It's almost it goes thru every bit in pool before it goes thru. Somebody said that zpool import got faster on snv118, but I don't have real information on that yet. Yours Markus Kovero -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Victor Latushkin Sent: 29. heinäkuuta 2009 14:05 To: Pavel Kovalenko Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] zpool import hungs up forever... On 29.07.09 14:42, Pavel Kovalenko wrote: fortunately, after several hours terminal went back -- # zdb -e data1 Uberblock magic = 00bab10c version = 6 txg = 2682808 guid_sum = 14250651627001887594 timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009 Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects capacity operations bandwidth errors descriptionused avail read write read write read write cksum data1 5.74T 6.99T 772 0 96.0M 0 0 0 91 /dev/dsk/c14t0d05.74T 6.99T 772 0 96.0M 0 0 0 223 # So we know that there are some checksum errors there but at least zdb was able to open pool in read-only mode. i've tried to run zdb -e -t 2682807 data1 and #echo 0t::pid2proc|::walk thread|::findstack -v | mdb -k This is wrong - you need to put PID of the 'zpool import data1' process right after '0t'. and #fmdump -eV shows checksum errors, such as Jul 28 2009 11:17:35.386268381 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0x1baa23c52ce01c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x578154df5f3260c0 vdev = 0x6e4327476e17daaa (end detector) pool = data1 pool_guid = 0x578154df5f3260c0 pool_context = 2 pool_failmode = wait vdev_guid = 0x6e4327476e17daaa vdev_type = disk vdev_path = /dev/dsk/c14t0d0p0 vdev_devid = id1,s...@n2661000612646364/q parent_guid = 0x578154df5f3260c0 parent_type = root zio_err = 50 zio_offset = 0x2313d58000 zio_size = 0x4000 zio_objset = 0x0 zio_object = 0xc zio_level = 0 zio_blkid = 0x0 __ttl = 0x1 __tod = 0x4a6ea60f 0x1705fcdd This tells us that object 0xc in metabjset (objset 0x0) is corrupted. So to get more details you can do the following: zdb -e - data1 zdb -e -bbcs data1 victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hungs up forever...
Victor, after # ps -ef | grep zdb | grep -v grep root 3281 1683 1 14:22:09 pts/2 8:57 zdb -e -t 2682807 data1 i've inserted pid after 0t: # echo 0t3281::pid2proc|::walk thread|::findstack -v | mdb -k mdb-k0t3281 and got a couple of records: stack pointer for thread ff02017ad700: ff0008ce8a50 [ ff0008ce8a50 _resume_from_idle+0xf1() ] ff0008ce8a80 swtch+0x147() ff0008ce8ac0 sema_p+0x1d9(ff01df7a9b70) ff0008ce8af0 biowait+0x76(ff01df7a9ab0) ff0008ce8bf0 default_physio+0x3d3(f7a148f0, 0, d70207, 40, f7a14130, ff0008ce8e80) ff0008ce8c30 physio+0x25(f7a148f0, 0, d70207, 40, f7a14130, ff0008ce8e80) ff0008ce8c80 sdread+0x150(d70207, ff0008ce8e80, ff01e94927c8) ff0008ce8cb0 cdev_read+0x3d(d70207, ff0008ce8e80, ff01e94927c8 ) ff0008ce8d30 spec_read+0x270(ff020de0aa00, ff0008ce8e80, 0, ff01e94927c8, 0) ff0008ce8da0 fop_read+0x6b(ff020de0aa00, ff0008ce8e80, 0, ff01e94927c8, 0) ff0008ce8f00 pread+0x22c(8, 3e98000, 2, 291cf9a) ff0008ce8f10 sys_syscall+0x17b() stack pointer for thread ff01e92dd8a0: ff000904cd00 [ ff000904cd00 _resume_from_idle+0xf1() ] ff000904cd30 swtch+0x147() ff000904cd90 cv_wait_sig_swap_core+0x170(ff01e92dda76, ff01e92dda78, 0) ff000904cdb0 cv_wait_sig_swap+0x18(ff01e92dda76, ff01e92dda78) ff000904ce20 cv_waituntil_sig+0x135(ff01e92dda76, ff01e92dda78, 0 , 0) ff000904cec0 lwp_park+0x157(0, 0) ff000904cf00 syslwp_park+0x31(0, 0, 0) ff000904cf10 sys_syscall+0x17b() stack pointer for thread ff01e92c4e60: ff0008ceed00 [ ff0008ceed00 _resume_from_idle+0xf1() ] ff0008ceed30 swtch+0x147() ff0008ceed90 cv_wait_sig_swap_core+0x170(ff01e92c5036, ff01e92c5038, 0) ff0008ceedb0 cv_wait_sig_swap+0x18(ff01e92c5036, ff01e92c5038) ff0008ceee20 cv_waituntil_sig+0x135(ff01e92c5036, ff01e92c5038, 0 , 0) ff0008ceeec0 lwp_park+0x157(0, 0) ff0008ceef00 syslwp_park+0x31(0, 0, 0) ff0008ceef10 sys_syscall+0x17b() and #zdb -e - data1 zdb-e-_list.txt lists a lot of data objects, that were on the pool: # ls -la zdb-e-_list.txt -rw-r--r-- 1 root root 28863781 Jul 28 21:41 zdb-e-_list.txt I can provide more detailed information by email pkovalenko at mtv.ru, as i don't know any zfs specialist who can help me recover pool. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
On Tue, 28 Jul 2009, Glen Gunselman wrote: # zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT zpool1 40.8T 176K 40.8T 0% ONLINE - # zfs list NAME USED AVAIL REFER MOUNTPOINT zpool1 364K 32.1T 28.8K /zpool1 This is normal, and admittedly somewhat confusing (see CR 6308817). Even if you had not created the additional zfs datasets, it still would have listed 40T and 32T. Here's an example using five 1G disks in a raidz: -bash-3.2# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT tank 4.97G 132K 4.97G 0% ONLINE - -bash-3.2# zfs list NAME USED AVAIL REFER MOUNTPOINT tank 98.3K 3.91G 28.8K /tank The AVAIL column in the zpool output shows 5G, whereas it shows 4G in the zfs list. The difference is the 1G parity. If we use raidz2, we'd expect 2G to be used for the parity, and this is borne out in a quick test using the same disks: -bash-3.2# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT tank 4.97G 189K 4.97G 0% ONLINE - -bash-3.2# zfs list NAME USED AVAIL REFER MOUNTPOINT tank 105K 2.91G 32.2K /tank Contrast that with a five-way mirror: -bash-3.2# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT tank 1016M 73.5K 1016M 0% ONLINE - -bash-3.2# zfs list NAME USED AVAIL REFER MOUNTPOINT tank69K 984M18K /tank Now they both show the pool capacity to be around 1G. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [indiana-discuss] zfs issues?
On 29/07/2009, at 12:00 AM, James Lever wrote: CR 6865661 *HOT* Created, P1 opensolaris/triage-queue zfs scrub rpool causes zpool hang This bug I logged has been marked as related to CR 6843235 which is fixed in snv 119. cheers, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
IIRC zpool list includes the parity drives in the disk space calculation and zfs list doesn't. Terabyte drives are more likely 900-something GB drives thanks to that base-2 vs. base-10 confusion HD manufacturers introduced. Using that 900GB figure I get to both 40TB and 32TB for with and without parity drives. Spares aren't counted. I see format/verify shows the disk size as 931GB Volume name = ascii name = ATA-HITACHI HUA7210S-A90A-931.51GB bytes/sector= 512 sectors = 1953525166 accessible sectors = 1953525133 Part TagFlag First Sector Size Last Sector 0usrwm 256 931.51GB 1953508749 1 unassignedwm 000 2 unassignedwm 000 3 unassignedwm 000 4 unassignedwm 000 5 unassignedwm 000 6 unassignedwm 000 8 reservedwm1953508750 8.00MB 1953525133 I totally over looked the count the spares/don't count the spares issue. When they (the manufacturers) round up and then multiply by 48 the difference between what the sales brochure shows and what you end up with becomes significant. There was a time when manufacturers know about base-2 but those days are long gone. Thanks for the reply, Glen -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
Here is the output from my J4500 with 48 x 1 TB disks. It is almost the exact same configuration as yours. This is used for Netbackup. As Mario just pointed out, zpool list includes the parity drive in the space calculation whereas zfs list doesn't. [r...@xxx /]# zpool status Scoot, Thanks for the sample zpool status output. I will be using the storage for NetBackup, also. (I am booting the X4500 from a SAN - 6140 - and using a SL48 w/2 LTO4 drives.) Glen -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
This is normal, and admittedly somewhat confusing (see CR 6308817). Even if you had not created the additional zfs datasets, it still would have listed 40T and 32T. Mark, Thanks for the examples. Where would I see CR 6308817 my usual search tools aren't find it. Glen -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] feature proposal
What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. -- Andriy Gapon ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On Wed, 29 Jul 2009, Andriy Gapon wrote: Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. It's a nice idea, but zfs filesystems consume memory and have overhead. This would make it trivial for a non-root user (assuming they have permissions) to crush the host under the weight of .. mkdir. $ mkdir -p waste/resources/now/waste/resources/now/waste/resources/now (now make that much longer and put it in a loop) Also, will rmdir call zfs destroy? Snapshots interacting with that could be somewhat unpredictable. What about rm -rf? It'd either require major surgery to userland tools, including every single program that might want to create a directory, or major surgery to the kernel. The former is unworkable, the latter .. scary. -- Andre van Eyssen. mail: an...@purplecow.org jabber: an...@interact.purplecow.org purplecow.org: UNIX for the masses http://www2.purplecow.org purplecow.org: PCOWpix http://pix.purplecow.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
On Wed, 29 Jul 2009, Glen Gunselman wrote: Where would I see CR 6308817 my usual search tools aren't find it. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6308817 Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
Andriy Gapon wrote: What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. This has been brought up before and I thought there was an open CR for it but I can't find it. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. A good use case. Another good one is shared build machine which is similar to the home dir case. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On Wed, July 29, 2009 10:24, Andre van Eyssen wrote: It'd either require major surgery to userland tools, including every single program that might want to create a directory, or major surgery to the kernel. The former is unworkable, the latter .. scary. How about: add a flag (-Z?) to useradd(1M) and usermod(1M) so that if base_dir is on ZFS, then the user's homedir is created as a new file system (assuming -m). Which makes me wonder: is there a programmatic way to determine if a path is on ZFS? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
David Magda wrote: On Wed, July 29, 2009 10:24, Andre van Eyssen wrote: It'd either require major surgery to userland tools, including every single program that might want to create a directory, or major surgery to the kernel. The former is unworkable, the latter .. scary. How about: add a flag (-Z?) to useradd(1M) and usermod(1M) so that if base_dir is on ZFS, then the user's homedir is created as a new file system (assuming -m). Which makes me wonder: is there a programmatic way to determine if a path is on ZFS? st_fstype field of struct stat. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
on 29/07/2009 17:24 Andre van Eyssen said the following: On Wed, 29 Jul 2009, Andriy Gapon wrote: Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. It's a nice idea, but zfs filesystems consume memory and have overhead. This would make it trivial for a non-root user (assuming they have permissions) to crush the host under the weight of .. mkdir. Well, I specifically stated that this property should not be recursive, i.e. it should work only in a root of a filesystem. When setting this property on a filesystem an administrator should carefully set permissions to make sure that only trusted entities can create directories there. 'rmdir' question requires some thinking, my first reaction is it should do zfs destroy... -- Andriy Gapon ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On Wed, 29 Jul 2009, David Magda wrote: Which makes me wonder: is there a programmatic way to determine if a path is on ZFS? statvfs(2) -- Andre van Eyssen. mail: an...@purplecow.org jabber: an...@interact.purplecow.org purplecow.org: UNIX for the masses http://www2.purplecow.org purplecow.org: PCOWpix http://pix.purplecow.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On Wed, 29 Jul 2009, David Magda wrote: Which makes me wonder: is there a programmatic way to determine if a path is on ZFS? Yes, if it's local. Just use df -n $path and it'll spit out the filesystem type. If it's mounted over NFS, it'll just say something like nfs or autofs, though. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On Wed, 29 Jul 2009, Andriy Gapon wrote: Well, I specifically stated that this property should not be recursive, i.e. it should work only in a root of a filesystem. When setting this property on a filesystem an administrator should carefully set permissions to make sure that only trusted entities can create directories there. Even limited to the root of a filesystem, it still gives a user the ability to consume resources rapidly. While I appreciate the fact that it would be restricted by permissions, I can think of a number of usage cases where it could suddenly tank a host. One use that might pop up, for example, would be cache spools - which often contain *many* directories. One runaway and kaboom. We generally use hosts now with plenty of RAM and the per-filesystem overhead for ZFS doesn't cause much concern. However, on a scratch box, try creating a big stack of filesystems - you can end up with a pool that consumes so much memory you can't import it! 'rmdir' question requires some thinking, my first reaction is it should do zfs destroy... .. which will fail if there's a snapshot, for example. The problem seems to be reasonably complex - compounded by the fact that many programs that create or remove directories do so directly - not by calling externals that would be ZFS aware. -- Andre van Eyssen. mail: an...@purplecow.org jabber: an...@interact.purplecow.org purplecow.org: UNIX for the masses http://www2.purplecow.org purplecow.org: PCOWpix http://pix.purplecow.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Strange errors in zpool scrub, Solaris 10u6 x86_64
I did a zpool scrub recently, and while it was running it reported errors and woed about restoring from backup. When the scrub is complete, it reports finishing with 0 errors though. On the next scrub some other errors are reported in different files. iostat -xne does report a few errors (1 s/w on each of 2 mirrored drives, and 2 h/w errors on one of the drives). Any ideas? Is it a cosmetic problem or a crawling-hiding bug in my hardware and I should go about replacing something somewhere? I don't see such behavior on any other servers around... Thanks for ideas, //Jim zpool status -v while scrub is running: pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 0h1m, 29.45% done, 0h2m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 c3t3d0s0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: //dev/dsk/c3t2d0s0 //dev/dsk/c3t3d0s0 That has previously complained about //dev/dsk having problems, although both the directory and device files are accessible well. zpool status -v when scrub has finished: pool: rpool state: ONLINE scrub: scrub completed after 0h4m with 0 errors on Wed Jul 29 18:23:43 2009 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 c3t3d0s0 ONLINE 0 0 0 errors: No known data errors -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On Wed, 29 Jul 2009, Mark J Musante wrote: Yes, if it's local. Just use df -n $path and it'll spit out the filesystem type. If it's mounted over NFS, it'll just say something like nfs or autofs, though. $ df -n /opt Filesystemkbytesused avail capacity Mounted on /dev/md/dsk/d24 33563061 11252547 2197488434%/opt $ df -n /sata750 Filesystemkbytesused avail capacity Mounted on sata750 2873622528 77 322671575 1%/sata750 Not giving the filesystem type. It's easy to spot the zfs with the lack of recognisable device path, though. -- Andre van Eyssen. mail: an...@purplecow.org jabber: an...@interact.purplecow.org purplecow.org: UNIX for the masses http://www2.purplecow.org purplecow.org: PCOWpix http://pix.purplecow.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] LVM and ZFS
I'm curious about if there are any potential problems with using LVM metadevices as ZFS zpool targets. I have a couple of situations where using a device directly by ZFS causes errors on the console about Bus and lots of stalled I/O. But as soon as I wrap that device inside an LVM metadevice and then use it in the ZFS zpool things work perfectly fine and smoothly (no stalls). Situation 1 is when trying to use Intel X25-E or X25-M SSD disks in a Sun X4240 server with the LSI SAS controller - I never could get things to run without errors no matter what. (Tried multiple LSI controllers and multiple SSD disks). Jul 8 09:43:31 merope scsi: [ID 365881 kern.info] /p...@0,0/pci10de,3...@f/pci1000,3...@0 (mpt0): Jul 8 09:43:31 merope Log info 31126000 received for target 15. Jul 8 09:43:31 merope scsi_status=0, ioc_status=804b, scsi_state=c Jul 8 09:43:31 merope scsi: [ID 365881 kern.info] /p...@0,0/pci10de,3...@f/pci1000,3...@0 (mpt0): Jul 8 09:43:31 merope Log info 31126000 received for target 15. Jul 8 09:43:31 merope scsi_status=0, ioc_status=804b, scsi_state=c Jul 8 09:43:31 merope scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,3...@f/pci1000,3...@0/s...@f,0 (sd32): Jul 8 09:43:31 merope Error for Command: write Error Level: Retryable Jul 8 09:43:31 merope scsi: [ID 107833 kern.notice]Requested Block: 64256 Error Block: 64256 Jul 8 09:43:31 merope scsi: [ID 107833 kern.notice]Vendor: ATA Serial Number: CVEM8493 00BM Jul 8 09:43:31 merope scsi: [ID 107833 kern.notice]Sense Key: Unit Attention Jul 8 09:43:31 merope scsi: [ID 107833 kern.notice]ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU : 0x0 Situation 2 is when I installed an X25-E in a X4500 Thumper. Here I didn't see any errors on the console of the server, but performance would at regular intervals drop to zero (felt the same as in the LSI case above, just without the console errors). (In situation 1 above things would work perfectly fine when I was using an Adaptec controller instead). Anyway, when I put the 4GB partition of that SSD disk that I was using for testing inside a simple LVM metadevice all errors vanished and performance increased many many times. And no hickups. But I wonder... Is there anything in a setup like this that might be dangerous - something that might come back and bite me in the future? LVM(disksuite) is really mature technology and something I've been using without problems on many servers for many years so I think it can be trusted but anyway...? (I use the partition of that SSD-in-a-LVM-metadevice as a SLOG device for the ZFS zpools on those servers and performance is now really *really* good). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
On 29.07.09 16:59, Mark J Musante wrote: On Tue, 28 Jul 2009, Glen Gunselman wrote: # zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT zpool1 40.8T 176K 40.8T 0% ONLINE - # zfs list NAME USED AVAIL REFER MOUNTPOINT zpool1 364K 32.1T 28.8K /zpool1 This is normal, and admittedly somewhat confusing (see CR 6308817). Even if you had not created the additional zfs datasets, it still would have listed 40T and 32T. Here's an example using five 1G disks in a raidz: -bash-3.2# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT tank 4.97G 132K 4.97G 0% ONLINE - -bash-3.2# zfs list NAME USED AVAIL REFER MOUNTPOINT tank 98.3K 3.91G 28.8K /tank The AVAIL column in the zpool output shows 5G, whereas it shows 4G in the zfs list. The difference is the 1G parity. If we use raidz2, we'd expect 2G to be used for the parity, and this is borne out in a quick test using the same disks: -bash-3.2# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT tank 4.97G 189K 4.97G 0% ONLINE - -bash-3.2# zfs list NAME USED AVAIL REFER MOUNTPOINT tank 105K 2.91G 32.2K /tank Contrast that with a five-way mirror: -bash-3.2# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT tank 1016M 73.5K 1016M 0% ONLINE - -bash-3.2# zfs list NAME USED AVAIL REFER MOUNTPOINT tank69K 984M18K /tank Mirror case shows one more thing worth to mention - difference between available space reported by zpool and zfs is explained by a reservation set aside by ZFS for internal purposes - it is 32MB or 1/64 of pool capacity whichever is bigger (32MB in this example). Same reservation applies to RAID-Z case as well, though it is difficult to see it ;-) victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
Andriy Gapon wrote: What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. But now that quotas are working properly, Why would you want to continue the hack of 1 FS per user? I'm seriously curious here. In my view it's just more work. A more cluttered zfs list, and share output. A lot less straight forward and simple too. Why bother? What's the benefit? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
Andre van Eyssen wrote: On Wed, 29 Jul 2009, Andriy Gapon wrote: Well, I specifically stated that this property should not be recursive, i.e. it should work only in a root of a filesystem. When setting this property on a filesystem an administrator should carefully set permissions to make sure that only trusted entities can create directories there. Even limited to the root of a filesystem, it still gives a user the ability to consume resources rapidly. While I appreciate the fact that it would be restricted by permissions, I can think of a number of usage cases where it could suddenly tank a host. One use that might pop up, for example, would be cache spools - which often contain *many* directories. One runaway and kaboom. No worse than any other use case, if you can create datasets you can do that anyway. If you aren't running with restrictive resource controls you can tank the host in so many easier ways. Note that the proposal is that this be off by default and has to be something you explicitly enable. We generally use hosts now with plenty of RAM and the per-filesystem overhead for ZFS doesn't cause much concern. However, on a scratch box, try creating a big stack of filesystems - you can end up with a pool that consumes so much memory you can't import it! 'rmdir' question requires some thinking, my first reaction is it should do zfs destroy... .. which will fail if there's a snapshot, for example. The problem seems to be reasonably complex - compounded by the fact that many programs that create or remove directories do so directly - not by calling externals that would be ZFS aware. I don't understand how you came to that conclusion. This wouldn't be implemented in /usr/bin/mkdir but in the ZFS implementation of the mkdir(2) syscall. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
Kyle McDonald wrote: Andriy Gapon wrote: What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. But now that quotas are working properly, Why would you want to continue the hack of 1 FS per user? hack ? Different usage cases! Why bother? What's the benefit? The benefit is that users can control their own snapshot policy, they can create and destroy their own sub datasets, send and recv them etc. We can also delegate specific properties to users if we want as well. This is exactly how I have the builds area setup on our ONNV build machines for the Solaris security team.Sure the output of zfs list is long - but I don't care about that. When encryption comes along having a separate filesystem per user is an useful deployment case because it means we can deploy with separate keys for each user (granted may be less interesting if they only access their home dir over NFS/CIFS but still useful). I have a prototype PAM module that uses the users login password as the ZFS dataset wrapping key and keeps that in sync with the users login password on password change. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool export taking hours
fyleow wrote: fyleow wrote: I have a raidz1 tank of 5x 640 GB hard drives on my newly installed OpenSolaris 2009.06 system. I did a zpool export tank and the process has been running for 3 hours now taking up 100% CPU usage. When I do a zfs list tank it's still shown as mounted. What's going on here? Should it really be taking this long? $ zfs list tank NAME USED AVAIL REFER MOUNTPOINT tank 1.10T 1.19T 36.7K /tank $ zpool status tank pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 errors: No known data errors Can you run the following command and post the output: # echo ::pgrep zpool | ::walk thread | ::findstack -v | mdb -k Thanks, George ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss Here's what I get # echo ::pgrep zpool | ::walk thread | ::findstack -v | mdb -k stack pointer for thread ff00f717b020: ff0003684cf0 ff0003684d60 restore_mstate+0x129(fb8568ee) It might be best to generate a live crash dump so we can see what might be hanging up. You can also try running the command above multiple times and even run 'pstack pid of zpool' to see if we get additional information. Thanks, George ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On Wed, Jul 29, 2009 at 03:35:06PM +0100, Darren J Moffat wrote: Andriy Gapon wrote: What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. This has been brought up before and I thought there was an open CR for it but I can't find it. I'd want this to be something one could set per-directory, and I'd want it to not be inherittable (or to have control over whether it is inherittable). Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
Darren J Moffat wrote: Kyle McDonald wrote: Andriy Gapon wrote: What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. But now that quotas are working properly, Why would you want to continue the hack of 1 FS per user? hack ? Different usage cases! Why bother? What's the benefit? The benefit is that users can control their own snapshot policy, they can create and destroy their own sub datasets, send and recv them etc. We can also delegate specific properties to users if we want as well. This is exactly how I have the builds area setup on our ONNV build machines for the Solaris security team.Sure the output of zfs list is long - but I don't care about that. I can imagine a use for a builds. 1 FS per build - I don't know. But why link it to the mkdir? Why not make the build scripts do the zfs create out right? When encryption comes along having a separate filesystem per user is an useful deployment case because it means we can deploy with separate keys for each user (granted may be less interesting if they only access their home dir over NFS/CIFS but still useful). I have a prototype PAM module that uses the users login password as the ZFS dataset wrapping key and keeps that in sync with the users login password on password change. Encryption is an interesting case. User Snapshots I'd need to think about more. Couldn't the other properties be delegated on directories? Maybe I'm just getting old. ;) I still think having the zpool not automatically include a filesystem, and having ZFS containers was a useful concept. And I still use share (and now sharemgr) to manage my shares, and not ZFS share. Oh well. :) -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
I can think of a different feature where this would be useful - storing virtual machines. With an automatic 1fs per folder, each virtual machine would be stored in its own filesystem, allowing for rapid snapshots, and instant restores of any machine. One big limitation for me of zfs is that although I can restore an entire filesystem in seconds, restoring any individual folder takes much, much longer as it's treated as a standard copy. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Wed, 29 Jul 2009, Jorgen Lundman wrote: So, it is slower than the CF test. This is disappointing. Everyone else seems to use Intel X25-M, which have a write-speed of 170MB/s (2nd generation) so perhaps that is why it works better for them. It is curious that it is slower than the CF card. Perhaps because it shares with so many other SATA devices? Something to be aware of is that not all SSDs are the same. In fact, some faster SSDs may use a RAM write cache (they all do) and then ignore a cache sync request while not including hardware/firmware support to ensure that the data is persisted if there is power loss. Perhaps your fast CF device does that. If so, that would be really bad for zfs if your server was to spontaneously reboot or lose power. This is why you really want a true enterprise-capable SSD device for your slog. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv syntax
I apologize for replying in the middle of this thread, but I never saw the initial snapshot syntax of mypool2, which needs to be recursive (zfs snapshot -r mypo...@snap) to snapshot all the datasets in mypool2. Then, use zfs send -R to pick up and restore all the dataset properties. What was the original snapshot syntax? Cindy, You figured it out! I forgot the -r :) I don't have the room to try to send locally so I reran the ssh and it's showing what would get transferred with Ian's syntax. I just ran the following: zfs send -vR mypo...@snap |ssh j...@host pfexec /usr/sbin/zfs recv -Fdnv mypool/zfsname Looking at the man page, it doesn't explicitly state the behavior I am noticing but looking at the switch's, I can see a _lot_ of traffic going from the sending host to the receiving host. Does the -n just not write it but allow it to be sent? The command has not returned... Thanks everyone! jlc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On 29.07.09 07:56, Andre van Eyssen wrote: On Wed, 29 Jul 2009, Mark J Musante wrote: Yes, if it's local. Just use df -n $path and it'll spit out the filesystem type. If it's mounted over NFS, it'll just say something like nfs or autofs, though. $ df -n /opt Filesystemkbytesused avail capacity Mounted on /dev/md/dsk/d24 33563061 11252547 2197488434%/opt $ df -n /sata750 Filesystemkbytesused avail capacity Mounted on sata750 2873622528 77 322671575 1%/sata750 Not giving the filesystem type. It's easy to spot the zfs with the lack of recognisable device path, though. which df are you using? Michael -- Michael Schusterhttp://blogs.sun.com/recursion Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work
On Jul 28, 2009, at 6:34 PM, Eric D. Mudama wrote: On Mon, Jul 27 at 13:50, Richard Elling wrote: On Jul 27, 2009, at 10:27 AM, Eric D. Mudama wrote: Can *someone* please name a single drive+firmware or RAID controller+firmware that ignores FLUSH CACHE / FLUSH CACHE EXT commands? Or worse, responds ok when the flush hasn't occurred? two seconds with google shows http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=183771NewLang=enHilite=cache+flush Give it up. These things happen. Not much you can do about it, other than design around it. -- richard That example is a windows-specific, and is a software driver, where the data integrity feature must be manually disabled by the end user. The default behavior was always maximum data protection. I don't think you read the post. It specifically says, Previous versions of the Promise drivers ignored the flush cache command until system power down. Promise makes RAID controllers and has a firmware fix for this. This is the kind of thing we face: some performance engineer tries to get an edge by assuming there is only one case where cache flush matters. Another 2 seconds with google shows: http://sunsolve.sun.com/search/document.do?assetkey=1-66-27-1 (interestingly, for this one, fsck also fails) http://sunsolve.sun.com/search/document.do?assetkey=1-21-103622-06-1 http://forums.seagate.com/stx/board/message?board.id=freeagentmessage.id=5060query.id=3999#M5060 But they also get cache flush code wrong in the opposite direction. A good example of that is the notorious Seagate 1.5 TB disk stutter problem. NB, for the most part, vendors do not air their dirty laundry (eg bug reports) on the internet for those without support contracts. If you have a support contract, your search may show many more cases. While perhaps analagous at some level, the perpetual your hardware must be crappy/cheap/not-as-expensive-as-mine doesn't seem to be a sufficient explanation when things go wrong, like complete loss of a pool. As I said before, it is a systems engineering problem. If you do your own systems engineering, then you should make sure the components you select work as you expect. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Tunable iSCSI timeouts - ZFS over iSCSI fix
Anyone (Ross?) creating ZFS pools over iSCSI connections will want to pay attention to snv_121 which fixes the 3 minute hang after iSCSI disk problems: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=649 Yay! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tunable iSCSI timeouts - ZFS over iSCSI fix
Yup, somebody pointed that out to me last week and I can't wait :-) On Wed, Jul 29, 2009 at 7:48 PM, Davedave-...@dubkat.com wrote: Anyone (Ross?) creating ZFS pools over iSCSI connections will want to pay attention to snv_121 which fixes the 3 minute hang after iSCSI disk problems: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=649 Yay! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv syntax
Joseph L. Casale wrote: I apologize for replying in the middle of this thread, but I never saw the initial snapshot syntax of mypool2, which needs to be recursive (zfs snapshot -r mypo...@snap) to snapshot all the datasets in mypool2. Then, use zfs send -R to pick up and restore all the dataset properties. What was the original snapshot syntax? Cindy, You figured it out! I forgot the -r :) I don't have the room to try to send locally so I reran the ssh and it's showing what would get transferred with Ian's syntax. I just ran the following: zfs send -vR mypo...@snap |ssh j...@host pfexec /usr/sbin/zfs recv -Fdnv mypool/zfsname Looking at the man page, it doesn't explicitly state the behavior I am noticing but looking at the switch's, I can see a _lot_ of traffic going from the sending host to the receiving host. Does the -n just not write it but allow it to be sent? The command has not returned... Correct, the sending side will be happily sending into a void. Kill it and re-run without the -n. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
Glen Gunselman wrote: Here is the output from my J4500 with 48 x 1 TB disks. It is almost the exact same configuration as yours. This is used for Netbackup. As Mario just pointed out, zpool list includes the parity drive in the space calculation whereas zfs list doesn't. [r...@xxx /]# zpool status Scoot, Thanks for the sample zpool status output. I will be using the storage for NetBackup, also. (I am booting the X4500 from a SAN - 6140 - and using a SL48 w/2 LTO4 drives.) Glen Glen, If you want any more info about our configuration drop me a line. It works ver very well and we have had no issues at all. This System is a T5220 (323 GB RAM)with the 48 TB J4500 connected via SAS. System also has 3 dual port fibre channel HBA's feeding 6 LTO4 drives in a 540 slot SL500. The server is 10 gig attached straight to our network core routers and needless to say achieves very high throughput. I have seen it pushing the full capacity of the SAS link to the J4500 quite commonly. This is probably the choke point for this system. /Scott -- ___ Scott Lawson Systems Architect Manukau Institute of Technology Information Communication Technology Services Private Bag 94006 Manukau City Auckland New Zealand Phone : +64 09 968 7611 Fax: +64 09 968 7641 Mobile : +64 27 568 7611 mailto:sc...@manukau.ac.nz http://www.manukau.ac.nz perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Install and boot from USB stick?
Hello, Ive tried to find any hard information on how to install, and boot, opensolaris from a USB stick. Ive seen a few people written a few sucessfull stories about this, but I cant seem to get it to work. The procedure: Boot from LiveCD, insert USB drive, find it using `format', start installer. The USB stick it not found (just stands on Finding disks). Remove USB stick, hit back in installer, insert USB stick again, USB stick found, start installing. At 19%, it just stands there. Have no idea why. Suggestions? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
on 29/07/2009 17:52 Andre van Eyssen said the following: On Wed, 29 Jul 2009, Andriy Gapon wrote: Well, I specifically stated that this property should not be recursive, i.e. it should work only in a root of a filesystem. When setting this property on a filesystem an administrator should carefully set permissions to make sure that only trusted entities can create directories there. Even limited to the root of a filesystem, it still gives a user the ability to consume resources rapidly. While I appreciate the fact that it would be restricted by permissions, I can think of a number of usage cases where it could suddenly tank a host. One use that might pop up, for example, would be cache spools - which often contain *many* directories. One runaway and kaboom. Well, the feature would not be on by default. So careful evaluation and planning should prevent abuses. We generally use hosts now with plenty of RAM and the per-filesystem overhead for ZFS doesn't cause much concern. However, on a scratch box, try creating a big stack of filesystems - you can end up with a pool that consumes so much memory you can't import it! 'rmdir' question requires some thinking, my first reaction is it should do zfs destroy... .. which will fail if there's a snapshot, for example. The problem seems to be reasonably complex - compounded by the fact that many programs that create or remove directories do so directly - not by calling externals that would be ZFS aware. Well, snapshots could be destroyed too, nothing stops from doing that. BTW, I am not proposing to implement this feature in mkdir/rmdir userland utility, I am proposing to implement the feature in ZFS kernel code responsible for directory creation/removal. -- Andriy Gapon ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On Wed, 2009-07-29 at 15:06 +0300, Andriy Gapon wrote: What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. This feature request touches upon a very generic observation that my group made a long time ago: ZFS is a wonderful filesystem, the only trouble is that (almost) all the cool features have to be asked for using non-filesystem (POSIX) APIs. Basically everytime you have to do anything with ZFS you have to do it on a host where ZFS runs. The sole exception from this rule is .zfs subdirectory that lets you have access to snapshots without explicit calls to zfs(1M). Basically .zfs subdirectory is your POSIX FS way to request two bits of ZFS functionality. In general, however, we all want more. On the read-only front: wouldn't it be cool to *not* run zfs sends explicitly but have: .zfs/send/snap name .zfs/sendr/from-snap-name-to-snap-name give you the same data automagically? On the read-write front: wouldn't it be cool to be able to snapshot things by: $ mkdir .zfs/snapshot/snap-name ? The list goes on... Thanks, Roman. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] cleaning up cloned zones
I create a couple of zones. I have a zone path like this: r...@vps1:~# zfs list -r zones/cars NAME USED AVAIL REFER MOUNTPOINT zones/fans 1.22G 3.78G22K /zones/fans zones/fans/ROOT 1.22G 3.78G19K legacy zones/fans/ROOT/zbe 1.22G 3.78G 1.22G legacy I then upgrade the global zone, this creates the zfs clones/snapshots for the zones: r...@vps1:~# zfs list -r zones/fans NAMEUSED AVAIL REFER MOUNTPOINT zones/fans 4.78G 5.22G22K /zones/fans zones/fans/ROOT4.78G 5.22G19K legacy zones/fans/ROOT/zbe2.64G 5.22G 2.64G legacy zones/fans/ROOT/zbe-1 2.13G 5.22G 3.99G legacy I create a couple of new zones, the mounted zfs tree looks like this: r...@vps1:~# zfs list -r zones/cars NAME USED AVAIL REFER MOUNTPOINT zones/cars 1.22G 3.78G22K /zones/cars zones/cars/ROOT 1.22G 3.78G19K legacy zones/cars/ROOT/zbe 1.22G 3.78G 1.22G legacy So, now the problem is, I have some zones that have a zbe-1 and some that have a zfs clone with just zbe name. After making sure everything works for a month now, I want to clean up that. I want to promote all of them to be just zbe. I understand I won't be able to revert back to original zone bits, but I could have 40+ zones on this system, and I prefer them all to be consistent looking. Here is a full hierarchy now: r...@vps1:~# zfs get -r mounted,origin,mountpoint zones/fans NAME PROPERTYVALUE SOURCE zones/fans mounted yes- zones/fans origin - - zones/fans mountpoint /zones/fansdefault zones/fans/ROOTmounted no - zones/fans/ROOTorigin - - zones/fans/ROOTmountpoint legacy local zones/fans/ROOT/zbemounted no - zones/fans/ROOT/zbeorigin - - zones/fans/ROOT/zbemountpoint legacy local zones/fans/ROOT/z...@zbe-1 mounted - - zones/fans/ROOT/z...@zbe-1 origin - - zones/fans/ROOT/z...@zbe-1 mountpoint - - zones/fans/ROOT/zbe-1 mounted yes- zones/fans/ROOT/zbe-1 origin zones/fans/ROOT/z...@zbe-1 - zones/fans/ROOT/zbe-1 mountpoint legacy local How do I go about renaming and destroying the original zbe fs? I believe this will involve me to promote the zbe-1 and then destroy zbe followed by renaming zbe-1 to zbe. But this is a live system, I don't have something to play with first. Any tips? Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cleaning up cloned zones
hey anil, given that things work, i'd recommend leaving them alone. if you really want to insist on cleaning things up aesthetically then you need to do multiple zfs operation and you'll need to shutdown the zones. assuming you haven't cloned any zones (because if you did that complicates things), you could do: - shutdown your zones - zfs promote of the latest zbe - destroy all of the new snapshots of the promoted zbe (and the old zbe filesystems which are now dependants of those snapshots.) - a rename of the promoted zbe to whatever name you want to standardize on. note that i haven't tested any of this, but in theory it should work. it may be the case that some of the zfs operations above may fail due to the zoned bit being set for zbes. if this is the case then you'll need to clear the zoned bit, do the operations, and then reset the zoned bit. please don't come crying to me if this doesn't work. ;) ed On Wed, Jul 29, 2009 at 07:44:37PM -0700, Anil wrote: I create a couple of zones. I have a zone path like this: r...@vps1:~# zfs list -r zones/cars NAME USED AVAIL REFER MOUNTPOINT zones/fans 1.22G 3.78G22K /zones/fans zones/fans/ROOT 1.22G 3.78G19K legacy zones/fans/ROOT/zbe 1.22G 3.78G 1.22G legacy I then upgrade the global zone, this creates the zfs clones/snapshots for the zones: r...@vps1:~# zfs list -r zones/fans NAMEUSED AVAIL REFER MOUNTPOINT zones/fans 4.78G 5.22G22K /zones/fans zones/fans/ROOT4.78G 5.22G19K legacy zones/fans/ROOT/zbe2.64G 5.22G 2.64G legacy zones/fans/ROOT/zbe-1 2.13G 5.22G 3.99G legacy I create a couple of new zones, the mounted zfs tree looks like this: r...@vps1:~# zfs list -r zones/cars NAME USED AVAIL REFER MOUNTPOINT zones/cars 1.22G 3.78G22K /zones/cars zones/cars/ROOT 1.22G 3.78G19K legacy zones/cars/ROOT/zbe 1.22G 3.78G 1.22G legacy So, now the problem is, I have some zones that have a zbe-1 and some that have a zfs clone with just zbe name. After making sure everything works for a month now, I want to clean up that. I want to promote all of them to be just zbe. I understand I won't be able to revert back to original zone bits, but I could have 40+ zones on this system, and I prefer them all to be consistent looking. Here is a full hierarchy now: r...@vps1:~# zfs get -r mounted,origin,mountpoint zones/fans NAME PROPERTYVALUE SOURCE zones/fans mounted yes- zones/fans origin - - zones/fans mountpoint /zones/fansdefault zones/fans/ROOTmounted no - zones/fans/ROOTorigin - - zones/fans/ROOTmountpoint legacy local zones/fans/ROOT/zbemounted no - zones/fans/ROOT/zbeorigin - - zones/fans/ROOT/zbemountpoint legacy local zones/fans/ROOT/z...@zbe-1 mounted - - zones/fans/ROOT/z...@zbe-1 origin - - zones/fans/ROOT/z...@zbe-1 mountpoint - - zones/fans/ROOT/zbe-1 mounted yes- zones/fans/ROOT/zbe-1 origin zones/fans/ROOT/z...@zbe-1 - zones/fans/ROOT/zbe-1 mountpoint legacy local How do I go about renaming and destroying the original zbe fs? I believe this will involve me to promote the zbe-1 and then destroy zbe followed by renaming zbe-1 to zbe. But this is a live system, I don't have something to play with first. Any tips? Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Solaris10+ and Online Media services
It seems like a lot of meida services are starting to catch on about ZFS. I knew Last.fm makes use of it, and I also found out grooveshark (see this blog: http://www.facebook.com/notes.php?id=7354446700start=200hash=fb219332a992a64f12d200435b3d24f2 ). Grooveshark looks nice for end users as well (which I was I sent it to desktop-discuss; I may start using it also for songs I own that I cannot listen to on Last.fm. Unfortunately, I couldn't upload mp3s or play the available songs in OpenSolaris (the software is a java applet). Did anyone have better luck? Do you know of other media services that have big investments in Solaris or ZFS? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
On Wed, Jul 29, 2009 at 05:34:53PM -0700, Roman V Shaposhnik wrote: On Wed, 2009-07-29 at 15:06 +0300, Andriy Gapon wrote: What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. This feature request touches upon a very generic observation that my group made a long time ago: ZFS is a wonderful filesystem, the only trouble is that (almost) all the cool features have to be asked for using non-filesystem (POSIX) APIs. Basically everytime you have to do anything with ZFS you have to do it on a host where ZFS runs. The sole exception from this rule is .zfs subdirectory that lets you have access to snapshots without explicit calls to zfs(1M). Basically .zfs subdirectory is your POSIX FS way to request two bits of ZFS functionality. In general, however, we all want more. On the read-only front: wouldn't it be cool to *not* run zfs sends explicitly but have: .zfs/send/snap name .zfs/sendr/from-snap-name-to-snap-name give you the same data automagically? On the read-write front: wouldn't it be cool to be able to snapshot things by: $ mkdir .zfs/snapshot/snap-name ? Are you sure this doesn't work on Solaris/OpenSolaris? From looking at the code you should be able to do exactly that as well as destroy snapshot by rmdir'ing this entry. -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpZJahRvw8OH.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss