Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
Darren On 02/12/2013 11:25 AM, Darren J Moffat wrote: On 02/10/13 12:01, Koopmann, Jan-Peter wrote: Why should it? Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap support (I believe currently only Nexenta but correct me if I am wrong) the blocks will not be freed, will they? Solaris 11.1 has ZFS with SCSI UNMAP support. Seem to have skipped that one... Are there any related tools e.g. to release all zero blocks or the like? Of course it's up to the admin then to know what all this is about or to wreck the data Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iSCSI access patterns and possible improvements?
Thanks for all the answers more inline) On 01/18/2013 02:42 AM, Richard Elling wrote: On Jan 17, 2013, at 7:04 AM, Bob Friesenhahn bfrie...@simple.dallas.tx.us mailto:bfrie...@simple.dallas.tx.us wrote: On Wed, 16 Jan 2013, Thomas Nau wrote: Dear all I've a question concerning possible performance tuning for both iSCSI access and replicating a ZVOL through zfs send/receive. We export ZVOLs with the default volblocksize of 8k to a bunch of Citrix Xen Servers through iSCSI. The pool is made of SAS2 disks (11 x 3-way mirrored) plus mirrored STEC RAM ZIL SSDs and 128G of main memory The iSCSI access pattern (1 hour daytime average) looks like the following (Thanks to Richard Elling for the dtrace script) If almost all of the I/Os are 4K, maybe your ZVOLs should use a volblocksize of 4K? This seems like the most obvious improvement. 4k might be a little small. 8k will have less metadata overhead. In some cases we've seen good performance on these workloads up through 32k. Real pain is felt at 128k :-) My only pain so far is the time a send/receive takes without really loading the network at all. VM performance is nothing I worry about at all as it's pretty good. So key question for me is if going from 8k to 16k or even 32k would have some benefit for that problem? [ stuff removed ] For disaster recovery we plan to sync the pool as often as possible to a remote location. Running send/receive after a day or so seems to take a significant amount of time wading through all the blocks and we hardly see network average traffic going over 45MB/s (almost idle 1G link). So here's the question: would increasing/decreasing the volblocksize improve the send/receive operation and what influence might show for the iSCSI side? Matching the volume block size to what the clients are actually using (due to their filesystem configuration) should improve performance during normal operations and should reduce the number of blocks which need to be sent in the backup by reducing write amplification due to overlap blocks.. compression is a good win, too Thanks for that. I'll use your mentioned tools to drill down -- richard Thomas -- richard.ell...@richardelling.com mailto:richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] iSCSI access patterns and possible improvements?
Dear all I've a question concerning possible performance tuning for both iSCSI access and replicating a ZVOL through zfs send/receive. We export ZVOLs with the default volblocksize of 8k to a bunch of Citrix Xen Servers through iSCSI. The pool is made of SAS2 disks (11 x 3-way mirrored) plus mirrored STEC RAM ZIL SSDs and 128G of main memory The iSCSI access pattern (1 hour daytime average) looks like the following (Thanks to Richard Elling for the dtrace script) R value - Distribution - count 256 | 0 512 |@22980 1024 | 663 2048 | 1075 4096 |@433819 8192 |@@ 40876 16384 |@@ 37218 32768 |@82584 65536 |@@ 34784 131072 |@25968 262144 |@14884 524288 | 69 1048576 | 0 W value - Distribution - count 256 | 0 512 |@35961 1024 | 25108 2048 | 10222 4096 |@@@ 1243634 8192 |@521519 16384 | 218932 32768 |@@@ 146519 65536 | 112 131072 | 15 262144 | 78 524288 | 0 For disaster recovery we plan to sync the pool as often as possible to a remote location. Running send/receive after a day or so seems to take a significant amount of time wading through all the blocks and we hardly see network average traffic going over 45MB/s (almost idle 1G link). So here's the question: would increasing/decreasing the volblocksize improve the send/receive operation and what influence might show for the iSCSI side? Thanks for any help Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Jamie We ran Into the same and had to migrate the pool while imported read-only. On top we were adviced to NOT use an L2ARC. Maybe you should consider that as well Thomas Am 12.12.2012 um 19:21 schrieb Jamie Krier jamie.kr...@gmail.com: I've hit this bug on four of my Solaris 11 servers. Looking for anyone else who has seen it, as well as comments/speculation on cause. This bug is pretty bad. If you are lucky you can import the pool read-only and migrate it elsewhere. I've also tried setting zfs:zfs_recover=1,aok=1 with varying results. http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc Hardware platform: Supermicro X8DAH 144GB ram Supermicro sas2 jbods LSI 9200-8e controllers (Phase 13 fw) Zuesram log ZuesIops sas l2arc Seagate ST33000650SS sas drives All four servers are running the same hardware, so at first I suspected a problem there. I opened a ticket with Oracle which ended with this email: - We strongly expect that this is a software issue because this problem does not happen on Solaris 10. On Solaris 11, it happens with both the SPARC and the X64 versions of Solaris. We have quite a few customer who have seen this issue and we are in the process of working on a fix. Because we do not know the source of the problem yet, I cannot speculate on the time to fix. This particular portion of Solaris 11 (the virtual memory sub-system) is quite different than in Solaris 10. We re-wrote the memory management in order to get ready for systems with much more memory than Solaris 10 was designed to handle. Because this is the memory management system, there is not expected to be any work-around. Depending on your company's requirements, one possibility is to use Solaris 10 until this issue is resolved. I apologize for any inconvenience that this bug may cause. We are working on it as a Sev 1 Priority1 in sustaining engineering. - I am thinking about switching to an Illumos distro, but wondering if this problem may be present there as well. Thanks - Jamie ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Changing a VDEV GUID?
Hi ZFS fellows, I already seen on the archive of the list some of you doing some GUID change of a pool's VDEV, to allow a cloned disk to be imported on the same system as the source. Can someone explain in detail how to achieve that? Has already someone invented the wheel so I would not have to rewrite a tool to do it? Subsidiary: Is there an official response of Oracle in front of such case? How do they officially deal with Binary Copied disks, as it's common to do such copy with UFS to copy SAP environment or Databases... Thanks in advance, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS best practice for FreeBSD?
On Thu, 11 Oct 2012, Freddie Cash wrote: On Thu, Oct 11, 2012 at 2:47 PM, andy thomas a...@time-domain.co.uk wrote: According to a Sun document called something like 'ZFS best practice' I read some time ago, best practice was to use the entire disk for ZFS and not to partition or slice it in any way. Does this advice hold good for FreeBSD as well? Solaris disabled the disk cache if the disk was partitioned, thus the recommendation to always use the entire disk with ZFS. FreeBSD's GEOM architecture allows the disk cache to be enabled whether you use the full disk or partition it. Personally, I find it nicer to use GPT partitions on the disk. That way, you can start the partition at 1 MB (gpart add -b 2048 on 512B disks, or gpart add -b 512 on 4K disks), leave a little wiggle-room at the end of the disk, and use GPT labels to identify the disk (using gpt/label-name for the device when adding to the pool). This is apparently what had been done in this case: gpart add -b 34 -s 600 -t freebsd-swap da0 gpart add -b 634 -s 1947525101 -t freebsd-zfs da1 gpart show (stuff relating to a compact flash/SATA boot disk deleted) =34 1953525101 da0 GPT (932G) 34 6001 freebsd-swap (2.9G) 634 19475251012 freebsd-zfs (929G) =34 1953525101 da2 GPT (932G) 34 6001 freebsd-swap (2.9G) 634 19475251012 freebsd-zfs (929G) =34 1953525101 da1 GPT (932G) 34 6001 freebsd-swap (2.9G) 634 19475251012 freebsd-zfs (929G) Is this a good scheme? The server has 12 G of memory (upped from 4 GB last year after it kept crashing with out of memory reports on the console screen) so I doubt the swap would actually be used very often. Running Bonnie++ on this pool comes up with some very good results for sequential disk writes but the latency of over 43 seconds for block reads is terrible and is obviously impacting performance as a mail server, as shown here: Version 1.96 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP hsl-main.hsl.of 24G63 67 80584 20 70568 17 314 98 554226 60 410.1 13 Latency 77140us 43145ms 28872ms 171ms 212ms 232ms Version 1.96 --Sequential Create-- Random Create hsl-main.hsl.office -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 19261 93 + +++ 18491 97 21542 92 + +++ 20691 94 Latency 15399us 488us 226us 27733us 103us 138us The other issue with this server is it needs to be rebooted every 8-10 weeks as disk I/O slows to a crawl over time and the server becomes unusable. After a reboot, it's fine again. I'm told ZFS 13 on FreeBSD 8.0 has a lot of problems so I was planning to rebuild the server with FreeBSD 9.0 and ZFS 28 but I didn't want to make any basic design mistakes in doing this. Another point about the Sun ZFS paper - it mentioned optimum performance would be obtained with RAIDz pools if the number of disks was between 3 and 9. So I've always limited my pools to a maximum of 9 active disks plus spares but the other day someone here was talking of seeing hundreds of disks in a single pool! So what is the current advice for ZFS in Solaris and FreeBSD? You can have multiple disks in a vdev. And you can multiple vdevs in a pool. Thus, you can have hundred of disks in a pool. :) Just split the disks up into multiple vdevs, where each vdev is under 9 disks each. :) For example, we have 25 disks in the following pool, but only 6 disks in each vdev (plus log/cache): [root@alphadrive ~]# zpool list -v NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT storage24.5T 20.7T 3.76T84% 3.88x DEGRADED - raidz2 8.12T 6.78T 1.34T - gpt/disk-a1- - - - gpt/disk-a2- - - - gpt/disk-a3- - - - gpt/disk-a4- - - - gpt/disk-a5- - - - gpt/disk-a6- - - - raidz2 5.44T 4.57T 888G - gpt/disk-b1- - - - gpt/disk-b2- - - - gpt/disk-b3- - - - gpt/disk-b4- - - - gpt/disk-b5- - - - gpt/disk-b6- - - - raidz2 5.44T 4.60T 863G - gpt/disk-c1
Re: [zfs-discuss] ZFS best practice for FreeBSD?
On Thu, 11 Oct 2012, Richard Elling wrote: On Oct 11, 2012, at 2:58 PM, Phillip Wagstrom phillip.wagst...@gmail.com wrote: On Oct 11, 2012, at 4:47 PM, andy thomas wrote: According to a Sun document called something like 'ZFS best practice' I read some time ago, best practice was to use the entire disk for ZFS and not to partition or slice it in any way. Does this advice hold good for FreeBSD as well? My understanding of the best practice was that with Solaris prior to ZFS, it disabled the volatile disk cache. This is not quite correct. If you use the whole disk ZFS will attempt to enable the write cache. To understand why, remember that UFS (and ext, by default) can die a horrible death (+fsck) if there is a power outage and cached data is not flushed to disk. So by default, Sun shipped some disks with write cache disabled by default. For non-Sun disks, they are most often shipped with write cache enabled and the most popular file systems (NTFS) properly issue cache flush requests as needed (for the same reason ZFS issues cache flush requests). Out of interest, how do you enable the write cache on a disk? I recently replaced a failing Dell-branded disk on a Dell server with an HP-branded disk (both disks were the identical Seagate model) and on running the EFI diagnostics just to check all was well, it reported the write cache was disabled on the new HP disk but enabled on the remaining Dell disks in the server. I couldn't see any way of enabling the cache from the EFI diags so I left it as it was - probably not ideal. With ZFS, the disk cache is used, but after every transaction a cache-flush command is issued to ensure that the data made it the platters. Write cache is flushed after uberblock updates and for ZIL writes. This is important for uberblock updates, so the uberblock doesn't point to a garbaged MOS. It is important for ZIL writes, because they must be guaranteed written to media before ack. Thanks for the explanation, that all makes sense now. Andy If you slice the disk, enabling the disk cache for the whole disk is dangerous because other file systems (meaning UFS) wouldn't do the cache-flush and there was a risk for data-loss should the cache fail due to, say a power outage. Can't speak to how BSD deals with the disk cache. I looked at a server earlier this week that was running FreeBSD 8.0 and had 2 x 1 Tb SAS disks in a ZFS 13 mirror with a third identical disk as a spare. Large file I/O throughput was OK but the mail jail it hosted had periods when it was very slow with accessing lots of small files. All three disks (the two in the ZFS mirror plus the spare) had been partitioned with gpart so that partition 1 was a 6 GB swap and partition 2 filled the rest of the disk and had a 'freebsd-zfs' partition on it. It was these second partitions that were part of the mirror. This doesn't sound like a very good idea to me as surelt disk seeks for swap and for ZFS file I/O are bound to clash. aren't they? It surely would make a slow, memory starved swapping system even slower. :) Another point about the Sun ZFS paper - it mentioned optimum performance would be obtained with RAIDz pools if the number of disks was between 3 and 9. So I've always limited my pools to a maximum of 9 active disks plus spares but the other day someone here was talking of seeing hundreds of disks in a single pool! So what is the current advice for ZFS in Solaris and FreeBSD? That number was drives per vdev, not per pool. -Phil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- richard.ell...@richardelling.com +1-760-896-4422 - Andy Thomas, Time Domain Systems Tel: +44 (0)7866 556626 Fax: +44 (0)20 8372 2582 http://www.time-domain.co.uk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS best practice for FreeBSD?
According to a Sun document called something like 'ZFS best practice' I read some time ago, best practice was to use the entire disk for ZFS and not to partition or slice it in any way. Does this advice hold good for FreeBSD as well? I looked at a server earlier this week that was running FreeBSD 8.0 and had 2 x 1 Tb SAS disks in a ZFS 13 mirror with a third identical disk as a spare. Large file I/O throughput was OK but the mail jail it hosted had periods when it was very slow with accessing lots of small files. All three disks (the two in the ZFS mirror plus the spare) had been partitioned with gpart so that partition 1 was a 6 GB swap and partition 2 filled the rest of the disk and had a 'freebsd-zfs' partition on it. It was these second partitions that were part of the mirror. This doesn't sound like a very good idea to me as surelt disk seeks for swap and for ZFS file I/O are bound to clash. aren't they? Another point about the Sun ZFS paper - it mentioned optimum performance would be obtained with RAIDz pools if the number of disks was between 3 and 9. So I've always limited my pools to a maximum of 9 active disks plus spares but the other day someone here was talking of seeing hundreds of disks in a single pool! So what is the current advice for ZFS in Solaris and FreeBSD? Andy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Question about ZFS snapshots
I have a ZFS filseystem and create weekly snapshots over a period of 5 weeks called week01, week02, week03, week04 and week05 respectively. Ny question is: how do the snapshots relate to each other - does week03 contain the changes made since week02 or does it contain all the changes made since the first snapshot, week01, and therefore includes those in week02? To rollback to week03, it's necesaary to delete snapshots week04 and week05 first but what if week01 and week02 have also been deleted - will the rollback still work or is it ncessary to keep earlier snapshots? Andy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
I tried to use cylinder 0 for root on x86, but in the UFS days, and I lost the vtoc on both mirrored disks. The installer had selected cylinder 1 as the starting cylinder for the first disk, and I thought I should be able to use cylinder 0 as well, so for the mirror I partitioned it to start from 0. I then removed the first disk, changed the starting cylinder to 0, and added it back. When I later tried to reboot the system both vtocs were lost. I had to whip up a program that scanned the disk to find my UFS filesystems so that I could put a proper vtoc back, boot the system and then change it back to start at cylinder 1. I always leave cylinder 0 alone since then. Thomas 2012-06-16 18:23, Richard Elling skrev: On Jun 15, 2012, at 7:37 AM, Hung-Sheng Tsao Ph.D. wrote: by the way when you format start with cylinder 1 donot use 0 There is no requirement for skipping cylinder 0 for root on Solaris, and there never has been. -- richard *__* ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS error accessing past end of object
Dear all I'm about to answer my own question with some really useful hints from Steve, thanks for that!!! On 03/02/2012 07:43 AM, Thomas Nau wrote: Dear all I asked before but without much feedback. As the issue is persistent I want to give it another try. We disabled panicing for such kind of error in /etc/system but still see messages such as zfs: accessing past end of object 5b1aa/21a8008 (size=60416 access=32603+32768) in the logs. Is there any way to identify which object (file?) causes this? I'm about to answer my own question with some really useful hints from Steve, thanks for that!!! In case your system crashed there's an option in /etc/system to return an IO error instead of panicing, just add set zfs:zfs_recover=1 After rebooting the system will issue a warning such as Mar 4 15:04:16 neo genunix: [ID 762447 kern.warning] WARNING: zfs: accessing past end of object bab/21a8008 (size=60416 access=32603+32768) to syslog whenever the problem shows. The important numbers are dataset ID: bab object ID: 21a8008 Those would also show up in the kernel panic message. Assuming the ZFS datasets still exist and the file is also unaltered we don't need crash dump analysis but can use zdb instead. I running the latest S11 bits by the way First use some bash magic to turn the hex numbers into decimals as zdb deals with those # printf %d %d\n 0xbab 0x21a8008 987 35291144 now lookup the dataset; I assume we already have a pretty good idea about which pool to check # zdb -d -r pool1 | grep ID 2987 Dataset pool1/.../backup-clone [ZPL], ID 2987, cr_txg 190496, 1.36T, 15590871 objects Now lookup the actual object. Add more -v to get even more data # zdb -vvv pool1/backup/nfs/home/student1/backup-clone 35291144 Dataset pool1/.../backup-clone [ZPL], ID 2987, cr_txg 190496, 1.36T, 15590871 objects, rootbp DVA[0]=2:2a000cada00:c00:RZM:4 [L0 DMU objset] fletcher4 lzjb LE contiguous unique unencrypted 4-copy size=800L/200P birth=190500L/190500P fill=15590871 cksum=1abc142ec9:88286e5b5e3:18d73114fd4d4:34d14f5e348c05 Object lvl iblk dblk dsize lsize %full type 35291144116K 59.0K32K 59.0K 100.00 ZFS plain file 168 bonus System attributes dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path/zep13/.mozilla/firefox/jlonp9fm.default/cookies.sqlite uid 63883 gid 400 atime Tue Jan 10 14:15:34 2012 mtime Tue Jan 10 14:23:01 2012 ctime Tue Jan 10 14:23:01 2012 crtime Wed Oct 19 09:43:54 2011 gen 15760229 mode100644 size2228224 parent 34303712 links 1 pflags 4080004 So here comes the funny stuff: according to the object data the size is 2228224 bytes which matches of course the ls -l output. On the other hand ZFS read complained after about 32k which fits the dsize/lsize columns as we use compression. Strange, isn't it but wait, it gets even more confusing... The initial panic, now turned into warnings, was caused by the TSM backup client trying to backup the file. We use zfs clones to get a consistent backup as much as possible. Let's truss the client (cut some path elements): access(/backup/pool1/.../zep13/.mozilla/firefox/jlonp9fm.default/cookies.sqlite, R_OK) = 0 open64(/backup/pool1/.../zep13/.mozilla/firefox/jlonp9fm.default/cookies.sqlite, O_RDONLY|O_NONBLOCK) = 6 acl(/backup/pool1/.../zep13/.mozilla/firefox/jlonp9fm.default/cookies.sqlite, ACE_GETACL, 1024, 0x0846D780) = 3 read(6, S Q L i t e f o r m a.., 32603)= 32603 ... read(6, 0x086B0C98, 32768) Err#5 EIO now let's just cat the file and see what happens: # cat /backup/pool1/.../zep13/.mozilla/firefox/jlonp9fm.default/cookies.sqlite TEST # ls -l TEST -rw-r--r-- 1 root root 2228224 Mar 4 15:50 TEST no complains. Observing the appropriate routine through # dtrace -n '::zfs_panic_recover:entry { stack(); }' does not trigger. Checking the backup client again... no more errors as truss also confirms open64(/backup/pool1/.../zep13/.mozilla/firefox/jlonp9fm.default/cookies.sqlite, O_RDONLY|O_NONBLOCK) = 6 read(6, S Q L i t e f o r m a.., 32603)= 32603 read(6, 3 4 1 0 8 . 2 . 2 . u t.., 32768)= 32768 read(6, \0\0\0\0\0\0\0\0\0\0\0\0.., 32768)= 32768 ... Double checking with zdb and ls -i shows the same object ID. I'm really puzzled!!! Any more ideas what's going on? Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server upgrade
On Thu, 16 Feb 2012, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of andy thomas One of my most vital servers is a Netra 150 dating from 1997 - still going strong, crammed with 12 x 300 Gb disks and running Solaris 9. I think one ought to have more faith in Sun hardware. If it's one of your most vital, I think you should have less faith in Sun hardware. If it's one of your nobody really cares, I can easily replace it, servers then... sounds good. Keep it on as long as it's alive. Well, it's used as an off-site backup server whose content is in addition mirrored to another Linux server internally and as all the Netra's disks are UFS, if I ever had a problem with it I'd just pull them all out and transfer them to an E450 and power that on in its place. Andy - Andy Thomas, Time Domain Systems Tel: +44 (0)7866 556626 Fax: +44 (0)20 8372 2582 http://www.time-domain.co.uk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server upgrade
On Wed, 15 Feb 2012, David Dyer-Bennet wrote: While I'm not in need of upgrading my server at an emergency level, I'm starting to think about it -- to be prepared (and an upgrade could be triggered by a failure at this point; my server dates to 2006). One of my most vital servers is a Netra 150 dating from 1997 - still going strong, crammed with 12 x 300 Gb disks and running Solaris 9. I think one ought to have more faith in Sun hardware. Andy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Failing disk(s) or controller in ZFS pool?
On one of our servers, we have a RAIDz1 ZFS pool called 'maths2' consisting of 7 x 300 Gb disks which in turn contains a single ZFS filesystem called 'home'. Yesterday, using the 'ls' command to list the directories within this pool caused the command to hang for a long period period followed by an 'i/o error' message. 'zpool status -x maths2' reports the pool is healthy but 'iostat -en' shows a rather different story: root@e450:~# iostat -en errors --- s/w h/w trn tot device 0 0 0 0 fd0 0 0 0 0 c2t3d0 0 0 0 0 c2t0d0 0 0 0 0 c2t1d0 0 0 0 0 c5t3d0 0 0 0 0 c4t0d0 0 0 0 0 c4t1d0 0 0 0 0 c2t2d0 0 0 0 0 c4t2d0 0 0 0 0 c4t3d0 0 0 0 0 c5t0d0 0 0 0 0 c5t1d0 0 0 0 0 c8t0d0 0 0 0 0 c8t1d0 0 0 0 0 c8t2d0 0 503 1658 2161 c9t0d0 0 2515 6260 8775 c9t1d0 0 0 0 0 c8t3d0 0 492 2024 2516 c9t2d0 0 444 1810 2254 c9t3d0 0 0 0 0 c5t2d0 0 1 0 1 rmt/2 Obviously it looks like controller c9 or the cabling associated with it is in trouble (the server is an Enterprise 450 with multiple disk controllers). On taking the server down and running the 'probe-scsi-all' command from the OBP, one disk c9t1d0 was reported as being faulty (no media present) but the others seemed fine. After booting back up, I started scrubbing the maths2 pool and for a long time, only disk c9t1d0 reported it was being repaired. After a few hours, another disk on this controller reported being repaired: NAMESTATE READ WRITE CKSUM maths2 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 21K repaired c9t1d0 ONLINE 0 0 0 938K repaired c9t2d0 ONLINE 0 0 0 c9t3d0 ONLINE 0 0 0 errors: No known data errors Now, does this point to a controller/cabling/backplane problem or could all 4 disks on this controller have been corrupted in some way? The O/S is Osol snv_134 for SPARC and the server has been up running for nearly a year with no problems to date - there are two other RAIDz1 pools on this server but these are working fine. Andy - Andy Thomas, Time Domain Systems Tel: +44 (0)7866 556626 Fax: +44 (0)20 8372 2582 http://www.time-domain.co.uk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Failing disk(s) or controller in ZFS pool?
On Tue, 14 Feb 2012, Richard Elling wrote: Hi Andy On Feb 14, 2012, at 10:37 AM, andy thomas wrote: On one of our servers, we have a RAIDz1 ZFS pool called 'maths2' consisting of 7 x 300 Gb disks which in turn contains a single ZFS filesystem called 'home'. Yesterday, using the 'ls' command to list the directories within this pool caused the command to hang for a long period period followed by an 'i/o error' message. 'zpool status -x maths2' reports the pool is healthy but 'iostat -en' shows a rather different story: root@e450:~# iostat -en errors --- s/w h/w trn tot device 0 0 0 0 fd0 0 0 0 0 c2t3d0 0 0 0 0 c2t0d0 0 0 0 0 c2t1d0 0 0 0 0 c5t3d0 0 0 0 0 c4t0d0 0 0 0 0 c4t1d0 0 0 0 0 c2t2d0 0 0 0 0 c4t2d0 0 0 0 0 c4t3d0 0 0 0 0 c5t0d0 0 0 0 0 c5t1d0 0 0 0 0 c8t0d0 0 0 0 0 c8t1d0 0 0 0 0 c8t2d0 0 503 1658 2161 c9t0d0 0 2515 6260 8775 c9t1d0 0 0 0 0 c8t3d0 0 492 2024 2516 c9t2d0 0 444 1810 2254 c9t3d0 0 0 0 0 c5t2d0 0 1 0 1 rmt/2 Obviously it looks like controller c9 or the cabling associated with it is in trouble (the server is an Enterprise 450 with multiple disk controllers). On taking the server down and running the 'probe-scsi-all' command from the OBP, one disk c9t1d0 was reported as being faulty (no media present) but the others seemed fine. We see similar symptoms when a misbehaving disk (usually SATA) disrupts the other disks in the same fault zone. OK, I will replace the disk. After booting back up, I started scrubbing the maths2 pool and for a long time, only disk c9t1d0 reported it was being repaired. After a few hours, another disk on this controller reported being repaired: NAMESTATE READ WRITE CKSUM maths2 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 21K repaired c9t1d0 ONLINE 0 0 0 938K repaired c9t2d0 ONLINE 0 0 0 c9t3d0 ONLINE 0 0 0 errors: No known data errors Now, does this point to a controller/cabling/backplane problem or could all 4 disks on this controller have been corrupted in some way? The O/S is Osol snv_134 for SPARC and the server has been up running for nearly a year with no problems to date - there are two other RAIDz1 pools on this server but these are working fine. Not likely. More likely the faulty disk causing issues elsewhere. It sems odd that 'zpool status' is not reporting a degraded status and 'zpool status -x' is still saying all pools are healthy. This is a little worrying as I use remote monitoring to keep an eye on all the servers I admin (many of which run Solaris, OpenIndiana and FreeBSD) and one thing that is checked every 15 minutes is the pool status using 'zpool status -x'. But this seems to result in a false sense of security and I could be blissfully unaware that half a pool has dropped out! NB, for file and RAID systems that do not use checksums, such corruptions can be catastrophic. Yea ZFS! Yes indeed! cheers, Andy - Andy Thomas, Time Domain Systems Tel: +44 (0)7866 556626 Fax: +44 (0)20 8372 2582 http://www.time-domain.co.uk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] need hint on pool setup
Bob, On 01/31/2012 09:54 PM, Bob Friesenhahn wrote: On Tue, 31 Jan 2012, Thomas Nau wrote: Dear all We have two JBODs with 20 or 21 drives available per JBOD hooked up to a server. We are considering the following setups: RAIDZ2 made of 4 drives RAIDZ2 made of 6 drives The first option wastes more disk space but can survive a JBOD failure whereas the second is more space effective but the system goes down when a JBOD goes down. Each of the JBOD comes with dual controllers, redundant fans and power supplies so do I need to be paranoid and use option #1? Of course it also gives us more IOPs but high end logging devices should take care of that I think that the answer depends on the impact to your business if data is temporarily not available. If your business can not survive data being temporarily not available (for hours or even a week) then the more conserative approach may be warranted. We are talking about home directories at a university so some downtime is ok but fore sure now hours or even days. We do regular backups plus snapshot send-receive to a remote location. The main thing I was wondering about is if it's better to have a downtime if a JBOD fails (rare I assume) or to keep going without any redundancy left. If you have a service contract which assures that a service tech will show up quickly with replacement hardware in hand, then this may also influence the decision which should be made. The replacement hardware is kind of on-site as we use it for the disaster recovery on the remote location Another consideration is that since these JBODs connect to a server, the data will also be unavailable when the server is down. The server being down may in fact be a more significant factor than a JBOD being down. I skipped that, sorry. Of course all JOBDs are connected through multiple SAS HBAs to two servers so server failure is easy to handle Thanks for the thoughts Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] need hint on pool setup
Dear all We have two JBODs with 20 or 21 drives available per JBOD hooked up to a server. We are considering the following setups: RAIDZ2 made of 4 drives RAIDZ2 made of 6 drives The first option wastes more disk space but can survive a JBOD failure whereas the second is more space effective but the system goes down when a JBOD goes down. Each of the JBOD comes with dual controllers, redundant fans and power supplies so do I need to be paranoid and use option #1? Of course it also gives us more IOPs but high end logging devices should take care of that Thanks for any hint Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Stress test zfs
Hi Grant On 01/06/2012 04:50 PM, Richard Elling wrote: Hi Grant, On Jan 4, 2012, at 2:59 PM, grant lowe wrote: Hi all, I've got a solaris 10 running 9/10 on a T3. It's an oracle box with 128GB memory RIght now oracle . I've been trying to load test the box with bonnie++. I can seem to get 80 to 90 K writes, but can't seem to get more than a couple K for writes. Any suggestions? Or should I take this to a bonnie++ mailing list? Any help is appreciated. I'm kinda new to load testing. I was hoping Roch (from Oracle) would respond, but perhaps he's not hanging out on zfs-discuss anymore? Bonnie++ sux as a benchmark. The best analysis of this was done by Roch and published online in the seminal blog post: http://137.254.16.27/roch/entry/decoding_bonnie I suggest you find a benchmark that more closely resembles your expected workload and do not rely on benchmarks that provide a summary metric. -- richard I had good experience with filebench. I resembles your workload as good as you are able to describe it but takes some time to get things setup if you cannot find your workload in one of the many provided examples Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SUNWsmbs SUNWsmbskr for Sparc OSOL snv_134?
Does anyone know where I can still find the SUNWsmbs and SUNWsmbskr packages for the Sparc version of OpenSolaris? I wanted to experiment with ZFS/CIFS on my Sparc server but the ZFS share command fails with: zfs set sharesmb=on tank1/windows cannot share 'tank1/windows': smb add share failed modinfo reports that the nsmb driver is loaded but I think smbsrv also needs to be loaded. The available documentation suggests that SUNWsmbs and SUNWsmbskr need to be installed. My system has SUNWsmbfskr installed and according to pkginfo this provides 'SMB/CIFS File System client support (Kernel)' - is this the same package as SUNWsmbskr? Thanks in adavnce for any suggestions, Andy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] does log device (ZIL) require a mirror setup?
Dear all We use a STEC ZeusRAM as a log device for a 200TB RAID-Z2 pool. As they are supposed to be read only after a crash or when booting and those nice things are pretty expensive I'm wondering if mirroring the log devices is a must / highly recommended Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance question over NFS
Hi Bob I don't know what the request pattern from filebench looks like but it seems like your ZEUS RAM devices are not keeping up or else many requests are bypassing the ZEUS RAM devices. Note that very large synchronous writes will bypass your ZEUS RAM device and go directly to a log in the main store. Small (= 128K) writes should directly benefit from the dedicated zil device. Find a copy of zilstat.ksh and run it while filebench is running in order to understand more about what is going on. Bob The pattern looks like: N-Bytes N-Bytes/s N-Max-RateB-Bytes B-Bytes/s B-Max-Rateops =4kB 4-32kB =32kB 958865695886569588656 88399872 88399872 88399872 90 0 0 90 666228066622806662280 87031808 87031808 87031808 83 0 0 83 636672863667286366728 72790016 72790016 72790016 79 0 0 79 631635263163526316352 83886080 83886080 83886080 80 0 0 80 668761666876166687616 84594688 84594688 84594688 92 0 0 92 490904849090484909048 69238784 69238784 69238784 73 0 0 73 660528066052806605280 81924096 81924096 81924096 79 0 0 79 689533668953366895336 81625088 81625088 81625088 85 0 0 85 653212865321286532128 87486464 87486464 87486464 90 0 0 90 692513669251366925136 86118400 86118400 86118400 83 0 0 83 So does it look good, bad or ugly ;) Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
You're probably hitting bug 7056738 - http://wesunsolve.net/bugid/id/7056738 Looks like it's not fixed yet @ oracle anyway... Were you using crypto on your datasets ? Regards, Thomas On Tue, 16 Aug 2011 09:33:34 -0700 (PDT) Stu Whitefish swhitef...@yahoo.com wrote: - Original Message - From: Alexander Lesle gro...@tierarzt-mueller.de To: zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 8:37:42 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! Hello Stu Whitefish and List, On August, 15 2011, 21:17 Stu Whitefish wrote in [1]: 7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a kernel panic, even when booted from different OS versions Right. I have tried OpenIndiana 151 and Solaris 11 Express (latest from Oracle) several times each as well as 2 new installs of Update 8. When I understand you right is your primary interest to recover your data on tank pool. Have you check the way to boot from a Live-DVD, mount your safe place and copy the data on a other machine? Hi Alexander, Yes of course...the problem is no version of Solaris can import the pool. Please refer to the first message in the thread. Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Gouverneur Thomas t...@ians.be ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance question over NFS
Dear all. We finally got all the parts for our new fileserver following several recommendations we got over this list. We use Dell R715, 96GB RAM, dual 8-core Opterons 1 10GE Intel dual-port NIC 2 LSI 9205-8e SAS controllers 2 DataON DNS-1600 JBOD chassis 46 Seagate constellation SAS drives 2 STEC ZEUS RAM The base zpool config utilizes 42 drives plus the STECs as mirrored log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2 junks plus as said a dedicated ZIL made of the mirrored STECs. As a quick'n dirty check we ran filebench with the fileserver workload. Running locally we get statfile15476ops/s 0.0mb/s 0.6ms/op 179us/op-cpu deletefile1 5476ops/s 0.0mb/s 1.0ms/op 454us/op-cpu closefile3 5476ops/s 0.0mb/s 0.0ms/op5us/op-cpu readfile15476ops/s 729.5mb/s 0.2ms/op 128us/op-cpu openfile25477ops/s 0.0mb/s 0.8ms/op 204us/op-cpu closefile2 5477ops/s 0.0mb/s 0.0ms/op5us/op-cpu appendfilerand1 5477ops/s 42.8mb/s 0.3ms/op 184us/op-cpu openfile15477ops/s 0.0mb/s 0.9ms/op 209us/op-cpu closefile1 5477ops/s 0.0mb/s 0.0ms/op6us/op-cpu wrtfile1 5477ops/s 688.4mb/s 0.4ms/op 220us/op-cpu createfile1 5477ops/s 0.0mb/s 2.7ms/op 1068us/op-cpu with a single remote client (similar Dell System) using NFS statfile1 90ops/s 0.0mb/s 27.6ms/op 145us/op-cpu deletefile190ops/s 0.0mb/s 64.5ms/op 401us/op-cpu closefile3 90ops/s 0.0mb/s 25.8ms/op 40us/op-cpu readfile1 90ops/s 11.4mb/s 3.1ms/op 363us/op-cpu openfile2 90ops/s 0.0mb/s 66.0ms/op 263us/op-cpu closefile2 90ops/s 0.0mb/s 22.6ms/op 124us/op-cpu appendfilerand190ops/s 0.7mb/s 0.5ms/op 101us/op-cpu openfile1 90ops/s 0.0mb/s 72.6ms/op 269us/op-cpu closefile1 90ops/s 0.0mb/s 43.6ms/op 189us/op-cpu wrtfile1 90ops/s 11.2mb/s 0.2ms/op 211us/op-cpu createfile190ops/s 0.0mb/s226.5ms/op 709us/op-cpu the same remote client with zpool sync disabled on the server statfile1 479ops/s 0.0mb/s 6.2ms/op 130us/op-cpu deletefile1 479ops/s 0.0mb/s 13.0ms/op 351us/op-cpu closefile3480ops/s 0.0mb/s 3.0ms/op 37us/op-cpu readfile1 480ops/s 62.7mb/s 0.8ms/op 174us/op-cpu openfile2 480ops/s 0.0mb/s 14.1ms/op 235us/op-cpu closefile2480ops/s 0.0mb/s 6.0ms/op 123us/op-cpu appendfilerand1 480ops/s 3.7mb/s 0.2ms/op 53us/op-cpu openfile1 480ops/s 0.0mb/s 13.7ms/op 235us/op-cpu closefile1480ops/s 0.0mb/s 11.1ms/op 190us/op-cpu wrtfile1 480ops/s 60.3mb/s 0.2ms/op 233us/op-cpu createfile1 480ops/s 0.0mb/s 35.6ms/op 683us/op-cpu Disabling ZIL is no option but I expected a much better performance especially the ZEUS RAM only gets us a speed-up of about 1.8x Is this test realistic for a typical fileserver scenario or does it require many more clients to push the limits? Thanks Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
Have you already extracted the core file of the kernel crash ? (and btw activated dump device for such dumping happen at next reboot...) Have you also tried applying the latest kernel/zfs patches and try importing the pool afterwards ? Thomas On 08/18/2011 06:40 PM, Stu Whitefish wrote: Hi Thomas, Thanks for that link. That's very similar but not identical. There's a different line number in zfs_ioctl.c, mine and Preston's fail on line 1815. It could be because of a difference in levels in that module of course, but the traceback is not identical either. Ours show brand_sysenter and the one you linked to shows brand_sys_syscall. I don't know what all that means but it is different. Anyway at least two of us have identical failures. I was not using crypto, just a plain jane mirror on 2 drives. Possibly I had compression on a few file systems but everything else was allowed to default. Here are our screenshots in case anybody doesn't want to go through the thread. http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ http://prestonconnors.com/zvol_get_stats.jpg I hope somebody can help with this. It's not a good feeling having so much data gone. Thanks for your help. Oracle, are you listening? Jim - Original Message - From: Thomas Gouverneurt...@ians.be To: zfs-discuss@opensolaris.org Cc: Stu Whitefishswhitef...@yahoo.com Sent: Thursday, August 18, 2011 1:57:29 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! You're probably hitting bug 7056738 - http://wesunsolve.net/bugid/id/7056738 Looks like it's not fixed yet @ oracle anyway... Were you using crypto on your datasets ? Regards, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance question over NFS
Tim the client is identical as the server but no SAS drives attached. Also right now only one 1gbit Intel NIC Is available Thomas Am 18.08.2011 um 17:49 schrieb Tim Cook t...@cook.ms: What are the specs on the client? On Aug 18, 2011 10:28 AM, Thomas Nau thomas@uni-ulm.de wrote: Dear all. We finally got all the parts for our new fileserver following several recommendations we got over this list. We use Dell R715, 96GB RAM, dual 8-core Opterons 1 10GE Intel dual-port NIC 2 LSI 9205-8e SAS controllers 2 DataON DNS-1600 JBOD chassis 46 Seagate constellation SAS drives 2 STEC ZEUS RAM The base zpool config utilizes 42 drives plus the STECs as mirrored log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2 junks plus as said a dedicated ZIL made of the mirrored STECs. As a quick'n dirty check we ran filebench with the fileserver workload. Running locally we get statfile1 5476ops/s 0.0mb/s 0.6ms/op 179us/op-cpu deletefile1 5476ops/s 0.0mb/s 1.0ms/op 454us/op-cpu closefile3 5476ops/s 0.0mb/s 0.0ms/op 5us/op-cpu readfile1 5476ops/s 729.5mb/s 0.2ms/op 128us/op-cpu openfile2 5477ops/s 0.0mb/s 0.8ms/op 204us/op-cpu closefile2 5477ops/s 0.0mb/s 0.0ms/op 5us/op-cpu appendfilerand1 5477ops/s 42.8mb/s 0.3ms/op 184us/op-cpu openfile1 5477ops/s 0.0mb/s 0.9ms/op 209us/op-cpu closefile1 5477ops/s 0.0mb/s 0.0ms/op 6us/op-cpu wrtfile1 5477ops/s 688.4mb/s 0.4ms/op 220us/op-cpu createfile1 5477ops/s 0.0mb/s 2.7ms/op 1068us/op-cpu with a single remote client (similar Dell System) using NFS statfile1 90ops/s 0.0mb/s 27.6ms/op 145us/op-cpu deletefile1 90ops/s 0.0mb/s 64.5ms/op 401us/op-cpu closefile3 90ops/s 0.0mb/s 25.8ms/op 40us/op-cpu readfile1 90ops/s 11.4mb/s 3.1ms/op 363us/op-cpu openfile2 90ops/s 0.0mb/s 66.0ms/op 263us/op-cpu closefile2 90ops/s 0.0mb/s 22.6ms/op 124us/op-cpu appendfilerand1 90ops/s 0.7mb/s 0.5ms/op 101us/op-cpu openfile1 90ops/s 0.0mb/s 72.6ms/op 269us/op-cpu closefile1 90ops/s 0.0mb/s 43.6ms/op 189us/op-cpu wrtfile1 90ops/s 11.2mb/s 0.2ms/op 211us/op-cpu createfile1 90ops/s 0.0mb/s 226.5ms/op 709us/op-cpu the same remote client with zpool sync disabled on the server statfile1 479ops/s 0.0mb/s 6.2ms/op 130us/op-cpu deletefile1 479ops/s 0.0mb/s 13.0ms/op 351us/op-cpu closefile3 480ops/s 0.0mb/s 3.0ms/op 37us/op-cpu readfile1 480ops/s 62.7mb/s 0.8ms/op 174us/op-cpu openfile2 480ops/s 0.0mb/s 14.1ms/op 235us/op-cpu closefile2 480ops/s 0.0mb/s 6.0ms/op 123us/op-cpu appendfilerand1 480ops/s 3.7mb/s 0.2ms/op 53us/op-cpu openfile1 480ops/s 0.0mb/s 13.7ms/op 235us/op-cpu closefile1 480ops/s 0.0mb/s 11.1ms/op 190us/op-cpu wrtfile1 480ops/s 60.3mb/s 0.2ms/op 233us/op-cpu createfile1 480ops/s 0.0mb/s 35.6ms/op 683us/op-cpu Disabling ZIL is no option but I expected a much better performance especially the ZEUS RAM only gets us a speed-up of about 1.8x Is this test realistic for a typical fileserver scenario or does it require many more clients to push the limits? Thanks Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Possible ZFS problem
We are using ZFS on a Sun E450 server (4 x 400 MHz CPU, 1 Gb memory, 18 Gb system disk and 19 x 300 Gb disks running OSOL snv 134) for archive storage where speed is not important. We have 2 RAID-Z1 pools of 8 disks plus one spare disk shared between the two pools and this has apparently worked well since it was set up several months ago. However, one of our users recently put a 35 Gb tar.gz file on this server and uncompressed it to a 215 Gb tar file. But when he tried to untar it, after about 43 Gb had been extracted we noticed the disk usage reported by df for that ZFS pool wasn't changing much. Using du -sm on the extracted archive directory showed that the size would increase over a period of 30 seconds or so and then suddenly drop back about 50 Mb and start increasing again. In other words it seems to be going into some sort of a loop and all we could do was to kill tar and try again when exactly the same thing happened after 43 Gb had been extracted. Thinking the tar file could be corrupt, we sucessfully untarred the file on a Linux system (1 Tb disk with plain ext3 filesystem). I suspect my problem may be due to limited memory on this system but are there any other things I should take into consideration? It's not a major problem as the system is intended for storage and users are not supposed to go in and untar huge tarfiles on it as it's not a fast system ;-) Andy Andy Thomas, Time Domain Systems Tel: +44 (0)7866 556626 Fax: +44 (0)20 8372 2582 http://www.time-domain.co.uk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Possible ZFS problem
On Sat, 13 Aug 2011, Bob Friesenhahn wrote: On Sat, 13 Aug 2011, andy thomas wrote: However, one of our users recently put a 35 Gb tar.gz file on this server and uncompressed it to a 215 Gb tar file. But when he tried to untar it, after about 43 Gb had been extracted we noticed the disk usage reported by df for that ZFS pool wasn't changing much. Using du -sm on the extracted archive directory showed that the size would increase over a period of 30 seconds or so and then suddenly drop back about 50 Mb and start increasing again. In other words it seems to be going into some sort of a loop and all we could do was to kill tar and try again when exactly the same thing happened after 43 Gb had been extracted. What 'tar' program were you using? Make sure to also try using the Solaris-provided tar rather than something like GNU tar. I was using GNU tar actually as the original archive was created on a Linux machine. I will try it again using Solaris tar. 1GB of memory is not very much for Solaris to use. A minimum of 2GB is recommended for zfs. We are going to upgrade the system to 4 Gb as soon as possible. Thanks for the quick response, Andy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Possible ZFS problem
On Sat, 13 Aug 2011, Joerg Schilling wrote: andy thomas a...@time-domain.co.uk wrote: What 'tar' program were you using? Make sure to also try using the Solaris-provided tar rather than something like GNU tar. I was using GNU tar actually as the original archive was created on a Linux machine. I will try it again using Solaris tar. GNU tar does not follow the standard when creating archives, so Sun tar may be unable to unpack the archive correctly. So it is GNU tar that is broken and not Solaris tar? I always thought it was the other way round. Thanks for letting me know. But GNU tar makes strange things when unpacking symlinks. I recommend to use star, it understands GNU tar archives. I've just installed this (version 1.5a78) from Sunfreeware and am having a play. Danke! Andy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 512b vs 4K sectors
Richard On 07/04/2011 03:58 PM, Richard Elling wrote: On Jul 4, 2011, at 6:42 AM, Lanky Doodle wrote: Hiya, I''ve been doing a lot of research surrounding this and ZFS, including some posts on here, though I am still left scratching my head. I am planning on using slow RPM drives for a home media server, and it's these that seem to 'suffer' from a few problems; Seagate Barracuda LP - Looks to be the only true 512b sector hard disk. Serious firmware issues Western Digital Cavier Green - 4K sectors = crap write performance Hitachi 5K3000 - Variable sector sizing (according to tech. specs) Samsung SpinPoint F4 - Just plain old problems with them What is the best drive of the above 4, and are 4K drives really a no-no with ZFS. Are there any alternatives in the same price bracket? 4K drives are fine, especially if the workload is read-mostly. Depending on the OS, you can tell ZFS to ignore the incorrect physical sector size reported by some drives. Today, this is easiest in FreeBSD, a little bit more tricky in OpenIndiana (patches and source are available for a few different implementations). Or you can just trick them out by starting the pool with a 4K sector device that doesn't lie (eg, iscsi target). Are you refering to the ahift patches and what do you mean by tricking them by using an iscsi target? Thanks, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] JBOD recommendation for ZFS usage
Dear all Sorry if it's kind of off-topic for the list but after talking to lots of vendors I'm running out of ideas... We are looking for JBOD systems which (1) hold 20+ 3.3 SATA drives (2) are rack mountable (3) have all the nive hot-swap stuff (4) allow 2 hosts to connect via SAS (4+ lines per host) and see all available drives as disks, no RAID volume. In a perfect world both hosts would connect each using two independent SAS connectors The box will be used in a ZFS Solaris/based fileserver in a fail-over cluster setup. Only one host will access a drive at any given time. It seems that a lot of vendors offer JBODs but so far I haven't found one in Germany which handles (4). Any hints? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD recommendation for ZFS usage
Thanks Jim and all the other who have replied so far On 05/30/2011 11:37 AM, Jim Klimov wrote: ... So if your application can live with the unit of failover being a bunch of 21 or 24 disks - that might be a way to go. However each head would only have one connection to each backplane, and I'm not sure if you can STONITH the non-leading head to enforce failovers (and enable the specific PRI/SEC chip of the backplane). That exactly my point. I don't need any internal failover which restricts which disks a host can see. We want to failover between hosts, not connections. For the later we would use another JBOD and let ZFS do the dirty job or mirroring. We run a similar setup for years but with FC connected RAID systems. Over time they are kind of limited when it comes to price/performance Also one point was stressed many times in the docs: these failover backplanes require use of SAS drives, no SATA (while the single-path BPs are okay with both SAS and SATA). Still, according to the forums, SATA disks on shared backplanes often give too much headache and may give too little performance in comparison... I would be fine with SAS as well Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz DEGRADED state
So there is no current way to specify the creation of a 3 disk raid-z array with a known missing disk? On 12/5/06, David Bustos david.bus...@sun.com wrote: Quoth Thomas Garner on Thu, Nov 30, 2006 at 06:41:15PM -0500: I currently have a 400GB disk that is full of data on a linux system. If I buy 2 more disks and put them into a raid-z'ed zfs under solaris, is there a generally accepted way to build an degraded array with the 2 disks, copy the data to the new filesystem, and then move the original disk to complete the array? No, because we currently can't add disks to a raidz array. You could create a mirror instead and then add in the other disk to make a three-way mirror, though. Even doing that would be dicey if you only have a single machine, though, since Solaris can't natively read the popular Linux filesystems. I believe there is freeware to do it, but nothing supported. David ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] nfs issues
I'm having some very strange nfs issues that are driving me somewhat mad. I'm running b134 and have been for months now, without issue. Recently i enabled 2 services to get bonjoir notificatons working in osx /network/dns/multicast:default /system/avahi-bridge-dsd:default and i added a few .service files to /etc/avahi/services/ ever since doing this, nfs is keeps crashing (i'd say every 30 minutes or so) and falling into the maintenance state, on top of that, i see a TON of core files in / bin core.mountd.1286564239 core.mountd.1286576077 core.mountd.1286579221 core.mountd.1286583281 core.mountd.1286583420 core.mountd.1286586498 etc lib net roottank bootcore.mountd.1286574167 core.mountd.1286576084 core.mountd.1286579228 core.mountd.1286583284 core.mountd.1286583423 core.mountd.1286586501 export lost+found opt rpool tmp cdrom core.mountd.1286574170 core.mountd.1286579150 core.mountd.1286583275 core.mountd.1286583355 core.mountd.1286586406 dev homemedia platform sbinusr core.mountd.1286564233 core.mountd.1286574173 core.mountd.1286579153 core.mountd.1286583278 core.mountd.1286583418 core.mountd.1286586409 devices kernel mnt proc running pstack on them shows: wonsl...@wonslung-raidz2:/tank/nas/dump/Done# pstack /core.mountd.1286564233 core '/core.mountd.1286564233' of 22940: /usr/lib/nfs/mountd - lwp# 1 / thread# 1 feeefd1b __lwp_park (fee12a00, 0, fe5f9588, 0) + b feee7beb mutex_lock_impl (fe5f9588, 0, 8047d58, 80e6f50, 80e7030, fe5f3000) + 163 feee7d28 mutex_lock (fe5f9588) + 10 fe5c3cf5 _svc_run_mt (fe5f4838, fe5f4848, fe5f4858, fe5f4858, 805552d, feeca118) + 69 fe5c38eb svc_run (1, 2328, 1, 0, 8047dfc, feffb804) + 77 0805552d main (1, 8047e40, 8047e48, 8047dfc) + 4f9 080548ed _start (1, 8047ee0, 0, 8047ef4, 8047f11, 8047f22) + 7d - lwp# 2 / thread# 2 feef4367 __pause (8, 200, 8, 5, fe000, fef82000) + 7 08054de7 nfsauth_svc (0, fef82000, fed4efe8, feeef9fe) + 3b feeefa53 _thrp_setup (fedf0a00) + 9b feeefce0 _lwp_start (fedf0a00, 0, 0, 0, 0, 0) - lwp# 3 / thread# 3 feef4367 __pause (8, 200, 8, 6, fe000, fef82000) + 7 08054e43 cmd_svc (0, fef82000, fe46efe8, feeef9fe) + 3b feeefa53 _thrp_setup (fedf1200) + 9b feeefce0 _lwp_start (fedf1200, 0, 0, 0, 0, 0) - lwp# 4 / thread# 4 08054f49 do_logging_queue (8071110, 806e888, fe36ffc8, 805501a) + 45 0805502e logging_svc (0, fef82000, fe36ffe8, feeef9fe) + 52 feeefa53 _thrp_setup (fedf1a00) + 9b feeefce0 _lwp_start (fedf1a00, 0, 0, 0, 0, 0) - lwp# 5 / thread# 5 feef4ca1 __door_return (fe270d2c, 8, 0, 0) + 21 08059280 nfsauth_func (0, fe270dc4, 3c, 0, 0, 8059108) + 178 feef4cbe __door_return () + 3e - lwp# 6 / thread# 6 feef4ca1 __door_return (0, 0, 0, 0) + 21 feedb63f door_create_func (0, fef82000, fe171fe8, feeef9fe) + 2f feeefa53 _thrp_setup (fedf2a00) + 9b feeefce0 _lwp_start (fedf2a00, 0, 0, 0, 0, 0) - lwp# 8 / thread# 8 feef4387 __pollsys (80ca168, 9, 0, 0, fe5f8e38, fef82000) + 7 fee987f4 poll (80ca168, 9, , fe5c3fc7) + 4c fe5c3e49 _svc_run_mt (0, fef82000, fdf73fe8, feeef9fe) + 1bd feeefa53 _thrp_setup (fedf3a00) + 9b feeefce0 _lwp_start (fedf3a00, 0, 0, 0, 0, 0) anyways, i am not an expert and don't really know how to troubleshoot this, so if someone could help, i'd really appreciate it ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Data transfer taking a longer time than expected (Possibly dedup related)
Hi all I'm currently moving a fairly big dataset (~2TB) within the same zpool. Data is being moved from a dataset to another, which has dedup enabled. The transfer started at quite a slow transfer speed — maybe 12MB/s. But it is now crawling to a near halt. Only 800GB has been moved in 48 hours. I looked for similar problems on the forums and other places, and it seems dedup needs a much bigger amount of RAM than the server currently has (3GB), to perform smoothly for such an operation. My question is, how can I gracefully stop the ongoing operation? What I did was simply mv temp/* new/ in an ssh session (which is still open). Can I disable dedup on the dataset while the transfer is going on? Can I simply Ctrl-C the procress to stop it? Shoul I be careful of anything? Help would be appreciated -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data transfer taking a longer time than expected (Possibly dedup related)
Thanks, I'm going to do that. I'm just worried about corrupting my data, or other problems. I wanted to make sure there is nothing I really should be careful with. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup and handling corruptions - impossible?
You are saying ZFS will detect and rectify this kind of corruption in a deduped pool automatically if enough redundancy is present? Can that fail sometimes? Under what conditions? I would hate to restore a 1.5TB pool from backup just because one 5MB file is gone bust. And I have a known good copy of the file. I raised a technical question and you are going all personal on me. -- This message posted from opensolaris.org zfs checksums every transaction. When you access a file, it checks that the checksums match. If they do not (corruption) and you have redundancy, it repairs the corruption. It can detect and corrupt corruption in this way. It didnt' seem like anyone got personal with you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Upgrade Nevada Kernel
you can upgrade by changing to the dev repositoryor if you don't mind re-installing you can download the b134 image at genunix http://www.genunix.org/ On Sat, Aug 21, 2010 at 1:25 AM, Long Tran opensolaris.stor...@gmail.comwrote: Hi, I hit ZFS bug that it would be resolve in latter snv 134 or latter. I'm running SNV111 How do I upgrade to latest version ? THanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Halcyon ZFS and system monitoring software for OpenSolaris (beta)
On Thu, Aug 19, 2010 at 4:33 PM, Mike Kirk mike.k...@halcyoninc.com wrote: Hi all, Halcyon recently started to add ZFS pool stats to our Solaris Agent, and because many people were interested in the previous OpenSolaris beta* we've rolled it into our OpenSolaris build as well. I've already heard some great feedback about supporting ZIL and ARC stats, which we're hoping to add soon. If you'd like to see what we have now, and maybe try it on your OpenSolaris system, please see the download/screenshot page here: http://forums.halcyoninc.com/showthread.php?p=1018 I know this isn't the best time to be posting about legacy OpenSolaris: we're keeping our eyes on Solaris 11 Express / Illumos and aim to support the more advanced features of Solaris 11 the day it's pushed out the door. Thanks for your time! Regards, Mike dot Kirk at HalcyonInc dot com I just tried this, and i'm getting an error on install. I've also posted in your forums but i thought perhaps someone else on list might know the solutions. anyways, I'm runniong Opensolaris b134, this is the error i receive Seeding the new agent ... ERROR: Failed to run command /opt/Neuron/bin/na usm-seed -s xxx agent. STDOUT/STDERR: /opt/Neuron/bin/na[1009]: eval: line 1: 6470: Memory fault(coredump) Moving log file /tmp/HALNeuronSolaris-install_20100820-29.log to /var/opt/Neuron/install/HALNeuronSolaris-install_20100820-29.log ... any help would be greatly appreciated, i really love the screenshots for this software. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] make df have accurate out upon zfs?
df serves a purpose though. There are other commands which output that information.. On Thu, Aug 19, 2010 at 3:01 PM, Fred Liu fred_...@issi.com wrote: Not sure if there was similar threads in this list before. Three scenarios: 1): df cannot count snapshot space in a file system with quota set. 2): df cannot count sub-filesystem space in a file system with quota set. 3): df cannot count space saved by de-dup in a file system with quota set. Are they possible? Btw, what is the difference between /usr/gnu/bin/df and /bin/df? Thanks. Fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] make df have accurate out upon zfs?
can't the zfs command provide that information? 2010/8/20 Fred Liu fred_...@issi.com Can you shed more lights on **other commands** which output that information? Appreciations. Fred *From:* Thomas Burgess [mailto:wonsl...@gmail.com] *Sent:* 星期五, 八月 20, 2010 17:34 *To:* Fred Liu *Cc:* ZFS Discuss *Subject:* Re: [zfs-discuss] make df have accurate out upon zfs? df serves a purpose though. There are other commands which output that information.. On Thu, Aug 19, 2010 at 3:01 PM, Fred Liu fred_...@issi.com wrote: Not sure if there was similar threads in this list before. Three scenarios: 1): df cannot count snapshot space in a file system with quota set. 2): df cannot count sub-filesystem space in a file system with quota set. 3): df cannot count space saved by de-dup in a file system with quota set. Are they possible? Btw, what is the difference between /usr/gnu/bin/df and /bin/df? Thanks. Fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] make df have accurate out upon zfs?
as for the difference between the two df's, one is the gnu df (liek you'd have on linux) and the other is the solaris df. 2010/8/20 Thomas Burgess wonsl...@gmail.com can't the zfs command provide that information? 2010/8/20 Fred Liu fred_...@issi.com Can you shed more lights on **other commands** which output that information? Appreciations. Fred *From:* Thomas Burgess [mailto:wonsl...@gmail.com] *Sent:* 星期五, 八月 20, 2010 17:34 *To:* Fred Liu *Cc:* ZFS Discuss *Subject:* Re: [zfs-discuss] make df have accurate out upon zfs? df serves a purpose though. There are other commands which output that information.. On Thu, Aug 19, 2010 at 3:01 PM, Fred Liu fred_...@issi.com wrote: Not sure if there was similar threads in this list before. Three scenarios: 1): df cannot count snapshot space in a file system with quota set. 2): df cannot count sub-filesystem space in a file system with quota set. 3): df cannot count space saved by de-dup in a file system with quota set. Are they possible? Btw, what is the difference between /usr/gnu/bin/df and /bin/df? Thanks. Fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] make df have accurate out upon zfs?
try something like zfs list -o space zfs -t snapshot stuff like that 2010/8/20 Fred Liu fred_...@issi.com Sure, I know this. What I want to say is following: r...@cn03:~# /usr/gnu/bin/df -h /cn03/3 FilesystemSize Used Avail Use% Mounted on cn03/3298G 154K 298G 1% /cn03/3 r...@cn03:~# /bin/df -h /cn03/3 Filesystem size used avail capacity Mounted on cn03/3 800G 154K 297G 1%/cn03/3 r...@cn03:~# zfs get all cn03/3 NAMEPROPERTY VALUE SOURCE cn03/3 type filesystem - cn03/3 creation Sat Jul 10 9:35 2010 - cn03/3 used 503G - cn03/3 available 297G - cn03/3 referenced 154K - cn03/3 compressratio 1.00x - cn03/3 mountedyes- cn03/3 quota 800G local cn03/3 reservationnone default cn03/3 recordsize 128K default cn03/3 mountpoint /cn03/3default cn03/3 sharenfs rw,root=nfsrootlocal cn03/3 checksum on default cn03/3 compressionoffdefault cn03/3 atime on default cn03/3 deviceson default cn03/3 exec on default cn03/3 setuid on default cn03/3 readonly offdefault cn03/3 zoned offdefault cn03/3 snapdirhidden default cn03/3 aclmodegroupmask default cn03/3 aclinherit restricted default cn03/3 canmount on default cn03/3 shareiscsi offdefault cn03/3 xattr on default cn03/3 copies 1 default cn03/3 version4 - cn03/3 utf8only off- cn03/3 normalization none - cn03/3 casesensitivitysensitive - cn03/3 vscan offdefault cn03/3 nbmand offdefault cn03/3 sharesmb offdefault cn03/3 refquota none default cn03/3 refreservation none default cn03/3 primarycache alldefault cn03/3 secondarycache alldefault cn03/3 usedbysnapshots46.8G - cn03/3 usedbydataset 154K - cn03/3 usedbychildren 456G - cn03/3 usedbyrefreservation 0 - cn03/3 logbiaslatencydefault cn03/3 dedup offdefault cn03/3 mlslabel none default cn03/3 com.sun:auto-snapshot true inherited from cn03 Thanks. Fred *From:* Thomas Burgess [mailto:wonsl...@gmail.com] *Sent:* 星期五, 八月 20, 2010 18:44 *To:* Fred Liu *Cc:* ZFS Discuss *Subject:* Re: [zfs-discuss] make df have accurate out upon zfs? as for the difference between the two df's, one is the gnu df (liek you'd have on linux) and the other is the solaris df. 2010/8/20 Thomas Burgess wonsl...@gmail.com can't the zfs command provide that information? 2010/8/20 Fred Liu fred_...@issi.com Can you shed more lights on **other commands** which output that information? Appreciations. Fred *From:* Thomas Burgess [mailto:wonsl...@gmail.com] *Sent:* 星期五, 八月 20, 2010 17:34 *To:* Fred Liu *Cc:* ZFS Discuss *Subject:* Re: [zfs-discuss] make df have accurate out upon zfs? df serves a purpose though. There are other commands which output that information.. On Thu, Aug 19, 2010 at 3:01 PM, Fred Liu fred_...@issi.com wrote: Not sure if there was similar threads in this list before. Three scenarios: 1): df cannot count snapshot space in a file system with quota set. 2): df cannot count sub-filesystem space in a file system with quota set. 3): df cannot count space saved by de-dup in a file system with quota set. Are they possible? Btw, what is the difference between /usr/gnu/bin/df and /bin/df? Thanks. Fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] lots of errors in logs?
I've been running opensolaris for months, and today while poking around, i noticed a ton of errors in my logs...I'm wondering what they mean and if it's anything to worry about I've found a few things on google but not a whole lotanyways, heres a pastie of the log http://pastie.org/1104916 any help would be greatly appreciated ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On Mon, Aug 16, 2010 at 11:17 PM, Frank Cusack frank+lists/z...@linetwo.netwrote: On 8/16/10 9:57 AM -0400 Ross Walker wrote: No, the only real issue is the license and I highly doubt Oracle will re-release ZFS under GPL to dilute it's competitive advantage. You're saying Oracle wants to keep zfs out of Linux? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss why would Oracle want ZFS in linux when it makes the value of Solaris greater? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raidz - what is stored in parity?
On Wed, Aug 11, 2010 at 12:57 AM, Peter Taps ptr...@yahoo.com wrote: Hi Eric, Thank you for your help. At least one part is clear now. I still am confused about how the system is still functional after one disk fails. Consider my earlier example of 3 disks zpool configured for raidz-1. To keep it simple let's not consider block sizes. Let's say I send a write value abcdef to the zpool. As the data gets striped, we will have 2 characters per disk. disk1 = ab + some parity info disk2 = cd + some parity info disk3 = ef + some parity info Now, if disk2 fails, I lost cd. How will I ever recover this? The parity info may tell me that something is bad but I don't see how my data will get recovered. The only good thing is that any newer data will now be striped over two disks. Perhaps I am missing some fundamental concept about raidz. Regards, Peter I find the best way to understand how parity works is to think back to your algebra class when you'd have something like 1x +2 = 3 and you could solve for xit's not EXACTLY like that but solving the parity stuff is similar to solving for x -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best usage of SSD-disk in ZFS system
On Fri, Aug 6, 2010 at 6:44 AM, P-O Yliniemi p...@bsd-guide.net wrote: Hello! I have built a OpenSolaris / ZFS based storage system for one of our customers. The configuration is about this: Motherboard/CPU: SuperMicro X7SBE / Xeon (something, sorry - can't remember and do not have my specification nearby) RAM: 8GB ECC (X7SBE won't take more) Drives for storage: 16*1.5TB Seagate ST31500341AS, connected to two AOC-SAT2-MV8 controllers Drives for operating system: 2*80GB Intel X25-M (mirror) ZFS configuration: Two vdevs, raid-z of 7+1 disks per set, striped together (gives a zpool with about 21TB storage space) Disk performance: around 700-800MB/s, tested and timed with 'mkfile' and 'time' (a 40GB file is created in just about a minute) I have a spare X25-M drive of 40GB to use for cache or log (or both), but since the disk array is a lot faster than the SSD-disk, I can not see the advantage in using it as a cache device. Is there any advantages for using a separate log or cache device in this case ? Regards, PeO ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss I can tell you for sure that there can be a really nice advantage for sequential writes. to see this yourself, do the following: create a filesystem, share it out NFS create a really big tar.gz file and put it in the filessytem log in from a network client via nfs and extract the tar.ball using something like: time tar xzfv some.tar.gz do this a few times to get an average, then add the SSD as a log device. I have the exact same motherboard with a very similar setup, and i noticed a 400% nfs performance boost by doing this. try it yourself =) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused about consumer drives and zfs can someone help?
I've found the Seagate 7200.12 1tb drives and Hitachi 7k2000 2TB drives to be by far the best. I've read lots of horror stories about any WD drive with 4k sectorsit'sbest to stay away from them. I've also read plenty of people say that the green drives are terrible. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC and ZIL on same SSD?
On Wed, Jul 21, 2010 at 12:42 PM, Orvar Korvar knatte_fnatte_tja...@yahoo.com wrote: Are there any drawbacks to partition a SSD in two parts and use L2ARC on one partition, and ZIL on the other? Any thoughts? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss It's not going to be as good as having separate but i can tell you that i did this on my home system and it was WELL worth it. I used one of the sandforce 1500 based SSD's 50 gb i used 9 gb for ZIL, and the rest for L2ARC. adding the zil gave me about 400-500% nfs write performance. Seeing as you can't ever use more than half your ram for ZIL anyways, the only real downside to doing this is that i/o becomes split between zil and L2arc but realistically it depends on your workloadfor mine, i noticed a HUGE benefit from doing this. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance?
On Fri, Jul 23, 2010 at 3:11 AM, Sigbjorn Lie sigbj...@nixtra.com wrote: Hi, I've been searching around on the Internet to fine some help with this, but have been unsuccessfull so far. I have some performance issues with my file server. I have an OpenSolaris server with a Pentium D 3GHz CPU, 4GB of memory, and a RAIDZ1 over 4 x Seagate (ST31500341AS) 1,5TB SATA drives. If I compile or even just unpack a tar.gz archive with source code (or any archive with lots of small files), on my Linux client onto a NFS mounted disk to the OpenSolaris server, it's extremely slow compared to unpacking this archive on the locally on the server. A 22MB .tar.gz file containng 7360 files takes 9 minutes and 12seconds to unpack over NFS. Unpacking the same file locally on the server is just under 2 seconds. Between the server and client I have a gigabit network, which at the time of testing had no other significant load. My NFS mount options are: rw,hard,intr,nfsvers=3,tcp,sec=sys. Any suggestions to why this is? Regards, Sigbjorn ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss as someone else said, adding an ssd log device can help hugely. I saw about a 500% nfs write increase by doing this. I've heard of people getting even more. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance?
On Fri, Jul 23, 2010 at 5:00 AM, Sigbjorn Lie sigbj...@nixtra.com wrote: I see I have already received several replies, thanks to all! I would not like to risk losing any data, so I believe a ZIL device would be the way for me. I see these exists in different prices. Any reason why I would not buy a cheap one? Like the Intel X25-V SSD 40GB 2,5? What size of ZIL device would be recommened for my pool consisting for 4 x 1,5TB drives? Any brands I should stay away from? Regards, Sigbjorn Like i said, i bought a 50 gb OCZ Vertex Limited Edition...it's like 200 dollars, up to 15,000 random iops (iops is what you want for fast zil) I've gotten excelent performance out of it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Maximum zfs send/receive throughput
On 25.06.2010 14:32, Mika Borner wrote: It seems we are hitting a boundary with zfs send/receive over a network link (10Gb/s). We can see peak values of up to 150MB/s, but on average about 40-50MB/s are replicated. This is far away from the bandwidth that a 10Gb link can offer. Is it possible, that ZFS is giving replication a too low priority/throttling it too much? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss you can probably improve overall performance by using mbuffer [1] to stream the data over the network. At least some people have reported increased performance. mbuffer will buffer the datastream and disconnect zfs send operations from network latencies. Get it there: original source: http://www.maier-komor.de/mbuffer.html binary package: http://www.opencsw.org/packages/CSWmbuffer/ - Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ Vertex 2 Pro performance numbers
Conclusion: This device will make an excellent slog device. I'll order them today ;) I have one and i love it...I sliced it though, used 9 gb for ZIL and the rest for L2ARC (my server is on a smallish network with about 10 clients) It made a huge difference in NFS performance and other stuff as well (for instance, doing something like du will run a TON faster than before) For the money, it's a GREAT deal. I am very impressed --Arne ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Erratic behavior on 24T zpool
On Fri, Jun 18, 2010 at 4:42 AM, Pasi Kärkkäinen pa...@iki.fi wrote: On Fri, Jun 18, 2010 at 01:26:11AM -0700, artiepen wrote: Well, I've searched my brains out and I can't seem to find a reason for this. I'm getting bad to medium performance with my new test storage device. I've got 24 1.5T disks with 2 SSDs configured as a zil log device. I'm using the Areca raid controller, the driver being arcmsr. Quad core AMD with 16 gig of RAM OpenSolaris upgraded to snv_134. The zpool has 2 11-disk raidz2's and I'm getting anywhere between 1MB/sec to 40MB/sec with zpool iostat. On average, though it's more like 5MB/sec if I watch while I'm actively doing some r/w. I know that I should be getting better performance. How are you measuring the performance? Do you understand raidz2 with that big amount of disks in it will give you really poor random write performance? -- Pasi i have a media server with 2 raidz2 vdevs 10 drives wide myself without a ZIL (but with a 64 gb l2arc) I can write to it about 400 MB/s over the network, and scrubs show 600 MB/s but it really depends on the type of i/o you haverandom i/o across 2 vdevs will be REALLY slow (as slow as the slowest 2 drives in your pool basically) 40 MB/s might be right if it's randomthough i'd still expect to see more. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Erratic behavior on 24T zpool
On Fri, Jun 18, 2010 at 6:34 AM, Curtis E. Combs Jr. ceco...@uga.eduwrote: Oh! Yes. dedup. not compression, but dedup, yes. dedup may be your problem...it requires some heavy ram and/or decent L2ARC from what i've been reading. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool is wrong size in b134
Also, the disks were replaced one at a time last year from 73GB to 300GB to increase the size of the pool. Any idea why the pool is showing up as the wrong size in b134 and have anything else to try? I don't want to upgrade the pool version yet and then not be able to revert back... thanks, Ben ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss sometimes when you upgrade a pool by replacing drives with bigger ones, you have to export the pool, then import it. Or at least that's what i've always done ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
On Mon, Jun 14, 2010 at 4:41 AM, Arne Jansen sensi...@gmx.net wrote: Hi, I known it's been discussed here more than once, and I read the Evil tuning guide, but I didn't find a definitive statement: There is absolutely no sense in having slog devices larger than then main memory, because it will never be used, right? ZFS will rather flush the txg to disk than reading back from zil? So there is a guideline to have enough slog to hold about 10 seconds of zil, but the absolute maximum value is the size of main memory. Is this correct? I thought it was half the size of memory. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] panic after zfs mount
Dear all We ran into a nasty problem the other day. One of our mirrored zpool hosts several ZFS filesystems. After a reboot (all FS mounted at that time an in use) the machine paniced (console output further down). After detaching one of the mirrors the pool fortunately imported automatically in a faulted state without mounting the filesystems. Offling the unplugged device and clearing the fault allowed us to disable auto-mounting the filesystems. Going through them one by one all but one mounted OK. The one again triggered a panic. We left mounting on that one disabled for now to be back in production after pulling data from the backup tapes. Scrubbing didn't show any error so any idea what's behind the problem? Any chance to fix the FS? Thomas --- panic[cpu3]/thread=ff0503498400: BAD TRAP: type=e (#pf Page fault) rp=ff001e937320 addr=20 occurred in module zfs due to a NULL pointer dereference zfs: #pf Page fault Bad kernel fault at addr=0x20 pid=27708, pc=0xf806b348, sp=0xff001e937418, eflags=0x10287 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 20cr3: 4194a7000cr8: c rdi: ff0503aaf9f0 rsi:0 rdx:0 rcx: 155cda0b r8: eaa325f0 r9: ff001e937480 rax: 7ff rbx:0 rbp: ff001e937460 r10: 7ff r11:0 r12: ff0503aaf9f0 r13: ff0503aaf9f0 r14: ff001e9375d0 r15: ff001e937610 fsb:0 gsb: ff04e7e5c040 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:0 rip: f806b348 cs: 30 rfl:10287 rsp: ff001e937418 ss: 38 ff001e937200 unix:die+dd () ff001e937310 unix:trap+177e () ff001e937320 unix:cmntrap+e6 () ff001e937460 zfs:zap_leaf_lookup_closest+40 () ff001e9374f0 zfs:fzap_cursor_retrieve+c9 () ff001e9375b0 zfs:zap_cursor_retrieve+19a () ff001e937780 zfs:zfs_purgedir+4c () ff001e9377d0 zfs:zfs_rmnode+52 () ff001e937810 zfs:zfs_zinactive+b5 () ff001e937860 zfs:zfs_inactive+ee () ff001e9378b0 genunix:fop_inactive+af () ff001e9378d0 genunix:vn_rele+5f () ff001e937ac0 zfs:zfs_unlinked_drain+af () ff001e937af0 zfs:zfsvfs_setup+fb () ff001e937b50 zfs:zfs_domount+16a () ff001e937c70 zfs:zfs_mount+1e4 () ff001e937ca0 genunix:fsop_mount+21 () ff001e937e00 genunix:domount+ae3 () ff001e937e80 genunix:mount+121 () ff001e937ec0 genunix:syscall_ap+8c () ff001e937f10 unix:brand_sys_sysenter+1eb () - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ cifs-discuss mailing list cifs-disc...@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/cifs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic after zfs mount
Thanks for the link Arne. On 06/13/2010 03:57 PM, Arne Jansen wrote: Thomas Nau wrote: Dear all We ran into a nasty problem the other day. One of our mirrored zpool hosts several ZFS filesystems. After a reboot (all FS mounted at that time an in use) the machine paniced (console output further down). After detaching one of the mirrors the pool fortunately imported automatically in a faulted state without mounting the filesystems. Offling the unplugged device and clearing the fault allowed us to disable auto-mounting the filesystems. Going through them one by one all but one mounted OK. The one again triggered a panic. We left mounting on that one disabled for now to be back in production after pulling data from the backup tapes. Scrubbing didn't show any error so any idea what's behind the problem? Any chance to fix the FS? We had the same problem. Victor pointed my to http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6742788 with a workaround to mount the filesystem read-only to save the data. I still hope to figure out the chain of events that causes this. Did you use any extended attributes on this filesystem? -- Arne To my knowledge we haven't used any extended attributes but I'll double check after mounting the filesystem read-only. As it's one that's exported using Samba it might be indeed the case. For sure a lot of ACLs are used Thomas - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic after zfs mount
Arne, On 06/13/2010 03:57 PM, Arne Jansen wrote: Thomas Nau wrote: Dear all We ran into a nasty problem the other day. One of our mirrored zpool hosts several ZFS filesystems. After a reboot (all FS mounted at that time an in use) the machine paniced (console output further down). After detaching one of the mirrors the pool fortunately imported automatically in a faulted state without mounting the filesystems. Offling the unplugged device and clearing the fault allowed us to disable auto-mounting the filesystems. Going through them one by one all but one mounted OK. The one again triggered a panic. We left mounting on that one disabled for now to be back in production after pulling data from the backup tapes. Scrubbing didn't show any error so any idea what's behind the problem? Any chance to fix the FS? We had the same problem. Victor pointed my to http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6742788 with a workaround to mount the filesystem read-only to save the data. I still hope to figure out the chain of events that causes this. Did you use any extended attributes on this filesystem? -- Arne Mounting the FS read-only worked, thanks again. I checked the attributes and the set for all files is: {archive,nohidden,noreadonly,nosystem,noappendonly,nonodump,noimmutable,av_modified,noav_quarantined,nonounlink} so just the default ones Thomas - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reconfiguring a RAID-Z dataset
Yeah, this is what I was thinking too... Is there anyway to retain snapshot data this way? I've read about the ZFS replay/mirror features, but my impression was that this was more so for a development mirror for testing rather than a reliable backup? This is the only way I know of that one could do something like this. Is there some other way to create a solid clone, particularly with a machine that won't have the same drive configuration? I recently used zfs send/recv to copy a bunch of datasets from a raidz2 box to a box made on mirrors. It works fine. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reconfiguring a RAID-Z dataset
On Sun, Jun 13, 2010 at 12:18 AM, Joe Auty j...@netmusician.org wrote: Thomas Burgess wrote: Yeah, this is what I was thinking too... Is there anyway to retain snapshot data this way? I've read about the ZFS replay/mirror features, but my impression was that this was more so for a development mirror for testing rather than a reliable backup? This is the only way I know of that one could do something like this. Is there some other way to create a solid clone, particularly with a machine that won't have the same drive configuration? I recently used zfs send/recv to copy a bunch of datasets from a raidz2 box to a box made on mirrors. It works fine. ZFS send/recv looks very cool and very convenient. I wonder what it was that I read that suggested not relying on it for backups? Maybe this was alluding to the notion that like relying on RAID for a backup, if there is corruption your mirror (i.e. machine you are using with zfs recv) will be corrupted too? At any rate, thanks for answering this question! At some point if I go this route I'll test send and recv functionality to give all of this a dry run. well, it's not considered to be an enterprise ready backup solution I think this is due to the fact that you can't recover a single file from a zfs send stream but despite this limitation it's still VERY handy. Another reason, from what i understand by reading this list, is that the zfs send streams aren't resilient. If you do not pipe it directly into a zfs receive, it might get corrupted and be worthless(basically don't save the output of zfs send and expect to receive it later) again, this is not relevant if you are doing a zfs send into a zfs receive at the other end I think the 2 reasons i just gave are the reasons people have warned against it...but still, it's damn amazing. -- Joe Auty, NetMusician NetMusician helps musicians, bands and artists create beautiful, professional, custom designed, career-essential websites that are easy to maintain and to integrate with popular social networks. www.netmusician.org j...@netmusician.org nmtwitter.png___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs corruptions in pool
On 06.06.2010 08:06, devsk wrote: I had an unclean shutdown because of a hang and suddenly my pool is degraded (I realized something is wrong when python dumped core a couple of times). This is before I ran scrub: pool: mypool state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h7m with 0 errors on Mon May 31 09:00:27 2010 config: NAMESTATE READ WRITE CKSUM mypool DEGRADED 0 0 0 c6t0d0s0 DEGRADED 0 0 0 too many errors errors: Permanent errors have been detected in the following files: mypool/ROOT/May25-2010-Image-Update:0x3041e mypool/ROOT/May25-2010-Image-Update:0x31524 mypool/ROOT/May25-2010-Image-Update:0x26d24 mypool/ROOT/May25-2010-Image-Update:0x37234 //var/pkg/download/d6/d6be0ef348e3c81f18eca38085721f6d6503af7a mypool/ROOT/May25-2010-Image-Update:0x25db3 //var/pkg/download/cb/cbb0ff02bcdc6649da3763900363de7cff78ec72 mypool/ROOT/May25-2010-Image-Update:0x26cf6 I ran scrub and this is what it has to say afterwards. pool: mypool state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scan: scrub repaired 0 in 0h11m with 0 errors on Sat Jun 5 22:43:54 2010 config: NAMESTATE READ WRITE CKSUM mypool DEGRADED 0 0 0 c6t0d0s0 DEGRADED 0 0 0 too many errors errors: No known data errors Few of questions: 1. Have the errors really gone away? Can I just clear and be content that errors are really gone? 2. Why did the errors occur anyway if ZFS guarantees on-disk consistency? I wasn't writing anything. Those files were definitely not being touched when the hang and unclean shutdown happened. I mean I don't mind if I create or modify a file and it doesn't land on disk because on unclean shutdown happened but a bunch of unrelated files getting corrupted, is sort of painful to digest. 3. The action says Determine if the device needs to be replaced. How the heck do I do that? Is it possible that this system runs on a virtual box? At least I've seen such a thing happen on a Virtual Box but never on a real machine. The reason why the error have gone away might be that meta data has three copies IIRC. So if your disk only had corruptions in the meta data area these errors can be repaired by scrubbing the pool. The smartmontools might help you figuring out if the disk is broken. But if you only had an unexpected shutdown and now everything is clean after a scrub, I wouldn't expect the disk to be broken. You can get the smartmontools from opencsw.org. If your system is really running on a Virtual Box I'd recommend that you turn of disk write caching of Virtual Box. Search the OpenSolaris forum of Virtual Box. There is an article somewhere how to do this. IIRC the subject is somethink like 'zfs pool curruption'. But it is also somewhere in the docs. HTH, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots, txgs and performance
Very interesting. This could be useful for a number of us. Would you be willing to share your work? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ideal SATA/SAS Controllers for ZFS
On Wed, May 26, 2010 at 5:47 PM, Brandon High bh...@freaks.com wrote: On Sat, May 15, 2010 at 4:01 AM, Marc Bevand m.bev...@gmail.com wrote: I have done quite some research over the past few years on the best (ie. simple, robust, inexpensive, and performant) SATA/SAS controllers for ZFS. I've spent some time looking at the capabilities of a few controllers based on the questions about the SiI3124 and PMP support. According to the docs, the Marvell 88SX6081 driver doesn't support NCQ or PMP, though the card does. While I'm not really performance bound on my system, I imagine NCQ would help performance a bit, at least for scrubs or resilvers. Even more so because I'm using the slow WD10EADS drives. This raises the question of whether a SAS controller supports NCQ for sata drives. Would an LSI 1068e based controller? What about a LSI 2008 based card? If that is the chip on the AOC-SAT2-MV8 then i'm pretty sure it does suppoer NCQ I'm also pretty sure the LSI supports NCQ I'm not 100% sure though ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ideal SATA/SAS Controllers for ZFS
I thought it didI couldn't imagine sun using that chip in the original thumper if it didn't suppoer NCQalso, i've read where people have had to DISABLE ncq on this driver to fix one bug or another (as a work around) On Wed, May 26, 2010 at 8:40 PM, Marty Faltesek marty.falte...@oracle.comwrote: On Wed, 2010-05-26 at 17:18 -0700, Brandon High wrote: If that is the chip on the AOC-SAT2-MV8 then i'm pretty sure it does suppoer NCQ Not according to the driver documentation: http://docs.sun.com/app/docs/doc/819-2254/marvell88sx-7d In addition, the 88SX6081 device supports the SATA II Phase 1.0 specification features, including SATA II 3.0 Gbps speed, SATA II Port Multiplier functionality and SATA II Port Selector. Currently the driver does not support native command queuing, port multiplier or port selector functionality. The driver source isn't available (or I couldn't find it) so it's not easy to confirm. marvell88sx does support NCQ. This man page error was corrected in nevada build 138. Marty ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] question about zpool iostat output
I was just wondering: I added a SLOG/ZIL to my new system today...i noticed that the L2ARC shows up under it's own headingbut the SLOG/ZIL doesn'tis this correct? see: capacity operationsbandwidth poolalloc free read write read write -- - - - - - - rpool 15.3G 44.2G 0 0 0 0 c6t4d0s0 15.3G 44.2G 0 0 0 0 -- - - - - - - tank10.9T 7.22T 0 2.43K 0 300M raidz210.9T 7.22T 0 2.43K 0 300M c4t6d0 - - 0349 0 37.6M c4t5d0 - - 0350 0 37.6M c5t7d0 - - 0350 0 37.6M c5t3d0 - - 0350 0 37.6M c8t0d0 - - 0354 0 37.6M c4t7d0 - - 0351 0 37.6M c4t3d0 - - 0350 0 37.6M c5t8d0 - - 0349 0 37.6M c5t0d0 - - 0348 0 37.6M c8t1d0 - - 0353 0 37.6M c6t5d0s0 0 8.94G 0 0 0 0 cache - - - - - - c6t5d0s1 37.5G 0 0158 0 19.6M It seems sort of strange to me that it doesn't look like this instead: capacity operationsbandwidth poolalloc free read write read write -- - - - - - - rpool 15.3G 44.2G 0 0 0 0 c6t4d0s0 15.3G 44.2G 0 0 0 0 -- - - - - - - tank10.9T 7.22T 0 2.43K 0 300M raidz210.9T 7.22T 0 2.43K 0 300M c4t6d0 - - 0349 0 37.6M c4t5d0 - - 0350 0 37.6M c5t7d0 - - 0350 0 37.6M c5t3d0 - - 0350 0 37.6M c8t0d0 - - 0354 0 37.6M c4t7d0 - - 0351 0 37.6M c4t3d0 - - 0350 0 37.6M c5t8d0 - - 0349 0 37.6M c5t0d0 - - 0348 0 37.6M c8t1d0 - - 0353 0 37.6M log - - - - - - c6t5d0s0 0 8.94G 0 0 0 0 cache - - - - - - c6t5d0s1 37.5G 0 0158 0 19.6M ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] USB Flashdrive as SLOG?
The last couple times i've read this questions, people normally responded with: It depends you might not even NEED a slog, there is a script floating around which can help determine that... If you could benefit from one, it's going to be IOPS which help youso if the usb drive has more iops than your pool configuration does, then it might give some benefit.but then again, usb might not be as safe either, and if an older version you may want to mirror it. On Tue, May 25, 2010 at 8:11 AM, Kyle McDonald kmcdon...@egenera.comwrote: Hi, I know the general discussion is about flash SSD's connected through SATA/SAS or possibly PCI-E these days. So excuse me if I'm askign something that makes no sense... I have a server that can hold 6 U320 SCSI disks. Right now I put in 5 300GB for a data pool, and 1 18GB for the root pool. I've been thinking lately that I'm not sure I like the root pool being unprotected, but I can't afford to give up another drive bay. So recently the idea occurred to me to go the other way. If I were to get 2 USB Flash Thunb drives say 16 or 32 GB each, not only would i be able to mirror the root pool, but I'd also be able to put a 6th 300GB drive into the data pool. That led me to wonder whether partitioning out 8 or 12 GB on a 32GB thumb drive would be beneficial as an slog?? I bet the USB bus won't be as good as SATA or SAS, but will it be better than the internal ZIL on the U320 drives? This seems like at least a win-win, and possibly a win-win-win. Is there some other reason I'm insane to consider this? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] can you recover a pool if you lose the zil (b134+)
Is there a best practice on keeping a backup of the zpool.cache file? Is it possible? Does it change with changes to vdevs? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] question about zpool iostat output
i am running the last release from the genunix page uname -a output: SunOS wonslung-raidz2 5.11 snv_134 i86pc i386 i86pc Solaris On Tue, May 25, 2010 at 10:33 AM, Cindy Swearingen cindy.swearin...@oracle.com wrote: Hi Thomas, This looks like a display bug. I'm seeing it too. Let me know which Solaris release you are running and I will file a bug. Thanks, Cindy On 05/25/10 01:42, Thomas Burgess wrote: I was just wondering: I added a SLOG/ZIL to my new system today...i noticed that the L2ARC shows up under it's own headingbut the SLOG/ZIL doesn'tis this correct? see: capacity operationsbandwidth poolalloc free read write read write -- - - - - - - rpool 15.3G 44.2G 0 0 0 0 c6t4d0s0 15.3G 44.2G 0 0 0 0 -- - - - - - - tank10.9T 7.22T 0 2.43K 0 300M raidz210.9T 7.22T 0 2.43K 0 300M c4t6d0 - - 0349 0 37.6M c4t5d0 - - 0350 0 37.6M c5t7d0 - - 0350 0 37.6M c5t3d0 - - 0350 0 37.6M c8t0d0 - - 0354 0 37.6M c4t7d0 - - 0351 0 37.6M c4t3d0 - - 0350 0 37.6M c5t8d0 - - 0349 0 37.6M c5t0d0 - - 0348 0 37.6M c8t1d0 - - 0353 0 37.6M c6t5d0s0 0 8.94G 0 0 0 0 cache - - - - - - c6t5d0s1 37.5G 0 0158 0 19.6M It seems sort of strange to me that it doesn't look like this instead: capacity operationsbandwidth poolalloc free read write read write -- - - - - - - rpool 15.3G 44.2G 0 0 0 0 c6t4d0s0 15.3G 44.2G 0 0 0 0 -- - - - - - - tank10.9T 7.22T 0 2.43K 0 300M raidz210.9T 7.22T 0 2.43K 0 300M c4t6d0 - - 0349 0 37.6M c4t5d0 - - 0350 0 37.6M c5t7d0 - - 0350 0 37.6M c5t3d0 - - 0350 0 37.6M c8t0d0 - - 0354 0 37.6M c4t7d0 - - 0351 0 37.6M c4t3d0 - - 0350 0 37.6M c5t8d0 - - 0349 0 37.6M c5t0d0 - - 0348 0 37.6M c8t1d0 - - 0353 0 37.6M log - - - - - - c6t5d0s0 0 8.94G 0 0 0 0 cache - - - - - - c6t5d0s1 37.5G 0 0158 0 19.6M ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] questions about zil
On Tue, May 25, 2010 at 11:27 AM, Edward Ned Harvey solar...@nedharvey.comwrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Nicolas Williams I recently got a new SSD (ocz vertex LE 50gb) It seems to work really well as a ZIL performance wise. I know it doesn't have a supercap so lets' say dataloss occursis it just dataloss or is it pool loss? Just dataloss. WRONG! The correct answer depends on your version of solaris/opensolaris. More specifically, it depends on the zpool version. The latest fully updated sol10 and the latest opensolaris release (2009.06) only go up to zpool 14 or 15. But in zpool 19 is when a ZIL loss doesn't permanently offline the whole pool. I know this is available in the developer builds. The best answer to this, I think, is in the ZFS Best Practices Guide: (uggh, it's down right now, so I can't paste the link) If you have zpool 19, and you lose an unmirrored ZIL, then you lose your pool. Also, as a configurable option apparently, I know on my systems, it also meant I needed to power cycle. If you have zpool =19, and you lose an unmirrored ZIL, then performance will be degraded, but everything continues to work as normal. Apparently the most common mode of failure for SSD's is also failure to read. To make it worse, a ZIL is only read after system crash, which means the possibility of having a failed SSD undetected must be taken into consideration. If you do discover a failed ZIL after crash, with zpool 19 your pool is lost. But with zpool =19 only the unplayed writes are lost. With zpool =19, your pool will be intact, but you would lose up to 30sec of writes that occurred just before the crash. I didn't ask about losing my zil. I asked about power loss taking out my pool. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] questions about zil
At least to me, this was not clearly not asking about losing zil and was not clearly asking about power loss. Sorry for answering the question you thought you didn't ask. I was only responding to your response of WRONG!!! The guy wasn't wrong in regards to my questions. I'm sorry for not making THAT more clear in my post. I would suggest clarifying your question, by saying instead: so lets' say *power*loss occurs Then it would have been clear what you were asking. I'm pretty sure i did ask about power lossor at least it was implied by my point about the UPS. You're right, i probably should have been a little more clear. Since this is a SSD you're talking about, unless you have enabled nonvolatile write cache on that disk (which you should never do), and the disk incorrectly handles cache flush commands (which it should never do), then the supercap is irrelevant. All ZIL writes are to be done synchronously. This SSD doesn't use nonvolatile write cache (at least i don't think it does, it's a SF-1500 based ssd) I might be wrong about this, but i thought one of the biggest things about the sandforce was that it doesn't use DRAM If you have a power loss, you don't lose your pool, and you also don't lose any writes in the ZIL. You do, however, lose any async writes that were not yet flushed to disk. There is no way to prevent that, regardless of ZIL configuration. Yes, I know that i lose async writesi just wasn't sure if that resulted in an issue...I might be somewhat confused to how the ZIL works but i thought the point of the ZIL was to pretend a write actually happened when it may not have actually been flushed to disk yet...in this case, a write to the zil might not make it to diski just didn't know if this could result in a loss of a pool due to some sort of corruption of the uberblock or something.I'm not entirely up to speed on the voodoo that is ZFS. I wasn't trying to be rude, sorry if it came off like that. I am aware of the issue regarding removing the ZIL on non-dev versions of opensolarisi am on b134 so that doesnt' apply to me. Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] questions about zil
On Tue, May 25, 2010 at 12:38 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Mon, 24 May 2010, Thomas Burgess wrote: It's a sandforce sf-1500 model but without a supercapheres some info on it: Maximum Performance * Max Read: up to 270MB/s * Max Write: up to 250MB/s * Sustained Write: up to 235MB/s * Random Write 4k: 15,000 IOPS * Max 4k IOPS: 50,000 Isn't there a serious problem with these specifications? It seems that the minimum assured performance values (and the median) are much more interesting than some maximum performance value which might only be reached during a brief instant of the device lifetime under extremely ideal circumstances. It seems that toilet paper may of much more practical use than these specifications. In fact, I reject them as being specifications at all. The Apollo reentry vehicle was able to reach amazing speeds, but only for a single use. Bob What exactly do you mean? Every review i've read about this device has been great. Every review i've read about the sandforce controllers has been good toare you saying they have shorter lifetimes? Everything i've read has made them sound like they should last longer than typical ssds because they write less actual data -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] questions about zil
Also, let me note, it came with a 3 year warranty so I expect it to last at least 3 years...but if it doesn't, i'll just return it under the warranty. On Tue, May 25, 2010 at 1:26 PM, Thomas Burgess wonsl...@gmail.com wrote: On Tue, May 25, 2010 at 12:38 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Mon, 24 May 2010, Thomas Burgess wrote: It's a sandforce sf-1500 model but without a supercapheres some info on it: Maximum Performance * Max Read: up to 270MB/s * Max Write: up to 250MB/s * Sustained Write: up to 235MB/s * Random Write 4k: 15,000 IOPS * Max 4k IOPS: 50,000 Isn't there a serious problem with these specifications? It seems that the minimum assured performance values (and the median) are much more interesting than some maximum performance value which might only be reached during a brief instant of the device lifetime under extremely ideal circumstances. It seems that toilet paper may of much more practical use than these specifications. In fact, I reject them as being specifications at all. The Apollo reentry vehicle was able to reach amazing speeds, but only for a single use. Bob What exactly do you mean? Every review i've read about this device has been great. Every review i've read about the sandforce controllers has been good toare you saying they have shorter lifetimes? Everything i've read has made them sound like they should last longer than typical ssds because they write less actual data -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] questions about zil
I recently got a new SSD (ocz vertex LE 50gb) It seems to work really well as a ZIL performance wise. My question is, how safe is it? I know it doesn't have a supercap so lets' say dataloss occursis it just dataloss or is it pool loss? also, does the fact that i have a UPS matter? the numbers i'm seeing are really nicethese are some nfs tar times before zil: real 2m21.498s user 0m5.756s sys 0m8.690s real 2m23.870s user 0m5.756s sys 0m8.739s and these are the same ones after. real 0m32.739s user 0m5.708s sys 0m8.515s real 0m35.580s user 0m5.707s sys 0m8.526s I also sliced iti have 16 gb ram so i used a 9 gb slice for zil and the rest for L2ARC this is for a single 10 drive raidz2 vdev so fari'm really impressed with the performance gains ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] questions about zil
ZFS is always consistent on-disk, by design. Loss of the ZIL will result in loss of the data in the ZIL which hasn't been flushed out to the hard drives, but otherwise, the data on the hard drives is consistent and uncorrupted. This is what i thought. I have read this list on and off for awhile now but i'm not a guruI see a lot of stuff about the intel ssd and disabling the write cacheso i just wasn't sure...This is good news. It avoids the scenario of losing data in your ZIL due to power loss (and, of course, the rest of your system). So, yes, if you actually care about your system, I'd recommend at least a minimal UPS to allow for quick shutdown after a power loss. yes, i have a nice little UPS. I've tested it a few times and it seems to work well. It gives me about 20 minutes of power and can even send commands via a script to shut down the system before the battery goes dry. That's going to pretty much be the best-case use for the ZIL - NFS writes being synchronous. Of course, using the rest of the SSD for L2ARC is likely to be almost (if not more) helpful for performance for a wider variety of actions. yes, i have another machine without a zil (i bought a kingston 64 gb ssd on sale and intended to try it as a zil but ultimately decided to just use it as l2arc because of the performance numbers...) but the l2arc helps a ton for my uses. I did slice this ssd...i used 9 gb for zil and the rest for l2arc (about 36 gb) I'm really impressed with this ssdfor only 160 dollars (180 - 20 mail in rebate) it's a killer deal. it can do 235 MB/s sustained writes and has soemthing like 15,000 iops -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] questions about zil
Not familiar with that model It's a sandforce sf-1500 model but without a supercapheres some info on it: Maximum Performance - Max Read: up to 270MB/s - Max Write: up to 250MB/s - Sustained Write: up to 235MB/s - Random Write 4k: 15,000 IOPS - Max 4k IOPS: 50,000 per http://www.ocztechnology.com/products/solid-state-drives/2-5--sata-ii/performance-enterprise-solid-state-drives/ocz-vertex-limited-edition-sata-ii-2-5--ssd.html Wow. That's a pretty huge improvement. :-) - Garrett (newly of Nexenta) yes, i love it. I'm really impressed with this ssd for the money160 usd (180 - 20 rebate) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New SSD options
From earlier in the thread, it sounds like none of the SF-1500 based drives even have a supercap, so it doesn't seem that they'd necessarily be a better choice than the SLC-based X-25E at this point unless you need more write IOPS... Ray I think the upcoming OCZ Vertex 2 Pro will have a supercap. I just bought a ocz vertex le, it doesn't have a supercap but it DOES have some awesome specs otherwise.. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] confused
did this come out? http://cr.opensolaris.org/~gman/opensolaris-whats-new-2010-05/ i was googling trying to find info about the next release and ran across this Does this mean it's actually about to come out before the end of the month or is this something else? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] confused
never mindjust found more info on this...shoudl have held back from asking On Mon, May 24, 2010 at 1:26 AM, Thomas Burgess wonsl...@gmail.com wrote: did this come out? http://cr.opensolaris.org/~gman/opensolaris-whats-new-2010-05/ i was googling trying to find info about the next release and ran across this Does this mean it's actually about to come out before the end of the month or is this something else? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
yah, unfortunately this is the first send. i'm trying to send 9 TB of data. It really sucks because i was at 6 TB when it lost power On Sat, May 22, 2010 at 2:34 AM, Brandon High bh...@freaks.com wrote: You can resume a send if the destination has a snapshot in common with the source. If you don't, there's nothing you can do. It probably taking a while to restart because the sends that were interrupted need to be rolled back. Sent from my Nexus One. On May 21, 2010 9:44 PM, Thomas Burgess wonsl...@gmail.com wrote: I can't tell you for sure For some reason the server lost power and it's taking forever to come back up. (i'm really not sure what happened) anyways, this leads me to my next couple questions: Is there any way to resume a zfs send/recv Why is it taking so long for the server to come up? it's stuck on Reading ZFS config and there is a FLURRY of hard drive lights blinking (all 10 in sync ) On Sat, May 22, 2010 at 12:26 AM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 201... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HDD Serial numbers for ZFS
install smartmontools There is no package for it but it's EASY to install once you do, you can get ouput like this: pfexec /usr/local/sbin/smartctl -d sat,12 -a /dev/rdsk/c5t0d0 smartctl 5.39.1 2010-01-28 r3054 [i386-pc-solaris2.11] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.12 family Device Model: ST31000528AS Serial Number:6VP06FF5 Firmware Version: CC34 User Capacity:1,000,204,886,016 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is:Sat May 22 11:15:50 2010 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 609) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 192) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities:(0x103f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 113 099 006Pre-fail Always - 55212722 3 Spin_Up_Time0x0003 095 095 000Pre-fail Always - 0 4 Start_Stop_Count0x0032 100 100 020Old_age Always - 132 5 Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 1 7 Seek_Error_Rate 0x000f 081 060 030Pre-fail Always - 136183285 9 Power_On_Hours 0x0032 091 091 000Old_age Always - 7886 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020Old_age Always - 132 183 Runtime_Bad_Block 0x 100 100 000Old_age Offline - 0 184 End-to-End_Error0x0032 100 100 099Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000Old_age Always - 0 189 High_Fly_Writes 0x003a 085 085 000Old_age Always - 15 190 Airflow_Temperature_Cel 0x0022 063 054 045Old_age Always - 37 (Lifetime Min/Max 32/40) 194 Temperature_Celsius 0x0022 037 046 000Old_age Always - 37 (0 16 0 0) 195 Hardware_ECC_Recovered 0x001a 048 025 000Old_age Always - 55212722 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x003e 200 200 000Old_age Always - 0 240 Head_Flying_Hours 0x 100 253 000Old_age Offline - 23691039612915 241 Total_LBAs_Written 0x 100 253 000Old_age Offline - 263672243 242 Total_LBAs_Read 0x 100 253 000Old_age Offline - 960644151 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 100 Not_testing 200 Not_testing 300 Not_testing 400 Not_testing 500 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. On Sat, May 22, 2010 at 3:09 AM, Andreas Iannou andreas_wants_the_w...@hotmail.com wrote: I
Re: [zfs-discuss] HDD Serial numbers for ZFS
i don't think there is but it's dirt simple to install. I followed the instructions here: http://cafenate.wordpress.com/2009/02/22/setting-up-smartmontools-on-opensolaris/ On Sat, May 22, 2010 at 3:19 AM, Andreas Iannou andreas_wants_the_w...@hotmail.com wrote: Thanks Thomas, I thought there'd already be a package in the repo for it. Cheers, Andre -- Date: Sat, 22 May 2010 03:17:38 -0400 Subject: Re: [zfs-discuss] HDD Serial numbers for ZFS From: wonsl...@gmail.com To: andreas_wants_the_w...@hotmail.com CC: zfs-discuss@opensolaris.org install smartmontoolsá There is no package for it but it's EASY to install once you do, you can get ouput like this: pfexec /usr/local/sbin/smartctl -d sat,12 -a /dev/rdsk/c5t0d0 smartctl 5.39.1 2010-01-28 r3054 [i386-pc-solaris2.11] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: á á Seagate Barracuda 7200.12 family Device Model: á á ST31000528AS Serial Number: á á6VP06FF5 Firmware Version: CC34 User Capacity: á á1,000,204,886,016 bytes Device is: á á á áIn smartctl database [for details use: -P show] ATA Version is: á 8 ATA Standard is: áATA-8-ACS revision 4 Local Time is: á áSat May 22 11:15:50 2010 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: á(0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: á á á( á 0) The previous self-test routine completed without error or no self-test has everá been run. Total time to complete Offlineá data collection: ( 609) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: á á á á á á(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: á á á á(0x01) Error logging supported. General Purpose Logging supported. Short self-test routineá recommended polling time: ( á 1) minutes. Extended self-test routine recommended polling time: ( 192) minutes. Conveyance self-test routine recommended polling time: ( á 2) minutes. SCT capabilities: á á á (0x103f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME á á á á áFLAG á á VALUE WORST THRESH TYPE á á áUPDATED áWHEN_FAILED RAW_VALUE áá1 Raw_Read_Error_Rate á á 0x000f á 113 á 099 á 006 á áPre-fail áAlways á á á - á á á 55212722 áá3 Spin_Up_Time á á á á á á0x0003 á 095 á 095 á 000 á áPre-fail áAlways á á á - á á á 0 áá4 Start_Stop_Count á á á á0x0032 á 100 á 100 á 020 á áOld_age á Always á á á - á á á 132 áá5 Reallocated_Sector_Ct á 0x0033 á 100 á 100 á 036 á áPre-fail áAlways á á á - á á á 1 áá7 Seek_Error_Rate á á á á 0x000f á 081 á 060 á 030 á áPre-fail áAlways á á á - á á á 136183285 áá9 Power_On_Hours á á á á á0x0032 á 091 á 091 á 000 á áOld_age á Always á á á - á á á 7886 á10 Spin_Retry_Count á á á á0x0013 á 100 á 100 á 097 á áPre-fail áAlways á á á - á á á 0 á12 Power_Cycle_Count á á á 0x0032 á 100 á 100 á 020 á áOld_age á Always á á á - á á á 132 183 Runtime_Bad_Block á á á 0x á 100 á 100 á 000 á áOld_age á Offline á á á- á á á 0 184 End-to-End_Error á á á á0x0032 á 100 á 100 á 099 á áOld_age á Always á á á - á á á 0 187 Reported_Uncorrect á á á0x0032 á 100 á 100 á 000 á áOld_age á Always á á á - á á á 0 188 Command_Timeout á á á á 0x0032 á 100 á 100 á 000 á áOld_age á Always á á á - á á á 0 189 High_Fly_Writes á á á á 0x003a á 085 á 085 á 000 á áOld_age á Always á á á - á á á 15 190 Airflow_Temperature_Cel 0x0022 á 063 á 054 á 045 á áOld_age á Always á á á - á á á 37 (Lifetime Min/Max 32/40) 194 Temperature_Celsius á á 0x0022 á 037 á 046 á 000 á áOld_age á Always á á á - á á á 37 (0 16 0 0) 195 Hardware_ECC_Recovered á0x001a á 048 á 025 á 000 á áOld_age á Always á á á - á á á 55212722 197 Current_Pending_Sector á0x0012 á 100 á 100 á 000 á áOld_age á Always á á á - á á á 0 198 Offline_Uncorrectable á 0x0010 á 100 á 100 á 000 á áOld_age á Offline á á á- á á á 0 199 UDMA_CRC_Error_Count á á0x003e á 200 á 200 á 000 á áOld_age á Always á á á - á á á 0 240 Head_Flying_Hours á á á 0x á 100 á 253 á 000 á áOld_age á Offline á á á- á á á 23691039612915 241 Total_LBAs_Written á á á0x á 100 á 253 á 000 á áOld_age á Offline á á á- á á á 263672243 242 Total_LBAs_Read á á á á 0x á 100 á 253 á 000 á áOld_age á
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
i only care about the most recent snapshot, as this is a growing video collection. i do have snapshots, but i only keep them for when/if i accidently delete something, or rename something wrong. On Sat, May 22, 2010 at 3:43 AM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 10:22 PM, Thomas Burgess wonsl...@gmail.com wrote: yah, it seems that rsync is faster for what i need anywaysat least right now... If you don't have snapshots you want to keep in the new copy, then probably... -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Understanding ZFS performance.
If you install Opensolaris with the AHCI settings off, then switch them on, it will fail to boot I had to reinstall with the settings correct. the best way to tell if ahci is working is to use cfgadm if you see your drives there, ahci is on if not, then you may need to reinstall with it on (for the rpool at least) On Sat, May 22, 2010 at 4:43 PM, Brian broco...@vt.edu wrote: Is there a way within opensolaris to detect if AHCI is being used by various controllers? I suspect you may be accurate an AHCI is not turned on. The bios for this particular motherboard is fairly confusing on the AHCI settings. The only setting I have is actually in the raid section, and it seems to let select between IDE/AHCI/RAID as an option. However, I can't tell if it applies only if one is using software RAID. If I set it to AHCI, another screen appears prior to boot that is titled AMD AHCI BIOS. However, opensolaris hangs during booting with this enabled. Is there a way from the grub menu to request opensolaris boot without the splashscreen, but instead boot with debug information printed to the console? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Understanding ZFS performance.
just to make sure i understand what is going on here, you have a rpool which is having performance issues, and you discovered ahci was disabled? you enabled it, and now it won't boot. correct? This happened to me and the solution was to export my storage pool and reinstall my rpool with the ahci settings on. Then i imported my storage pool and all was golden On Sat, May 22, 2010 at 5:25 PM, Brian broco...@vt.edu wrote: Thanks - I can give reinstalling a shot. Is there anything else I should do first? Should I export my tank pool? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Understanding ZFS performance.
This didn't work for me. I had the exact same issue a few days ago. My motherboard had the following: Native IDE AHCI RAID Legacy IDE so naturally i chose AHCI, but it ALSO had a mode called IDE/SATA combined mode I thought i needed this to use both the ide and ant sata ports, turns out it was basically an ide emulation mode for sata, long story short i ended up with opensolaris installed in IDE mode. I had to reinstall. I tried the livecd/import method and it still failed to boot. On Sat, May 22, 2010 at 5:30 PM, Ian Collins i...@ianshome.com wrote: On 05/23/10 08:52 AM, Thomas Burgess wrote: If you install Opensolaris with the AHCI settings off, then switch them on, it will fail to boot I had to reinstall with the settings correct. Well you probably didn't have to. Booting form the live CD and importing the pool would have put things right. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Understanding ZFS performance.
this old thread has info on how to switch from ide-sata mode http://opensolaris.org/jive/thread.jspa?messageID=448758#448758 On Sat, May 22, 2010 at 5:32 PM, Ian Collins i...@ianshome.com wrote: On 05/23/10 08:43 AM, Brian wrote: Is there a way within opensolaris to detect if AHCI is being used by various controllers? I suspect you may be accurate an AHCI is not turned on. The bios for this particular motherboard is fairly confusing on the AHCI settings. The only setting I have is actually in the raid section, and it seems to let select between IDE/AHCI/RAID as an option. However, I can't tell if it applies only if one is using software RAID. [answered in other post] If I set it to AHCI, another screen appears prior to boot that is titled AMD AHCI BIOS. However, opensolaris hangs during booting with this enabled. Is there a way from the grub menu to request opensolaris boot without the splashscreen, but instead boot with debug information printed to the console? Just hit a key once the bar is moving. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Understanding ZFS performance.
GREAT, glad it worked for you! On Sat, May 22, 2010 at 7:39 PM, Brian broco...@vt.edu wrote: Ok. What worked for me was booting with the live CD and doing: pfexec zpool import -f rpool reboot After that I was able to boot with AHCI enabled. The performance issues I was seeing are now also gone. I am getting around 100 to 110 MB/s during a scrub. Scrubs are completing in 20 minutes for 1TB of data rather than 1.2 hours. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] snapshots send/recv
I'm confusedI have a filesystem on server 1 called tank/nas/dump I made a snapshot called first zfs snapshot tank/nas/d...@first then i did a zfs send/recv like: zfs send tank/nas/d...@first | ssh wonsl...@192.168.1.xx /bin/pfexec /usr/sbin/zfs recv tank/nas/dump this worked fine, next today, i wanted to send what has changed i did zfs snapshot tank/nas/d...@second now, heres where i'm confusedfrom reading the man page i thought this command would work: pfexec zfs send -i tank/nas/d...@first tank/nas/d...@second| ssh wonsl...@192.168.1.15 /bin/pfexec /usr/sbin/zfs recv -vd tank/nas/dump but i get an error: cannot receive incremental stream: destination tank/nas/dump has been modified since most recent snapshot why is this? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots send/recv
On Sat, May 22, 2010 at 9:26 PM, Ian Collins i...@ianshome.com wrote: On 05/23/10 01:18 PM, Thomas Burgess wrote: this worked fine, next today, i wanted to send what has changed i did zfs snapshot tank/nas/d...@second now, heres where i'm confusedfrom reading the man page i thought this command would work: pfexec zfs send -i tank/nas/d...@first tank/nas/d...@second| ssh wonsl...@192.168.1.15 mailto:wonsl...@192.168.1.15 /bin/pfexec /usr/sbin/zfs recv -vd tank/nas/dump It should (you can shorten the first snap to first. but i get an error: cannot receive incremental stream: destination tank/nas/dump has been modified since most recent snapshot Well has it? Even wandering around the filesystem with atime enabled will cause this error. Add -f to the receive to force a roll-back to the state after the original snap. Ahh, this i didn't know. Yes, i DID cd to the dir and check some stuff and atime IS enabledthis is NOT very intuitive. adding -F worked...thanks -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots send/recv
ok, so forcing just basically makes it drop whatever changes were made Thats what i was wondering...this is what i expected On Sun, May 23, 2010 at 12:05 AM, Ian Collins i...@ianshome.com wrote: On 05/23/10 03:56 PM, Thomas Burgess wrote: let me ask a question though. Lets say i have a filesystem tank/something i make the snapshot tank/someth...@one i send/recv it then i do something (add a file...remove something, whatever) on the send side, then i do a send/recv and force it of the next filesystem What do you mean force it of the next filesystem? will the new recv'd filesystem be identical to the original forced snapshot or will it be a combination of the 2? The received filesystem will be identical to the sending one. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New SSD options
On the PCIe side, I noticed there's a new card coming from LSI that claims 150,000 4k random writes. Unfortunately this might end up being an OEM-only card. I also notice on the ddrdrive site that they now have an opensolaris driver and are offering it in a beta program. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/recv over ssh
I seem to be getting decent speed with arcfour (this was what i was using to begin with) Thanks for all the helpthis honestly was just me being stupid...looking back on yesterday, i can't even remember what i was doing wrong nowi was REALLY tired when i asked this question. On Fri, May 21, 2010 at 2:43 PM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 11:28 AM, David Dyer-Bennet d...@dd-b.net wrote: I thought I remembered a none cipher, but couldn't find it the other year and decided I must have been wrong. I did use ssh-1, so maybe I really WAS remembering after all. It may have been in ssh2 as well, or at least the commercial version .. I thought it used to be a compile time option for openssh too. Seems a high price to pay to try to protect idiots from being idiots. Anybody who doesn't understand that encryption = none means it's not encrypted and hence not safe isn't safe as an admin anyway. Well, it won't expose your passwords since the key exchange it still encrypted ... That's good, right? Circling back to the original topic, you can use ssh to start up mbuffer on the remote side, then start the send. Something like: #!/bin/bash ssh -f r...@${recv_host} mbuffer -q -I ${SEND_HOST}:1234 | zfs recv puddle/tank sleep 1 zfs send -R tank/foo/bar | mbuffer -O ${RECV_HOST}:1234 When I was moving datasets between servers, I was on the console of both, so manually starting the send/recv was not a problem. I've tried doing it with netcat rather than mbuffer but it was painfully slow, probably due to network buffers. ncat (from the nmap devs) may be a suitable alternative, and can support ssl and certificate based auth. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
8:10:12:15:20 supported_max_cstates 0 vendor_id AuthenticAMD module: cpu_infoinstance: 7 name: cpu_info7 class:misc brand AMD Opteron(tm) Processor 6128 cache_id7 chip_id 0 clock_MHz 2000 clog_id 7 core_id 7 cpu_typei386 crtime 9171.560266487 current_clock_Hz20 current_cstate 0 family 16 fpu_typei387 compatible implementation x86 (chipid 0x0 AuthenticAMD 100F91 family 16 model 9 step 1 clock 2000 MHz) model 9 ncore_per_chip 8 ncpu_per_chip 8 pg_id 11 pkg_core_id 7 snaptime113230.737322698 socket_type G34 state on-line state_begin 1274377645 stepping1 supported_frequencies_Hz 8:10:12:15:20 supported_max_cstates 0 vendor_id AuthenticAMD On Mon, May 17, 2010 at 5:55 PM, Dennis Clarke dcla...@blastwave.orgwrote: On 05-17-10, Thomas Burgess wonsl...@gmail.com wrote: psrinfo -pv shows: The physical processor has 8 virtual processors (0-7) x86 (AuthenticAMD 100F91 family 16 model 9 step 1 clock 200 MHz) AMD Opteron(tm) Processor 6128 [ Socket: G34 ] That's odd. Please try this : # kstat -m cpu_info -c misc module: cpu_infoinstance: 0 name: cpu_info0 class:misc brand VIA Esther processor 1200MHz cache_id0 chip_id 0 clock_MHz 1200 clog_id 0 core_id 0 cpu_typei386 crtime 3288.24125364 current_clock_Hz1199974847 current_cstate 0 family 6 fpu_typei387 compatible implementation x86 (CentaurHauls 6A9 family 6 model 10 step 9 clock 1200 MHz) model 10 ncore_per_chip 1 ncpu_per_chip 1 pg_id -1 pkg_core_id 0 snaptime1526742.97169617 socket_type Unknown state on-line state_begin 1272610247 stepping9 supported_frequencies_Hz1199974847 supported_max_cstates 0 vendor_id CentaurHauls You should get a LOT more data. Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
Something i've been meaning to ask I'm transfering some data from my older server to my newer one. the older server has a socket 775 intel Q9550 8 gb ddr2 800 20 1TB drives in raidz2 (3 vdevs, 2 with 7 drives one with 6) connected to 3 AOC-SAT2-MV8 cards spread as evenly across them as i could The new server is socket g34 based with the opteron 6128 8 core cpu with 16 gb ddr3 1333 ECC ram with 10 2TB drives (so far) in a single raidz2 vdev connected to 3 LSI SAS3081E-R cards (flashed with IT firmware) I'm sure this is due to something i don't understand, but durring zfs send/recv from the old server to the new server (3 send/recv streams) I'm noticing the loadavg on the old server is much less than the new one this is form top on the old server: load averages: 1.58, 1.57, 1.37; up 5+05:13:17 04:52:42 and this is the newer server load averages: 6.20, 5.98, 5.30; up 1+05:03:02 18:49:57 shouldn't the newer server have LESS load? Please forgive my ubernoobness. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
is 3 zfs recv's random? On Fri, May 21, 2010 at 10:03 PM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 5:54 PM, Thomas Burgess wonsl...@gmail.com wrote: shouldn't the newer server have LESS load? Please forgive my ubernoobness. Depends on what it's doing! Load average is really how many process are waiting to run, so it's not always a useful metric. If there are processes waiting on disk, you can have high load with almost no cpu use. Check the iowait with iostat or top. You've got a pretty wide stripe, which isn't going to give the best performance, especially for random write workloads. Your old 3 vdev config will have better random write performance. Check to see what's using the CPU with top or prstat. prstat gives better info for threads, imo. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
yeah, i'm aware of the performance aspects. I use these servers as mostly hd video servers for my house...they don't need to perform amazingly. I originally went with the setup on the old server because of everything i had read about performance with wide stripes...in all honesty it performed amazingly well, much more than i truly need...i plan to have 2 raidz2 stripes of 10 drives in this server (new one). At most it will be serving 4-5 HD streams (mostly 720p mkv files, with some 1080p as well) The older server can EASILY max out 2 Gb/s links..i imagine the new server will be able to do this as well...i think a scrub of the old server takes 4-5 hours.i'm not sure what this equates to in MB/s but its WAY more than i ever really need. This is what led me to use wider stripes in the new server, and i'm honestly considering redoing the old server as well, if i switched to 2 wider stripes instead of 3 i'd gain another TB or twofor my use i don't think that would be a horrible thing. On Fri, May 21, 2010 at 10:03 PM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 5:54 PM, Thomas Burgess wonsl...@gmail.com wrote: shouldn't the newer server have LESS load? Please forgive my ubernoobness. Depends on what it's doing! Load average is really how many process are waiting to run, so it's not always a useful metric. If there are processes waiting on disk, you can have high load with almost no cpu use. Check the iowait with iostat or top. You've got a pretty wide stripe, which isn't going to give the best performance, especially for random write workloads. Your old 3 vdev config will have better random write performance. Check to see what's using the CPU with top or prstat. prstat gives better info for threads, imo. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
I can't tell you for sure For some reason the server lost power and it's taking forever to come back up. (i'm really not sure what happened) anyways, this leads me to my next couple questions: Is there any way to resume a zfs send/recv Why is it taking so long for the server to come up? it's stuck on Reading ZFS config and there is a FLURRY of hard drive lights blinking (all 10 in sync ) On Sat, May 22, 2010 at 12:26 AM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 7:57 PM, Thomas Burgess wonsl...@gmail.com wrote: is 3 zfs recv's random? It might be. What do a few reports of 'iostat -xcn 30' look like? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
yah, it seems that rsync is faster for what i need anywaysat least right now... On Sat, May 22, 2010 at 1:07 AM, Ian Collins i...@ianshome.com wrote: On 05/22/10 04:44 PM, Thomas Burgess wrote: I can't tell you for sure For some reason the server lost power and it's taking forever to come back up. (i'm really not sure what happened) anyways, this leads me to my next couple questions: Is there any way to resume a zfs send/recv Nope. Why is it taking so long for the server to come up? it's stuck on Reading ZFS config and there is a FLURRY of hard drive lights blinking (all 10 in sync ) It's cleaning up the mess. If you had a lot of data copied over, it'll take a while deleting it! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
0.24.21.5 13 17 c6t4d0 55.72.0 3821.3 91.1 0.3 0.24.73.0 6 10 c6t5d0 81.22.0 5866.7 91.2 0.2 0.41.95.2 5 14 c6t6d0 0.9 227.2 23.4 28545.1 4.7 0.6 20.42.8 63 64 c8t5d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c4t7d0 cpu us sy wt id 39 32 0 29 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 fd0 1.52.4 35.4 33.6 0.0 0.03.61.0 0 0 c8t1d0 105.81.9 5560.1 95.5 0.3 0.32.72.9 8 16 c5t0d0 109.62.5 5546.4 95.6 0.0 0.50.04.3 0 13 c4t0d0 110.82.6 5504.7 95.4 0.3 0.32.22.6 7 15 c4t1d0 104.62.4 5596.9 95.5 0.0 0.60.05.4 0 15 c5t1d0 109.92.2 5522.1 86.1 0.2 0.32.02.5 7 14 c4t2d0 104.61.9 5533.6 86.2 0.3 0.32.53.1 7 16 c5t2d0 109.22.7 5498.4 86.1 0.2 0.32.12.4 7 14 c4t3d0 105.32.9 5593.8 95.5 0.0 0.60.05.1 0 15 c5t3d0 57.81.9 3938.4 90.7 0.2 0.13.51.5 6 9 c4t5d0 50.82.3 3298.6 90.8 0.0 0.30.05.2 0 8 c5t4d0 105.02.6 5541.2 86.1 0.4 0.23.71.4 11 15 c5t5d0 90.82.3 6376.7 90.7 0.2 0.32.43.1 6 13 c5t6d0 87.41.8 6085.2 90.6 0.0 0.50.05.4 0 13 c5t7d0 104.22.4 5550.8 86.1 0.0 0.50.05.0 0 14 c6t0d0 106.82.4 5543.6 95.5 0.0 0.60.05.5 0 16 c6t1d0 105.42.5 5517.5 86.1 0.4 0.23.81.4 12 16 c6t2d0 106.62.4 5569.1 95.6 0.0 0.50.05.0 0 15 c6t3d0 107.22.2 5536.4 86.1 0.2 0.32.12.8 7 15 c6t4d0 61.22.4 4085.2 90.7 0.0 0.30.05.4 0 10 c6t5d0 70.31.8 5018.2 90.7 0.3 0.14.71.7 9 12 c6t6d0 0.8 203.3 12.3 25514.5 3.9 0.6 19.22.7 54 55 c8t5d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c4t7d0 cpu us sy wt id 38 30 0 32 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 fd0 2.22.5 64.2 35.2 0.0 0.03.30.9 0 0 c8t1d0 98.63.1 5441.3 110.3 0.0 0.60.05.9 0 16 c5t0d0 102.13.7 5392.7 110.2 0.0 0.50.04.3 0 13 c4t0d0 104.13.3 5390.7 110.4 0.0 0.50.05.0 0 15 c4t1d0 98.23.0 5437.3 110.2 0.0 0.50.05.1 0 14 c5t1d0 104.73.8 5437.3 104.5 0.0 0.50.04.8 0 15 c4t2d0 97.73.4 5481.1 104.6 0.0 0.60.06.0 0 16 c5t2d0 103.13.4 5468.4 104.6 0.0 0.60.05.2 0 15 c4t3d0 98.73.0 5415.2 110.3 0.0 0.50.05.1 0 14 c5t3d0 55.73.1 3883.4 93.7 0.1 0.12.02.5 4 8 c4t5d0 44.52.9 3141.2 93.6 0.0 0.30.05.5 0 7 c5t4d0 99.23.3 5464.0 104.5 0.4 0.24.21.5 12 15 c5t5d0 82.32.8 6119.3 93.4 0.0 0.50.06.4 0 14 c5t6d0 75.22.7 5601.1 93.4 0.1 0.41.74.8 3 13 c5t7d0 97.83.1 5458.8 104.5 0.0 0.50.05.2 0 14 c6t0d0 99.23.2 5441.5 110.2 0.0 0.60.05.8 0 16 c6t1d0 98.43.0 5475.7 104.6 0.3 0.43.03.5 8 17 c6t2d0 99.83.0 5434.4 110.1 0.0 0.50.05.1 0 14 c6t3d0 100.63.2 5453.9 104.6 0.0 0.60.05.5 0 15 c6t4d0 54.93.0 3878.1 93.5 0.1 0.21.54.2 3 9 c6t5d0 68.42.9 5128.3 93.5 0.2 0.33.14.2 6 13 c6t6d0 0.9 201.9 34.2 25338.0 3.8 0.5 18.92.6 51 52 c8t5d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c4t7d0 On Sat, May 22, 2010 at 12:26 AM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 7:57 PM, Thomas Burgess wonsl...@gmail.com wrote: is 3 zfs recv's random? It might be. What do a few reports of 'iostat -xcn 30' look like? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss