Re: [zfs-discuss] raidz DEGRADED state
So there is no current way to specify the creation of a 3 disk raid-z array with a known missing disk? On 12/5/06, David Bustos david.bus...@sun.com wrote: Quoth Thomas Garner on Thu, Nov 30, 2006 at 06:41:15PM -0500: I currently have a 400GB disk that is full of data on a linux system. If I buy 2 more disks and put them into a raid-z'ed zfs under solaris, is there a generally accepted way to build an degraded array with the 2 disks, copy the data to the new filesystem, and then move the original disk to complete the array? No, because we currently can't add disks to a raidz array. You could create a mirror instead and then add in the other disk to make a three-way mirror, though. Even doing that would be dicey if you only have a single machine, though, since Solaris can't natively read the popular Linux filesystems. I believe there is freeware to do it, but nothing supported. David ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
These are the same as the acard devices we've discussed here previously; earlier hyperdrive models were their own design. Very interesting, and my personal favourite, but I don't know of anyone actually reporting results yet with them as ZIL. Here's one report: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg27739.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Aggregate Pool I/O
Are you looking for something like: kstat -c disk sd::: Someone can correct me if I'm wrong, but I think the documentation for the above should be at: http://src.opensolaris.org/source/xref/zfs-crypto/gate/usr/src/uts/common/avs/ns/sdbc/cache_kstats_readme.txt I'm not sure about the file i/o vs disk i/o, but would love to hear how to measure it. Thomas On Sat, Jan 17, 2009 at 4:07 AM, Brad bst...@aspirinsoftware.com wrote: I'd like to track a server's ZFS pool I/O throughput over time. What's a good data source to use for this? I like zpool iostat for this, but if I poll at two points in time I would get a number since boot (e.g. 1.2M) and a current number (e.g. 1.3K). If I use the current number then I've lost data between polling intervals. But if I use the number since boot it's not precise enough to be useful. Is there a kstat equivalent to the I/O since boot? Some other good data source? And then is there a similar kstat equivalent to iostat? Would both data values then allow me to trend file i/O versus physical disk I/O? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to diagnose zfs - iscsi - nfs hang
Are these machines 32-bit by chance? I ran into similar seemingly unexplainable hangs, which Marc correctly diagnosed and have since not reappeared: http://mail.opensolaris.org/pipermail/zfs-discuss/2008-August/049994.html Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
For what it's worth I see this as well on 32-bit Xeons, 1GB ram, and dual AOC-SAT2-MV8 (large amounts of io sometimes resulting in lockup requiring a reboot --- though my setup is Nexenta b85). Nothing in the logging, nor loadavg increasing significantly. It could be the regular Marvell driver issues, but is definitely not cool when it happens. Thomas On Wed, Aug 6, 2008 at 1:31 PM, Bryan Allen [EMAIL PROTECTED] wrote: Good afternoon, I have a ~600GB zpool living on older Xeons. The system has 8GB of RAM. The pool is hanging off two LSI Logic SAS3041X-Rs (no RAID configured). When I put a moderate amount of load on the zpool (like, say, copying many files locally, or deleting a large number of ZFS fs), the system hangs and becomes completely unresponsive, requiring a reboot. The ARC never gets over ~40MB. The system is running Sol10u4. Are there any suggested tunables for running big zpools on 32bit? Cheers. -- bda Cyberpunk is dead. Long live cyberpunk. http://mirrorshades.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Unbalanced write patterns
If I have 2 raidz's, 5x400G and a later added 5x1T, should I expect that streaming writes would go primarily to only 1 of the raidz sets? Or is this some side effect of my non-ideal hardware setup? I thought that adding additional capacity to a pool automatically would then balance writes to both raidz's, but does not seem to fit what I've seen empirically. What am I missing? Note that the following is a snapshot of time in the middle of a large streaming write, not the initial output from zpool iostat. Thomas zpool iostat -v tank 1 capacity operationsbandwidth pool used avail read write read write -- - - - - - - tank1.72T 4.65T 9379 156K 30.7M raidz11.66T 166G 9329 156K 30.5M c0t0d0 - - 4152 282K 7.64M c0t4d0 - - 4155 282K 7.64M c1t0d0 - - 2153 188K 7.64M c1t4d0 - - 3161 220K 7.64M c1t1d0 - - 1158 94.1K 7.64M raidz165.2G 4.48T 0 50 0 158K c0t5d0 - - 0 72 0 82.4K c1t2d0 - - 0 69 0 80.4K c1t6d0 - - 0 72 0 83.3K c0t2d0 - - 0 0 0 0 c0t6d0 - - 0 73 0 87.3K -- - - - - - - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
Thanks, Roch! Much appreciated knowing what the problem is and that a fix is in a forthcoming release. Thomas On 6/25/07, Roch - PAE [EMAIL PROTECTED] wrote: Sorry about that; looks like you've hit this: 6546683 marvell88sx driver misses wakeup for mv_empty_cv http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6546683 Fixed in snv_64. -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
We have seen this behavior, but it appears to be entirely related to the hardware having the Intel IPMI stuff swallow up the NFS traffic on port 623 directly by the network hardware and never getting. http://blogs.sun.com/shepler/entry/port_623_or_the_mount Unfortunately, this nfs hangs across 3 separate machines, none of which should have this IPMI issue. It did spur me on to dig a little deeper, though, so thanks for the encouragement that all may not be well. Can anyone debug this? Remember that this is Nexenta Alpha 7, so it should be b61. nfsd is totally hung (rpc timeouts) and zfs would be having problems taking snapshots, if I hadn't disabled the hourly snapshots. Thanks! Thomas [EMAIL PROTECTED] ~]$ rpcinfo -t filer0 nfs rpcinfo: RPC: Timed out program 13 version 0 is not available echo ::pgrep nfsd | ::walk thread | ::findstack -v | mdb -k stack pointer for thread 821cda00: 822d6e28 822d6e5c swtch+0x17d() 822d6e8c cv_wait_sig_swap_core+0x13f(8b8a9232, 8b8a9200, 0) 822d6ea4 cv_wait_sig_swap+0x13(8b8a9232, 8b8a9200) 822d6ee0 cv_waituntil_sig+0x100(8b8a9232, 8b8a9200, 0) 822d6f44 poll_common+0x3e1(8069480, a, 0, 0) 822d6f84 pollsys+0x7c() 822d6fac sys_sysenter+0x102() stack pointer for thread 821d2e00: 8c279d98 8c279dcc swtch+0x17d() 8c279df4 cv_wait_sig+0x123(8988796e, 89887970) 8c279e2c svc_wait+0xaa(1) 8c279f84 nfssys+0x423() 8c279fac sys_sysenter+0x102() stack pointer for thread a9f88800: 8c92e218 8c92e244 swtch+0x17d() 8c92e254 cv_wait+0x4e(8a4169ea, 8a4169e0) 8c92e278 mv_wait_for_dma+0x32() 8c92e2a4 mv_start+0x278(88252c78, 89833498) 8c92e2d4 sata_hba_start+0x79(8987d23c, 8c92e304) 8c92e308 sata_txlt_synchronize_cache+0xb7(8987d23c) 8c92e334 sata_scsi_start+0x1b7(8987d1e4, 8987d1e0) 8c92e368 scsi_transport+0x52(8987d1e0) 8c92e3a4 sd_start_cmds+0x28a(8a2710c0, 0) 8c92e3c0 sd_core_iostart+0x158(18, 8a2710c0, 8da3be70) 8c92e3f8 sd_uscsi_strategy+0xe8(8da3be70) 8c92e414 sd_send_scsi_SYNCHRONIZE_CACHE+0xd4(8a2710c0, 8c50074c) 8c92e4b0 sdioctl+0x48e(1ac0080, 422, 8c50074c, 8010, 883cee68, 0) 8c92e4dc cdev_ioctl+0x2e(1ac0080, 422, 8c50074c, 8010, 883cee68, 0) 8c92e504 ldi_ioctl+0xa4(8a671700, 422, 8c50074c, 8010, 883cee68, 0) 8c92e544 vdev_disk_io_start+0x187(8c500580) 8c92e554 vdev_io_start+0x18(8c500580) 8c92e580 zio_vdev_io_start+0x142(8c500580) 8c92e59c zio_next_stage+0xaa(8c500580) 8c92e5b0 zio_ready+0x136(8c500580) 8c92e5cc zio_next_stage+0xaa(8c500580) 8c92e5ec zio_wait_for_children+0x46(8c500580, 1, 8c50076c) 8c92e600 zio_wait_children_ready+0x18(8c500580) 8c92e614 zio_next_stage_async+0xac(8c500580) 8c92e624 zio_nowait+0xe(8c500580) 8c92e660 zio_ioctl+0x94(9c6f8300, 89557c80, 89556400, 422, 0, 0) 8c92e694 zil_flush_vdev+0x54(89557c80, 0, 0, 8c92e6e0, 9c6f8500) 8c92e6e4 zil_flush_vdevs+0x6b(8bbe46c0) 8c92e734 zil_commit_writer+0x35f(8bbe46c0, 3497c, 0, 4af5, 0) 8c92e774 zil_commit+0x96(8bbe46c0, , , 4af5, 0) 8c92e7e8 zfs_putpage+0x1e4(8c8ab480, 0, 0, 0, 0, 8c6c75c0) 8c92e824 vhead_putpage+0x95(8c8ab480, 0, 0, 0, 0, 8c6c75c0) 8c92e86c fop_putpage+0x27(8c8ab480, 0, 0, 0, 0, 8c6c75c0) 8c92e91c rfs4_op_commit+0x153(82141dd4, b28c3100, 8c92ed8c, 8c92e948) 8c92ea48 rfs4_compound+0x1ce(8c92ead0, 8c92ea7c, 0, 8c92ed8c, 0) 8c92eaac rfs4_dispatch+0x65(8bf9b248, 8c92ed8c, b28c5a40, 8c92ead0) 8c92ed10 common_dispatch+0x6b0(8c92ed8c, b28c5a40, 2, 4, 8bf9c01c, 8bf9b1f0) 8c92ed34 rfs_dispatch+0x1f(8c92ed8c, b28c5a40) 8c92edc4 svc_getreq+0x158(b28c5a40, 842952a0) 8c92ee0c svc_run+0x146(898878e8) 8c92ee2c svc_do_run+0x6e(1) 8c92ef84 nfssys+0x3fb() 8c92efac sys_sysenter+0x102() snipping out a bunch of other threads ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
So it is expected behavior on my Nexenta alpha 7 server for Sun's nfsd to stop responding after 2 hours of running a bittorrent client over nfs4 from a linux client, causing zfs snapshots to hang and requiring a hard reboot to get the world back in order? Thomas There is no NFS over ZFS issue (IMO/FWIW). If ZFS is talking to a JBOD, then the slowness is a characteristic of NFS (not related to ZFS). So FWIW on JBOD, there is no ZFS+NFS issue in the sense that I don't know howwe couldchange ZFS to be significantly better at NFS nor do I know how to change NFS that would help _particularly_ ZFS. Doesn't mean there is none, I just don't know about them. So please ping me if you highlight such an issue. So if one replaces ZFS by some other filesystem and gets large speedup I'm interested (make sure the other filesystem either runs with write cache off, or flushes it on NFS commit). So that leaves us with a Samba vs NFS issue (not related to ZFS). We know that NFS is able to create file _at most_ at one file per server I/O latency. Samba appears better and this is what we need to investigate. It might be better in a way that NFS can borrow (maybe through some better NFSV4 delegation code) or Samba might be better by being careless with data. If we find such an NFS improvement it will help all backend filesystems not just ZFS. Which is why I say: There is no NFS over ZFS issue. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] pool resilver oddity
Perhaps someone on this mailing list can shed some light onto some odd zfs circumstances I encountered this weekend. I have an array of 5 400GB drives in a raidz, running on Nexenta. One of these drives showed a SMART error (HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE [asc=5d, ascq=10]). I preemptively replaced it, using zpool replace tank c3t0d0 c3t5d0. The resilver started, but quickly hung at scrub: resilver in progress, 0.37% done, 132h14m to go. NFS stopped working and I think the system had some responsiveness issues. I did have automatic hourly/daily/weekly snapshots running on the filesystem at the time. I rebooted it, but it would not come up in any sane state, sometimes becoming pingable, but never becoming ssh-able or consolable over serial (as it is configured to do). I tried using various live cds, to no avail. I eventually got it to boot after much gnashing of teeth in Nexenta's single user mode into a login prompt, but only after both of the drives affected by the replacement were physically removed. Having both drives removed allowed the system to give a login prompt. The resilver proceeded normally, and I watched the resilver complete. I physically reattached the drives and rebooted the system, at which time the pool was online and no longer in a degraded state. The system now boots normally. So, after all that, my primary question is how did the resilvering (which I liken to a rebuilding of the 5 drive array) take place with only 4 drives online? Shouldn't it have been writing data/parity to the replacement drive? Is this normal and the expected behavior? Thanks for any insight! Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: .zfs snapshot directory in all directories
for what purpose ? Darren's correct, it's a simple case of ease of use. Not show-stopping by any means but would be nice to have. Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] .zfs snapshot directory in all directories
Since I have been unable to find the answer online, I thought I would ask here. Is there a knob to turn to on a zfs filesystem put the .zfs snapshot directory into all of the children directories of the filesystem, like the .snapshot directories of NetApp systems, instead of just the root of the filesystem? Thanks! Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz DEGRADED state
So there is no current way to specify the creation of a 3 disk raid-z array with a known missing disk? On 12/5/06, David Bustos [EMAIL PROTECTED] wrote: Quoth Thomas Garner on Thu, Nov 30, 2006 at 06:41:15PM -0500: I currently have a 400GB disk that is full of data on a linux system. If I buy 2 more disks and put them into a raid-z'ed zfs under solaris, is there a generally accepted way to build an degraded array with the 2 disks, copy the data to the new filesystem, and then move the original disk to complete the array? No, because we currently can't add disks to a raidz array. You could create a mirror instead and then add in the other disk to make a three-way mirror, though. Even doing that would be dicey if you only have a single machine, though, since Solaris can't natively read the popular Linux filesystems. I believe there is freeware to do it, but nothing supported. David ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz DEGRADED state
In the same vein... I currently have a 400GB disk that is full of data on a linux system. If I buy 2 more disks and put them into a raid-z'ed zfs under solaris, is there a generally accepted way to build an degraded array with the 2 disks, copy the data to the new filesystem, and then move the original disk to complete the array? Thanks! Thomas On 11/30/06, Krzys [EMAIL PROTECTED] wrote: Ah, did not see your follow up. Thanks. Chris On Thu, 30 Nov 2006, Cindy Swearingen wrote: Sorry, Bart, is correct: If new_device is not specified, it defaults to old_device. This form of replacement is useful after an existing disk has failed and has been physically replaced. In this case, the new disk may have the same /dev/dsk path as the old device, even though it is actu- ally a different disk. ZFS recognizes this. cs Cindy Swearingen wrote: One minor comment is to identify the replacement drive, like this: # zpool replace mypool2 c3t6d0 c3t7d0 Otherwise, zpool will error... cs Bart Smaalders wrote: Krzys wrote: my drive did go bad on me, how do I replace it? I am sunning solaris 10 U2 (by the way, I thought U3 would be out in November, will it be out soon? does anyone know? [11:35:14] server11: /export/home/me zpool status -x pool: mypool2 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: none requested config: NAMESTATE READ WRITE CKSUM mypool2 DEGRADED 0 0 0 raidz DEGRADED 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c3t6d0 UNAVAIL 0 679 0 cannot open errors: No known data errors ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Shut down the machine, replace the drive, reboot and type: zpool replace mypool2 c3t6d0 On earlier versions of ZFS I found it useful to do this at the login prompt; it seemed fairly memory intensive. - Bart ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss !DSPAM:122,456f1b0c21174266247132! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss