Re: [zfs-discuss] Finding disks [was: # disks per vdev]
Thanks for all the replies. I have a pretty good idea how the disk enclosure assigns slot locations so should be OK. One last thing - I see thet Supermicro has just released a newer version of the card I mentioned in the first post that supports SATA 6Gbps. From what I can see it uses the Marvell 9480 controller, which I don't think is supported in Solaris Express 11 yet. Does this mean it strictly won't work (ie no available drivers) or that it just wouldn't be supported if there's problems? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool with data errors
On 21 June, 2011 - Todd Urie sent me these 5,9K bytes: I have a zpool that shows the following from a zpool status -v zpool name brsnnfs0104 [/var/spool/cron/scripts]# zpool status -v ABC0101 pool:ABC0101 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM ABC0101 ONLINE 0 010 /dev/vx/dsk/ABC01dg/ABC0101_01 ONLINE 0 0 2 /dev/vx/dsk/ABC01dg/ABC0101_02 ONLINE 0 0 8 /dev/vx/dsk/ABC01dg/ABC0101_03 ONLINE 0 010 errors: Permanent errors have been detected in the following files: /clients/ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/717b52282ea059452621587173561360 /clients/ ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/6e6a9f37c4d13fdb3dcb8649272a2a49 /clients/ABC0101/rep/d0/prod1/reports/ReutersCMOLoad/ReutersCMOLoad. ABCntss001.20110620.141330.26496.ROLLBACK_FOR_UPDATE_COUPONS.html /clients/ ABC0101/rep/local/bfm/web/htdocs/tmp/G2_0.related_detail_loader.1308593666.54643.n5cpoli3355.data /clients/ ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/F_OLPO82_A.gp. ABCIM_GA.nlaf.xml.gz /clients/ ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNVLXCIAFI.gp. ABCIM_GA.nlaf.xml.gz /clients/ ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNIVLEXCIA.gp.BARCRATING_ ABC.nlaf.xml.gz I think that a scrub at least has the possibility to clear this up. A quick search suggests that others have had some good experience with using scrub in similar circumstances. I was wondering if anyone could share some of their experiences, good and bad, so that I can assess the risk and probability of success with this approach. Also, any other ideas would certainly be appreciated. As you have no ZFS based redundancy, it can only detect that some blocks delivered from the devices (SAN I guess?) were broken according to the checksum. If you had raidz/mirror in zfs, it would have corrected the problems and written back correct data to the malfunctioning device. Now it does not. A scrub only reads the data and verifies that data matches checksums. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool with data errors
Todd, Is that ZFS on top of VxVM ? Are those volumes okay? I wonder if this is really a sensible combination? ..Remco On 6/21/11 7:36 AM, Todd Urie wrote: I have a zpool that shows the following from a zpool status -v zpool name brsnnfs0104 [/var/spool/cron/scripts]# zpool status -v ABC0101 pool: ABC0101 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM ABC0101 ONLINE 0 010 /dev/vx/dsk/ ABC01dg/ ABC0101_01 ONLINE 0 0 2 /dev/vx/dsk/ ABC01dg/ ABC0101_02 ONLINE 0 0 8 /dev/vx/dsk/ ABC01dg/ ABC0101_03 ONLINE 0 010 errors: Permanent errors have been detected in the following files: /clients/ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/717b52282ea059452621587173561360 /clients/ ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/6e6a9f37c4d13fdb3dcb8649272a2a49 /clients/ ABC0101/rep/d0/prod1/reports/ReutersCMOLoad/ReutersCMOLoad. ABCntss001.20110620.141330.26496.ROLLBACK_FOR_UPDATE_COUPONS.html /clients/ ABC0101/rep/local/bfm/web/htdocs/tmp/G2_0.related_detail_loader.1308593666.54643.n5cpoli3355.data /clients/ ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/F_OLPO82_A.gp. ABCIM_GA.nlaf.xml.gz /clients/ ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNVLXCIAFI.gp. ABCIM_GA.nlaf.xml.gz /clients/ ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNIVLEXCIA.gp.BARCRATING_ ABC.nlaf.xml.gz I think that a scrub at least has the possibility to clear this up. A quick search suggests that others have had some good experience with using scrub in similar circumstances. I was wondering if anyone could share some of their experiences, good and bad, so that I can assess the risk and probability of success with this approach. Also, any other ideas would certainly be appreciated. -RTU ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool with data errors
The volumes sit on HDS SAN. The only reason for the volumes is to prevent inadvertent import of the zpool on two nodes of a cluster simultaneously. Since we're on SAN with Raid internally, didn't seem to we would need zfs to provide that redundancy also. On Tue, Jun 21, 2011 at 4:17 AM, Remco Lengers re...@lengers.com wrote: ** Todd, Is that ZFS on top of VxVM ? Are those volumes okay? I wonder if this is really a sensible combination? ..Remco On 6/21/11 7:36 AM, Todd Urie wrote: I have a zpool that shows the following from a zpool status -v zpool name brsnnfs0104 [/var/spool/cron/scripts]# zpool status -v ABC0101 pool: ABC0101 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM ABC0101 ONLINE 0 010 /dev/vx/dsk/ ABC01dg/ ABC0101_01 ONLINE 0 0 2 /dev/vx/dsk/ ABC01dg/ ABC0101_02 ONLINE 0 0 8 /dev/vx/dsk/ ABC01dg/ ABC0101_03 ONLINE 0 010 errors: Permanent errors have been detected in the following files: /clients/ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/717b52282ea059452621587173561360 /clients/ ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/6e6a9f37c4d13fdb3dcb8649272a2a49 /clients/ ABC0101/rep/d0/prod1/reports/ReutersCMOLoad/ReutersCMOLoad. ABCntss001.20110620.141330.26496.ROLLBACK_FOR_UPDATE_COUPONS.html /clients/ ABC0101/rep/local/bfm/web/htdocs/tmp/G2_0.related_detail_loader.1308593666.54643.n5cpoli3355.data /clients/ ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/F_OLPO82_A.gp. ABCIM_GA.nlaf.xml.gz /clients/ ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNVLXCIAFI.gp. ABCIM_GA.nlaf.xml.gz /clients/ ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNIVLEXCIA.gp.BARCRATING_ ABC.nlaf.xml.gz I think that a scrub at least has the possibility to clear this up. A quick search suggests that others have had some good experience with using scrub in similar circumstances. I was wondering if anyone could share some of their experiences, good and bad, so that I can assess the risk and probability of success with this approach. Also, any other ideas would certainly be appreciated. -RTU ___ zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- -RTU ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool with data errors
On 21/06/11 7:54 AM, Todd Urie wrote: The volumes sit on HDS SAN. The only reason for the volumes is to prevent inadvertent import of the zpool on two nodes of a cluster simultaneously. Since we're on SAN with Raid internally, didn't seem to we would need zfs to provide that redundancy also. You do if you want self-healing, as Tomas points out. A non-redundant pool, even on mirrored or RAID storage, offers no ability to recover from detected errors anywhere on the data path. To gain this benefit of ZFS, it needs to manage redundancy. On the upside, ZFS at least *detected* the errors, while other systems would not. --Toby On Tue, Jun 21, 2011 at 4:17 AM, Remco Lengers re...@lengers.com mailto:re...@lengers.com wrote: Todd, Is that ZFS on top of VxVM ? Are those volumes okay? I wonder if this is really a sensible combination? ..Remco On 6/21/11 7:36 AM, Todd Urie wrote: I have a zpool that shows the following from a zpool status -v zpool name ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool with data errors
didn't seem to we would need zfs to provide that redundancy also. There was a time when I fell for this line of reasoning too. The problem (if you want to call it that) with zfs is that it will show you, front and center, the corruption taking place in your stack. Since we're on SAN with Raid internally Your situation would suggest that your RAID silently corrupted data and didn't even know about it. Until you can trust the volumes behind zfs (and I don't trust any of them anymore, regardless of the brand name on the cabinet), give zfs at least some redundancy so that it can pick up the slack. By the way, I used to trust storage because I didn't believe it was corrupting data, but I had no proof one way or the other, so I gave it the benefit of the doubt. Since I have been using zfs, my standards have gone up considerably. Now I trust storage because I can *prove* it's correct. If someone can't prove that a volume is returning correct data, don't trust it. Let zfs manage it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)
On Sun, 19 Jun 2011, Richard Elling wrote: Yes. I've been looking at what the value of zfs_vdev_max_pending should be. The old value was 35 (a guess, but a really bad guess) and the new value is 10 (another guess, but a better guess). I observe that data from a fast, modern I am still using 5 here. :-) I haven't formed an opinion yet, but I'm inclined towards wanting overall better latency. Most properly implemented systems are not running at maximum capacity and so decreased latency is definitely desirable so that applications obtain the best CPU usage and short-lived requests do not clog the system. Typical benchmark scenarios (max sustained or peak throughput) do not represent most real-world usage. The 60 or 80% solution (with assured reasonable response time) is definitely better than the 99% solution when it comes to user satisfaction. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?
Hello Jim! I understood ZFS doesn't like slices but from your reply maybe I should reconsider. I have a few older servers with 4 bays x 73G. If I make a root mirror pool and swap on the other 2 as you suggest, then I would have about 63G x 4 left over. If so then I am back to wondering what to do about 4 drives. Is raidz1 worthwhile in this scenario? That is less redundancy that a mirror and much less than a 3 way mirror, isn't it? Is it even possible to do raidz2 on 4 slices? Or would 2, 2 way mirrors be better? I don't understand what RAID10 is, is it simply a stripe of two mirrors? Or would it be best to do a 3 way mirror and a hot spare? I would like to be able to tolerate losing one drive without loss of integrity. I will be doing new installs of Solaris 10. Is there an option in the installer for me to issue ZFS commands and set up pools or do I need to format the disks before installing and if so how do I do that? Thank you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?
Hello Marty! With four drives you could also make a RAIDZ3 set, allowing you to have the lowest usable space, poorest performance and worst resilver times possible. That's not funny. I was actually considering this :p But you have to admit, it would probably be somewhat reliable! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool with data errors
On Jun 21, 2011, at 2:54 PM, Todd Urie wrote: The volumes sit on HDS SAN. The only reason for the volumes is to prevent inadvertent import of the zpool on two nodes of a cluster simultaneously. Since we're on SAN with Raid internally, didn't seem to we would need zfs to provide that redundancy also. Not a wise way of building a pool. Your HDS SAN does not give any protection against data corruption and not doing redundancy with ZFS it can only report data corruption and not correct them. Also VxVM does not give you any more protection against importing the luns/volumes/pools than what ZFS gives. They both warn the admin if they are trying to shoot their leg but let them do it if they use the force. Time to rebuild your pool without VxVM involved and restore data from backups. Sami ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?
On 21 June, 2011 - Nomen Nescio sent me these 0,4K bytes: Hello Marty! With four drives you could also make a RAIDZ3 set, allowing you to have the lowest usable space, poorest performance and worst resilver times possible. That's not funny. I was actually considering this :p 4-way mirror would be way more useful. But you have to admit, it would probably be somewhat reliable! /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)
On Jun 21, 2011, at 8:18 AM, Garrett D'Amore wrote: Does that also go through disksort? Disksort doesn't seem to have any concept of priorities (but I haven't looked in detail where it plugs in to the whole framework). So it might make better sense for ZFS to keep the disk queue depth small for HDDs. -- richard disksort is much further down than zio priorities... by the time disksort sees them they have already been sorted in priority order. Yes, disksort is at sd. So ZFS schedules I/Os, disksort reorders them, and the drive reorders them again. To get the best advantage out of the ZFS priority ordering, I can make an argument to disable disksort and keep the vdev_max_pending low to limit the reordering work done by the drive. I am not convinced that traditional benchmarks show the effects of ZFS priority ordering, though. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dskinfo utility
Hello, I got tired of gathering disk information from different places when working with Solaris disks so I wrote a small utility for summarizing the most commonly used information. It is especially tricky to work with a large set of SAN disks using MPxIO you do not even see the logical unit number in the name of the disk so your have to use other commands to acquire that information per disk. The focus of the first version is ZFS so it does understand which disks are part of pools, later version might add other volume managers or filesystems. Besides the name of the disk, size and usage it can also show number of FC paths to disks, if it's labeled, driver type, logical unit number, vendor, serial and product names. Examples, mind the format, it looks good with 80 columns: $ dskinfo list disk sizeuse type c0t600144F8288C50B55BC58DB70001d0 499G - iscsi c5t0d0 149G rpooldisk c5t2d0 37G - disk c6t0d0 1.4T zpool01 disk c6t1d0 1.4T zpool01 disk c6t2d0 1.4T zpool01 disk # dskinfo list-long disk size lun use p spd type lb c1t0d0 136G - rpool - - disk y c1t1d0 136G - rpool - - disk y c6t6879120292610822533095343732d0 100G 0x1 zpool03 4 4Gb fcy c6t6879120292610822533095343734d0 100G 0x3 zpool03 4 4Gb fcy c6t6879120292610822533095343736d0 404G 0x5 zpool03 4 4Gb fcy c6t6879120292610822533095343745d0 5T 0xbzpool03 4 4Gb fcy # dskinfo list-full disk size hex dec p spd type lb use vendor product serial c0t0d0 68G - - - - disk y rpoolFUJITSU MAP3735N SUN72G - c0t1d0 68G - - - - disk y rpoolFUJITSU MAP3735N SUN72G - c1t1d0 16G - - - - disk y storage SEAGATE ST318404LSUN18G - c1t2d0 16G - - - - disk y storage FUJITSU MAJ3182M SUN18G - c1t3d0 16G - - - - disk y storage FUJITSU MAJ3182M SUN18G - c1t4d0 16G - - - - disk y storage FUJITSU MAG3182L SUN18G - c1t5d0 16G - - - - disk y storage FUJITSU MAJ3182M SUN18G - c1t6d0 16G - - - - disk y storage FUJITSU MAJ3182M SUN18G - I'we been using it for myself for a while now, I thought it might fill a need so I am making the current version available for download. Download link and some other information can be found here: http://sparcv9.blogspot.com/2011/06/solaris-dskinfo-utility.html Regards Henrik http://sparcv9.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss