Re: [zfs-discuss] Finding disks [was: # disks per vdev]

2011-06-21 Thread Lanky Doodle
Thanks for all the replies.

I have a pretty good idea how the disk enclosure assigns slot locations so 
should be OK.

One last thing - I see thet Supermicro has just released a newer version of the 
card I mentioned in the first post that supports SATA 6Gbps. From what I can 
see it uses the Marvell 9480 controller, which I don't think is supported in 
Solaris Express 11 yet.

Does this mean it strictly won't work (ie no available drivers) or that it just 
wouldn't be supported if there's problems?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool with data errors

2011-06-21 Thread Tomas Ögren
On 21 June, 2011 - Todd Urie sent me these 5,9K bytes:

 I have a zpool that shows the following from a zpool status -v zpool name
 
 brsnnfs0104 [/var/spool/cron/scripts]# zpool status -v ABC0101
   pool:ABC0101
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: none requested
 config:
 
 NAME  STATE READ WRITE CKSUM
 ABC0101   ONLINE   0 010
   /dev/vx/dsk/ABC01dg/ABC0101_01  ONLINE   0 0 2
   /dev/vx/dsk/ABC01dg/ABC0101_02  ONLINE   0 0 8
   /dev/vx/dsk/ABC01dg/ABC0101_03  ONLINE   0 010
 
 errors: Permanent errors have been detected in the following files:
 
 /clients/ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/717b52282ea059452621587173561360
 /clients/
 ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/6e6a9f37c4d13fdb3dcb8649272a2a49
 /clients/ABC0101/rep/d0/prod1/reports/ReutersCMOLoad/ReutersCMOLoad.
 ABCntss001.20110620.141330.26496.ROLLBACK_FOR_UPDATE_COUPONS.html
 /clients/
 ABC0101/rep/local/bfm/web/htdocs/tmp/G2_0.related_detail_loader.1308593666.54643.n5cpoli3355.data
 /clients/
 ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/F_OLPO82_A.gp.
 ABCIM_GA.nlaf.xml.gz
 /clients/
 ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNVLXCIAFI.gp.
 ABCIM_GA.nlaf.xml.gz
 /clients/
 ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNIVLEXCIA.gp.BARCRATING_
 ABC.nlaf.xml.gz
 
 I think that a scrub at least has the possibility to clear this up.  A quick
 search suggests that others have had some good experience with using scrub
 in similar circumstances.  I was wondering if anyone could share some of
 their experiences, good and bad, so that I can assess the risk and
 probability of success with this approach.  Also, any other ideas would
 certainly be appreciated.

As you have no ZFS based redundancy, it can only detect that some blocks
delivered from the devices (SAN I guess?) were broken according to the
checksum. If you had raidz/mirror in zfs, it would have corrected the
problems and written back correct data to the malfunctioning device. Now
it does not. A scrub only reads the data and verifies that data matches
checksums.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool with data errors

2011-06-21 Thread Remco Lengers

Todd,

Is that ZFS on top of VxVM ?  Are those volumes okay? I wonder if this 
is really a sensible combination?


..Remco

On 6/21/11 7:36 AM, Todd Urie wrote:
I have a zpool that shows the following from a zpool status -v zpool 
name


brsnnfs0104 [/var/spool/cron/scripts]# zpool status -v ABC0101
  pool: ABC0101
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
ABC0101   ONLINE   0 010
  /dev/vx/dsk/ ABC01dg/ ABC0101_01  ONLINE   0 0 2
  /dev/vx/dsk/ ABC01dg/ ABC0101_02  ONLINE   0 0 8
  /dev/vx/dsk/ ABC01dg/ ABC0101_03  ONLINE   0 010

errors: Permanent errors have been detected in the following files:

/clients/ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/717b52282ea059452621587173561360
/clients/ 
ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/6e6a9f37c4d13fdb3dcb8649272a2a49
/clients/ 
ABC0101/rep/d0/prod1/reports/ReutersCMOLoad/ReutersCMOLoad. 
ABCntss001.20110620.141330.26496.ROLLBACK_FOR_UPDATE_COUPONS.html
/clients/ 
ABC0101/rep/local/bfm/web/htdocs/tmp/G2_0.related_detail_loader.1308593666.54643.n5cpoli3355.data
/clients/ 
ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/F_OLPO82_A.gp. 
ABCIM_GA.nlaf.xml.gz
/clients/ 
ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNVLXCIAFI.gp. 
ABCIM_GA.nlaf.xml.gz
/clients/ 
ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNIVLEXCIA.gp.BARCRATING_ 
ABC.nlaf.xml.gz


I think that a scrub at least has the possibility to clear this up.  A 
quick search suggests that others have had some good experience with 
using scrub in similar circumstances.  I was wondering if anyone could 
share some of their experiences, good and bad, so that I can assess 
the risk and probability of success with this approach.  Also, any 
other ideas would certainly be appreciated.



-RTU


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool with data errors

2011-06-21 Thread Todd Urie
The volumes sit on HDS SAN.  The only reason for the volumes is to prevent
inadvertent import of the zpool on two nodes of a cluster simultaneously.
 Since we're on SAN with Raid internally, didn't seem to we would need zfs
to provide that redundancy also.

On Tue, Jun 21, 2011 at 4:17 AM, Remco Lengers re...@lengers.com wrote:

 **
 Todd,

 Is that ZFS on top of VxVM ?  Are those volumes okay? I wonder if this is
 really a sensible combination?

 ..Remco


 On 6/21/11 7:36 AM, Todd Urie wrote:

 I have a zpool that shows the following from a zpool status -v zpool name


  brsnnfs0104 [/var/spool/cron/scripts]# zpool status -v  ABC0101
   pool: ABC0101
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: none requested
 config:

  NAME  STATE READ WRITE CKSUM
  ABC0101   ONLINE   0 010
   /dev/vx/dsk/ ABC01dg/ ABC0101_01  ONLINE   0 0 2
   /dev/vx/dsk/ ABC01dg/ ABC0101_02  ONLINE   0 0 8
   /dev/vx/dsk/ ABC01dg/ ABC0101_03  ONLINE   0 010

  errors: Permanent errors have been detected in the following files:

 /clients/ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/717b52282ea059452621587173561360
 /clients/
 ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/6e6a9f37c4d13fdb3dcb8649272a2a49
 /clients/
 ABC0101/rep/d0/prod1/reports/ReutersCMOLoad/ReutersCMOLoad.
 ABCntss001.20110620.141330.26496.ROLLBACK_FOR_UPDATE_COUPONS.html
 /clients/
 ABC0101/rep/local/bfm/web/htdocs/tmp/G2_0.related_detail_loader.1308593666.54643.n5cpoli3355.data
 /clients/
 ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/F_OLPO82_A.gp.
 ABCIM_GA.nlaf.xml.gz
 /clients/
 ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNVLXCIAFI.gp.
 ABCIM_GA.nlaf.xml.gz
 /clients/
 ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNIVLEXCIA.gp.BARCRATING_
 ABC.nlaf.xml.gz

  I think that a scrub at least has the possibility to clear this up.  A
 quick search suggests that others have had some good experience with using
 scrub in similar circumstances.  I was wondering if anyone could share some
 of their experiences, good and bad, so that I can assess the risk and
 probability of success with this approach.  Also, any other ideas would
 certainly be appreciated.


 -RTU


 ___
 zfs-discuss mailing 
 listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
-RTU
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool with data errors

2011-06-21 Thread Toby Thain
On 21/06/11 7:54 AM, Todd Urie wrote:
 The volumes sit on HDS SAN.  The only reason for the volumes is to
 prevent inadvertent import of the zpool on two nodes of a cluster
 simultaneously.  Since we're on SAN with Raid internally, didn't seem to
 we would need zfs to provide that redundancy also.

You do if you want self-healing, as Tomas points out. A non-redundant
pool, even on mirrored or RAID storage, offers no ability to recover
from detected errors anywhere on the data path. To gain this benefit of
ZFS, it needs to manage redundancy.

On the upside, ZFS at least *detected* the errors, while other systems
would not.

--Toby

 
 On Tue, Jun 21, 2011 at 4:17 AM, Remco Lengers re...@lengers.com
 mailto:re...@lengers.com wrote:
 
 Todd,
 
 Is that ZFS on top of VxVM ?  Are those volumes okay? I wonder if
 this is really a sensible combination?
 
 ..Remco
 
 
 On 6/21/11 7:36 AM, Todd Urie wrote:
 I have a zpool that shows the following from a zpool status -v
 zpool name
...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool with data errors

2011-06-21 Thread Marty Scholes
 didn't seem to we would need zfs to provide that redundancy also.

There was a time when I fell for this line of reasoning too.  The problem (if 
you want to call it that) with zfs is that it will show you, front and center, 
the corruption taking place in your stack.

 Since we're on SAN with Raid internally

Your situation would suggest that your RAID silently corrupted data and didn't 
even know about it.

Until you can trust the volumes behind zfs (and I don't trust any of them 
anymore, regardless of the brand name on the cabinet), give zfs at least some 
redundancy so that it can pick up the slack.

By the way, I used to trust storage because I didn't believe it was corrupting 
data, but I had no proof one way or the other, so I gave it the benefit of the 
doubt.

Since I have been using zfs, my standards have gone up considerably.  Now I 
trust storage because I can *prove* it's correct.

If someone can't prove that a volume is returning correct data, don't trust it. 
 Let zfs manage it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-21 Thread Bob Friesenhahn

On Sun, 19 Jun 2011, Richard Elling wrote:


Yes. I've been looking at what the value of zfs_vdev_max_pending should be.
The old value was 35 (a guess, but a really bad guess) and the new value is
10 (another guess, but a better guess).  I observe that data from a fast, modern


I am still using 5 here. :-)


I haven't formed an opinion yet, but I'm inclined towards wanting overall
better latency.


Most properly implemented systems are not running at maximum capacity 
and so decreased latency is definitely desirable so that applications 
obtain the best CPU usage and short-lived requests do not clog the 
system.  Typical benchmark scenarios (max sustained or peak 
throughput) do not represent most real-world usage.  The 60 or 80% 
solution (with assured reasonable response time) is definitely better 
than the 99% solution when it comes to user satisfaction.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-21 Thread Dave U . Random
Hello Jim! I understood ZFS doesn't like slices but from your reply maybe I
should reconsider. I have a few older servers with 4 bays x 73G. If I make a
root mirror pool and swap on the other 2 as you suggest, then I would have
about 63G x 4 left over. If so then I am back to wondering what to do about
4 drives. Is raidz1 worthwhile in this scenario? That is less redundancy
that a mirror and much less than a 3 way mirror, isn't it? Is it even
possible to do raidz2 on 4 slices? Or would 2, 2 way mirrors be better? I
don't understand what RAID10 is, is it simply a stripe of two mirrors? Or
would it be best to do a 3 way mirror and a hot spare? I would like to be
able to tolerate losing one drive without loss of integrity.

I will be doing new installs of Solaris 10. Is there an option in the
installer for me to issue ZFS commands and set up pools or do I need to
format the disks before installing and if so how do I do that? Thank you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-21 Thread Nomen Nescio
Hello Marty! 

 With four drives you could also make a RAIDZ3 set, allowing you to have
 the lowest usable space, poorest performance and worst resilver times
 possible.

That's not funny. I was actually considering this :p

But you have to admit, it would probably be somewhat reliable!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool with data errors

2011-06-21 Thread Sami Ketola

On Jun 21, 2011, at 2:54 PM, Todd Urie wrote:

 The volumes sit on HDS SAN.  The only reason for the volumes is to prevent 
 inadvertent import of the zpool on two nodes of a cluster simultaneously.  
 Since we're on SAN with Raid internally, didn't seem to we would need zfs to 
 provide that redundancy also.


Not a wise way of building a pool. Your HDS SAN does not give any protection 
against data corruption and not doing redundancy with ZFS it can only report 
data corruption and not correct them. Also VxVM does not give you any more 
protection against importing the luns/volumes/pools than what ZFS gives. They 
both warn the admin if they are trying to shoot their leg but let them do it if 
they use the force.

Time to rebuild your pool without VxVM involved and restore data from backups.

Sami
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-21 Thread Tomas Ögren
On 21 June, 2011 - Nomen Nescio sent me these 0,4K bytes:

 Hello Marty! 
 
  With four drives you could also make a RAIDZ3 set, allowing you to have
  the lowest usable space, poorest performance and worst resilver times
  possible.
 
 That's not funny. I was actually considering this :p

4-way mirror would be way more useful.

 But you have to admit, it would probably be somewhat reliable!

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-21 Thread Richard Elling
On Jun 21, 2011, at 8:18 AM, Garrett D'Amore wrote:
 
 Does that also go through disksort? Disksort doesn't seem to have any 
 concept of priorities (but I haven't looked in detail where it plugs in to 
 the whole framework).
 
 So it might make better sense for ZFS to keep the disk queue depth small 
 for HDDs.
 -- richard
  
 
 
 disksort is much further down than zio priorities... by the time disksort 
 sees them they have already been sorted in priority order.

Yes, disksort is at sd. So ZFS schedules I/Os, disksort reorders them, and the 
drive reorders them again.
To get the best advantage out of the ZFS priority ordering, I can make an 
argument to disable disksort and
keep the vdev_max_pending low to limit the reordering work done by the drive. I 
am not convinced that
traditional benchmarks show the effects of ZFS priority ordering, though.
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] dskinfo utility

2011-06-21 Thread Henrik Johansson
Hello,

I got tired of gathering disk information from different places when working 
with Solaris disks so I wrote a small utility for summarizing the most commonly 
used information.

It is especially tricky to work with a large set of SAN disks using MPxIO you 
do not even see the logical unit number in the name of the disk so your have to 
use other commands to acquire that information per disk.

The focus of the first version is ZFS so it does understand which disks are 
part of pools, later version might add other volume managers or filesystems.

Besides the name of the disk, size and usage it can also show number of FC 
paths to disks, if it's  labeled, driver type, logical unit number, vendor, 
serial and product names.

Examples, mind the format, it looks good with 80 columns:
$ dskinfo list  
disk
 sizeuse  type 
c0t600144F8288C50B55BC58DB70001d0 499G  -   iscsi 
c5t0d0  
  149G rpooldisk 
c5t2d0  
  37G  - disk 
c6t0d0  
  1.4T zpool01 disk 
c6t1d0  
  1.4T zpool01 disk 
c6t2d0  
  1.4T zpool01 disk 

# dskinfo list-long
disk   size lun   use   p   spd type  lb
c1t0d0 136G - rpool -   -   disk  y
c1t1d0 136G - rpool -   -   disk  y
c6t6879120292610822533095343732d0  100G 0x1   zpool03 4   4Gb fcy
c6t6879120292610822533095343734d0 100G 0x3   zpool03 4   4Gb fcy
c6t6879120292610822533095343736d0 404G 0x5   zpool03 4   4Gb fcy
c6t6879120292610822533095343745d0 5T  0xbzpool03 4   4Gb fcy

# dskinfo list-full
disk   size hex   dec p   spd type  lb
  use  vendor   product  serial  
c0t0d0 68G  - -   -   -   disk  y 
  rpoolFUJITSU  MAP3735N SUN72G  -   
c0t1d0 68G  - -   -   -   disk  y 
  rpoolFUJITSU  MAP3735N SUN72G  -   
c1t1d0 16G  - -   -   -   disk  y 
  storage  SEAGATE  ST318404LSUN18G  -   
c1t2d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAJ3182M SUN18G  -   
c1t3d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAJ3182M SUN18G  -   
c1t4d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAG3182L SUN18G  -   
c1t5d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAJ3182M SUN18G  -   
c1t6d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAJ3182M SUN18G  -   

I'we been using it for myself for a while now, I thought it might fill a need 
so I am making the current version available for download. Download link and 
some other information can be found here: 
http://sparcv9.blogspot.com/2011/06/solaris-dskinfo-utility.html

Regards

Henrik
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss