Re: [zfs-discuss] Copying thousands of small files on an expanded ZFS pool crawl to a poor performance-not on other pools.

2009-03-24 Thread Roch

Hi Noel.

zpool iostat -v

For a working pool and for a problem pool would help to see
the type of pool and it's capacity.

I assume the problem is not the source of the data.
To read large number of small files typically requires lots
and lots of threads (say 100 per source disks).

Is data coming into the pool through NFS/CIFS/direct ?

-r


Jim Mauro writes:
  
  Cross-posting to the public ZFS discussion alias.
  There's  nothing here that requires confidentiallity, and
  the public alias is a much broader audience with a larger
  number of experienced ZFS users...
  
  As to the issue - what is the free space disparity
  across the pools? Is the one particular pool significantly
  tighter on free space than the other pools (zpool list)?
  
  Thanks,
  /jim
  
  Nobel Shelby wrote:
   Customer has many large zfs pools..He does the same on all pools:
   Copying overnight large amounts of small files (1-5K).
   All but one particular pool (that has been expanded) gives them this 
   problem:
   --the copying within a few minutes crawls and the zpool looks 
   unresponsive.
  
   Background:
   He had to grow this particular pool twice over a period of time (it 
   was 6TB and it grew by 4TB twice-now it is 14TB)
   Solaris was U4 but now is U6.
  
   They have limited the arc:
   set zfs:zfs_arc_max=0x1
   and
   zfs:zfs_nocacheflush=1 (they have a 6540 array).
  
   Does expanding the pool affects performance and if so what is the best 
   way to recover
   (other than rebuilding the pool)
  
   Thanks,
   -Nobel
  
  
  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpools on USB zpool.cache zpool import

2009-03-24 Thread Richard Elling

Damon Atkins wrote:

The zpool.cache file makes clustering complex. {Assume the man page is
still correct}


The man page is correct.  zpool.cache helps make clustering feasible
because it differentiates those file systems which are of interest from
those which are not.  This is particularly important for environments
where storage is shared: SAN, NAS, etc.



From the zpool man page:

cachefile=path | none

Controls the location of where the pool configuration is cached.
Discovering all pools on system startup requires a cached copy of the
configuration data that  is  stored on  the  root  file  system. All
pools in this cache are automatically  imported  when  the  system  
boots.


 Some  environments,  such  as  install and clustering, need to
cache this information in a different location so that pools are not
automatically imported. *

Setting this property caches the pool configuration in a different
location  that can later be imported with zpool import -c.
... When the last pool using  a cache file  is  exported  or
destroyed,  the  file  is removed.

zpool import [-d dir | -c cachefile] [-D]

Lists pools available to import. If the -d option is not
specified,   this   command   searches  for  devices  in
/dev/dsk.
--
A truss of zpool import indicates that it is not multi-threaded when
scanning for disks. ie. it scans 1 at a time instead of X at a time. So
it does take a while to run. Would be nice if this was multi-threaded.


But you are complaining about removable media, so there is a timing
issue.  The fixed device locations are known early in the boot, as they
are part of the boot archive.  Removable devices are enumerated later.

What I think you are complaining about is that if you have a zpool on a
removable device, you want it to import when you plug it in. That
functionality exists elsewhere -- not needed in ZFS, per se.



If the cache file is to stay, it should do a scan of /dev to fix itself
at boot if something is wrong, and report it is doing a scan to the
console. esp if it is not multi-threaded.

PS it would be nice to have a zpool diskinfo devicepath reports  if
the device belongs to a zpool imported or not, and all the details about
any zpool it can find on the disk. e.g. file-systems (zdb is only for
ZFS engineers says the man page). 'zpool import' needs an option to
list the file systems of a pool which is not yet imported and its
properties so you can have more information about it before importing it.


Good idea.  Please file an RFE.
http://bugs.opensolaris.org category solaris/kernel/zfs
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Reliability at power failure?

2009-03-24 Thread Uwe Dippel
Since I moved to ZFS, sorry, I tend to have more problems after power 
failures. We have around 1 outage per week, in average, and the 
machine(s) don't boot up as one might expect (from ZFS).
Just today: reboot, and rebooting in circles; with no chance on my side 
to see the 30-40 lines of hex-stuff before the boot process recycles. 
That's already bad.

So, let's try failsafe (all on nv_110). No better:

Configuring /dev
relocation error: R_AMD64_PC32: file /kernel/dev/amd64/zfs: symbol 
down_object_opo_relocate failed [not fully correctly noted on my side]

zfs error doing relocations
Searching for installed OS instances ...
/sbin/install-recovery[7]: 72 segmentation Fault
no installed OS instance found.
Starting shell.

init 6 brought back the failsafe, and there a boot archive was noted as 
damaged, and could be repaired, and the machine restarted after another 
init 6.


At earlier boot failures after a power outage, the behaviour was 
different, but the boot archive was recognized as inconsistent a handful 
of times. This bugs me. Otherwise, the machines run through without 
trouble, and with ZFS, the chances for a damaged boot archive should be 
zero. Here it approaches a two-digit percentage. No flame, but when the 
machine(s) run that OS that usually uses ext3, the damage is less often, 
and the repair more straightforward.


I'd really be curious to know, where the problem could lie here, under 
the assumption that it was not the file system that corrupts the boot 
archive.


Uwe


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and the copies property

2009-03-24 Thread Robert Parkhurst
Hello,

 

I have a question about using the `copies` option in zfs.

 

If I were to make a non-redundant zpool of say 3 hard drives, but set
the `copies` option to something like 2 or 3, would that protect me in
the event of a hard drive failure?  Or would raidz be the only way to
really protect against the loss of a hard drive?

 

 

 

 

Thank you,

 

 

Robert Parkhurst

Systems Administrator

Gresham Enterprise Storage

www.greshamstorage.com

O:  512.407.2694

M:  512.698.7419

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpools on USB zpool.cache zpool import

2009-03-24 Thread Damon Atkins
The zpool.cache file makes clustering complex. {Assume the man page is 
still correct}


From the zpool man page:

cachefile=path | none

Controls the location of where the pool configuration is cached. 
Discovering all pools on system startup requires a cached copy of the 
configuration data that  is  stored on  the  root  file  system. All 
pools in this cache are automatically  imported  when  the  system  boots. 

 Some  environments,  such  as  install and clustering, need to 
cache this information in a different location so that pools are not 
automatically imported. *


Setting this property caches the pool configuration in a different 
location  that can later be imported with zpool import -c.
... When the last pool using  a cache file  is  exported  or  
destroyed,  the  file  is removed.


zpool import [-d dir | -c cachefile] [-D]

Lists pools available to import. If the -d option is not
specified,   this   command   searches  for  devices  in
/dev/dsk.
--
A truss of zpool import indicates that it is not multi-threaded when 
scanning for disks. ie. it scans 1 at a time instead of X at a time. So 
it does take a while to run. Would be nice if this was multi-threaded.


If the cache file is to stay, it should do a scan of /dev to fix itself 
at boot if something is wrong, and report it is doing a scan to the 
console. esp if it is not multi-threaded.


PS it would be nice to have a zpool diskinfo devicepath reports  if 
the device belongs to a zpool imported or not, and all the details about 
any zpool it can find on the disk. e.g. file-systems (zdb is only for 
ZFS engineers says the man page). 'zpool import' needs an option to 
list the file systems of a pool which is not yet imported and its 
properties so you can have more information about it before importing it.


Cheers
 Original Message 



On Mon, Mar 23, 2009 at 4:45 PM, Mattias Pantzare pantz...@gmail.com 
mailto:pantz...@gmail.com wrote:




If I put my disks on a diffrent controler zfs won't find them when I
boot. That is bad. It is also an extra level of complexity.


Correct me if I'm wrong, but wading through all of your comments, I 
believe what you would like to see is zfs automatically scan if the 
cache is invalid vs. requiring manual intervention, no? 

It would seem to me this would be rather sane behavior and a 
legitimate request to add this as an option.


--Tim



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reliability at power failure?

2009-03-24 Thread Richard Elling

Uwe Dippel wrote:
Since I moved to ZFS, sorry, I tend to have more problems after power 
failures. We have around 1 outage per week, in average, and the 
machine(s) don't boot up as one might expect (from ZFS).
Just today: reboot, and rebooting in circles; with no chance on my 
side to see the 30-40 lines of hex-stuff before the boot process 
recycles. That's already bad.

So, let's try failsafe (all on nv_110). No better:

Configuring /dev
relocation error: R_AMD64_PC32: file /kernel/dev/amd64/zfs: symbol 
down_object_opo_relocate failed [not fully correctly noted on my side]

zfs error doing relocations
Searching for installed OS instances ...
/sbin/install-recovery[7]: 72 segmentation Fault
no installed OS instance found.
Starting shell.

init 6 brought back the failsafe, and there a boot archive was noted 
as damaged, and could be repaired, and the machine restarted after 
another init 6.


At earlier boot failures after a power outage, the behaviour was 
different, but the boot archive was recognized as inconsistent a 
handful of times. This bugs me. Otherwise, the machines run through 
without trouble, and with ZFS, the chances for a damaged boot archive 
should be zero. Here it approaches a two-digit percentage. No flame, 
but when the machine(s) run that OS that usually uses ext3, the damage 
is less often, and the repair more straightforward.


I'd really be curious to know, where the problem could lie here, under 
the assumption that it was not the file system that corrupts the boot 
archive.


I don't think this is a file system issue. It is a boot archive update 
issue.

Check the boot-interest archive for more discussions.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and the copies property

2009-03-24 Thread Darren J Moffat

Robert Parkhurst wrote:

I have a question about using the `copies` option in zfs.

If I were to make a non-redundant zpool of say 3 hard drives, but set 
the `copies` option to something like 2 or 3, would that protect me in 
the event of a hard drive failure? 


No it won't because if a drive completely fails then the pool won't 
import.  The copies option is to help with partial failures of a drive 
but doesn't help with complete failure if there is no other redundancy 
in the pool level devices.


Or would raidz be the only way to 
really protect against the loss of a hard drive?


With only 3 disks then raidz is your only choice but note that you can't 
boot from a raidz or a stripe only mirrors (1 or more sides) are 
supported for booting from.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reliability at power failure?

2009-03-24 Thread C.

Uwe Dippel wrote:


At earlier boot failures after a power outage, the behaviour was 
different, but the boot archive was recognized as inconsistent a 
handful of times. This bugs me. Otherwise, the machines run through 
without trouble, and with ZFS, the chances for a damaged boot archive 
should be zero. Here it approaches a two-digit percentage. No flame, 
but when the machine(s) run that OS that usually uses ext3, the damage 
is less often, and the repair more straightforward.


I'd really be curious to know, where the problem could lie here, under 
the assumption that it was not the file system that corrupts the boot 
archive.


I've worked hard to resolve this problem.. google opensolaris rescue 
will show I've hit it a few times...  Anyway, short version is it's not 
zfs at all, but stupid handling of bootarchive.  If you've installed 
something like a 3rd party driver (OSS/Virtualbox) you'll likely hit 
this bug.


./C
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reliability at power failure?

2009-03-24 Thread Uwe Dippel

C. wrote:


I've worked hard to resolve this problem.. google opensolaris rescue 
will show I've hit it a few times...  Anyway, short version is it's 
not zfs at all, but stupid handling of bootarchive.  If you've 
installed something like a 3rd party driver (OSS/Virtualbox) you'll 
likely hit this bug.


You might have hit the nail on the head. My two candidates could be 
either Nvidia or VirtualBox.
Still, ought boot archive not be an independent process, that creates a 
proper backup in case of any modification, from any stupid handling? 
Should a recycling reboot not be noted, if just by a flag (in case we 
have r/w of a drive), including a redirection of the messages into a 
file? (Okay, that's off-topic in this list).
Should ZFS not keep track of a proper roll-back point to offer to boot 
to in case of failing/recycling boots? Maybe something like 'last 
successful boot'?


Uwe

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Copying thousands of small files on an expanded ZFS pool crawl to a poor performance-not on other pools.

2009-03-24 Thread Nobel Shelby

Jim,
There is no space constraints nor quotas...
Thanks,
-Nobel

Jim Mauro wrote:


Cross-posting to the public ZFS discussion alias.
There's  nothing here that requires confidentiallity, and
the public alias is a much broader audience with a larger
number of experienced ZFS users...

As to the issue - what is the free space disparity
across the pools? Is the one particular pool significantly
tighter on free space than the other pools (zpool list)?

Thanks,
/jim

Nobel Shelby wrote:

Customer has many large zfs pools..He does the same on all pools:
Copying overnight large amounts of small files (1-5K).
All but one particular pool (that has been expanded) gives them this 
problem:
--the copying within a few minutes crawls and the zpool looks 
unresponsive.


Background:
He had to grow this particular pool twice over a period of time (it 
was 6TB and it grew by 4TB twice-now it is 14TB)

Solaris was U4 but now is U6.

They have limited the arc:
set zfs:zfs_arc_max=0x1
and
zfs:zfs_nocacheflush=1 (they have a 6540 array).

Does expanding the pool affects performance and if so what is the 
best way to recover

(other than rebuilding the pool)

Thanks,
-Nobel




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reliability at power failure?

2009-03-24 Thread Richard Elling

Uwe Dippel wrote:

C. wrote:


I've worked hard to resolve this problem.. google opensolaris rescue 
will show I've hit it a few times... Anyway, short version is it's 
not zfs at all, but stupid handling of bootarchive. If you've 
installed something like a 3rd party driver (OSS/Virtualbox) you'll 
likely hit this bug.


You might have hit the nail on the head. My two candidates could be 
either Nvidia or VirtualBox.
Still, ought boot archive not be an independent process, that creates 
a proper backup in case of any modification, from any stupid handling? 
Should a recycling reboot not be noted, if just by a flag (in case we 
have r/w of a drive), including a redirection of the messages into a 
file? (Okay, that's off-topic in this list).
Should ZFS not keep track of a proper roll-back point to offer to boot 
to in case of failing/recycling boots? Maybe something like 'last 
successful boot'?


All good points, but not appropriate for this list. Please
redirect to boot-interest.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpools on USB zpool.cache

2009-03-24 Thread Robert Milkowski
Hello Mattias,

Monday, March 23, 2009, 9:08:53 PM, you wrote:


MP It would be nice to be able to move disks around when a system is
MP powered off and not have to worry about a cache when I boot.

You don't have to unless you are talking about share disks and
importing a pool on another system while the original is powered off
and the pool was not exported...

For a configuration when disks are not shared among different systems
you can move disk around without worrying about zpool.cache


-- 
Best regards,
 Robert Milkowski
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 Thumper, config for boot disks?

2009-03-24 Thread Robert Milkowski
Hello Richard,

Friday, March 20, 2009, 12:23:40 AM, you wrote:


RE It depends on your BIOS.  AFAIK, there is no way for the BIOS to
RE tell the installer which disks are valid boot disks.  For OBP (SPARC)
RE systems, you can have the installer know which disks are available
RE for booting.

IIRC biosdev can actually extract such information.
Caiman marks such disks as bootable and marks them separately
(there's an RFE to better present them in cases like x4500 where boot
disks are hard to find among so many disks drives)


-- 
Best regards
 Robert Milkowski http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Dennis Clarke

I'm happy to see that someone else brought up this topic. I had a nasty
long power failure last night that drained the APC/UPS batteries dry.[1]
:-(

I changed the subject line somewhat because I feel that the issue is one
of honesty as opposed to reliability.

I *feel* that ZFS is reliable out past six nines ( rho=0.99 ) flawless
for two reasons; I have never seen it fail me and I have pounded it with
some fairly offensive abuse under terrible conditions[2], and secondly
because everyone in the computer industry is trying to
steal^H^H^H^H^Himplement it into their OS of choice. There must be a
reason for that.

However, I have repeatedly run into problems when I need to boot after a
power failure. I see vdevs being marked as FAULTED regardless if there are
actually any hard errors reported by the on disk SMART Firmware. I am able
to remove these FAULTed devices temporarily and then re-insert the same
disk again and then run fine for months. Until the next long power
failure.

This is where honestly becomes a question because I have to question the
severity of the FAULT when I know from past experience that the disk(s) in
question can be removed and then re-inserted and life is fine for months.
Were harddisk manufacturers involved in this error message logic? :-P

A power failure, a really nice long one, happened last night and again
when I boot up I see nasty error messages.

Here is *precisely* what I saw last night :

{3} ok boot -s
Resetting ...


Sun Fire 480R, No Keyboard
Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.34, 16384 MB memory installed, Serial #53264354.
Ethernet address 0:3:ba:2c:bf:e2, Host ID: 832cbfe2.

Rebooting with command: boot -s
Boot device: /p...@9,60/SUNW,q...@2/f...@0,0/d...@w2104cfb6f0ff,0:a 
File and args: -s
SunOS Release 5.10 Version Generic_13-03 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Booting to milestone milestone/single-user:default.
Hostname: jupiter
Requesting System Maintenance Mode
SINGLE USER MODE

Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode
Mar 24 01:28:04 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc.   SunOS 5.10  Generic January 2005
#

 /***/
 /* the very first thing I check is zpool fibre0*/
 /***/

# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
fibre0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t16d0  ONLINE   0 0 0
c5t0d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t1d0   ONLINE   0 0 0
c2t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t2d0   ONLINE   0 0 0
c2t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t20d0  ONLINE   0 0 0
c5t4d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t21d0  ONLINE   0 0 0
c5t6d0   ONLINE   0 0 0
spares
  c2t22d0AVAIL

errors: No known data errors

 *
 * everything looks fine, okay, thank you to ZFS *
 * ... and then I try to boot to full init 3  
   *
 *

# exit
svc.startd: Returning to milestone all.
Reading ZFS config: done.
Mounting ZFS filesystems: (1/51)

jupiter console l(51/51)
root
Password:
Last login: Sat Mar  7 19:39:00 on console
Sun Microsystems Inc.   SunOS 5.10  Generic January 2005
# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
fibre0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t16d0  ONLINE   0 0 0
c5t0d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t1d0   ONLINE   0 0 0
c2t17d0  ONLINE   0

Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Bob Friesenhahn

On Tue, 24 Mar 2009, Dennis Clarke wrote:


However, I have repeatedly run into problems when I need to boot after a
power failure. I see vdevs being marked as FAULTED regardless if there are
actually any hard errors reported by the on disk SMART Firmware. I am able
to remove these FAULTed devices temporarily and then re-insert the same
disk again and then run fine for months. Until the next long power
failure.


In spite of huge detail, you failed to describe to us the technology 
used to communicate with these disks.  The interface adaptors, 
switches, and wiring topology could make a difference.



Is there *really* a severe fault in that disk ?

# luxadm -v display 2118625d599d


This sounds some some sort of fiber channel.


Transport protocol: IEEE 1394 (SBP-2)


Interesting that it mentions the protocol used by FireWire.

If you are using fiber channel, the device names in the pool 
specification suggest that Solaris multipathing is not being used (I 
would expect something long like 
c4t600A0B800039C9B50A9C47B4522Dd0).  If multipathing is not used, 
then you either have simplex connectivity, or two competing simplex 
paths to each device.  Multipathing is recommended if you have 
redundant paths available.


If the disk itself is not aware of its severe faults then that 
suggests that there is a transient problem with communicating with the 
disk.  The problem could be in a device driver, adaptor card, FC 
switch, or cable.  If the disk drive also lost power, perhaps the disk 
is unusually slow at spinning up.


It is easy to blame ZFS for problems.  On my system I was experiencing 
system crashes overnight while running 'zfs scrub' via cron job.  The 
fiber channel card was locking up.  Eventually I learned that it was 
due to a bug in VirtualBox's device driver.  If VirtualBox was not 
left running overnight, then the system would not crash.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Dennis Clarke

 On Tue, 24 Mar 2009, Dennis Clarke wrote:

 However, I have repeatedly run into problems when I need to boot after a
 power failure. I see vdevs being marked as FAULTED regardless if there
 are
 actually any hard errors reported by the on disk SMART Firmware. I am
 able
 to remove these FAULTed devices temporarily and then re-insert the same
 disk again and then run fine for months. Until the next long power
 failure.

 In spite of huge detail, you failed to describe to us the technology
 used to communicate with these disks.  The interface adaptors,
 switches, and wiring topology could make a difference.

Nothing fancy. Dual QLogic ( Sun ) fibre cards directly connected to the
back of A5200's. Simple really.

 Is there *really* a severe fault in that disk ?

 # luxadm -v display 2118625d599d

 This sounds some some sort of fiber channel.

 Transport protocol: IEEE 1394 (SBP-2)

 Interesting that it mentions the protocol used by FireWire.

I have no idea where that is coming from.

 If you are using fiber channel, the device names in the pool
 specification suggest that Solaris multipathing is not being used (I
 would expect something long like
 c4t600A0B800039C9B50A9C47B4522Dd0).  If multipathing is not used,
 then you either have simplex connectivity, or two competing simplex
 paths to each device.  Multipathing is recommended if you have
 redundant paths available.

Yes, I have another machine that has mpxio in place. However a power
failure also trips phantom faults.

 If the disk itself is not aware of its severe faults then that
 suggests that there is a transient problem with communicating with the
 disk.

You would think so eh?
But a transient problem that only occurs after a power failure?

 The problem could be in a device driver, adaptor card, FC
 switch, or cable.  If the disk drive also lost power, perhaps the disk
 is unusually slow at spinning up.

All disks were up at boot, you can see that when I ask for a zpool status
at boot time in single user mode. No errors and no faults.

The issue seems to be when fmadm starts up or perhaps some other service
that can thrown a fault. I'm not sure.

 It is easy to blame ZFS for problems.

It is easy to blame a power failure for problems as well as an nice shiney
new APC Smart-UPS XL 3000VA RM 3U unit with external extended run time
battery that doesn't signal a power failure.

I never blame ZFS for anything.

 On my system I was experiencing
 system crashes overnight while running 'zfs scrub' via cron job.  The
 fiber channel card was locking up.  Eventually I learned that it was
 due to a bug in VirtualBox's device driver.  If VirtualBox was not
 left running overnight, then the system would not crash.

VirtualBox ?

This is a Solaris 10 machine. Nothing fancy. OKay, sorry, nothing way out
in the field fancy like VirtualBox.

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] boot-interest WAS: Reliability at power failure?

2009-03-24 Thread Richard Elling

Jerry K wrote:

Where is the boot-interest mailing list??

A review of mailing list here:

http://mail.opensolaris.org/mailman/listinfo/

does not show a boot-interest mailing list, or anything similar.  Is 
it on a different site? 


My appologies, boot-interest is/was a Sun internal list.  Try on-discuss.
http://www.opensolaris.org/os/community/on/discussions/

-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Bob Friesenhahn

On Tue, 24 Mar 2009, Dennis Clarke wrote:


You would think so eh?
But a transient problem that only occurs after a power failure?


Transient problems are most common after a power failure or during 
initialization.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Dennis Clarke

 On Tue, 24 Mar 2009, Dennis Clarke wrote:

 You would think so eh?
 But a transient problem that only occurs after a power failure?

 Transient problems are most common after a power failure or during
 initialization.

Well the issue here is that power was on for ten minutes before I tried
to do a boot from the ok pronpt.

Regardless, the point is that the ZPool shows no faults at boot time and
then shows phantom faults *after* I go to init 3.

That does seem odd.

Dennsi


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Richard Elling

Dennis Clarke wrote:

On Tue, 24 Mar 2009, Dennis Clarke wrote:


However, I have repeatedly run into problems when I need to boot after a
power failure. I see vdevs being marked as FAULTED regardless if there
are
actually any hard errors reported by the on disk SMART Firmware. I am
able
to remove these FAULTed devices temporarily and then re-insert the same
disk again and then run fine for months. Until the next long power
failure.
  

In spite of huge detail, you failed to describe to us the technology
used to communicate with these disks.  The interface adaptors,
switches, and wiring topology could make a difference.



Nothing fancy. Dual QLogic ( Sun ) fibre cards directly connected to the
back of A5200's. Simple really.
  


Run away!  Run away!
Save yourself a ton of grief and replace the A5200.


Is there *really* a severe fault in that disk ?

# luxadm -v display 2118625d599d
  

This sounds some some sort of fiber channel.



Transport protocol: IEEE 1394 (SBP-2)
  

Interesting that it mentions the protocol used by FireWire.



I have no idea where that is coming from.

  

If you are using fiber channel, the device names in the pool
specification suggest that Solaris multipathing is not being used (I
would expect something long like
c4t600A0B800039C9B50A9C47B4522Dd0).  If multipathing is not used,
then you either have simplex connectivity, or two competing simplex
paths to each device.  Multipathing is recommended if you have
redundant paths available.



Yes, I have another machine that has mpxio in place. However a power
failure also trips phantom faults.

  

If the disk itself is not aware of its severe faults then that
suggests that there is a transient problem with communicating with the
disk.



You would think so eh?
But a transient problem that only occurs after a power failure?

  

The problem could be in a device driver, adaptor card, FC
switch, or cable.  If the disk drive also lost power, perhaps the disk
is unusually slow at spinning up.



All disks were up at boot, you can see that when I ask for a zpool status
at boot time in single user mode. No errors and no faults.

The issue seems to be when fmadm starts up or perhaps some other service
that can thrown a fault. I'm not sure.
  


The following will help you diagnose where the error messages
are generated from.  I doubt it is a problem with the disk, per se, but
you will want to double check your disk firmware to make sure it is
up to date (I've got scars)
   fmadm faulty
   fmdump -eV

-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpools on USB zpool.cache

2009-03-24 Thread Mattias Pantzare
 MP It would be nice to be able to move disks around when a system is
 MP powered off and not have to worry about a cache when I boot.

 You don't have to unless you are talking about share disks and
 importing a pool on another system while the original is powered off
 and the pool was not exported...

 For a configuration when disks are not shared among different systems
 you can move disk around without worrying about zpool.cache

So , what you are saying is that I can power off my computer, move my
zfs-disks to a different controller and then power on my computer and
the zfs file systems will show up?

zpool export is not always practical, especially on a root pool.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpools on USB zpool.cache zpool import

2009-03-24 Thread Blake
+1

On Mon, Mar 23, 2009 at 8:43 PM, Damon Atkins damon.atk...@yahoo.com.au wrote:

 PS it would be nice to have a zpool diskinfo devicepath reports  if the
 device belongs to a zpool imported or not, and all the details about any
 zpool it can find on the disk. e.g. file-systems (zdb is only for ZFS
 engineers says the man page). 'zpool import' needs an option to list the
 file systems of a pool which is not yet imported and its properties so you
 can have more information about it before importing it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Nathan Kroenert

Hey, Dennis -

I can't help but wonder if the failure is a result of zfs itself finding 
some problems post restart...


Is there anything in your FMA logs?

  fmstat

for a summary and

  fmdump

for a summary of the related errors

eg:
drteeth:/tmp # fmdump
TIME UUID SUNW-MSG-ID
Nov 03 13:57:29.4190 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 ZFS-8000-D3
Nov 03 13:57:29.9921 916ce3e2-0c5c-e335-d317-ba1e8a93742e ZFS-8000-D3
Nov 03 14:04:58.8973 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d ZFS-8000-CS
Mar 05 18:04:40.7116 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-4M 
Repaired
Mar 05 18:04:40.7875 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-6U 
Resolved
Mar 05 18:04:41.0052 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-4M 
Repaired
Mar 05 18:04:41.0760 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-6U 
Resolved


then for example,

  fmdump -vu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7

and

  fmdump -Vvu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7

will show more and more information about the error. Note that some of 
it might seem like rubbish. The important bits should be obvious though 
- things like the SUNW error message is (like ZFS-8000-D3), which can be 
pumped into


  sun.com/msg

to see what exactly it's going on about.

Note also that there should also be something interesting in the 
/var/adm/messages log to match and 'faulted' devices.


You might also find an

  fmdump -e

and

  fmdump -eV

to be interesting - This is the *error* log as opposed to the *fault* 
log. (Every 'thing that goes wrong' is an error, only those that are 
diagnosed are considered a fault.)


Note that in all of these fm[dump|stat] commands, you are really only 
looking at the two sets of data. The errors - that is the telemetry 
incoming to FMA - and the faults. If you include a -e, you view the 
errors, otherwise, you are looking at the faults.


By the way - sun.com/msg has a great PDF on it about the predictive self 
healing technologies in Solaris 10 and will offer more interesting 
information.


Would be interesting to see *why* ZFS / FMA is feeling the need to fault 
your devices.


I was interested to see on one of my boxes that I have actually had a 
*lot* of errors, which I'm now going to have to investigate... Looks 
like I have a dud rocket in my system... :)


Oh - And I saw this:

Nov 03 14:04:31.2783 ereport.fs.zfs.checksum

Score one more for ZFS! This box has a measly 300GB mirrored, and I have 
already seen dud data. (heh... It's also got non-ecc memory... ;)


Cheers!

Nathan.


Dennis Clarke wrote:

On Tue, 24 Mar 2009, Dennis Clarke wrote:

You would think so eh?
But a transient problem that only occurs after a power failure?

Transient problems are most common after a power failure or during
initialization.


Well the issue here is that power was on for ten minutes before I tried
to do a boot from the ok pronpt.

Regardless, the point is that the ZPool shows no faults at boot time and
then shows phantom faults *after* I go to init 3.

That does seem odd.

Dennsi


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Dennis Clarke

 Hey, Dennis -

 I can't help but wonder if the failure is a result of zfs itself finding
 some problems post restart...

Yes, yes, this is what I am feeling also, but I need to find the data also
and then I can sleep at night.  I am certain that ZFS does not just toss
out faults on a whim because there must be a deterministic, logical and
code based reason for those faults that occur *after* I go to init 3.

 Is there anything in your FMA logs?

Oh God yes,  brace yourself :-)

http://www.blastwave.org/dclarke/zfs/fmstat.txt

[ I edit the whitespace here for clarity ]
# fmstat
module  ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-diagnosis   0   0  0.0  2.7   0   0   3 0   4.2K   1.1K
cpumem-retire  0   0  0.0  0.2   0   0   0 0  0  0
disk-transport 0   0  0.0 45.7   0   0   0 040b  0
eft0   0  0.0  0.7   0   0   0 0   1.2M  0
fabric-xlate   0   0  0.0  0.7   0   0   0 0  0  0
fmd-self-diagnosis 3   0  0.0  0.2   0   0   0 0  0  0
io-retire  0   0  0.0  0.2   0   0   0 0  0  0
snmp-trapgen   2   0  0.0  1.7   0   0   0 032b  0
sysevent-transport 0   0  0.0 75.4   0   0   0 0  0  0
syslog-msgs2   0  0.0  1.4   0   0   0 0  0  0
zfs-diagnosis296 252  2.0 236719.7  98   0   1 2   176b   144b
zfs-retire 4   0  0.0 27.4   0   0   0 0  0  0

 zfs-diagnosis svc_t=236719.7 ?

 for a summary and

fmdump

 for a summary of the related errors

http://www.blastwave.org/dclarke/zfs/fmdump.txt

# fmdump
TIME UUID SUNW-MSG-ID
Dec 05 21:31:46.1069 aa3bfcfa-3261-cde4-d381-dae8abf296de ZFS-8000-D3
Mar 07 08:46:43.6238 4c8b199b-add1-c3fe-c8d6-9deeff91d9de ZFS-8000-FD
Mar 07 19:37:27.9819 b4824ce2-8f42-4392-c7bc-ab2e9d14b3b7 ZFS-8000-FD
Mar 07 19:37:29.8712 af726218-f1dc-6447-f581-cc6bb1411aa4 ZFS-8000-FD
Mar 07 19:37:30.2302 58c9e01f-8a80-61b0-ffea-ded63a9b076d ZFS-8000-FD
Mar 07 19:37:31.6410 3b0bfd9d-fc39-e7c2-c8bd-879cad9e5149 ZFS-8000-FD
Mar 10 19:37:08.8289 aa3bfcfa-3261-cde4-d381-dae8abf296de FMD-8000-4M
Repaired
Mar 23 23:47:36.9701 2b1aa4ae-60e4-c8ef-8eec-d92a18193e7a ZFS-8000-FD
Mar 24 01:29:00.1981 3780a2dd-7381-c053-e186-8112b463c2b7 ZFS-8000-FD
Mar 24 01:29:02.1649 146dad1d-f195-c2d6-c630-c1adcd58b288 ZFS-8000-FD

# fmdump -vu 3780a2dd-7381-c053-e186-8112b463c2b7
TIME UUID SUNW-MSG-ID
Mar 24 01:29:00.1981 3780a2dd-7381-c053-e186-8112b463c2b7 ZFS-8000-FD
  100%  fault.fs.zfs.vdev.io

Problem in: zfs://pool=fibre0/vdev=444604062b426970
   Affects: zfs://pool=fibre0/vdev=444604062b426970
   FRU: -
  Location: -

# fmdump -vu 146dad1d-f195-c2d6-c630-c1adcd58b288
TIME UUID SUNW-MSG-ID
Mar 24 01:29:02.1649 146dad1d-f195-c2d6-c630-c1adcd58b288 ZFS-8000-FD
  100%  fault.fs.zfs.vdev.io

Problem in: zfs://pool=fibre0/vdev=23e4d7426f941f52
   Affects: zfs://pool=fibre0/vdev=23e4d7426f941f52
   FRU: -
  Location: -

 will show more and more information about the error. Note that some of
 it might seem like rubbish. The important bits should be obvious though
 - things like the SUNW error message is (like ZFS-8000-D3), which can be
 pumped into

sun.com/msg

like so :

http://www.sun.com/msg/ZFS-8000-FD

or see http://www.blastwave.org/dclarke/zfs/ZFS-8000-FD.txt

Article for Message ID:   ZFS-8000-FD

  Too many I/O errors on ZFS device

  Type

 Fault

  Severity

 Major

  Description

 The number of I/O errors associated with a ZFS device exceeded
 acceptable levels.

  Automated Response

 The device has been offlined and marked as faulted.
 An attempt will be made to activate a hot spare if available.

  Impact

 The fault tolerance of the pool may be affected.


Yep, I agree, that is what I saw.

 Note also that there should also be something interesting in the
 /var/adm/messages log to match and 'faulted' devices.

 You might also find an

fmdump -e

spooky long list of events :

TIME CLASS
Mar 23 23:47:28.5586 ereport.fs.zfs.io
Mar 23 23:47:28.5594 ereport.fs.zfs.io
Mar 23 23:47:28.5588 ereport.fs.zfs.io
Mar 23 23:47:28.5592 ereport.fs.zfs.io
Mar 23 23:47:28.5593 ereport.fs.zfs.io
.
.
.
Mar 23 23:47:28.5622 ereport.fs.zfs.io
Mar 23 23:47:28.5560 ereport.fs.zfs.io
Mar 23 23:47:28.5658 ereport.fs.zfs.io
Mar 23 23:48:41.5957 ereport.fs.zfs.io


   http://www.blastwave.org/dclarke/zfs/fmdump_e.txt

ouch, that is a nasty long list all in a few seconds.

 and

fmdump -eV

a very detailed verbose long list with such entries as

Mar 23 2009 23:48:41.595757900 ereport.fs.zfs.io
nvlist version: 0
class = ereport.fs.zfs.io
ena =