Re: [zfs-discuss] Very sick iSCSI pool

2012-07-02 Thread Fajar A. Nugraha
On Tue, Jul 3, 2012 at 11:08 AM, Ian Collins  wrote:
 I'm assuming the pool is hosed?
>>>
>>> Before making that assumption, I'd try something simple first:
>>> - reading from the imported iscsi disk (e.g. with dd) to make sure
>>> it's not iscsi-related problem
>>> - import the disk in another host, and try to read the disk again, to
>>> make sure it's not client-specific problem
>>> - possibly restart the iscsi server, just to make sure
>>
>> Booting the initiator host from a live DVD image and attempting to
>> import the pool gives the same error report.
>
>
> The pool's data appears to be recoverable when I import it read only.

That's good

>
> The storage appliance is so full they can't delete files from it!

Hahaha :D

>  Now that
> shouldn't have caused problems with a fixed sized volume, but who knows?

AFAIK you'll always need space, e.g. to replay/rollback transactions
during pool import.

The best way is, of course, fix the appliance. Sometimes something
simple like deleting snapshots/datasets will do the trick.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very sick iSCSI pool

2012-07-02 Thread Ian Collins

On 07/ 1/12 08:57 PM, Ian Collins wrote:

On 07/ 1/12 10:20 AM, Fajar A. Nugraha wrote:

On Sun, Jul 1, 2012 at 4:18 AM, Ian Collins   wrote:

On 06/30/12 03:01 AM, Richard Elling wrote:

Hi Ian,
Chapter 7 of the DTrace book has some examples of how to look at iSCSI
target
and initiator behaviour.

Thanks Richard, I 'll have a look.

I'm assuming the pool is hosed?

Before making that assumption, I'd try something simple first:
- reading from the imported iscsi disk (e.g. with dd) to make sure
it's not iscsi-related problem
- import the disk in another host, and try to read the disk again, to
make sure it's not client-specific problem
- possibly restart the iscsi server, just to make sure

Booting the initiator host from a live DVD image and attempting to
import the pool gives the same error report.


The pool's data appears to be recoverable when I import it read only.

The storage appliance is so full they can't delete files from it!  Now 
that shouldn't have caused problems with a fixed sized volume, but who 
knows?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very sick iSCSI pool

2012-07-01 Thread Ian Collins

On 07/ 1/12 10:20 AM, Fajar A. Nugraha wrote:

On Sun, Jul 1, 2012 at 4:18 AM, Ian Collins  wrote:

On 06/30/12 03:01 AM, Richard Elling wrote:

Hi Ian,
Chapter 7 of the DTrace book has some examples of how to look at iSCSI
target
and initiator behaviour.


Thanks Richard, I 'll have a look.

I'm assuming the pool is hosed?

Before making that assumption, I'd try something simple first:
- reading from the imported iscsi disk (e.g. with dd) to make sure
it's not iscsi-related problem
- import the disk in another host, and try to read the disk again, to
make sure it's not client-specific problem
- possibly restart the iscsi server, just to make sure


Booting the initiator host from a live DVD image and attempting to 
import the pool gives the same error report.

I suspect the problem is with your oracle storage appliance. But since
you say there's no errors there, then the simple tests should make
sure whethere it's client, disk, or zfs problem.


So did I.

I'll get the admin for that system to dig a little deeper and export a 
new volume to see if I can create a new pool.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very sick iSCSI pool

2012-06-30 Thread Fajar A. Nugraha
On Sun, Jul 1, 2012 at 4:18 AM, Ian Collins  wrote:
> On 06/30/12 03:01 AM, Richard Elling wrote:
>>
>> Hi Ian,
>> Chapter 7 of the DTrace book has some examples of how to look at iSCSI
>> target
>> and initiator behaviour.
>
>
> Thanks Richard, I 'll have a look.
>
> I'm assuming the pool is hosed?

Before making that assumption, I'd try something simple first:
- reading from the imported iscsi disk (e.g. with dd) to make sure
it's not iscsi-related problem
- import the disk in another host, and try to read the disk again, to
make sure it's not client-specific problem
- possibly restart the iscsi server, just to make sure

I suspect the problem is with your oracle storage appliance. But since
you say there's no errors there, then the simple tests should make
sure whethere it's client, disk, or zfs problem.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very sick iSCSI pool

2012-06-30 Thread Ian Collins

On 06/30/12 03:01 AM, Richard Elling wrote:

Hi Ian,
Chapter 7 of the DTrace book has some examples of how to look at iSCSI 
target

and initiator behaviour.


Thanks Richard, I 'll have a look.

I'm assuming the pool is hosed?


 -- richard

On Jun 28, 2012, at 10:47 PM, Ian Collins wrote:

I'm trying to work out the case a remedy for a very sick iSCSI pool 
on a Solaris 11 host.


The volume is exported from an Oracle storage appliance and there are 
no errors reported there.  The host has no entries in its logs 
relating to the network connections.


Any zfs or zpool commands the change the state of the pool (such as 
zfs mount or zpool export) hang and can't be killed.


fmadm faulty reports:

Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d  ZFS-8000-FD 
   Major


Host: taitaklsc01
Platform: SUN-FIRE-X4170-M2-SERVER  Chassis_id  : 1142FMM02N
Product_sn  : 1142FMM02N

Fault class : fault.fs.zfs.vdev.io
Affects : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
 faulted but still in service
Problem in  : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
 faulted but still in service

Description : The number of I/O errors associated with a ZFS device 
exceeded
acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD

 for more information.

The zpool status paints a very gloomy picture:

 pool: fileserver
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
   continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Fri Jun 29 11:59:59 2012
   858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time)
   567K resilvered, 0.00% done
config:

   NAME STATE READ WRITE 
CKSUM
   fileserver   ONLINE   0 1.16M 
0
 c0t600144F096C94AC74ECD96F20001d0  ONLINE   0 1.16M 
0  (resilvering)


errors: 1557164 data errors, use '-v' for a list

Any ideas how to determine the cause of the problem and remedy it?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org 
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
ZFS Performance and Training
richard.ell...@richardelling.com 
+1-760-896-4422










--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very sick iSCSI pool

2012-06-29 Thread Richard Elling
Hi Ian,
Chapter 7 of the DTrace book has some examples of how to look at iSCSI target
and initiator behaviour.
 -- richard

On Jun 28, 2012, at 10:47 PM, Ian Collins wrote:

> I'm trying to work out the case a remedy for a very sick iSCSI pool on a 
> Solaris 11 host.
> 
> The volume is exported from an Oracle storage appliance and there are no 
> errors reported there.  The host has no entries in its logs relating to the 
> network connections.
> 
> Any zfs or zpool commands the change the state of the pool (such as zfs mount 
> or zpool export) hang and can't be killed.
> 
> fmadm faulty reports:
> 
> Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d  ZFS-8000-FDMajor
> 
> Host: taitaklsc01
> Platform: SUN-FIRE-X4170-M2-SERVER  Chassis_id  : 1142FMM02N
> Product_sn  : 1142FMM02N
> 
> Fault class : fault.fs.zfs.vdev.io
> Affects : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
>  faulted but still in service
> Problem in  : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
>  faulted but still in service
> 
> Description : The number of I/O errors associated with a ZFS device exceeded
> acceptable levels.  Refer to 
> http://sun.com/msg/ZFS-8000-FD
>  for more information.
> 
> The zpool status paints a very gloomy picture:
> 
>  pool: fileserver
> state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
>continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scan: resilver in progress since Fri Jun 29 11:59:59 2012
>858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time)
>567K resilvered, 0.00% done
> config:
> 
>NAME STATE READ WRITE CKSUM
>fileserver   ONLINE   0 1.16M 0
>  c0t600144F096C94AC74ECD96F20001d0  ONLINE   0 1.16M 0  
> (resilvering)
> 
> errors: 1557164 data errors, use '-v' for a list
> 
> Any ideas how to determine the cause of the problem and remedy it?
> 
> -- 
> Ian.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Very sick iSCSI pool

2012-06-28 Thread Ian Collins
I'm trying to work out the case a remedy for a very sick iSCSI pool on a 
Solaris 11 host.


The volume is exported from an Oracle storage appliance and there are no 
errors reported there.  The host has no entries in its logs relating to 
the network connections.


Any zfs or zpool commands the change the state of the pool (such as zfs 
mount or zpool export) hang and can't be killed.


fmadm faulty reports:

Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d  ZFS-8000-FDMajor

Host: taitaklsc01
Platform: SUN-FIRE-X4170-M2-SERVER  Chassis_id  : 1142FMM02N
Product_sn  : 1142FMM02N

Fault class : fault.fs.zfs.vdev.io
Affects : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
  faulted but still in service
Problem in  : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
  faulted but still in service

Description : The number of I/O errors associated with a ZFS device exceeded
 acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD

  for more information.

The zpool status paints a very gloomy picture:

  pool: fileserver
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jun 29 11:59:59 2012
858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time)
567K resilvered, 0.00% done
config:

NAME STATE READ WRITE CKSUM
fileserver   ONLINE   0 1.16M 0
  c0t600144F096C94AC74ECD96F20001d0  ONLINE   0 
1.16M 0  (resilvering)


errors: 1557164 data errors, use '-v' for a list

Any ideas how to determine the cause of the problem and remedy it?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss