Re: [zfs-discuss] can anyone help me?

2008-08-26 Thread Chris Murray
Hi all,
I can confirm that this is fixed too. I ran into the exact same issue yesterday 
after destroying a clone:
http://www.opensolaris.org/jive/thread.jspa?threadID=70459tstart=0

I used the b95-based 2008.11 development livecd this morning and the pool is 
now back up and running again after a quick import and export.

Chris
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-07-24 Thread Ross
Great news, thanks for the update :)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-07-24 Thread Aaron Botsis
Nevermind -- this problem seems like it's been fixed in b94. I saw a bug that 
looked like the description fit (slow clone removal, didn't write down the bug 
number) and gave it a shot. imported and things seem like they're back up and 
running.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-07-24 Thread Victor Latushkin
Aaron Botsis пишет:
 Hello, I've hit this same problem. 
 
 Hernan/Victor, I sent you an email asking for the description of this 
 solution. I've also got important data on my array. I went to b93 hoping 
 there'd be a patch for this.
 
 I caused the problem in a manner identical to Hernan; by removing a zvol 
 clone. Exact same symptoms, userspace seems to go away, network stack is 
 still up, no disk activity, system never recovers. 
 
 If anyone has the solution to this, PLEASE help me out. Thanks a million in 
 advance.

Though it is a bit late, I think it's still may be useful to describe a 
way out of this  (prior to fix for 6573681).

When dataset is destroyed, it is first being marked inconsistent. If the 
destroy cannot complete for whatever reason, upon dataset open ZFS 
discovers that it is marked inconsistent and tries to destroy it again 
by calling appropriate ioctl(), if destroy succeeds, then it pretends 
that dataset never existed, if it fails, it tries to roll it back to 
previous state - see lines 410-450 here

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libzfs/common/libzfs_dataset.c
 


But since ioctl() was unable to complete, it was no easy way out. Idea 
was simple - avoid attempting to destroy it again, and proceed right to 
rollback part. Since it was a clone, it was definitely possible to roll 
it back. So i simply added test for environment variable to 'if' 
statement on line 441, and it allowed to import pool.

my 2 cents,

Victor


 
 Aaron
 
 Well, finally managed to solve my issue, thanks to
 the invaluable help of Victor Latushkin, who I can't
 thank enough.

 I'll post a more detailed step-by-step record of what
 he and I did (well, all credit to him actually) to
 solve this. Actually, the problem is still there
 (destroying a huge zvol or clone is slow and takes a
 LOT of memory, and will die when it runs out of
 memory), but now I'm able to import my zpool and all
 is there.

 What Victor did was hack ZFS (libzfs) to force a
 rollback to abort the endless destroy, which was
 re-triggered every time the zpool was imported, as it
 was inconsistent. With this custom version of libzfs,
 setting an environment variable makes libzfs to
 bypass the destroy and jump to rollback, undoing
 the last destroy command.

 I'll be posting the long version of the story soon.

 Hernán
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-07-23 Thread Aaron Botsis
Hello, I've hit this same problem. 

Hernan/Victor, I sent you an email asking for the description of this solution. 
I've also got important data on my array. I went to b93 hoping there'd be a 
patch for this.

I caused the problem in a manner identical to Hernan; by removing a zvol clone. 
Exact same symptoms, userspace seems to go away, network stack is still up, no 
disk activity, system never recovers. 

If anyone has the solution to this, PLEASE help me out. Thanks a million in 
advance.

Aaron

 Well, finally managed to solve my issue, thanks to
 the invaluable help of Victor Latushkin, who I can't
 thank enough.
 
 I'll post a more detailed step-by-step record of what
 he and I did (well, all credit to him actually) to
 solve this. Actually, the problem is still there
 (destroying a huge zvol or clone is slow and takes a
 LOT of memory, and will die when it runs out of
 memory), but now I'm able to import my zpool and all
 is there.
 
 What Victor did was hack ZFS (libzfs) to force a
 rollback to abort the endless destroy, which was
 re-triggered every time the zpool was imported, as it
 was inconsistent. With this custom version of libzfs,
 setting an environment variable makes libzfs to
 bypass the destroy and jump to rollback, undoing
 the last destroy command.
 
 I'll be posting the long version of the story soon.
 
 Hernán
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-03 Thread Hernan Freschi
no, weird situation. I unplugged the disks from the controller (I have them 
labeled) before upgrading to snv89. after the upgrade, the controller names 
changed.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-02 Thread Henrik Hjort
Hi Hernan,

after looking at your posts my suggestion would be to
try the OpenSolaris 2008.05 Live CD and to import your
pool using the CD. That CD is nv86 + some extra fixes.
You will have to use 'pfexec' when you are trying to import
the pool. ( www.opensolaris.com )

I read this and I am not sure how you did it.
S10U4 and was recently upgraded to snv85

But an upgrade from Sol10 to NV is untested and nothing I
would recommend at all. A fresh install of snvXY is what I
know works.

Cheers,
  Henrik

Hernan Freschi wrote:
 fwiw, here are my previous posts:
 
 http://www.opensolaris.org/jive/thread.jspa?threadID=61301tstart=30
 http://www.opensolaris.org/jive/thread.jspa?threadID=62120tstart=0
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-02 Thread Hernan Freschi
Thanks for your answer, 
 after looking at your posts my suggestion would be to
 try the OpenSolaris 2008.05 Live CD and to import
 your pool using the CD. That CD is nv86 + some extra
 fixes.
I upgraded the snv85 to snv89 to see if it helped, but it didn't. I'll try to 
download the 2008.05 CD again (the ISO for that is one of the things trapped in 
the pool I can't import).
 
 But an upgrade from Sol10 to NV is untested and
 nothing I would recommend at all. A fresh install of snvXY is
 what I know works.

Didn't know that. I was simply following the N+2 rule, upgrading 10 to 11.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me? [SOLVED]

2008-06-02 Thread Hernan Freschi
Well, finally managed to solve my issue, thanks to the invaluable help of 
Victor Latushkin, who I can't thank enough.

I'll post a more detailed step-by-step record of what he and I did (well, all 
credit to him actually) to solve this. Actually, the problem is still there 
(destroying a huge zvol or clone is slow and takes a LOT of memory, and will 
die when it runs out of memory), but now I'm able to import my zpool and all is 
there.

What Victor did was hack ZFS (libzfs) to force a rollback to abort the 
endless destroy, which was re-triggered every time the zpool was imported, as 
it was inconsistent. With this custom version of libzfs, setting an environment 
variable makes libzfs to bypass the destroy and jump to rollback, undoing the 
last destroy command.

I'll be posting the long version of the story soon.

Hernán
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-02 Thread Marc Bevand
Hernan Freschi hjf at hjf.com.ar writes:
 
 Here's the output. Numbers may be a little off because I'm doing a nightly  
 build and compressing a crashdump with bzip2 at the same time.

Thanks. Your disks look healthy. But one question: why is
c5t0/c5t1/c6t0/c6t1 when in another post you referred to the 4 disks
as c[1234]d0 ?

Did you change the hardware ?

AFAIK ZFS doesn't always like it when the device names change... There
has been problems/bugs exposed by this in the past.

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-01 Thread Eric Snellman
I have very little technical knowledge on what the problem is.

Some random things to try:

Make a seperate zpool and filesytem for the swap.

Add more ram to the system.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-01 Thread Orvar Korvar
This sounds like a pain.

Is it possible that you bought support from SUN on this matter, if this is 
really important to you?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-01 Thread Marc Bevand
So you are experiencing slow I/O which is making the deletion of this clone 
and the replay of the ZIL take forever. It could be because of random I/O ops, 
or one of your disks which is dying (not reporting any errors, but very slow 
to execute every single ATA command). You provided the output of 'zpool 
iostat' while an import was hanging, what about 'iostat -Mnx 3 20' (not to be 
confused with zpool iostat). Please let the command complete, it will run for 
3*20 = 60 secs.

Also, to validate the slowly-dying-disk theory, reboot the box, do NOT import 
the pool, and run 4 of these commands (in parallel in the background) with 
c[1234]d0p0:
  $ dd bs=1024k of=/dev/null if=/dev/rdsk/cXd0p0
Then 'iostat -Mnx 2 5'

Also, are you using non-default settings in /etc/systems (other than 
zfs_arc_max) ? Are you passing any particular kernel parameters via GRUB or 
via 'eeprom' ?

On a side note, what is the version of your pool and the version of your 
filesystems ? If you don't know run 'zpool upgrade' and 'zfs upgrade' with no 
argument.

What is your SATA controller ? I didn't see you run dmesg.

-marc


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-01 Thread Hernan Freschi
I'll provide you with the results of these commands soon. But for the record, 
solaris does hang (dies out of memory, can't type anything on the console, 
etc). What I can do is boot with -k and get to kmdb when it's hung (BREAK over 
serial line). I have a crashdump I can upload.

I checked the disks with the drive manufacturers' tests and found no errors.
The controller is an NForce4 SATA on-board. zpool version is the latest (10). 
The non-default settings were removed, these were only for testing. No other 
non-default eeprom settings (other than the serial console options, but these 
were added after the problem started).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-01 Thread Hernan Freschi
Here's the output. Numbers may be a little off because I'm doing a nightly 
build and compressing a crashdump with bzip2 at the same time.

extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
3.7   19.40.10.3  3.3  0.0  142.71.6   1   3 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
0.00.00.00.0  0.0  0.00.1   12.6   0   0 c5t0d0
0.00.00.00.0  0.0  0.00.1   13.0   0   0 c5t1d0
0.00.00.00.0  0.0  0.00.1   12.6   0   0 c6t0d0
0.00.00.00.0  0.0  0.00.1   13.4   0   0 c6t1d0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   25.9   12.01.30.3  0.0  0.20.04.4   0  14 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
   75.20.0   75.20.0  0.0  1.00.1   12.7   0  96 c5t0d0
   68.20.0   68.20.0  0.0  0.90.1   13.1   0  89 c5t1d0
   71.70.0   71.70.0  0.0  0.90.1   13.1   0  94 c6t0d0
   62.80.0   62.80.0  0.0  0.90.1   14.0   0  88 c6t1d0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   24.0   16.00.60.3  0.0  0.00.10.8   0   3 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
   65.50.0   65.50.0  0.0  0.90.1   14.2   0  93 c5t0d0
   59.00.0   59.00.0  0.0  0.90.1   14.9   0  88 c5t1d0
   67.50.0   67.50.0  0.0  0.90.1   13.2   0  89 c6t0d0
   66.50.0   66.50.0  0.0  0.90.1   14.0   0  93 c6t1d0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   47.0   15.50.80.2  0.1  0.11.91.6   3   5 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
   55.50.0   55.50.0  0.0  0.80.1   14.5   0  80 c5t0d0
   73.00.0   73.00.0  0.0  1.00.1   13.2   0  96 c5t1d0
   72.50.0   72.50.0  0.0  1.00.1   13.3   0  96 c6t0d0
   68.00.0   68.00.0  0.0  1.00.1   14.3   0  97 c6t1d0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.09.50.00.2  0.0  0.00.00.3   0   0 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
   65.00.0   65.00.0  0.0  0.90.1   14.5   0  94 c5t0d0
   73.50.0   73.50.0  0.0  0.90.1   12.8   0  94 c5t1d0
   75.00.0   75.00.0  0.0  0.90.1   11.8   0  89 c6t0d0
   68.50.0   68.50.0  0.0  0.90.1   13.9   0  95 c6t1d0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-05-31 Thread Hernan Freschi
fwiw, here are my previous posts:

http://www.opensolaris.org/jive/thread.jspa?threadID=61301tstart=30
http://www.opensolaris.org/jive/thread.jspa?threadID=62120tstart=0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-05-31 Thread Dave Koelmeyer
bump  Yeah, can anyone help him? As a passive observer without much of a clue 
myself, I'm dying to know from the experts what this poor chap's problem might 
be. 

Cheers, 
Dave
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss