Re: [zfs-discuss] How long should an empty destroy take? snv_134

2011-03-07 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Yaverot
> 
> rpool remains 1% inuse. tank reports 100% full (with 1.44G free), 

I recommend:
When creating your new pool, use slices of the new disks, which are 99% of
the size of the new disks instead of using the whole new disks.  Because
this is a more reliable way of avoiding the problem "my new replacement disk
for the failed disk is slightly smaller than the failed disk and therefore I
can't replace."

I also recommend:
In every pool, create some space reservation.  So when and if you ever hit
100% usage again and start to hit the system crash scenario, you can do a
zfs destroy (snapshot) and delete the space reservation, in order to avoid
the system crash scenario you just witnessed.  Hopefully.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How long should an empty destroy take? snv_134

2011-03-06 Thread Yaverot
Follow up, and current status:

In the morning I cut power (before receiving the 4 replies). Turning it on 
again, I got too impatient to get a text screen for diagnostics to show that I 
overfilled the keyboard buffer.  I forced it off again (to stop the beeps), 
then waited longer before attempting to switch it from splash-screen to console.

When it got up, "others" was still there, and a disk (c16) was "faulted" which, 
as I only used the pool for light testing, and holding the names of devices, 
was a stripe, so the pool was faulted.  My guess is that the disk switched to 
faulted between the zpool status and the "zpool destroy others", and then got 
stuck trying to write the "not-in-use" label to the unavail disk.

I was able to "zpool destroy -f others" and add those to my newtank. ( using -f 
on add )
So newtank is now large enough for a send/recv from tank.  It isn't done yet, 
but a scrub on tank takes about 36 hours (newtank is mirrors instead of tank's 
raidz3).
Two drives show faulted in tank, one I found, it renamed itself from either c12 
or c14 to c21, but my attempt to add it back to the pool gave an error that c10 
is already part of tank. Yes c10 is part of tank, but the commandline referred 
to c14 and c21, so why talk about c10? Getting the data onto newtank seamed the 
best thing to push for, so I'm doing the send/recv with tank degraded, one more 
disk can disappear before any data is at risk. 

My power-off/reboot before running an export/import loop on newtank means all 
those drives have different names now than the ones I wrote on them. :(

rpool remains 1% inuse. tank reports 100% full (with 1.44G free), "others" is 
destroyed but I know that c16 is still physically connected and hasn't be 
zfs-delabeled should it ever online itself. zfs list shows data being recv'ed 
on newtank.

So:
1. send/recv tank->newtank progressing, and will hopefully finish with no 
problems.
2. Two disks apparently disappeared as they aren't part of any pool and don't 
show in format either.*
3. One disk renamed itself and therefore can't be readded/reattached to tank. 
(now c21)
4. All drives put into newtank before the destroy showed up, but with different 
names. newtank imported cleanly (at the time it was still empty).

*Or I don't see them because I get lost in the order. Comparing output requires 
scrolling back & forth, and they aren't sorted the same.  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How long should an empty destroy take? snv_134

2011-03-06 Thread Nathan Kroenert
Why wouldn't they try a reboot -d? That would at least get some data in 
the form of a crash dump if at all possible...


A power cycle seems a little medieval to me... At least in the first 
instance.


The other thing I have noted is that sometimes things to get wedged, and 
if you can find where, (mdb -k and take a poke at the stack of some of 
the zfs/zpool commands that are hung to see what they were operating on) 
and trying a zpool clear on that zpool.  Not that I'm recommending that 
you should *need* to, but that has got me unwedged on occasion. (though, 
usually when I have dome something administratively silly... ;)


Nathan.

 On 7/03/2011 12:14 PM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Yaverot

We're heading into the 3rd hour of the zpool destroy on "others".
The system isn't locked up, as it responds to local keyboard input, and

I bet you, you're in a semi-crashed state right now, which will degrade into
a full system crash.  You'll have no choice but to power cycle.  Prove me
wrong, please.   ;-)

I bet, as soon as you type in any "zpool" or "zfs" command ... even "list"
or "status" they will also hang indefinitely.

Is your pool still 100% full?  That's probably the cause.  I suggest if
possible, immediately deleting something and destroying an old snapshot to
free up a little bit of space.  And then you can move onward...



While this destroy is "running" all other zpool/zfs commands appear to be
hung.

Oh, sorry, didn't see this before I wrote what I wrote above.  This just
further confirms what I said above.



zpool destroy on an empty pool should be on the order of seconds, right?

zpool destroy is instant, regardless of how much data there is in a pool.
zfs destroy is instant for an empty volume, but zfs destroy takes a long
time for a lot of data.

But as mentioned above, that's irrelevant to your situation.  Because your
system is crashed, and even if you try init 0 or init 6...  They will fail.
You have no choice but to power cycle.

For the heck of it, I suggest init 0 first.  Then wait half an hour, and
power cycle.  Just to try and make the crash as graceful as possible.

As soon as it comes back up, free up a little bit of space, so you can avoid
a repeat.



Yes, I've triple checked, I'm not destroying tank.
While writing the email, I attempted a new ssh connection, it got to the

"Last

login:" line, but hasn't made it to the prompt.

Oh, sorry, yet again this is confirming what I said above.  semi-crashed and
degrading into a full crash.
Right now, you cannot open any new command prompts.
Soon it will stop responding to ping.  (Maybe 2-12 hours.)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How long should an empty destroy take? snv_134

2011-03-06 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Yaverot
> 
> I'm (still) running snv_134 on a home server.  My main pool "tank" filled
up
> last night ( 1G free remaining ).

There is (or was) a bug that would sometimes cause the system to crash when
100% full.
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/41227 

In that thread, the crash was related to being 100% full, running a scrub,
and some write operations all at the same time.  By any chance were you
running a scrub?

I am curious whether or not the scrub is actually an ingredient in that
failure scenario, or if the scrub was just coincidence for me.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How long should an empty destroy take? snv_134

2011-03-06 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Yaverot
> 
> We're heading into the 3rd hour of the zpool destroy on "others".
> The system isn't locked up, as it responds to local keyboard input, and

I bet you, you're in a semi-crashed state right now, which will degrade into
a full system crash.  You'll have no choice but to power cycle.  Prove me
wrong, please.   ;-)

I bet, as soon as you type in any "zpool" or "zfs" command ... even "list"
or "status" they will also hang indefinitely.  

Is your pool still 100% full?  That's probably the cause.  I suggest if
possible, immediately deleting something and destroying an old snapshot to
free up a little bit of space.  And then you can move onward...


> While this destroy is "running" all other zpool/zfs commands appear to be
> hung.

Oh, sorry, didn't see this before I wrote what I wrote above.  This just
further confirms what I said above.


> zpool destroy on an empty pool should be on the order of seconds, right?

zpool destroy is instant, regardless of how much data there is in a pool.
zfs destroy is instant for an empty volume, but zfs destroy takes a long
time for a lot of data.

But as mentioned above, that's irrelevant to your situation.  Because your
system is crashed, and even if you try init 0 or init 6...  They will fail.
You have no choice but to power cycle.

For the heck of it, I suggest init 0 first.  Then wait half an hour, and
power cycle.  Just to try and make the crash as graceful as possible.

As soon as it comes back up, free up a little bit of space, so you can avoid
a repeat.


> Yes, I've triple checked, I'm not destroying tank.
> While writing the email, I attempted a new ssh connection, it got to the
"Last
> login:" line, but hasn't made it to the prompt.  

Oh, sorry, yet again this is confirming what I said above.  semi-crashed and
degrading into a full crash.
Right now, you cannot open any new command prompts.
Soon it will stop responding to ping.  (Maybe 2-12 hours.)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How long should an empty destroy take? snv_134

2011-03-06 Thread Richard Elling

On Mar 5, 2011, at 9:14 PM, Yaverot wrote:

> I'm (still) running snv_134 on a home server.  My main pool "tank" filled up 
> last night ( 1G free remaining ).
> So today I bought new drives, adding them one at a time running format 
> between each one to see what name they received.
> As I had a pair of names, I zpool create/add newtank mirror cxxt0do cyyt0d0 
> them.  
> 
> Then I got to the point where I need my unused drives.  They were too small 
> to act as spares for tank, but I didn't want to lose track of them I stuck 
> them in another pool called "others".
> 
> We're heading into the 3rd hour of the zpool destroy on "others".

"zpool destroy" or "zfs destroy"?
 -- richard

> The system isn't locked up, as it responds to local keyboard input, and 
> existing ssh & smb connections.
> While this destroy is "running" all other zpool/zfs commands appear to be 
> hung.
> 
> The others pool never had more than 100G in it at one time, never had any 
> snapshots, and was empty for atleast two weeks prior to the destroy command.  
> I don't think dedup was ever used on it, but that should hardly matter when 
> the pool was already empty.
> "others" was never shared via smb or nfs.
> 
> zpool destroy on an empty pool should be on the order of seconds, right?
> 
> I really don't want to reboot/power down the server, as I'll use my current 
> connections, and if there's problems I don't know when the system will be 
> working again to re-establish.  
> 
> Yes, I've triple checked, I'm not destroying tank.
> While writing the email, I attempted a new ssh connection, it got to the 
> "Last login:" line, but hasn't made it to the prompt.  So I really don't want 
> to take the server down physically.
> 
> Doing a df, "others" doesn't show, but rpool, tank, and newtank do. Another 
> indication I issued destroy on the right pool.
> The smb connection is slower than normal, but still usable.  
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How long should an empty destroy take? snv_134

2011-03-05 Thread Yaverot
I'm (still) running snv_134 on a home server.  My main pool "tank" filled up 
last night ( 1G free remaining ).
So today I bought new drives, adding them one at a time running format between 
each one to see what name they received.
As I had a pair of names, I zpool create/add newtank mirror cxxt0do cyyt0d0 
them.  

Then I got to the point where I need my unused drives.  They were too small to 
act as spares for tank, but I didn't want to lose track of them I stuck them in 
another pool called "others".

We're heading into the 3rd hour of the zpool destroy on "others".
The system isn't locked up, as it responds to local keyboard input, and 
existing ssh & smb connections.
While this destroy is "running" all other zpool/zfs commands appear to be hung.

The others pool never had more than 100G in it at one time, never had any 
snapshots, and was empty for atleast two weeks prior to the destroy command.  I 
don't think dedup was ever used on it, but that should hardly matter when the 
pool was already empty.
"others" was never shared via smb or nfs.

zpool destroy on an empty pool should be on the order of seconds, right?

I really don't want to reboot/power down the server, as I'll use my current 
connections, and if there's problems I don't know when the system will be 
working again to re-establish.  

Yes, I've triple checked, I'm not destroying tank.
While writing the email, I attempted a new ssh connection, it got to the "Last 
login:" line, but hasn't made it to the prompt.  So I really don't want to take 
the server down physically.

Doing a df, "others" doesn't show, but rpool, tank, and newtank do. Another 
indication I issued destroy on the right pool.
The smb connection is slower than normal, but still usable.  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss