Re: [zfs-discuss] ZFS hangs/freezes after disk failure,

2008-08-29 Thread Todd H. Poole
> Wrt. what I've experienced and read in ZFS-discussion etc. list I've the
> __feeling__, that we would have got really into trouble, using Solaris
> (even the most recent one) on that system ... 
> So if one asks me, whether to run Solaris+ZFS on a production system, I
> usually say: definitely, but only, if it is a Sun server ...
> 
> My 2¢ ;-)

I can't agree with you more. I'm beginning to understand what the phrase "Sun's 
software is great - as long as you're running it on Sun's hardware" means...

Whether it's deserved or not, I feel like this OS isn't mature yet. And maybe 
it's not the whole OS, maybe it's some specific subsection (like ZFS), but my 
general impression of OpenSolaris has been... not stellar.

I don't think it's ready yet for a prime time slot on commodity hardware.

And while I don't intend to fan any flames that might already exist (remember, 
I've only just joined within the past week, and thus haven't been around long 
enough to figured out even if any flames exist), but I believe I'm justified in 
making the above statement. Just off the top of my head, here is a list of red 
flags I've run into in 7 day's time:

 - If I don't wait for at least 2 minutes before logging into my system after 
I've powered everything up, my machine freezes.
 - If I yank a hard drive out of a (supposedly redundant) RAID5 array (or 
"RAID-Z zpool," as its called) that has an NFS mount attached to it, not only 
does that mount point get severed, but _all_ NFS connections to all mount 
points are dropped, regardless of whether they were on the zpool or not. Oh, 
and then my machine freezes.
 - If I just yank a hard drive out of a (supposedly redundant) RAID5 array (or 
"RAID-Z zpool," as its called), and just forgetting about NFS, my machine 
freezes.
 - If I query a zpool for its status, but don't do so under the right 
circumstances, my machine freezes.

I've had to use the hard reset button on my case more times than I've had the 
ability to shut down the machine properly from a non-frozen console or GUI. 

That shouldn't happen.

I dunno. If this sounds like bitching, that's fine: I'll file bug reports and 
then move on. It's just that sometimes, software needs to grow a bit more 
before it's ready for production, and I feel like trying to run OpenSolaris + 
ZFS on commodity hardware just might be one of those times.

Just two more cents to add to yours.

As Richard said, the only way to fix things is to file bug reports. Hopefully, 
the most helpful things to come out of this thread will be those forms of 
constructive criticism.

As for now, it looks like a return to LVM2, XFS, and one of the Linux or BSD 
kernels might be a more stable decision, but don't worry - I haven't been 
completely dissuaded, and I definitely plan on checking back in a few releases 
to see how things are going in the ZFS world. ;)

Thanks everyone for your help, and keep improving! :)

-Todd
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pulling disks was: ZFS hangs/freezes after disk

2008-08-29 Thread Todd H. Poole
> Let's not be too quick to assign blame, or to think that perfecting
> the behaviour is straightforward or even possible.
>
> Start introducing random $20 components and you begin to dilute the
> quality and predictability of the composite system's behaviour.
>
> But this NEVER happens on linux *grin*.

Actually, it really doesn't! At least, it hasn't in many years...

I can't tell if you were being sarcastic or not, but honestly... you find a USB 
drive that can bring down your Linux machine, and I'll show you someone running 
a kernel from November of 2003. And for all the other "cheaper" components out 
there? Those are the components we make serious bucks off of. Just because it 
costs $30 doesn't mean it won't last a _really_ long time under stress! But if 
it doesn't, even when hardware fails, software's always there to route around 
it. So no biggie.

> Perfection?

Is Linux perfect?
Not even close. But certainly a lot closer at what the topic of this thread 
seems to cover: not crashing.

Linux may get a small number of things wrong, but it gets a ridiculously large 
number of them right, and stability/reliability on unstable/unreliable hardware 
is one of them. ;)

PS: I found this guy's experiment amusing. Talk about adding a bunch of cheap, 
$20 crappy components to a system, and still seeing it soar. 
http://linuxgazette.net/151/weiner.html
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing a Root Pool

2008-08-29 Thread Krister Joas
On Aug 30, 2008, at 8:45 AM, George Wilson wrote:
> Krister Joas wrote:
>> Hello.
>> I have a machine at home on which I have SXCE B96 installed on a  
>> root  zpool mirror.  It's been working great until yesterday.  The  
>> root pool  is a mirror with two identical 160GB disks.  The other  
>> day I added a  third disk to the mirror, a 250 GB disk.  Soon  
>> after, the third disk  developed some hardware problem and this is  
>> now preventing the system  from booting from the root pool.  It  
>> panics early on and reboots.
>> I'm trying to repair the system by dropping into single user mode   
>> after booting from a DVD-ROM.  I had to yank the third disk in  
>> order  for the machine to boot successfully at all.  However, in  
>> single user  mode I'm so far unable to do anything useful with the  
>> pool.  Using  "zpool import" the pool is listed as being DEGRADED,  
>> with one device  being UNAVAILABLE (cannot open).  The pool is also  
>> shown to be last  accessed by another system.  All this is as  
>> expected.  Any command  other than "zpool import" knows nothing  
>> about the pool "rpool", e.g.  "zpool status".  Assuming I have to  
>> import the pool before doing  anything like detaching any bad  
>> devices I try importing it using  "zpool import -f rpool".  This  
>> displays an error:
>> cannot import 'rpool': one or more devices is currently  
>> unavailable
>> At this point I'm stuck.  I can't boot from the pool and I can't   
>> access the pool after booting into single user mode from a DVD- 
>> ROM.   Does anyone have any advice on how to repair my system?
>> Krister
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> Krister,

Hi George,

> When you boot off of the DVD, there should be an option to go into  
> single-user shell. This will search for root instances. Does this  
> get displayed?

It says:

 Searching for installed OS instances...
 No installed OS instance found.

And then I'm at the single user prompt.

> Once you get to the DVD shell prompt, can you try to run 'zpool  
> import -F rpool'?

The result is the same as without a '-f'.  With both -f and -F it says  
"cannot open 'rpool': I/O error" and the pool disappears completely.   
It no longer shows up when doing "zpool import".  I have to reboot off  
the DVD again to get it back.

I'm using an ASUS P5K-E motherboard and the on board SATA ports.   
Could it be a motherboard problem?  I've tried moving the SATA cables  
to different ports but no change.  I have a second pool on this  
machine (called z2) on a separate disk (single disk - no redundancy).   
Running "zpool import -f z2" on that pool succeeds but then it fails  
with the following error: cannot mount '/z2': failed to create  
mountpoint.

> thanks,
> George

Thanks,
Krister

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-08-29 Thread Richard Elling
Miles Nordin wrote:
>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>> 
>
> re> if you use Ethernet switches in the interconnect, you need to
> re> disable STP on the ports used for interconnects or risk
> re> unnecessary cluster reconfigurations.
>
> RSTP/802.1w plus setting the ports connected to Solaris as ``edge'' is
> good enough, less risky for the WAN, and pretty ubiquitously supported
> with non-EOL switches.  The network guys will know this (assuming you
> have network guys) and do something like this:
>
> sw: can you disable STP for me?
>
> net: No?
>
> sw: 
>
> net: um,...i mean, Why?
>
> sw: []
>
> net: oh, that.  Ok, try it now.
>
> sw: thanks for disabling STP for me.
>
> net: i uh,.. whatever.  No problem!
>   

Precisely, this is not a problem that is usually solved unilaterally.

> re> Can we expect a similar attention to detail for ZFS
> re> implementers?  I'm afraid not :-(.
>
> wellyou weren't really ``expecting'' it of the sun cluster
> implementers.  You just ran into it by surprise in the form of an
> Issue.  

Rather, cluster implementers tend to RTFM. I know few ZFSers who
have RTFM, and do not expect many to do so... such is life.

> so, can you expect ZFS implementers to accept that running
> ZFS, iSCSI, FC-SW might teach them something about their LAN/SAN they
> didn't already know?  

No, I expect them to see a "problem" cause by network reconfiguration
and blame ZFS.  Indeed, this is what occasionally happens with Solaris
Cluster -- but only occasionally, solving via RTFM.

> So far they seem receptive to arcane advice like
> ``make this config change in your SAN controller to let it use the
> NVRAM cache more aggressively, and stop using EMC PowerPath unless
> .''  so, Yes?
>   

I have no idea what you are trying to say here.

> I think you can also expect them to wait longer than 40 seconds before
> declaring a system is frozen and rebooting it, though.
>   

Current [s]sd driver timeouts are 60 seconds with 3-5 retries by default.
We've had those timeouts for many, many years now and do provide highly
available services on such systems.  The B_FAILFAST change did improve
the availability of systems and similar tricks have improved service 
availability
for Solaris Clusters.  Refer to Eric's post for more details of this 
minefield.

NB some bugids one should research before filing new bugs here are:
CR 4713686: sd/ssd driver should have an additional target specific timeout
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4713686
CR 4500536 introduces B_FAILFAST
http://bugs.opensolaris.org/view_bug.do?bug_id=4500536

> ``Let's `patiently wait' forever because we think, based on our
> uncertainty, that FSPF might take several hours to converge'' is the
> alternative that strikes me as unreasonable.
>   

AFAICT, nobody is making such a proposal.  Did I miss a post?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Kernel Panic on import

2008-08-29 Thread Mike Aldred
Ok, I've managed to get around the kernel panic.

[EMAIL PROTECTED]:~/Download$ pfexec mdb -kw
Loading modules: [ unix genunix specfs dtrace cpu.generic uppc pcplusmp 
scsi_vhci zfs sd ip hook neti sctp arp usba uhci s1394 fctl md lofs random sppp 
ipc ptm fcip fcp cpc crypto logindmux ii nsctl sdbc ufs rdc nsmb sv ]
> vdev_uberblock_compare+0x49/W 1
vdev_uberblock_compare+0x49:0x  =   0x1
> vdev_uberblock_compare+0x3b/W 1
vdev_uberblock_compare+0x3b:0x  =   0x1
> zfsvfs_setup+0x60/v 0xeb
zfsvfs_setup+0x60:  0x74=   0xeb
> 

This has let me import the pool, without the kernel panicing, and I'm doing a 
scrub on the pool now.
The thing is, I don't know what those commands do, could anyone enlighten me?
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing a Root Pool

2008-08-29 Thread George Wilson
Krister Joas wrote:
> Hello.
> 
> I have a machine at home on which I have SXCE B96 installed on a root  
> zpool mirror.  It's been working great until yesterday.  The root pool  
> is a mirror with two identical 160GB disks.  The other day I added a  
> third disk to the mirror, a 250 GB disk.  Soon after, the third disk  
> developed some hardware problem and this is now preventing the system  
> from booting from the root pool.  It panics early on and reboots.
> 
> I'm trying to repair the system by dropping into single user mode  
> after booting from a DVD-ROM.  I had to yank the third disk in order  
> for the machine to boot successfully at all.  However, in single user  
> mode I'm so far unable to do anything useful with the pool.  Using  
> "zpool import" the pool is listed as being DEGRADED, with one device  
> being UNAVAILABLE (cannot open).  The pool is also shown to be last  
> accessed by another system.  All this is as expected.  Any command  
> other than "zpool import" knows nothing about the pool "rpool", e.g.  
> "zpool status".  Assuming I have to import the pool before doing  
> anything like detaching any bad devices I try importing it using  
> "zpool import -f rpool".  This displays an error:
> 
>  cannot import 'rpool': one or more devices is currently unavailable
> 
> At this point I'm stuck.  I can't boot from the pool and I can't  
> access the pool after booting into single user mode from a DVD-ROM.   
> Does anyone have any advice on how to repair my system?
> 
> Krister
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Krister,

When you boot off of the DVD, there should be an option to go into 
single-user shell. This will search for root instances. Does this get 
displayed?

Once you get to the DVD shell prompt, can you try to run 'zpool import 
-F rpool'?

thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-08-29 Thread Miles Nordin
> "re" == Richard Elling <[EMAIL PROTECTED]> writes:

re> if you use Ethernet switches in the interconnect, you need to
re> disable STP on the ports used for interconnects or risk
re> unnecessary cluster reconfigurations.

RSTP/802.1w plus setting the ports connected to Solaris as ``edge'' is
good enough, less risky for the WAN, and pretty ubiquitously supported
with non-EOL switches.  The network guys will know this (assuming you
have network guys) and do something like this:

sw: can you disable STP for me?

net: No?

sw: 

net: um,...i mean, Why?

sw: []

net: oh, that.  Ok, try it now.

sw: thanks for disabling STP for me.

net: i uh,.. whatever.  No problem!

re> Can we expect a similar attention to detail for ZFS
re> implementers?  I'm afraid not :-(.

wellyou weren't really ``expecting'' it of the sun cluster
implementers.  You just ran into it by surprise in the form of an
Issue.  so, can you expect ZFS implementers to accept that running
ZFS, iSCSI, FC-SW might teach them something about their LAN/SAN they
didn't already know?  So far they seem receptive to arcane advice like
``make this config change in your SAN controller to let it use the
NVRAM cache more aggressively, and stop using EMC PowerPath unless
.''  so, Yes?

I think you can also expect them to wait longer than 40 seconds before
declaring a system is frozen and rebooting it, though.

``Let's `patiently wait' forever because we think, based on our
uncertainty, that FSPF might take several hours to converge'' is the
alternative that strikes me as unreasonable.


pgpr5qvdq3JpM.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS noob question

2008-08-29 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> I took a snapshot of a directory in which I hold PDF files related to math.
> I then added a 50MB pdf file from a CD (Oxford Math Reference; I strongly
> reccomend this to any math enthusiast) and did "zfs list" to see the size of
> the snapshot (sheer curiosity). I don't have compression turned on for this
> filesystem. However, it seems that the 50MB PDF took up only 64K.  How is
> that possible?  Is ZFS such a good filesystem, that it shrinks files to a
> mere fraction of their size? 

If I understand correctly, you were expecting the snapshot to grow
in size because you made a change to the current filesystem, right?

Since the new file did not exist in the old snapshot, it could never
have known about the blocks of data in the new file.  The older snapshot
only needs to remember blocks that existed at the time of the snapshot
and which differ now, e.g. blocks in files which get modified or removed.

I would expect that when you add a new file, the only blocks that change
would be in the directory node (metadata blocks) which ends up containing
the new file.  That could indeed be 64k worth of changed blocks.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS noob question

2008-08-29 Thread Krenz von Leiberman
Hi. 
I took a snapshot of a directory in which I hold PDF files related to math. 
I then added a 50MB pdf file from a CD (Oxford Math Reference; I strongly 
reccomend this to any math enthusiast) and did "zfs list" to see the size of 
the snapshot (sheer curiosity). I don't have compression turned on for this 
filesystem. However, it seems that the 50MB PDF took up only 64K. 
How is that possible? 
Is ZFS such a good filesystem, that it shrinks files to a mere fraction of 
their size?
I viewed the file and it seems to be intact.
Is there anyone here, who can explain to me how this happened?  
I am awe-struck... in a good way, of course :D.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot delete file when fs 100% full

2008-08-29 Thread Paul Raines

On Fri, 29 Aug 2008, Shawn Ferry wrote:



On Aug 29, 2008, at 7:09 AM, Tomas Ögren wrote:


On 15 August, 2008 - Tomas Ögren sent me these 0,4K bytes:


On 14 August, 2008 - Paul Raines sent me these 2,9K bytes:


This problem is becoming a real pain to us again and I was wondering
if there has been in the past few month any known fix or workaround.


I had this problem in the past. Fortunately I was able to recover by removing 
an old snapshot which gave me enough room to deal with my problems.


Now, I create a fs called reserved and set a small reservation to ensure that 
there is a small amount of space available.


[sferry<@>noroute(0) 12:59 s001]

[6] zfs get reservation,mountpoint,canmount,type  noroute/reserved
NAME  PROPERTY VALUE SOURCE
noroute/reserved  reservation  50M   local
noroute/reserved  mountpoint   none  inherited from noroute
noroute/reserved  canmount off   local
noroute/reserved  type filesystem-

If I fill the pool now, I reduce the reservation (reduce instead of remove in 
case I have something writing uncontrollably to the pool) and clean up.


When this problem happens to us, I have no problem deleting a file as root
to get things back on track.  It is just that normal users cannot delete
(who are accessing only over NFS).  As soon as I delete a file as root,
then normal users can start deleting things themselves.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-08-29 Thread Bob Friesenhahn
On Fri, 29 Aug 2008, Miles Nordin wrote:
>
> I guess I'm changing my story slightly.  I *would* want ZFS to collect
> drive performance statistics and report them to FMA, but I wouldn't

Your email *totally* blew my limited buffer size, but this little bit 
remained for me to look at.  It left me wondering how ZFS would know 
if the device is a drive.  How can ZFS maintain statistics for a 
"drive" if it is perhaps not a drive at all?  ZFS does not require 
that the device be a "drive".  Isn't ZFS the wrong level to be 
managing the details of a device?

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-08-29 Thread Miles Nordin
> "es" == Eric Schrock <[EMAIL PROTECTED]> writes:

es> The main problem with exposing tunables like this is that they
es> have a direct correlation to service actions, and
es> mis-diagnosing failures costs everybody (admin, companies,
es> Sun, etc) lots of time and money.  Once you expose such a
es> tunable, it will be impossible to trust any FMA diagnosis,

Yeah, I tend to agree that the constants shouldn't be tunable, becuase
I hoped Sun would become a disciplined collection-point for experience
to set the constants, discipline meaning the constants are only
adjusted in response to bad diagnosis not ``preference,'' and in a
direction that improves diagnosis for everyone, not for ``the site''.

I'm not yet won over to the idea that statistical FMA diagnosis
constants shouldn't exist.  I think drives can't diagnose themselves
for shit, and I think drivers these days are diagnosees, not
diagnosers.  But clearly a confusingly-bad diagnosis is much worse
than diagnosis that's bad in a simple way.

es> If I issue a write to both halves of a mirror, should
es> I return when the first one completes, or when both complete?

well, if it's not a synchronous write, you return before you've
written either half of the mirror, so it's only an issue for
O_SYNC/ZIL writes, true?

BTW what does ZFS do right now for synchronous writes to mirrors, wait
for all, wait for two, or wait for one?

es> any such "best effort RAS" is a little dicey because you have
es> very little visibility into the state of the pool in this
es> scenario - "is my data protected?" becomes a very difficult
es> question to answer.

I think it's already difficult.  For example, a pool will say ONLINE
while it's resilvering, won't it?  I might be wrong.  

Take a pool that can only tolerate one failure.  Is the difference
between replacing an ONLINE device (still redundant) and replacing an
OFFLINE device (not redundant until resilvered) captured?  Likewise,
should a pool with a spare in use really be marked DEGRADED both
before the spare resilvers and after?

The answers to the questions aren't important so much as that you have
to think about the answers---what should they be, what are they
now---which means ``is my data protected?'' is already a difficult
question to answer.  

Also there were recently fixed bugs with DTL.  The status of each
device's DTL, even the existence and purpose of the DTL, isn't
well-exposed to the admin, and is relevant to answering the ``is my
data protected?''  question---indirect means of inspecting it like
tracking the status of resilvering seem too wallpapered given that the
bug escaped notice for so long.

I agree with the problem 100% and don't wish to worsen it, just
disagree that it's a new one.

re> 3 orders of magnitude range for magnetic disk I/Os, 4 orders
re> of magnitude for power managed disks.

I would argue for power management a fixed timeout.  The time to spin
up doesn't have anything to do with the io/s you got before the disk
spun down.  There's no reason to disguise the constant for which we
secretly wish inside some fancy math for deriving it just because
writing down constants feels bad.

unless you _know_ the disk is spinning up through some in-band means,
and want to compare its spinup time to recorded measurements of past
spinups.


This is a good case for pointing out there are two sets of rules:

 * 'metaparam -r' rules

   + not invoked at all if there's no redundancy.

   + very complicated

 - involve sets of disks, not one disk.  comparison of statistic
   among disks within a vdev (definitely), and comparison of
   individual disks to themselves over time (possibly).

 - complicated output: rules return a set of disks per vdev, not a
   yay-or-nay diagnosis per disk.  And there are two kinds of
   output decision:

   o for n-way mirrors, select anywhere from 1 to n disks.  for
 example, a three-way mirror with two fast local mirrors, one
 slow remote iSCSI mirror, should split reads among the two
 local disks.

 for raidz and raidz2 they can eliminate 0, 1 (,or 2) disks
 from the read-us set.  It's possible to issue all the reads
 and take the first sufficient set to return as Anton
 suggested, but I imagine 4-device raidz2 vdevs will be common
 which could some day perform as well as a 2-device mirror.

   o also, decide when to stop waiting on an existing read and
 re-issue it.  so the decision is not only about future reads,
 but has to cancel already-issued reads, possibly replacing
 the B_FAILFAST mechanism so there will be a second
 uncancellable round of reads once the first round exhausts
 all redundancy.

   o that second decision needs to be made thousands of times per
 second without a lot of CPU overhead

   + small consequence if the rules deliver false-positives, just

Re: [zfs-discuss] Proposed 2540 and ZFS configuration

2008-08-29 Thread Bob Friesenhahn
On Fri, 29 Aug 2008, Kyle McDonald wrote:
>> 
> What would one look for to decide what vdev to place each LUN?
>
> All mine have the same Current Load Balance value: round robin.

That is a good question and I will have to remind myself of the 
answer.  The "round robin" is good because that means that there are 
two working paths to the device.  There are two "Access State:" lines 
printed.  One is the status of the first path ('active' means used to 
transmit data), and the other is the status of the second path.  The 
controllers on the 2540 each "own" six of the drives by default (they 
operate active/standby at the drive level) so presumably (only an 
assumption) MPxIO directs traffic to the controller which has best 
access to the drive.

Assuming that you use a pool design which allows balancing, you would 
want to choose six disks which have 'active' in the first line, and 
six disks which have 'active' in the second line, and assure that your 
pool or vdev design takes advantage of this.

For example, my pool uses mirrored devices so I would split my mirrors 
so that one device is from the first set, and the other device is from 
the second set.  If you choose to build your pool with two raidz2s, 
then you could put all the devices active on the first fiber channel 
interface into the first raidz2, and the rest in the other.  This way 
you get balancing due to the vdev load sharing.  Another option with 
raidz2 is to make sure that half of the six disks are from each set so 
that writes to the vdev produce distributed load across the 
interfaces.  The reason why you might want to prefer load sharing at 
the vdev level is that if there is a performance problem with one 
vdev, the other vdev should still perform well and take more of the 
load.  The reason why you might want to load share within a vdev is 
that I/Os to the vdev might be more efficient.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-08-29 Thread Richard Elling
Nicolas Williams wrote:
> On Thu, Aug 28, 2008 at 11:29:21AM -0500, Bob Friesenhahn wrote:
>   
>> Which of these do you prefer?
>>
>>o System waits substantial time for devices to (possibly) recover in
>>  order to ensure that subsequently written data has the least
>>  chance of being lost.
>>
>>o System immediately ignores slow devices and switches to
>>  non-redundant non-fail-safe non-fault-tolerant may-lose-your-data
>>  mode.  When system is under intense load, it automatically
>>  switches to the may-lose-your-data mode.
>> 
>
> Given how long a resilver might take, waiting some time for a device to
> come back makes sense.  Also, if a cable was taken out, or drive tray
> powered off, then you'll see lots of drives timing out, and then the
> better thing to do is to wait (heuristic: not enough spares to recover).
>
>   

argv!  I didn't even consider switches.  Ethernet switches often use
spanning-tree algorithms to converge on the topology.  I'm not sure
what SAN switches use.  We have the following problem with highly
available clusters which use switches in the interconnect:
   + Solaris Cluster interconnect timeout defaults to 10 seconds
   + STP can take > 30 seconds to converge
So, if you use Ethernet switches in the interconnect, you need to
disable STP on the ports used for interconnects or risk unnecessary
cluster reconfigurations.  Normally, this isn't a problem as the people
who tend to build HA clusters also tend to read the docs which point
this out.  Still, a few slip through every few months. As usual, Solaris
Cluster gets blamed, though it really is a systems engineering problem.
Can we expect a similar attention to detail for ZFS implementers?
I'm afraid not :-(. 

I'm not confident we can be successful with sub-minute reconfiguration,
so the B_FAILFAST may be the best we could do for the general case.
That isn't so bad, in fact we use failfasts rather extensively for Solaris
Clusters, too.
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposed 2540 and ZFS configuration

2008-08-29 Thread Kyle McDonald
Bob Friesenhahn wrote:
> On Fri, 29 Aug 2008, Bob Friesenhahn wrote:
>   
>> If you do use the two raidz2 vdevs, then if you pay attention to how
>> MPxIO works, you can balance the load across your two fiber channel
>> links for best performance.  Each raidz2 vdev can be served (by
>> default) by a differente FC link.
>> 
>
> As a follow-up, here is a small script which will show how MPxIO is 
> creating paths to your devices.  The output of this when all paths and 
> devices are healthy may be useful for deciding how to create your 
> storage pool since then you can load-balance the I/O:
>
> #!/bin/sh
> # Test path access to multipathed devices
> devs=`mpathadm list lu | grep /dev/rdsk/`
> for dev in $devs
> do
>echo "=== $dev ==="
>mpathadm show lu $dev | egrep '(Access State)|(Current Load Balance)'
> done
>   
What would one look for to decide what vdev to place each LUN?

All mine have the same Current Load Balance value: round robin.

  -Kyle

> Bob
> ==
> Bob Friesenhahn
> [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot delete file when fs 100% full

2008-08-29 Thread Shawn Ferry

On Aug 29, 2008, at 7:09 AM, Tomas Ögren wrote:

> On 15 August, 2008 - Tomas Ögren sent me these 0,4K bytes:
>
>> On 14 August, 2008 - Paul Raines sent me these 2,9K bytes:
>>
>>> This problem is becoming a real pain to us again and I was wondering
>>> if there has been in the past few month any known fix or workaround.

I had this problem in the past. Fortunately I was able to recover by  
removing an old snapshot which gave me enough room to deal with my  
problems.

Now, I create a fs called reserved and set a small reservation to  
ensure that there is a small amount of space available.

[sferry<@>noroute(0) 12:59 s001]

[6] zfs get reservation,mountpoint,canmount,type  noroute/reserved
NAME  PROPERTY VALUE SOURCE
noroute/reserved  reservation  50M   local
noroute/reserved  mountpoint   none  inherited from noroute
noroute/reserved  canmount off   local
noroute/reserved  type filesystem-

If I fill the pool now, I reduce the reservation (reduce instead of  
remove in case I have something writing uncontrollably to the pool)  
and clean up.

Shawn

--
Shawn Ferry  shawn.ferry at sun.com
Senior Primary Systems Engineer
Sun Managed Operations






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposed 2540 and ZFS configuration

2008-08-29 Thread Bob Friesenhahn
On Fri, 29 Aug 2008, Bob Friesenhahn wrote:
>
> If you do use the two raidz2 vdevs, then if you pay attention to how
> MPxIO works, you can balance the load across your two fiber channel
> links for best performance.  Each raidz2 vdev can be served (by
> default) by a differente FC link.

As a follow-up, here is a small script which will show how MPxIO is 
creating paths to your devices.  The output of this when all paths and 
devices are healthy may be useful for deciding how to create your 
storage pool since then you can load-balance the I/O:

#!/bin/sh
# Test path access to multipathed devices
devs=`mpathadm list lu | grep /dev/rdsk/`
for dev in $devs
do
   echo "=== $dev ==="
   mpathadm show lu $dev | egrep '(Access State)|(Current Load Balance)'
done

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposed 2540 and ZFS configuration

2008-08-29 Thread Bob Friesenhahn
On Fri, 29 Aug 2008, Kenny wrote:
>
> 1) I didn't do raid2 because I didn't want to lose the space.  Is 
> this a bas idea??

Raidz2 is the most reliable vdev configuration other than 
triple-mirror.  The pool is only as strong as its weakest vdev. In 
private email I suggested using all 12 drives in two raidz2 vdevs. 
Other than due to natural disaster or other physical mishap, the 
probability that enough drives will independently fail to cause data 
loss in raidz2 is similar to winning the state lottery jackpot. Your 
Sun service contract should be able to get you a replacement drive by 
the next day.  A lot depends on if there are system administrators 
paying attention to the system who can take care of issues right away. 
If system administration is spotty or there is no one on site, then 
the ZFS spare is much more useful.

Using more vdevs provides more multi-user performance, which is 
important to your logging requirements.

If you do use the two raidz2 vdevs, then if you pay attention to how 
MPxIO works, you can balance the load across your two fiber channel 
links for best performance.  Each raidz2 vdev can be served (by 
default) by a differente FC link.

If you do enable compression, then that will surely make up for the 
additional space overhead of two raidz2 vdevs.

> 3) I have intentionally skipped the hardware hotspare and RAID 
> methods.  Is this a good idea??  What would be the best method to 
> intergrate both hardware and software

With the hardware JBOD approach, having the 2540 manage the hot spare 
would not make sense.

> 4) A fellow admin here voiced concern with having ZFS handle the 
> spare and raid functions.  Specifically that the overhead processing 
> would affect performance.  Does anyone have experiance with server 
> performance in this manner?

Having ZFS manage the spare costs nothing.  There will be additional 
overhead when building the replacement drive, but this overhead would 
be seen if the drive array handled it too.  Regardless, the drive 
array does not have the required information to build the drive.  ZFS 
does have that information so ZFS should be in charge of the spare 
drive.

> 5) If I wanted to add an additional disk tray in the near future (12 
> more 1TB disk), what would be the recommended method?  I was 
> thinking of simply createing additional vdevs and adding them to the 
> zpool.

That is a sensible approach.  If you know you will be running out of 
space, then it is best to install the additional hardware sooner than 
later since otherwise most of the data will be on the vdevs which were 
active first.  ZFS does not currently provide a way to re-write a pool 
so that it is better balanced across vdevs.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-08-29 Thread Nicolas Williams
On Thu, Aug 28, 2008 at 01:05:54PM -0700, Eric Schrock wrote:
> As others have mentioned, things get more difficult with writes.  If I
> issue a write to both halves of a mirror, should I return when the first
> one completes, or when both complete?  One possibility is to expose this
> as a tunable, but any such "best effort RAS" is a little dicey because
> you have very little visibility into the state of the pool in this
> scenario - "is my data protected?" becomes a very difficult question to
> answer.

Depending on the amount of redundancy left one might want the writes to
continue.  E.g., a 3-way mirror with one vdev timing out or going extra
slow, or Richard's lopsided mirror example.

The value of "best effort RAS" might make a useful property for mirrors
and RAIDZ-2.  If because of some slow vdev you've got less redundancy
for recent writes, but still have enough (for some value of "enough"),
and still have full redundancy for older writes, well, that's not so
bad.

Something like:

% # require at least successful writes to two mirrors and wait no more
% # than 15 seconds for the 3rd.
% zpool create mypool mirror ... mirror ... mirror ...
% zpool set minimum_redundancy=1 mypool
% zpool set vdev_write_wait=15s mypool

and for known-to-be-lopsided mirrors:

% # require at least successful writes to two mirrors and don't wait for
% # the slow vdevs
% zpool create mypool mirror ... mirror ... mirror -slow ...
% zpool set minimum_redundancy=1 mypool
% zpool set vdev_write_wait=0s mypool

?

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-08-29 Thread Nicolas Williams
On Thu, Aug 28, 2008 at 11:29:21AM -0500, Bob Friesenhahn wrote:
> Which of these do you prefer?
> 
>o System waits substantial time for devices to (possibly) recover in
>  order to ensure that subsequently written data has the least
>  chance of being lost.
> 
>o System immediately ignores slow devices and switches to
>  non-redundant non-fail-safe non-fault-tolerant may-lose-your-data
>  mode.  When system is under intense load, it automatically
>  switches to the may-lose-your-data mode.

Given how long a resilver might take, waiting some time for a device to
come back makes sense.  Also, if a cable was taken out, or drive tray
powered off, then you'll see lots of drives timing out, and then the
better thing to do is to wait (heuristic: not enough spares to recover).

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Proposed 2540 and ZFS configuration

2008-08-29 Thread Kenny
Hello again...

Now that I've got my 2540 up and running.  I'm considering which configuration 
is best.  I have a proposed config and wanted your opinions and comments on it.



Background

I have a requirement to host syslog data from approx 30 servers.  Currently the 
data is about 3.5TB in size spread across several servers.  The end game is to 
have the data in one location with some redundancy built-in. 

I have a SUN 2540 Disk Array with 12 1 TB (931GB metric) drives connected to a 
SUN T5220 server (32GB RAM).  My plan was to maximize useable space on the disk 
array by creating each disk as a volume (LUN) and having ZFS on the server 
raidz 11 drives and then adding  the 12th drive as a spare.  We have SUN 
service contracts in place to replace drives as soon as they go down.

With this config, I have a single zpool showing 9.06 T available.




Now questions.

1) I didn't do raid2 because I didn't want to lose the space.  Is this a bas 
idea??

2) I did only one vdev (all 11 LUNs) and added the spare.  Would it be better 
to break this up in to 2 vdevs (one w/ 6 LUNs and the other w/ 5)??  Why?

3) I have intentionally skipped the hardware hotspare and RAID methods.  Is 
this a good idea??  What would be the best method to intergrate both hardware 
and software 

4) A fellow admin here voiced concern with having ZFS handle the spare and raid 
functions.  Specifically that  the overhead processing would affect 
performance.  Does anyone have experiance with server performance in this 
manner?

5) If I wanted to add an additional disk tray in the near future (12 more 1TB 
disk), what would be the recommended method?  I was thinking of simply 
createing additional vdevs and adding them to the zpool.


Thanks in advance for the discussions!!

Regards,

--Kenny
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Upgrading my ZFS server

2008-08-29 Thread Joe S
Just an update to this thread with my results. To summarize, I have no
problems with the nVidia 750a chipset. It's simply a newer version of
the 5** series chipets that have reportedly worked well. Also, at
IDLE, this system uses 133 Watts:

CPU - AMD Athlon X2 4850e

Motherboard - XFX MD-A72P-7509
  * nVidia nForce 750a SLI chipset
  * 6x SATA, 1x eSATA
  * 2x PCIe 2.0 x16 slots
  * nVidia GeForce 8 series integrated video
  * 1x Marvell gigabit ethernet (disabled in BIOS

2x Kingston 2GB (2 x 1GB) 240-Pin DDR2 SDRAM ECC Unbuffered DDR2 800
(PC2 6400) Dual Channel Kit Server Memory
2x Intel PRO/1000 GT Desktop Adapter (82541PI)
6x Maxtor 6L300S0 drives (SATA)
1x 80GB IDE drive (OS)


# uname -a
SunOS  5.11 snv_96 i86pc i386 i86pc


# psrinfo -pv
The physical processor has 2 virtual processors (0 1)
  x86 (AuthenticAMD 60FB2 family 15 model 107 step 2 clock 2500 MHz)
AMD Athlon(tm) Dual Core Processor 4850e


# isainfo -bv
64-bit amd64 applications
tscp ahf cx16 sse3 sse2 sse fxsr amd_3dnowx amd_3dnow amd_mmx mmx cmov
amd_sysc cx8 tsc fpu


# prtconf -D
System Configuration:  Sun Microsystems  i86pc
Memory size: 3840 Megabytes
System Peripherals (Software Nodes):

i86pc (driver name: rootnex)
scsi_vhci, instance #0 (driver name: scsi_vhci)
isa, instance #0 (driver name: isa)
asy, instance #0 (driver name: asy)
motherboard
pit_beep, instance #0 (driver name: pit_beep)
pci, instance #0 (driver name: npe)
pci10de,cb84
pci10de,cb84
pci10de,cb84
pci10de,cb84
pci10de,cb84
pci10de,cb84
pci10de,cb84, instance #0 (driver name: ohci)
pci10de,cb84, instance #0 (driver name: ehci)
pci10de,cb84, instance #1 (driver name: ohci)
pci10de,cb84, instance #1 (driver name: ehci)
pci-ide, instance #0 (driver name: pci-ide)
ide, instance #0 (driver name: ata)
cmdk, instance #0 (driver name: cmdk)
ide (driver name: ata)
pci10de,75a, instance #0 (driver name: pci_pci)
pci8086,1376, instance #0 (driver name: e1000g)
pci8086,1376, instance #1 (driver name: e1000g)
pci10de,cb84, instance #0 (driver name: ahci)
disk, instance #1 (driver name: sd)
disk, instance #2 (driver name: sd)
disk, instance #3 (driver name: sd)
disk, instance #4 (driver name: sd)
disk, instance #5 (driver name: sd)
disk, instance #6 (driver name: sd)
pci10de,569, instance #1 (driver name: pci_pci)
display, instance #0 (driver name: vgatext)
pci10de,778 (driver name: pcie_pci)
pci10de,75b (driver name: pcie_pci)
pci10de,77a (driver name: pcie_pci)
pci1022,1100, instance #0 (driver name: mc-amd)
pci1022,1101, instance #1 (driver name: mc-amd)
pci1022,1102, instance #2 (driver name: mc-amd)
pci1022,1103, instance #0 (driver name: amd64_gart)
pci, instance #0 (driver name: pci)
iscsi, instance #0 (driver name: iscsi)
pseudo, instance #0 (driver name: pseudo)
options, instance #0 (driver name: options)
agpgart, instance #0 (driver name: agpgart)
xsvc, instance #0 (driver name: xsvc)
used-resources
cpus, instance #0 (driver name: cpunex)
cpu (driver name: cpudrv)
cpu (driver name: cpudrv)


# prtdiag
System Configuration: To Be Filled By O.E.M. To Be Filled By O.E.M.
BIOS Configuration: American Megatrends Inc. 080015  05/30/2008

 Processor Sockets 

Version  Location Tag
 --
AMD Athlon(tm) Dual Core Processor 4850e CPU 1

 Memory Device Sockets 

TypeStatus Set Device Locator  Bank Locator
--- -- --- --- 
DDR2in use 0   DIMM0   BANK0
DDR2in use 0   DIMM1   BANK1
DDR2in use 0   DIMM2   BANK2
DDR2in use 0   DIMM3   BANK3

 On-Board Devices =
  To Be Filled By O.E.M.

 Upgradeable Slots 

ID  StatusType Description
--- -  
0   in useAGP 4X   AGP
1   in usePCI  PCI1


# cfgadm
Ap_Id  Type Receptacle   Occupant Condition
sata0/0::dsk/c2t0d0disk connectedconfigured   ok
sata0/1::dsk/c2t1d0disk connectedconfigured   ok
sata0/2::dsk/c2t2d0disk connectedconfigured   ok
sata0/3::dsk/c2t3d0disk connectedconfigured   ok
sata0/4::dsk/c2t4d0disk connectedconfigured   ok
sata0/5::dsk/c2t5d0disk connectedconfigured   ok




On Sa

Re: [zfs-discuss] ZFS Pools 1+TB

2008-08-29 Thread Kenny
To All...

Problem solved.  Operator error on my part.  (but I did learn something!!  
)

Thank you all very much!


--Kenny
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing a Root Pool

2008-08-29 Thread Krister Joas
Here is the output from "zpool import" showing the configuration of  
the pool in case that can help diagnosing my problem.

   pool: rpool
 id: ...
  state: DEGRADED
status: The pool was last accessed by another system.
action: The pool can be imported despite missing or damaged devices.   
The
 fault tolerance of the pool may be compromised if imported.
see: http://www.sun.com/msg/ZFS-8000-EY
config:

 rpool DEGRADED
   mirror  DEGRADED
 c1t2d0s0  ONLINE
 c1t3d0s0  ONLINE
 c1t1d0s0  UNAVAIL  cannot open

Thanks,
Krister

On Aug 29, 2008, at 9:25 PM, Krister Joas wrote:

> Hello.
>
> I have a machine at home on which I have SXCE B96 installed on a  
> root zpool mirror.  It's been working great until yesterday.  The  
> root pool is a mirror with two identical 160GB disks.  The other day  
> I added a third disk to the mirror, a 250 GB disk.  Soon after, the  
> third disk developed some hardware problem and this is now  
> preventing the system from booting from the root pool.  It panics  
> early on and reboots.
>
> I'm trying to repair the system by dropping into single user mode  
> after booting from a DVD-ROM.  I had to yank the third disk in order  
> for the machine to boot successfully at all.  However, in single  
> user mode I'm so far unable to do anything useful with the pool.   
> Using "zpool import" the pool is listed as being DEGRADED, with one  
> device being UNAVAILABLE (cannot open).  The pool is also shown to  
> be last accessed by another system.  All this is as expected.  Any  
> command other than "zpool import" knows nothing about the pool  
> "rpool", e.g. "zpool status".  Assuming I have to import the pool  
> before doing anything like detaching any bad devices I try importing  
> it using "zpool import -f rpool".  This displays an error:
>
>cannot import 'rpool': one or more devices is currently unavailable
>
> At this point I'm stuck.  I can't boot from the pool and I can't  
> access the pool after booting into single user mode from a DVD-ROM.   
> Does anyone have any advice on how to repair my system?
>
> Krister
>

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot delete file when fs 100% full

2008-08-29 Thread Michael Schuster
On 08/29/08 04:09, Tomas Ögren wrote:
> On 15 August, 2008 - Tomas Ögren sent me these 0,4K bytes:
> 
>> On 14 August, 2008 - Paul Raines sent me these 2,9K bytes:
>>
>>> This problem is becoming a real pain to us again and I was wondering
>>> if there has been in the past few month any known fix or workaround.
>> Sun is sending me an IDR this/next week regarding this bug..
> 
> It seems to work, but I am unfortunately not allowed to pass this IDR

IDR are "point patches", built against specific kernel builds (IIRC) and as 
such not intended for a wider distribution. Therefore they need to be 
tracked so they can be replaced with the proper patch once that is available.
If you believe you need the IDR, you need to get in touch with your local 
services organisation and ask them to get it to you - they know the proper 
procedures to make sure you get one that works on your machine(s) and that 
you also get the patch once it's available.

HTH
Michael
-- 
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Repairing a Root Pool

2008-08-29 Thread Krister Joas
Hello.

I have a machine at home on which I have SXCE B96 installed on a root  
zpool mirror.  It's been working great until yesterday.  The root pool  
is a mirror with two identical 160GB disks.  The other day I added a  
third disk to the mirror, a 250 GB disk.  Soon after, the third disk  
developed some hardware problem and this is now preventing the system  
from booting from the root pool.  It panics early on and reboots.

I'm trying to repair the system by dropping into single user mode  
after booting from a DVD-ROM.  I had to yank the third disk in order  
for the machine to boot successfully at all.  However, in single user  
mode I'm so far unable to do anything useful with the pool.  Using  
"zpool import" the pool is listed as being DEGRADED, with one device  
being UNAVAILABLE (cannot open).  The pool is also shown to be last  
accessed by another system.  All this is as expected.  Any command  
other than "zpool import" knows nothing about the pool "rpool", e.g.  
"zpool status".  Assuming I have to import the pool before doing  
anything like detaching any bad devices I try importing it using  
"zpool import -f rpool".  This displays an error:

 cannot import 'rpool': one or more devices is currently unavailable

At this point I'm stuck.  I can't boot from the pool and I can't  
access the pool after booting into single user mode from a DVD-ROM.   
Does anyone have any advice on how to repair my system?

Krister

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot delete file when fs 100% full

2008-08-29 Thread Tomas Ögren
On 15 August, 2008 - Tomas Ögren sent me these 0,4K bytes:

> On 14 August, 2008 - Paul Raines sent me these 2,9K bytes:
> 
> > This problem is becoming a real pain to us again and I was wondering
> > if there has been in the past few month any known fix or workaround.
> 
> Sun is sending me an IDR this/next week regarding this bug..

It seems to work, but I am unfortunately not allowed to pass this IDR
on. Temporary patch (redistributable) will surface soon and a real patch
in 3-6 weeks (Sun Eng estimate).

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6664765

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS, Kernel Panic on import

2008-08-29 Thread Mike Aldred
G'day,

I've got a OpenSolaris server n95, that I use for media, serving.  It's uses a 
DQ35JOE motherboard, dual core, and I have my rpool mirrored on two IDE 40GB 
drives, and my media mirrored on 2 x 500GB SATA drives.

I've got a few CIFS shares on the media drive, and I'm using MediaTomb to 
stream to my PS3. No problems at all, until today.  I was at work (obviously 
not working too hard :) ), when I thought that I really should scrub my pools, 
since I hasn't done it for awhile.  So I SSHed into the box, and did a scrub on 
both pools.

A few minutes later, I lost my SSH connection... uh oh, but not too worried, I 
thought that the ADSL must've gone down or something.

Came home, and the server is in a reboot loop, kernel panic.  Nuts...

Booted into the LiveDVD of snv_95, no problem, set about scrubbing my rpool, 
everything is good, until I decide to import and start scrubbing my storage 
pool... kernel panic... Nuts...

Removed the storage pool drives from the machine, no problem, boots up fine and 
starts scrubbing the rpool again.  No problems.  Decided to more the storage 
drives over to my desktop machine, try to import kernel panic...

So, the trick is, how do I fix it?

I've read a few posts, and I've seen other people with similar problems, but I 
have to admit I'm simply not smart enough to solve the problem, so, anyone got 
any ideas?

Here's some info that I hope prove useful.

[EMAIL PROTECTED]:~/Desktop$ pfexec zpool import
  pool: storage
id: 6933883927787501942
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

storage ONLINE
  mirrorONLINE
c3t3d0  ONLINE
c3t2d0  ONLINE

[EMAIL PROTECTED]:~/Desktop$ zdb -uuu -e storage
Uberblock

magic = 00bab10c
version = 10
txg = 3818020
guid_sum = 6700303293925244073
timestamp = 1220003402 UTC = Fri Aug 29 17:50:02 2008
rootbp = [L0 DMU objset] 400L/200P DVA[0]=<0:6a00058e00:200> 
DVA[1]=<0:2a8600:200> DVA[2]=<0:3800050600:200> fletcher4 lzjb LE 
contiguous birth=3818020 fill=170 
cksum=8b56cdef9:38379d3cd95:b809c1c9bb15:197649b024bfd1

[EMAIL PROTECTED]:~/Desktop$ zdb -e -bb storage

Traversing all blocks to verify nothing leaked ...

No leaks (block sum matches space maps exactly)

bp count: 3736040
bp logical:484538716672  avg: 129693
bp physical:   484064542720  avg: 129566compression:   1.00
bp allocated:  484259193344  avg: 129618compression:   1.00
SPA allocated: 484259193344 used: 97.20%

Blocks  LSIZE   PSIZE   ASIZE avgcomp   %Total  Type
   105  1.11M339K   1017K9.7K3.35 0.00  deferred free
 232K  4K   12.0K   6.00K8.00 0.00  object directory
 2 1K  1K   3.00K   1.50K1.00 0.00  object array
 116K   1.50K   4.50K   4.50K   10.67 0.00  packed nvlist
 -  -   -   -   -   --  packed nvlist size
 116K   3.00K   9.00K   9.00K5.33 0.00  bplist
 -  -   -   -   -   --  bplist header
 -  -   -   -   -   --  SPA space map header
   373  2.14M801K   2.35M   6.44K2.73 0.00  SPA space map
 3  40.0K   40.0K   40.0K   13.3K1.00 0.00  ZIL intent log
   552  8.62M   2.40M   4.82M   8.94K3.60 0.00  DMU dnode
 8 8K  4K   8.50K   1.06K2.00 0.00  DMU objset
 -  -   -   -   -   --  DSL directory
 8 4K  4K   12.0K   1.50K1.00 0.00  DSL directory child map
 7  3.50K   3.50K   10.5K   1.50K1.00 0.00  DSL dataset snap map
15   225K   25.0K   75.0K   5.00K8.98 0.00  DSL props
 -  -   -   -   -   --  DSL dataset
 -  -   -   -   -   --  ZFS znode
 -  -   -   -   -   --  ZFS V0 ACL
 3.56M   451G451G451G127K1.00   100.00  ZFS plain file
 1.55K   9.9M   1.51M   3.03M   1.95K6.55 0.00  ZFS directory
 7  3.50K   3.50K   7.00K  1K1.00 0.00  ZFS master node
40   550K   87.0K174K   4.35K6.32 0.00  ZFS delete queue
 -  -   -   -   -   --  zvol object
 -  -   -   -   -   --  zvol prop
 -  -   -   -   -   --  other uint8[]
 -  -   -   -   -   --  other uint64[]
 1512 512   1.50K   1.50K1.00 0.00  other ZAP
 -  -   -   -   -   --  persistent error log
 1   128K   10.0K   30.0K   30.0K   12.80 0.00  SPA history
 -  -   -   -   -