Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Stephan Budach
Hi Edward,

well that was exactly my point, when I raised this question. If zfs send is 
able to identify corrupted files while it transfers a snapshot, why shouldn't 
scrub be able to do the same?

ZFS send quit with an I/O error and zpool status -v showed my the file that 
indeed had problems. Since I thought that zfs send also operates on the block 
level, I thought whether or not scrub would basically do the same thing.

On the other hand scrub really doesn't care about what to read from the device 
- it simply reads all blocks, which is not the case when running zfs send.

Maybe, if zfs send could just go on and not halt on an I/O error and instead 
just print out the errors…

Cheers,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZPool creation brings down the host

2010-10-06 Thread James C. McPherson

On  7/10/10 03:46 PM, Ramesh Babu wrote:

I am trying to create ZPool using single veritas volume. The host is going
down as soon as I issue zpool create command. It looks like the command is
crashing and bringing host down. Please let me know what the issue might
be.Below is the command used, textvol is the veritas volume and testpool
is the name of pool which I am tyring to create.

zpool create testpool /dev/vx/dsk/dom/textvol



That's not a configuration that I'd recommend - you're layering
one volume management system on top of another. It seems that
it's getting rather messy inside the kernel.


Do you have the panic stack trace we can look at, and/or a
crash dump?



James C. McPherson
--
Oracle
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bursty writes - why?

2010-10-06 Thread Eff Norwood
The NFS client that we're using always uses O_SYNC, which is why it was 
critical for us to use the DDRdrive X1 as the ZIL. I was unclear on the entire 
system we're using, my apologies. It is:

OpenSolaris SNV_134
Motherboard: SuperMicro X8DAH
RAM: 72GB
CPU: Dual Intel 5503 @ 2.0GHz
ZIL: DDRdrive X1 (two of these, independent and not mirrored)
Drives: 24 x Seagate 1TB SAS, 7200 RPM
Network connected via 3 x gigabit links as LACP + 1 gigabit backup, IPMP on top 
of those.

The output I posted is from zpool iostat and I used that because it corresponds 
to what users are seeing. Whenever zpool iostat shows write activity, the file 
copies to the system are working as expected. As soon as zpool iostat shows no 
activity, the writes all pause. The simple test case is to copy a cd-rom ISO 
image to the server while doing the zpool iostat.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZPool creation brings down the host

2010-10-06 Thread Ramesh Babu
I am trying to create ZPool using single veritas volume. The host is going
down as soon as I issue zpool create command. It looks like the command is
crashing and bringing host down. Please let me know what the issue might
be.Below is the command used, textvol is the veritas volume and testpool is
the name of pool which I am tyring to create.

zpool create testpool /dev/vx/dsk/dom/textvol
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Eric D. Mudama

On Wed, Oct  6 at 22:04, Edward Ned Harvey wrote:

* Because ZFS automatically buffers writes in ram in order to
aggregate as previously mentioned, the hardware WB cache is not
beneficial.  There is one exception.  If you are doing sync writes
to spindle disks, and you don't have a dedicated log device, then
the WB cache will benefit you, approx half as much as you would
benefit by adding dedicated log device.  The sync write sort-of
by-passes the ram buffer, and that's the reason why the WB is able
to do some good in the case of sync writes.


All of your comments made sense except for this one.

Every N seconds when the system decides to burst writes to media from
RAM, those writes are only sequential in the case where the underlying
storage devices are significantly empty.

Once you're in a situation where your allocations are scattered across
the disk due to longer-term fragmentation, I don't see any way that a
write cache would hurt performances on the devices, since it'd allow
the drive to reorder writes to the media within that burst of data.

Even though ZFS is issuing writes of ~256 sectors if it can, that is
only a fraction of a revolution on a modern drive, so random writes of
128KB still have significant opportunity for reordering optimization.

Granted, with NCQ or TCQ you can get back much of the cache-disabled
performance loss, however, in any system that implements an internal
queue depth greater than the protocol-allowed queue depth, there is
opportunity for improvement, to an asymptotic limit driven by servo
settle speed.

Obviously this performance improvement comes with the standard WB
risks, and YMMV, IANAL, etc.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Stephan Budach
Hi Edward,

these are interesting points. I have considered a couple of them, when I 
started playing around with ZFS.

I am not sure whether I disagree with all of your points, but I conducted a 
couple of tests, where I configured my raids as jbods and mapped each drive out 
as a seperate LUN and I couldn't notice a difference in performance in any way.

I'd love to discuss this in a seperate thread, but first I will have to check 
the archives an Google. ;)

Thanks,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Stephan Budach
> 
> Now, scrub would reveal corrupted blocks on the devices, but is there a
> way to identify damaged files as well?

I saw a lot of people offering the same knee-jerk reaction that I had:
"Scrub."  And that is the only correct answer, to make a best effort at
salvaging data.  But I think there is a valid question here which was
neglected.

*Does* scrub produce a list of all the names of all the corrupted files?
And if so, how does it do that?

If scrub is operating at a block-level (and I think it is), then how can
checksum failures be mapped to file names?  For example, this is a
long-requested feature of "zfs send" which is fundamentally difficult or
impossible to implement.

Zfs send operates at a block level.  And there is a desire to produce a list
of all the incrementally changed files in a zfs incremental send.  But no
capability of doing that.

It seems, if scrub is able to list the names of files that correspond to
corrupted blocks, then zfs send should be able to list the names of files
that correspond to changed blocks, right?

I am reaching the opposite conclusion of what's already been said.  I think
you should scrub, but don't expect file names as a result.  I think if you
want file names, then tar > /dev/null will be your best friend.

I didn't answer anything at first, cuz I was hoping somebody would have that
answer.  I only know that I don't know, and the above is my best guess.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Stephan Budach
> 
> Ian,
> 
> yes, although these vdevs are FC raids themselves, so the risk is… uhm…
> calculated.

Whenever possible, you should always JBOD the storage and let ZFS manage the 
raid, for several reasons.  (See below).  Also, as counter-intuitive as this 
sounds (see below) you should disable hardware write-back cache (even with BBU) 
because it hurts performance in any of these situations:  (a) Disable WB if you 
have access to SSD or other nonvolatile dedicated log device.  (b) Disable WB 
if you know all of your writes to be async mode and not sync mode.  (c) Disable 
WB if you've opted to disable ZIL.

* Hardware raid blindly assumes the redundant data written to disk is written 
correctly.  So later, if you experience a checksum error (such as you have) 
then it's impossible for ZFS to correct it.  The hardware raid doesn't know a 
checksum error has occurred, and there is no way for the OS to read the "other 
side of the mirror" to attempt correcting the checksum via redundant data.

* ZFS has knowledge of both the filesystem, and the block level devices, while 
hardware raid has only knowledge of block level devices.  Which means ZFS is 
able to optimize performance in ways that hardware cannot possibly do.  For 
example, whenever there are many small writes taking place concurrently, ZFS is 
able to remap the physical disk blocks of those writes, to aggregate them into 
a single sequential write.  Depending on your metric, this yields 1-2 orders of 
magnitude higher IOPS.

* Because ZFS automatically buffers writes in ram in order to aggregate as 
previously mentioned, the hardware WB cache is not beneficial.  There is one 
exception.  If you are doing sync writes to spindle disks, and you don't have a 
dedicated log device, then the WB cache will benefit you, approx half as much 
as you would benefit by adding dedicated log device.  The sync write sort-of 
by-passes the ram buffer, and that's the reason why the WB is able to do some 
good in the case of sync writes.  

Ironically, if you have WB enabled, and you have a SSD log device, then the WB 
hurts you.  You get the best performance with SSD log, and no WB.  Because the 
WB "lies" to the OS, saying some tiny chunk of data has been written... then 
the OS will happily write another tiny chunk, and another, and another.  The WB 
is only buffering a lot of tiny random writes, and in aggregate, it will only 
go as fast as the random writes.  It undermines ZFS's ability to aggregate 
small writes into sequential writes.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of 2-way mirror

2010-10-06 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Tony MacDoodle
> 
> Is it possible to add 2 disks to increase the size of the pool below?
> 
> NAME STATE READ WRITE CKSUM
>   testpool ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> c1t2d0 ONLINE 0 0 0
> c1t3d0 ONLINE 0 0 0
> mirror-1 ONLINE 0 0 0
> c1t4d0 ONLINE 0 0 0
> c1t5d0 ONLINE 0 0 0

It's important that you know the difference between "add" and "attach"
methods for increasing this size...

If you "add" another mirror, then you'll have mirror-0, mirror-1, and
mirror-2.  You cannot remove any of the existing devices. 

If you "attach" a larger disk to mirror-0, and possibly fiddle with the
autoexpand property and a little bit of additional futzing (pretty basic,
including resilver & detach the old devices) then you can effectively
replace the existing devices with larger devices.  No need to consume extra
disk bays.

It's all a matter of which is the more desirable outcome for you.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bursty writes - why?

2010-10-06 Thread Bob Friesenhahn

On Wed, 6 Oct 2010, Marty Scholes wrote:

If you think about it, this is far more sane than flushing to disk 
every time the write() system call is used.


Yes, it dramatically diminishes the number of copy-on-write writes and 
improves the pool layout efficiency.  It also saves energy.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bursty writes - why?

2010-10-06 Thread Marty Scholes
I think you are seeing ZFS store up the writes, coalesce them, then flush to 
disk every 30 seconds.

Unless the writes are synchronous, the ZIL won't be used, but the writes will 
be cached instead, then flushed.

If you think about it, this is far more sane than flushing to disk every time 
the write() system call is used.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side

2010-10-06 Thread Nicolas Williams
On Wed, Oct 06, 2010 at 05:19:25PM -0400, Miles Nordin wrote:
> > "nw" == Nicolas Williams  writes:
> 
> nw> *You* stated that your proposal wouldn't allow Windows users
> nw> full control over file permissions.
> 
> me: I have a proposal
> 
> you: op!  OP op, wait!  DOES YOUR PROPOSAL blah blah WINDOWS blah blah
>  COMPLETELY AND EXACTLY LIKE THE CURRENT ONE.
> 
> me: no, but what it does is...

The correct quote is:

"no, not under my proposal."

That's from a post from you on September 30, 2010, with Message-Id:
.  That was a direct answer to a
direct question.

Now, maybe you wish to change your view.  That'd be fine.  Do not,
however, imply that I'm liar though, not if you want to be taken
seriously.  Please re-write your proposal _clearly_ and refrain from
personal attacks.

Cheers,

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side

2010-10-06 Thread Miles Nordin
> "nw" == Nicolas Williams  writes:

nw> *You* stated that your proposal wouldn't allow Windows users
nw> full control over file permissions.

me: I have a proposal

you: op!  OP op, wait!  DOES YOUR PROPOSAL blah blah WINDOWS blah blah
 COMPLETELY AND EXACTLY LIKE THE CURRENT ONE.

me: no, but what it does is...

you: well then I don't even have to read it.  It's unacceptable
 because $BLEH.

me: untrue.  My proposal handles $BLEH just fine.

you: you just said it didn't!

me: well, it does.  Please read it.

you: I read it and I don't understand it.  Anyway it doesn't handle
 $BLEH so it's no good.


This is not really working, and concision is the problem.  so, I now,
today, state:

My proposal allows Windows users full control over file permissions.

nw> Yes, that may be.  I encourage you to find a clearer way to
nw> express your proposal.

So far, it's just us talking.  I think I'll wait and see if anyone
besides you reads it.  If so, maybe they can ask questions that help
me clarify it.  If no one does, it's probably not interesting here
anyway.


pgp4wuhrA1SzN.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Miles Nordin
> "dd" == David Dyer-Bennet  writes:

dd> Richard Elling said ZFS handles the 4k real 512byte fake
dd> drives okay now in default setups

There are two steps to handling it well.  one is to align the start of
partitions to 4kB, and apparently on Solaris (thanks to all the
cumbersome partitioning tools) that is done.  On Linux you often have
to really pay attention to make this happen, depending on the
partitioning tool that happens to be built into your ``distro'' or
whatever.

The second step is to never write anything smaller than 4kB.  ex., if
you want to write 0.5kB, pad it with 3.5kB of zeroes to avoid the
read-modify-write penalty.  AIUI that is not done yet, and zfs does
sometimes want to write 0.5kB.  When it's writing 128kB of course
there is no penalty.  For this, I think XFS and NTFS are actually
better and tend not to write the small blocks, but I could be wrong.


pgpn3kSSlfThy.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Bursty writes - why?

2010-10-06 Thread Eff Norwood
I have a 24 x 1TB system being used as an NFS file server. Seagate SAS disks 
connected via an LSI 9211-8i SAS controller, disk layout 2 x 11 disk RAIDZ2 + 2 
spares. I am using 2 x DDR Drive X1s as the ZIL. When we write anything to it, 
the writes are always very bursty like this:

ool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0232  0  29.0M
xpool488K  20.0T  0101  0  12.7M
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0 50  0  6.37M
xpool488K  20.0T  0477  0  59.7M
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool   74.7M  20.0T  0702  0  76.2M
xpool   74.7M  20.0T  0577  0  72.2M
xpool   74.7M  20.0T  0110  0  13.9M
xpool   74.7M  20.0T  0  0  0  0
xpool   74.7M  20.0T  0  0  0  0
xpool   74.7M  20.0T  0  0  0  0
xpool   74.7M  20.0T  0  0  0  0

Whenever you see 0 the write is just hanging. What I would like to see is at 
least some writing happening every second. What can I look at for this issue?

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Miles Nordin
> "ag" == Andrew Gabriel  writes:

ag> Having now read a number of forums about these, there's a
ag> strong feeling WD screwed up by not providing a switch to
ag> disable pseudo 512b access so you can use the 4k native.

this reporting lie is no different from SSD's which have 2 - 8 kB
sectors on the inside and benefit from alignment.  I think probably
everything will report 512 byte sectors forever.  If a device had a
4224-byte sector, it would make sense to report that, but I don't see
a big downside to reporting 512 when it's really 4096.

NAND flash often does have sectors with odd sizes like 4224, and (some
of) Linux's NAND-friendly filesystems (ubifs, yaffs, nilfs) use this
OOB area for filesystem structures, which are intermixed with the ECC.
but in that case it's not a SCSI interface to the odd-sized
sector---it's an ``mtd'' interface that supports operations like
``erase page'', ``suspend erasing'', ``erase some more''.

that said I am in the ``ignore WD for now'' camp.  but this isn't why.
Ignore them (among other, better reasons) because they have 4k sectors
at all which don't yet work well until we can teach ZFS to never write
smaller than 4kB.  but failure to report 4k as SCSI 4kB sector is not
a problem, to my view.  You can just align your partitions.


pgp6jwIDoUJ9i.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Simon Breden
> Hi all
> 
> I just discovered WD Black drives are rumored not to
> be set to allow TLER.

Yep: http://opensolaris.org/jive/message.jspa?messageID=501159#501159

> Enterprise drives will cost
> about 60% more, and on a large install, that means a
> lot of money...

True, sometimes more than twice the price.

If these are for a business, personally I would invest in TLER-capable drives 
like the WD REx models (RAID Edition). These allow for fast fails on read/write 
errors so that the data can be remapped. This prevents the possibility of the 
drive being kicked from the array.

If these are for home and you don't have, or are not willing to spend a lot 
more on TLER-capable drives then go for something reliable. Forget WD Green 
drives (see links below). After WD removed TLER-setting on their non-enterprise 
drives, I have switched to Samsung HD203WI drives and so far these have been 
flawless. I believe it's a 4-platter model. Samsung have very recently (last 
month?) brought out a HD204UI model which is a 3-platter (667GB per platter) 
model, which should be even better -- check the newegg ratings for good/bad 
news etc.

http://opensolaris.org/jive/thread.jspa?threadID=121871&tstart=0
http://breden.org.uk/2009/05/01/home-fileserver-a-year-in-zfs/#drives
http://jmlittle.blogspot.com/2010/03/wd-caviar-green-drives-and-zfs.html 

Cheers,
Simon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side

2010-10-06 Thread Nicolas Williams
On Wed, Oct 06, 2010 at 04:38:02PM -0400, Miles Nordin wrote:
> > "nw" == Nicolas Williams  writes:
> 
> nw> The current system fails closed 
> 
> wrong.
> 
> $ touch t0
> $ chmod 444 t0
> $ chmod A0+user:$(id -nu):write_data:allow t0
> $ ls -l t0
> -r--r--r--+  1 carton   carton 0 Oct  6 20:22 t0
> 
> now go to an NFSv3 client:
> $ ls -l t0
> -r--r--r-- 1 carton 405 0 2010-10-06 16:26 t0
> $ echo lala > t0
> $ 
> 
> wide open.

The system does what the ACL says.  The mode fails to accurately
represent the actual access because... the mode can't.  Now, we could
have chosen (and still could choose to) represent the presence of ACEs
for subjects other than owner@/group@/everyone@ by using the group bits
of the mode to represent the maximal set of permissions granted.

But I don't consider the above "failing open".

> nw> You seem to be in denial.  You continue to ignore the
> nw> constraint that Windows clients must be able to fully control
> nw> permissions in spite of their inability to perceive and modify
> nw> file modes.
> 
> You remain unshakably certain that this is true of my proposal in
> spite of the fact that you've said clearly that you don't understand
> my proposal.  That's bad science.

*You* stated that your proposal wouldn't allow Windows users full
control over file permissions.

> It may be my fault that you don't understand it: maybe I need to write
> something shorter but just as expressive to fit within mailing list
> attention spans, or maybe my examples are unclear.  However that
> doesn't mean that I'm in denial nor make you right---that just makes
> me annoying.

Yes, that may be.  I encourage you to find a clearer way to express your
proposal.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side

2010-10-06 Thread Miles Nordin
> "nw" == Nicolas Williams  writes:

nw> The current system fails closed 

wrong.

$ touch t0
$ chmod 444 t0
$ chmod A0+user:$(id -nu):write_data:allow t0
$ ls -l t0
-r--r--r--+  1 carton   carton 0 Oct  6 20:22 t0

now go to an NFSv3 client:
$ ls -l t0
-r--r--r-- 1 carton 405 0 2010-10-06 16:26 t0
$ echo lala > t0
$ 

wide open.

NFSv3 and SMB sharing the same dataset is a use-case you claim to
accomodate.  This case fails open once Windows users start adding
'allow' ACL's.  It's not a corner case; it's a design that fails open.

 >> ever had 777 it would send a SIGWTF to any AFS-unaware
 >> graybeards

nw> A signal?!  How would that work when the entity doing a chmod
nw> is on a remote NFS client?

please find SIGWTF under 'kill -l' and you might understand what I
meant.

nw> You seem to be in denial.  You continue to ignore the
nw> constraint that Windows clients must be able to fully control
nw> permissions in spite of their inability to perceive and modify
nw> file modes.

You remain unshakably certain that this is true of my proposal in
spite of the fact that you've said clearly that you don't understand
my proposal.  That's bad science.

It may be my fault that you don't understand it: maybe I need to write
something shorter but just as expressive to fit within mailing list
attention spans, or maybe my examples are unclear.  However that
doesn't mean that I'm in denial nor make you right---that just makes
me annoying.


-- 
READ CAREFULLY. By reading this fortune, you agree, on behalf of your employer,
to release me from all obligations and waivers arising from any and all
NON-NEGOTIATED  agreements, licenses, terms-of-service, shrinkwrap, clickwrap,
browsewrap, confidentiality, non-disclosure, non-compete and acceptable use
policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its
partners, licensors, agents and assigns, in perpetuity, without prejudice to my
ongoing rights and privileges. You further represent that you have the
authority to release me from any BOGUS AGREEMENTS on behalf of your employer.


pgpvrZFYgaHat.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Stephan Budach
Ian,

yes, although these vdevs are FC raids themselves, so the risk is… uhm… 
calculated.

Unfortuanetly, one of the devices seems to have some issues, as stated im my 
previous post.
I will, nevertheless, add redundancy to my pool asap.

Thanks,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Stephan Budach
Hi Cindy,

thanks for bringing that to my attention. I checked fmdump and found a lot of 
these entries:


Okt 06 2010 17:52:12.862812483 ereport.io.scsi.cmd.disk.tran
nvlist version: 0
class = ereport.io.scsi.cmd.disk.tran
ena = 0x514dc67d57e1
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = 
/p...@0,0/pci8086,3...@7/pci1077,1...@0,1/f...@0,0/d...@w21d02305ff42,0
(end detector)

driver-assessment = retry
op-code = 0x88
cdb = 0x88 0x0 0x0 0x0 0x0 0x2 0xac 0xd4 0x3d 0x80 0x0 0x0 0x0 0x80 0x0 
0x0
pkt-reason = 0x3
pkt-state = 0x0
pkt-stats = 0x20
__ttl = 0x1
__tod = 0x4cac9b2c 0x336d7943

Okt 06 2010 17:52:12.862813713 ereport.io.scsi.cmd.disk.recovered
nvlist version: 0
class = ereport.io.scsi.cmd.disk.recovered
ena = 0x514dc67d57e1
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = 
/p...@0,0/pci8086,3...@7/pci1077,1...@0,1/f...@0,0/d...@w21d02305ff42,0
devid = id1,s...@n600d02310005ff42712ab96c
(end detector)

driver-assessment = recovered
op-code = 0x88
cdb = 0x88 0x0 0x0 0x0 0x0 0x2 0xac 0xd4 0x3d 0x80 0x0 0x0 0x0 0x80 0x0 
0x0
pkt-reason = 0x0
pkt-state = 0x1f
pkt-stats = 0x0
__ttl = 0x1
__tod = 0x4cac9b2c 0x336d7e11

Googling about these errors brought me directly to this document:

http://dsc.sun.com/solaris/articles/scsi_disk_fma2.html

which talks about these scsi errors. Since we're talking FC here, it seems to 
point to some FC issue I have not been aware of. Furthermore, it's always the 
same FC device that show these errors, so I will try to check the device and 
it's connections to the fabric first.

Thanks,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub doesn't finally finish?

2010-10-06 Thread Stephan Budach
Seems like it's really the case, that scrub doesn't take traffic that goes onto 
the zpool while it's scrubbing away.

After some more time, the scrub finished and everything looks good so far.

Thanks,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of 2-way mirror

2010-10-06 Thread David Dyer-Bennet

On Wed, October 6, 2010 14:14, Tony MacDoodle wrote:
> Is it possible to add 2 disks to increase the size of the pool below?
>
> NAME STATE READ WRITE CKSUM
>   testpool ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> c1t2d0 ONLINE 0 0 0
> c1t3d0 ONLINE 0 0 0
> mirror-1 ONLINE 0 0 0
> c1t4d0 ONLINE 0 0 0
> c1t5d0 ONLINE 0 0 0

You have two ways to increase the size of this pool (sanely).

First, you can add a third mirror vdev.  I think that's what you're
specifically asking about.  You do this with the "zpool add ..." command,
see man page.

Second, you can add (zpool attach) two larger disks to one of the existing
mirror vdevs, wait until the resilvers have finished, and then detach the
two original (smaller) disks.  At that point (with recent versions; with
older versions you have to set a property) the vdev will expand to use the
full capacity of the new larger disks, and that space will become
available in the pool.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Ian Collins

On 10/ 6/10 09:52 PM, Stephan Budach wrote:

Hi,

I recently discovered some - or at least one corrupted file on one ofmy ZFS 
datasets, which caused an I/O error when trying to send a ZFDS snapshot to 
another host:


zpool status -v obelixData
   pool: obelixData
  state: ONLINE
status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: none requested
config:

 NAME STATE READ WRITE CKSUM
 obelixData   ONLINE   4 0 0
   c4t21D023038FA8d0  ONLINE   0 0 0
   c4t21D02305FF42d0  ONLINE   4 0 0

   

Are you aware that this is a very dangerous configuration?

Your pool lacks redundancy and you will loose it if one of the devices 
fails.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of 2-way mirror

2010-10-06 Thread Freddie Cash
On Wed, Oct 6, 2010 at 12:14 PM, Tony MacDoodle  wrote:
> Is it possible to add 2 disks to increase the size of the pool below?

Yes.  zpool add testpool mirror devname1 devname2

That will add a third mirror vdev to the pool.

> NAME STATE READ WRITE CKSUM
>   testpool ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> c1t2d0 ONLINE 0 0 0
> c1t3d0 ONLINE 0 0 0
> mirror-1 ONLINE 0 0 0
> c1t4d0 ONLINE 0 0 0
> c1t5d0 ONLINE 0 0 0

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Cindy Swearingen

Budy,

Your previous zpool status output shows a non-redundant pool with data 
corruption.


You should use the fmdump -eV command to find out the underlying cause
of this corruption.

You can review the hardware-level monitoring tools, here:

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

Thanks,

Cindy

On 10/06/10 13:09, Stephan Budach wrote:

Well I think, that answers my question then: after a successful scrub, zpool 
status -v should then list all damaged files on an entire zpool.

I only asked, because I read a thread in this forum that one guy had a problem 
with different files, aven after a successful scrub.

Thanks,
budy

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Increase size of 2-way mirror

2010-10-06 Thread Tony MacDoodle
Is it possible to add 2 disks to increase the size of the pool below?

NAME STATE READ WRITE CKSUM
  testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Stephan Budach
Well I think, that answers my question then: after a successful scrub, zpool 
status -v should then list all damaged files on an entire zpool.

I only asked, because I read a thread in this forum that one guy had a problem 
with different files, aven after a successful scrub.

Thanks,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Roy Sigurd Karlsbakk
- Original Message -
> On Tue, October 5, 2010 17:20, Richard Elling wrote:
> > On Oct 5, 2010, at 2:06 PM, Michael DeMan wrote:
> >>
> >> On Oct 5, 2010, at 1:47 PM, Roy Sigurd Karlsbakk wrote:
> 
> >>> Well, here it's about 60% up and for 150 drives, that makes a wee
> >>> difference...
> 
> >> Understood on 1.6 times cost, especially for quantity 150 drives.
> 
> > One service outage will consume far more in person-hours and
> > downtime than
> > this little bit of money. Penny-wise == Pound-foolish?
> 
> That looks to be true, yes (going back to the actual prices, 150
> drives would cost $6000 extra for the enterprise versions).

I somehow doubt a service outage will consume that lot. The drives will be 
carefully distributed in smallish RAIDz2 VDEVs on two separate large systems 
and one small one, and all of them are dedicated for backup targets (Bacula 
using their drives for storing backup). We already have a 50TB setup on mostly 
Green drives, and although I now know that's a terrible idea, it's been running 
stably for about a year with quite constant load.

So really, I beleive the chance for non-TLER drives to mess this up badly is a 
minor one (and perhaps more importantly, so does my boss).

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Roy Sigurd Karlsbakk
> > TLER (the ability of the drive to timeout a command)
> 
> I went and got what detailed documentation I could on a couple of the
> Seagate drives last night, and I couldn't find anything on how they
> behaved in that sort of error cases. (I believe TLER is a WD-specific
> term, but I didn't just search, I read them through.)
> 
> So that's inconvenient. How do we find out about that sort of thing?

From http://en.wikipedia.org/wiki/TLER

 Similar technologies are called Error Recovery Control (ERC), used by 
competitor Seagate, and Command Completion Time Limit (CCTL), used by Samsung 
and Hitachi.

I haven't checked which drives have those abilities, though...

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs build/test article on anandtech

2010-10-06 Thread SR
http://www.anandtech.com/show/3963/zfs-building-testing-and-benchmarking


I'm curious why nexenta did not perform as well as opensolaris.  Both OS 
versions seem to be the same.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Write retry errors to SSD's on SAS backplane (mpt)

2010-10-06 Thread Ray
Hi,

I came across this exact same problem when using Intel X25-E Extreme 32GB SSD 
disks as ZIL and L2ARC devices in a T5220 server.  Since I didn't see a 
definitive solution here, I opened a support case with Oracle.  They told me to 
upgrade the firmware on my SSD disks and LSI Expander, and the problem went 
away.

Here's the solution they gave me after an analysis of my system:

[i]1. customer is using systemboard 540-7970 (showfru).  According to SSH 
(http://sunsolve.central/handbook_internal/Devices/System_Board/SYSBD_SE_T5120_T5220.html#7970)
 this is 1068E B2 board.

Customer's LSI firmware is at latest already (1.27.02.00).  So, 140952-02 not 
needed.

2.  SSD is at 8850 (diskinfo), need to go to 8855->8862 (143211-01, obsolete by 
143211-02, rev 8862).  No Cougar Card I can see, hence aac drive patch not 
needed.  see README in 143211-02

3.  btw, Customer already at KUP 139555-08 & 140796-01.

So,  here is what I suggest,

Most importantly, the ssd disk fw patch is the most critical as readme from 
patch 143211-02 states the possible bug fixes (CR 6918513 & 6827668). Follow 
the Install Instructions there.

As far as the patch 141043-01 (LSI Expander firmware for 16 Disk backplane on 
Sun SPARC Enterprise T5220 and T5240 platforms), your disk backplane version 
maybe already there.  But, there is no way to tell as far as I know until one 
goes through the update process.  Perhaps customer can skip for now.  
Otherwise, they can go through it but skip step 1-3

Install.info from patch 141043-01

1.  # patchadd 126419-02
2.  # patchadd 13-05
3.  # reboot   **After reboot, need to install firmwareflash package for SPARC 
systems
4.  # pkgadd -d 
5.  # firmwareflash -l**To list all available ses devices in the system
6.  # firmwareflash -d  -f 
LSI_X28EXPDR_16DISK_BootRec_REV5-SPARC_Enterprise_T5220+T5240.rxp
7.  # reboot
8.  # You must now power cycle the system to run the new boot record firmware 
just loaded.
[/i]


So the main thing you need to do is apply firmware upgrade 143211-02, which 
specifically addresses the issue of retryable writes on SSD disks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Jim Dunham
Budy,

> No - not a trick question., but maybe I didn't make myself clear.
> Is there a way to discover such bad files other than trying to actually read 
> from them one by one, say using cp or by sending a snapshot elsewhere?

As noted by your original email, ZFS reports on any corruption using the "zpool 
status" command.

ZFS detects corruption as part of its normal filesystem operations, which maybe 
triggered by: cp, send-recv, etc., or by a forced reading of the entire 
filesystem by scrub.

> I am well aware that the file shown in  zpool status -v is damaged and I have 
> already restored it, but I wanted to know, if there're more of them.

Assuming that the ZFS filesystem in question is not degrading further (as in a 
disk going bad), upon completion of a successful scrub, zpool reports the 
complete status of the filesystem being reported on. 

- Jim

> 
> Regards,
> budy
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub doesn't finally finish?

2010-10-06 Thread Stephan Budach
Yes - that may well be. There was data going on to the device while scrub has 
been running. Especially large zfs receives  had been going on.

I'd be odd if that was the case, though.

Cheers,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Scott Meilicke
Scrub?

On Oct 6, 2010, at 6:48 AM, Stephan Budach wrote:

> No - not a trick question., but maybe I didn't make myself clear.
> Is there a way to discover such bad files other than trying to actually read 
> from them one by one, say using cp or by sending a snapshot elsewhere?
> 
> I am well aware that the file shown in  zpool status -v is damaged and I have 
> already restored it, but I wanted to know, if there're more of them.
> 
> Regards,
> budy
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread David Dyer-Bennet

On Tue, October 5, 2010 16:47, casper@sun.com wrote:
>
>
>>My immediate reaction to this is "time to avoid WD drives for a while";
>>until things shake out and we know what's what reliably.
>>
>>But, um, what do we know about say the Seagate Barracuda 7200.12 ($70),
>>the SAMSUNG Spinpoint F3 1TB ($75), or the HITACHI Deskstar 1TB 3.5"
>>($70)?
>
>
> I've seen several important features when selecting a drive for
> a mirror:
>
>   TLER (the ability of the drive to timeout a command)

I went and got what detailed documentation I could on a couple of the
Seagate drives last night, and I couldn't find anything on how they
behaved in that sort of error cases.  (I believe TLER is a WD-specific
term, but I didn't just search, I read them through.)

So that's inconvenient.  How do we find out about that sort of thing?

>   sector size (native vs virtual)

Richard Elling said ZFS handles the 4k real 512byte fake drives okay now
in default setups; but somebody immediately asked for version info, so I'm
still watching this one.

>   power use (specifically at home)

Hadn't thought about that.  But when I'm upgrading drives, I figure I'm
always going to come out better on power than when I started.

>   performance (mostly for work)

I can't bring myself to buy below 7200RPM, but it's probably foolish
(except that other obnoxious features tend to come in the "green" drives).

>   price

Yeah, well.  I'm cheap.

> I've heard scary stories about a mismatch of the native sector size and
> unaligned Solaris partitions (4K sectors, unaligned cylinder).

So have I.  Sounds like you get read-modify-write actions for non-aligned
accesses.

I hope the next generation of drives admit to being 4k sectors, and that
ZFS will be prepared to use them sensibly.  But I'm not sure I'm willing
to wait for that; the oldest drives in my box are now 4 years old, and I'm
about ready for the next capacity upgrade.

> I was pretty happen with the WD drives (except for the one with a
> seriously
> broken cache) but I see the reasons to not to pick WD drives over the 1TB
> range.

And the big ones are what pretty much everybody is using at home. 
Capacity and price are vastly more important than performance for most of
us.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there a way to limit ZFS File Data but maintain room for the ARC to cache metadata

2010-10-06 Thread David Blasingame Oracle
Good idea.  Provides options, but it would be nice to be able to set a 
low water mark on what can be taken away from the arc metadata cache 
without having to have something like an SSD.


Dave

On 10/01/10 14:02, Freddie Cash wrote:

On Fri, Oct 1, 2010 at 11:46 AM, David Blasingame Oracle
 wrote:
  

I'm working on this scenario in which file system activity appears to cause
the arc cache to evict meta data.  I would like to have a preference to keep
the metadata in cache over ZFS File Data

What I've notice on import of a zpool the arc_meta_used goes up
significantly.  ZFS meta data operations usually run pretty good.  However
over time with IO Operations the cache get's evicted and arc_no_grow get
set.





  

So, I would like to limit the amount of ZFS File Data that can be used and
keep the arc cache warm with metadata.  Any suggestions?



Would adding a cache device (L2ARC) and setting primarycache=metadata
and secondarycache=all on the root dataset do what you need?

That way ARC is used strictly for metadata, and L2ARC is used for metadata+data.

  



--



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread David Dyer-Bennet

On Tue, October 5, 2010 17:20, Richard Elling wrote:
> On Oct 5, 2010, at 2:06 PM, Michael DeMan wrote:
>>
>> On Oct 5, 2010, at 1:47 PM, Roy Sigurd Karlsbakk wrote:

>>> Well, here it's about 60% up and for 150 drives, that makes a wee
>>> difference...

>> Understood on 1.6  times cost, especially for quantity 150 drives.

> One service outage will consume far more in person-hours and downtime than
> this little bit of money.  Penny-wise == Pound-foolish?

That looks to be true, yes (going back to the actual prices, 150 drives
would cost $6000 extra for the enterprise versions).

It's still quite annoying to be jerked around by people charging 60% extra
for changing a timeout in the firmware, and carefully making it NOT
user-alterable.

Also, the non-TLER versions are a constant threat to anybody running home
systems, who might quite reasonably think they could put those in a home
server.

(Yeah, I know the enterprise versions have other differences.  I'm not
nearly so sure I CARE about the other differences, in the size servers I'm
working with.)
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub doesn't finally finish?

2010-10-06 Thread Marty Scholes
Have you had a lot of activity since the scrub started?

I have noticed what appears to be extra I/O at the end of a scrub when activity 
took place during the scrub.  It's as if the scrub estimator does not take the 
extra activity into account.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kernel panic after upgrading from snv_138 to snv_140

2010-10-06 Thread Thorsten Heit
Hi,

my machine is a HP ProLiant ML350 G5 with 2 quad-core Xeons, 32GB RAM and a HP 
SmartArray E200i RAID controller with 3x160 and 3x500GB SATA discs connected to 
it. Two of the 160GB discs build the mirrored root pool (rpool), the third 
serves as a temporary data pool called "tank", and the three 500G discs form a 
RAIDZ1 pool called "daten".

So far I successfully upgraded from OpenSolaris b134 to b138 by manually 
building ONNV. Recently I built b140, installed it, but unfortunately booting 
results in a kernel panic:

...
NOTICE: zfs_parse_bootfs: error 22
Cannot mount root on rpool/187 fstype zfs

panic[cpu0]/thread=fbc2f660: vfs_mountroot: cannot mount root

fbc71ba0 genunix:vfs_mountroot+32e ()
fbc71bd0 genunix:main+136 ()
fbc71be0 unix:_locore_start+92 ()

panic: entering debugger (no dump device, continue to reboot)

Welcome to kmdb
Loaded modules: [ scsi_vhci mac uppc sd unix zfs krtld genunix specfs pcplusmp 
cpu.generic ]
[0]>


Before the above attempt with b140, I tried to upgrade to OpenIndiana, but have 
quite the same problem; OI doesn't boot neither. See 
http://openindiana.org/pipermail/openindiana-discuss/2010-September/000504.html

Any ideas what is causing this kernel panic?


Regards

Thorsten
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Stephan Budach
No - not a trick question., but maybe I didn't make myself clear.
Is there a way to discover such bad files other than trying to actually read 
from them one by one, say using cp or by sending a snapshot elsewhere?

I am well aware that the file shown in  zpool status -v is damaged and I have 
already restored it, but I wanted to know, if there're more of them.

Regards,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Tomas Ögren
On 06 October, 2010 - Stephan Budach sent me these 2,1K bytes:

> Hi,
> 
> I recently discovered some - or at least one corrupted file on one ofmy ZFS 
> datasets, which caused an I/O error when trying to send a ZFDS snapshot to 
> another host:
> 
> 
> zpool status -v obelixData
>   pool: obelixData
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
> entire pool from backup.
>see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: none requested
> config:
> 
> NAME STATE READ WRITE CKSUM
> obelixData   ONLINE   4 0 0
>   c4t21D023038FA8d0  ONLINE   0 0 0
>   c4t21D02305FF42d0  ONLINE   4 0 0
> 
> errors: Permanent errors have been detected in the following files:
> 
> <0x949>:<0x12b9b9>
> 
> obelixData/jvmprepr...@2010-10-02_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in 
> CI vor ET 10.6.2010/13404_41_07008 Estate 
> HandelsMarketing/Dealer_Launch_Invitations 
> Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps
> 
> obelixData/jvmprepr...@backupsnapshot_2010-10-05-08:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ
>  in CI vor ET 10.6.2010/13404_41_07008 Estate 
> HandelsMarketing/Dealer_Launch_Invitations 
> Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps
> 
> obelixData/jvmprepr...@2010-09-24_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in 
> CI vor 6_210/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations 
> Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps
> /obelixData/JvMpreprint/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in CI vor 
> ET 10.6.2010/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations 
> Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps
> 
> Now, scrub would reveal corrupted blocks on the devices, but is there a way 
> to identify damaged files as well?

Is this a trick question or something? The filenames are right over
your question..?

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] scrub doesn't finally finish?

2010-10-06 Thread Stephan Budach
Hi all,

I have issued a scrub on a pool, that consists of two independant FC raids. The 
scrub has been running for approx. 25 hrs and then showed 100%, but there's 
still an incredible traffic on one of the FC raids going on, plus zpool statuv 
-v reports that scrub is still running:


zpool status -v backupPool_01
  pool: backupPool_01
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 26h45m, 100,00% done, 0h0m to go
config:

NAME STATE READ WRITE CKSUM
backupPool_01ONLINE   0 0 0
  c3t211378AC0271d0  ONLINE   0 026  2,11M repaired
  c3t211378AC026Ed0  ONLINE   0 0 0

errors: No known data errors

So, what is scrub still doing to the upper vdev? Is there anywhere where I can 
get more information about what scrub is still doing?

Thanks,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Phil Harman
www.solarisinternals.com has always been a community. It never was hosted by 
Sun, and it's not hosted by Oracle. True, many of the contributors were Sun 
employees, but not so many remain at Oracle. If it's out if date, I suspect 
that's because the original contributors are too busy doing other fun things. 
However, it is a wiki, so YOU can apply for a login and edit it if you have 
something useful to share :)

On 6 Oct 2010, at 02:36, Michael DeMan  wrote:

> Hi upfront, and thanks for the valuable information.
> 
> 
> On Oct 5, 2010, at 4:12 PM, Peter Jeremy wrote:
> 
>>> Another annoying thing with the whole 4K sector size, is what happens
>>> when you need to replace drives next year, or the year after?
>> 
>> About the only mitigation needed is to ensure that any partitioning is
>> based on multiples of 4KB.
> 
> I agree, but to be quite honest, I have no clue how to do this with ZFS.  It 
> seems that it should be something under the regular tuning documenation.  
> 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
> 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> 
> 
> Is it going to be the case that basic information like about how to deal with 
> common scenarios like this is no longer going to be publicly available, and 
> Oracle will simply keep it 'close to the vest', with the relevant information 
> simply available for those who choose to research it themselves, or only 
> available to those with certain levels of support contracts from Oracle?
> 
> To put it another way - does the community that uses ZFS need to fork 'ZFS 
> Best Practices' and 'ZFZ Evil Tuning' to ensure that it is reasonably up to 
> date?
> 
> Sorry for the somewhat hostile in the above, but the changes w/ the merger 
> have demoralized a lot of folks I think.
> 
> - Mike
> 
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Andrew Gabriel

casper@sun.com wrote:

On Tue, Oct 5, 2010 at 11:49 PM,   wrote:


I'm not sure that that is correct; the drive works on naive clients but I
believe it can reveal its true colors.
  

The drive reports 512 byte sectors to all hosts. AFAIK there's no way
to make it report 4k sectors.




Too bad because it makes it less useful (specifically because the label 
mentions sectors and if you can use bigger sectors, you can address a 
larger drive).
  


Having now read a number of forums about these, there's a strong feeling 
WD screwed up by not providing a switch to disable pseudo 512b access so 
you can use the 4k native. The industry as a whole will transition to 4k 
sectorsize over next few years, but these first 4k sectorsize HDs are 
rather less useful with 4k sectorsize-aware OS's. Let's hope other 
manufacturers get this right in their first 4k products.


--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Finding corrupted files

2010-10-06 Thread Stephan Budach
Hi,

I recently discovered some - or at least one corrupted file on one ofmy ZFS 
datasets, which caused an I/O error when trying to send a ZFDS snapshot to 
another host:


zpool status -v obelixData
  pool: obelixData
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
obelixData   ONLINE   4 0 0
  c4t21D023038FA8d0  ONLINE   0 0 0
  c4t21D02305FF42d0  ONLINE   4 0 0

errors: Permanent errors have been detected in the following files:

<0x949>:<0x12b9b9>

obelixData/jvmprepr...@2010-10-02_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in 
CI vor ET 10.6.2010/13404_41_07008 Estate 
HandelsMarketing/Dealer_Launch_Invitations 
Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps

obelixData/jvmprepr...@backupsnapshot_2010-10-05-08:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ
 in CI vor ET 10.6.2010/13404_41_07008 Estate 
HandelsMarketing/Dealer_Launch_Invitations 
Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps

obelixData/jvmprepr...@2010-09-24_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in 
CI vor 6_210/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations 
Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps
/obelixData/JvMpreprint/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in CI vor ET 
10.6.2010/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations 
Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps

Now, scrub would reveal corrupted blocks on the devices, but is there a way to 
identify damaged files as well?

Thanks,
budy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS crypto bug status change

2010-10-06 Thread Darren J Moffat

On 05/10/2010 20:14, Miles Nordin wrote:

I'm glad it wasn't my project, though.  If I were in Darren's place
I'd have signed on to work for an open-source company, spent seven
years of my life working on something, delaying it and pushing hard to
make it a generation beyond other filesystem crypto, and then when I'm
finally done,.


Please don't speculate, nobody but me and a very few others inside 
Oracle have all the facts of why this integrated when it did; and I'm 
not going to give all the details here because it is neither relevant 
nor appropriate.


For the record I didn't sign on to an open-source company, I joined Sun 
many many years before OpenSolaris (in 1996 in fact), I didn't even join 
initially as a developer I was in SunService doing backline support and 
a little sustaining engineering for Trusted Solaris 1.x (the SunOS 4.1.3 
era version).  One the other before I joined Sun I was one of the first 
people to have a working "clone" of the then Trusted Solaris privilege 
system into Linux - for what later became the capabilities system in Linux.


While I appreciate open source I'm not against closed source - if I was 
I wouldn't have joined Sun in 1996 and I wouldn't have had my jobs prior 
to that either (In fact I doubt I'd be in this industry at all).  Just 
because I have and continue to participate in the open where I find it 
appropriate and useful (to me and others) doesn't mean I'm an open 
source or nothing person.  Quite the opposite in fact, open source is a 
"tool" or "means to an end" and one always has to pick the right tool 
for the job at the right time.


I care deeply about software quality and I don't believe the "ra ra" 
that just by being open source makes software better quality or more 
secure.  Many eyes can help find bugs but only if there are actually 
people actively looking.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Casper . Dik

>On Tue, Oct 5, 2010 at 11:49 PM,   wrote:
>> I'm not sure that that is correct; the drive works on naive clients but I
>> believe it can reveal its true colors.
>
>The drive reports 512 byte sectors to all hosts. AFAIK there's no way
>to make it report 4k sectors.


Too bad because it makes it less useful (specifically because the label 
mentions sectors and if you can use bigger sectors, you can address a 
larger drive).

They still have all sizes w/o "Advanced Format" (non EARS/AARS models)

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Brandon High
On Tue, Oct 5, 2010 at 11:49 PM,   wrote:
> I'm not sure that that is correct; the drive works on naive clients but I
> believe it can reveal its true colors.

The drive reports 512 byte sectors to all hosts. AFAIK there's no way
to make it report 4k sectors.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Roy Sigurd Karlsbakk





If you're spending upwards of $30,000 on a storage system, you probably 
shouldn't skimp on the most important component. You might as well be 
complaining that ECC ram costs more. Don't be ridiculous. For one, this is a 
disk backup system, not a fileserver, and TLER is far from as critic al as ECC. 

Vennlige hilsener / Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Michael DeMan
Can you give us release numbers that confirm that this is 'automatic'.  It is 
my understanding that the last available public release of OpenSolaris does not 
do this.



On Oct 5, 2010, at 8:52 PM, Richard Elling wrote:

> ZFS already aligns the beginning of data areas to 4KB offsets from the label.
> For modern OpenSolaris and Solaris implementations, the default starting 
> block for partitions is also aligned to 4KB.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss