Re: [zfs-discuss] Wishlist items

2007-06-27 Thread Boyd Adamson

On 26/06/2007, at 12:08 PM, [EMAIL PROTECTED] wrote:

I've been saving up a few wishlist items for zfs. Time to share.

1. A verbose (-v) option to the zfs commandline.

In particular zfs sometimes takes a while to return from zfs  
snapshot -r tank/[EMAIL PROTECTED] in the case where there are a great  
many iscsi shared volumes underneath. A little progress feedback  
would go a long way. In general I feel the zfs tools lack  
sufficient feedback and/or logging of actions, and this'd be a  
great start.


Since IIRC snapshot -r is supposed to be atomic (one TXG) I'm not  
sure that progress reports would be meaningful.


Have you seen zpool history?


2. LUN management and/or general iscsi integration enhancement

Some of these iscsi volumes I'd like to be under the same target  
but with different LUNs. A means for mapping that would be  
excellent. As would a means to specify the IQN explicitly, and the  
set of permitted initiators.


3. zfs rollback on clones. It should be possible to rollback a  
clone to the origin snapshot, yes? Right now the tools won't allow  
it. I know I can hack in a race-sensitive snapshot of the new  
volume immediately after cloning, but I already have many hundreds  
of entities and I'm trying not to proliferate them.


Yes, since rollback only takes the snapshot as an argument there  
seems not to be a way to rollback a clone to the fork snapshot.


You could, of course just blow away the clone and make a new one from  
the same snapshot


Similarly the ability to do zfs send -i [clone origin snapshot1]  
snapshot2 in order to efficiently transmit/backup clones would be  
terrific.


It seems that a way to use [EMAIL PROTECTED] as an alias for  
[EMAIL PROTECTED] would solve both of these problems, at least at the  
user interface level.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: ZFS - SAN and Raid

2007-06-27 Thread Richard L. Hamilton
 Victor Engle wrote:
  Roshan,
  
  As far as I know, there is no problem at all with
 using SAN storage
  with ZFS and it does look like you were having an
 underlying problem
  with either powerpath or the array.
 
 Correct.  A write failed.
 
  The best practices guide on opensolaris does
 recommend replicated
  pools even if your backend storage is redundant.
 There are at least 2
  good reasons for that. ZFS needs a replica for the
 self healing
  feature to work. Also there is no fsck like tool
 for ZFS so it is a
  good idea to make sure self healing can work.
 
 Yes, currently ZFS on Solaris will panic if a
 non-redundant write fails.
 This is known and being worked on, but there really
 isn't a good solution
 if a write fails, unless you have some ZFS-level
 redundancy.

Why not?  If O_DSYNC applies, a write() can still fail with EIO, right?
And if O_DSYNC does not apply, an app could not assume that the
written data was on stable storage anyway.

Or the write() can just block until the problem is corrected (if correctable)
or the system is rebooted.

In any case, IMO there ought to be some sort of consistent behavior
possible short of a panic.  I've seen UFS based systems stay up even
with their disks incommunicado for awhile, although they were hardly
useful like that except insofar as activity strictly involving reading
already cached pages was involved.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance

2007-06-27 Thread Gino
Same problem here (snv_60).
Robert, did you find any solutions?

gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS usb keys

2007-06-27 Thread Jürgen Keil
 Shouldn't S10u3 just see the newer on-disk format and
 report that fact, rather than complain it is corrupt?

Yep, I just tried it, and it refuses to zpool import the newer pool,
telling me about the incompatible version.  So I guess the pool
format isn't the correct explanation for the Dick Davies' (number9)
problem.



On a S-x86 box running snv_68, ZFS version 7:

# mkfile 256m /home/leo.nobackup/tmp/zpool_test.vdev
# zpool create test_pool /home/leo.nobackup/tmp/zpool_test.vdev
# zpool export test_pool


On a S-sparc box running snv_61, ZFS version 3
(I get the same error on S-x86, running S10U2, ZFS version 2):

# zpool import -d /home/leo.nobackup/tmp/
  pool: test_pool
id: 6231880247307261822
 state: FAULTED
status: The pool is formatted using an incompatible version.
action: The pool cannot be imported.  Access the pool on a system running newer
software, or recreate the pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-A5
config:

test_pool  UNAVAIL   newer version
  /home/leo.nobackup/tmp//zpool_test.vdev  ONLINE
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS usb keys

2007-06-27 Thread William D. Hathaway
It would be really handy if whoever was responsible for the message at:
http://www.sun.com/msg/ZFS-8000-A5
could add data about which zpool versions  are supported at specific OS/patch 
releases.

The current message doesn't help the user figure out how to accomplish their 
implied task, which is to import the pool on a different system.


Adding the version number of the pool that couldn't be imported to the zpool 
import error message would be nice too.


  Shouldn't S10u3 just see the newer on-disk format
 and
  report that fact, rather than complain it is
 corrupt?
 
 Yep, I just tried it, and it refuses to zpool
 import the newer pool,
 telling me about the incompatible version.  So I
 guess the pool
 format isn't the correct explanation for the Dick
 Davies' (number9)
 problem.
 
 
 
 On a S-x86 box running snv_68, ZFS version 7:
 
 # mkfile 256m /home/leo.nobackup/tmp/zpool_test.vdev
 # zpool create test_pool
 /home/leo.nobackup/tmp/zpool_test.vdev
 # zpool export test_pool
 
 
 On a S-sparc box running snv_61, ZFS version 3
 (I get the same error on S-x86, running S10U2, ZFS
 version 2):
 
 # zpool import -d /home/leo.nobackup/tmp/
   pool: test_pool
   id: 6231880247307261822
 tate: FAULTED
 status: The pool is formatted using an incompatible
 version.
 action: The pool cannot be imported.  Access the pool
 on a system running newer
 software, or recreate the pool from backup.
 http://www.sun.com/msg/ZFS-8000-A5
 config:
 
 test_pool
 UNAVAIL   newer
 version
 /home/leo.nobackup/tmp//zpool_test.vdev
   ONLINE
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance

2007-06-27 Thread Victor Latushkin

Gino wrote:

Same problem here (snv_60).
Robert, did you find any solutions?


Couple of week ago I put together an implementation of space maps which 
completely eliminates loops and recursion from space map alloc 
operation, and allows to implement different allocation strategies quite 
easily (of which I put together 3 more). It looks like it works for me 
on thumper and my notebook with ZFS Root though I have almost no time to 
test it more these days due to year end. I haven't done SPARC build yet 
and I do not have test case to test against.


Also, it comes at a price - I have to spend some more time (logarithmic, 
though) during all other operations on space maps and is not optimized now.


victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS usb keys

2007-06-27 Thread Mark J Musante
On Wed, 27 Jun 2007, [UTF-8] Jürgen Keil wrote:

 Yep, I just tried it, and it refuses to zpool import the newer pool,
 telling me about the incompatible version.  So I guess the pool format
 isn't the correct explanation for the Dick Davies' (number9) problem.

Have you tried creating the pool on b61 and importing it into b68?


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New german white paper on ZFS

2007-06-27 Thread Jens Elkner
On Tue, Jun 19, 2007 at 05:19:05PM +0200, Constantin Gonzalez wrote:
Hi,

   http://blogs.sun.com/constantin/entry/new_zfs_white_paper_in

Excellent!!!

I think it is a pretty good idea, to put the links for the
paper and slides on the ZFS Documentation page aka
http://www.opensolaris.org/os/community/zfs/docs/ 

Regards,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestions on 30 drive configuration?

2007-06-27 Thread Victor Latushkin

Richard Elling wrote:

Rob Logan wrote:

  an array of 30 drives in a RaidZ2 configuration with two hot spares
  I don't want to mirror 15 drives to 15 drives

ok, so space over speed... and are willing to toss somewhere between 4
and 15 drives for protection.

raidz splits the (up to 128k) write/read recordsize into each element of
the raidz set.. (ie: all drives must be touched and all must finish
before the block request is complete)  so with a 9 disk raid1z set that's
(8 data + 1 parity (8+1)) or 16k per disk for a full 128k write. or for
a smaller 4k block, that a single 512b sector per disk. on a 26+2 raid2z
set that 4k block would still use 8 disks, with the other 18 disks
unneeded but allocated.


It is not so easy to predict.  ZFS will coalesce writes.  A single 
transaction
group may have many different writes in it.  Also, raidz[12] is dynamic, 
and

will use what it needs, unlike separate volume managers who do not have any
understanding of the context of the data.


There is a good slide which illustrates how stripe width is selected 
dynamically in RAID-Z. Please see slide 13 in this slide deck:

http://www.snia.org/events/past/sdc2006/zfs_File_Systems-bonwick-moore.pdf

Yes, some space may be wasted (marked by X on the slide), but there are 
guidelines for the number of the devices in the RAID-Z(2) vdev which 
allows to avoid this waste.


Btw, I believe there's no link to this presentation on opensolaris.org. 
unfortunately...


victor


so perhaps three sets of 8+2 would let three blocks be read/written to
at once with a total of 6 disks for protection.

but for twice the speed, six sets of 4+1 would be the same size, (same
number of disks for protection) but isn't quite as safe for its 2x speed.


Yes, need to follow your priorities, there are just too many options 
otherwise.

 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS usb keys

2007-06-27 Thread Dick Davies

Thanks to everyone for the sanity check - I think
it's a platform issue, but not an endian one.

The stick was originally DOS-formatted, and the zpool was built on the first
fdisk partition. So Sparcs aren't seeing it, but the x86/x64 boxes are.


--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS usb keys

2007-06-27 Thread Mike Lee
I had a similar situation between x86 and SPARC, version number. When I 
created the pool on the LOWER rev machine, it was seen by the HIGHER rev 
machine. This was a USB HDD, not a stick. I can now move the drive 
between boxes.


HTH,
Mike

Dick Davies wrote:

Thanks to everyone for the sanity check - I think
it's a platform issue, but not an endian one.

The stick was originally DOS-formatted, and the zpool was built on the 
first

fdisk partition. So Sparcs aren't seeing it, but the x86/x64 boxes are.




--
http://www.sun.com/solaris  * Michael Lee *
Area System Support Engineer

*Sun Microsystems, Inc.*
Phone x40782 / 866 877 8350
Email [EMAIL PROTECTED]
http://www.sun.com/solaris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New german white paper on ZFS

2007-06-27 Thread Cindy . Swearingen

Jens,

Someone already added it to the ZFS links page, here:

http://opensolaris.org/os/community/zfs/links/

I just added a link to the links page from the zfs docs page
so it is easier to find.

Thanks,

Cindy

Jens Elkner wrote:

On Tue, Jun 19, 2007 at 05:19:05PM +0200, Constantin Gonzalez wrote:
Hi,



 http://blogs.sun.com/constantin/entry/new_zfs_white_paper_in



Excellent!!!

I think it is a pretty good idea, to put the links for the
paper and slides on the ZFS Documentation page aka
http://www.opensolaris.org/os/community/zfs/docs/ 


Regards,
jel.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Suggestions on 30 drive configuration?

2007-06-27 Thread Dan Saul

I have 8 SATA on the motherboard, 4 PCI cards with 4 SATA each, one
PCIe 4x sata card with two, and one PCIe 1x with two. The operating
system itself will be on a hard drive attached to one ATA 100
connector.

Kind of like a poor man's data centre, except not that cheap... It
still is estimated to come out at around 6 thousand dollars, which in
retrospect for that amount of sorage these days is actually relatively
good.

I've weighed my options and I am thinking that 3 raidz2 sets is the
best balance of data safety to free space.

Also, in case any of you are wondering why I would need space, most of
it will be HDV footage and render files.

Thank you everyone who contributed here you have been of great assistance.

On 6/25/07, Bryan Wagoner [EMAIL PROTECTED] wrote:

What is the controller setup going to look like for the 30 drives? Is it going 
to be fibre channel, SAS, etc. and what will be the Controller-to-Disk ratio?

~Bryan


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: zfs and 2530 jbod

2007-06-27 Thread Frank Cusack

On June 26, 2007 2:13:54 PM -0700 Joel Miller [EMAIL PROTECTED] wrote:

The 2500 series engineering team is talking with the ZFS folks to
understand the various aspects of delivering a complete solution. (There
is a lot more to it than it seems to work...).


Great news, you made my day!  Any ETA?
-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS usb keys

2007-06-27 Thread Matthew Ahrens

William D. Hathaway wrote:

It would be really handy if whoever was responsible for the message at:
http://www.sun.com/msg/ZFS-8000-A5
could add data about which zpool versions  are supported at specific OS/patch 
releases.


Did you look at http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number?

--matt


$ zpool upgrade -v
This system is currently running ZFS pool version 8.

The following versions are supported:

VER  DESCRIPTION
---  
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   pool properties
 7   Separate intent log devices
 8   Delegated administration
For more information on a particular version, including supported releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Jef Pearlman
 Jef Pearlman wrote:
  Absent that, I was considering using zfs and just
  having a single pool. My main question is this: what
  is the failure mode of zfs if one of those drives
  either fails completely or has errors? Do I
  permanently lose access to the entire pool? Can I
  attempt to read other data? Can I zfs replace the
  bad drive and get some level of data recovery?
  Otherwise, by pooling drives am I simply increasing
  the probability of a catastrophic data loss? I
  apologize if this is addressed elsewhere -- I've read
  a bunch about zfs, but not come across this
  particular answer.
 
 We generally recommend a single pool, as long as the
 use case permits.
 But I think you are confused about what a zpool is.
  I suggest you look
 t the examples or docs.  A good overview is the slide
 show
   http://www.opensolaris.org/os/community/zfs/docs/zfs_
 last.pdf

Perhaps I'm not asking my question clearly. I've already experimented a fair 
amount with zfs, including creating and destroying a number of pools with and 
without redundancy, replacing vdevs, etc. Maybe asking by example will clarify 
what I'm looking for or where I've missed the boat. The key is that I want a 
grow-as-you-go heterogenous set of disks in my pool:

Let's say I start with a 40g drive and a 60g drive. I create a non-redundant 
pool (which will be 100g). At some later point, I run across an unused 30g 
drive, which I add to the pool. Now my pool is 130g. At some point after that, 
the 40g drive fails, either by producing read errors or my failing to spin up 
at all. What happens to my pool? Can I mount and access it at all (for the data 
not on or striped across the 40g drive)? Can I zfs replace the 40g drive with 
another drive and have it attempt to copy as much data over as it can? Or am I 
just out of luck? zfs seems like a great way to use old/unutilized drives to 
expand capacity, but sooner or later one of those drives will fail, and if it 
takes out the whole pool (which it might reasonably do), then it doesn't work 
out in the end.
 
  As a side-question, does anyone have a suggestion
  for an intelligent way to approach this goal? This is
  not mission-critical data, but I'd prefer not to make
  data loss _more_ probable. Perhaps some volume
  manager (like LVM on linux) has appropriate features?
 
 ZFS, mirrored pool will be the most performant and
 easiest to manage
 with better RAS than a raidz pool.

The problem I've come across with using mirror or raidz for this setup is that 
(as far as I know) you can't add disks to mirror/raidz groups, and if you just 
add the disk to the pool, you end up in the same situation as above (with more 
space but no redundancy).

Thanks for your help.

-Jef
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Darren Dunham
 Perhaps I'm not asking my question clearly. I've already experimented
 a fair amount with zfs, including creating and destroying a number of
 pools with and without redundancy, replacing vdevs, etc. Maybe asking
 by example will clarify what I'm looking for or where I've missed the
 boat. The key is that I want a grow-as-you-go heterogenous set of
 disks in my pool:

  Let's say I start with a 40g drive and a 60g drive. I create a
 non-redundant pool (which will be 100g). At some later point, I run
 across an unused 30g drive, which I add to the pool. Now my pool is
 130g. At some point after that, the 40g drive fails, either by
 producing read errors or my failing to spin up at all. What happens to
 my pool?

Since you have created a non-redundant pool (or more specifically, a
pool with non-redundant members), the pool will fail.

 The problem I've come across with using mirror or raidz for this setup
 is that (as far as I know) you can't add disks to mirror/raidz groups,
 and if you just add the disk to the pool, you end up in the same
 situation as above (with more space but no redundancy).

You can't add to an existing mirror, but you can add new mirrors (or
raidz) items to the pool.  If so, there's no loss of redundancy.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: [zfs-code] Space allocation failure

2007-06-27 Thread Manoj Joseph

Hi,

In brief, what I am trying to do is to use libzpool to access a zpool - 
like ztest does.


Matthew Ahrens wrote:

Manoj Joseph wrote:

Hi,

Replying to myself again. :)

I see this problem only if I attempt to use a zpool that already 
exists. If I create one (using files instead of devices, don't know if 
it matters) like ztest does, it works like a charm.


You should probably be posting on zfs-discuss.


Switching from zfs-code to zfs-discuss.

The pool you're trying to access is damaged.  It would appear that one 
of the devices can not be written to.


No, AFAIK, the pool is not damaged. But yes, it looks like the device 
can't be written to by the userland zfs.


bash-3.00# zpool import test
bash-3.00# zfs list test
NAME   USED  AVAIL  REFER  MOUNTPOINT
test85K  1.95G  24.5K  /test
bash-3.00# ./udmu test
 pool: test
 state: ONLINE
 scrub: none requested
 config:

NAMESTATE READ WRITE CKSUM
testONLINE   0 0 0
  c2t0d0ONLINE   0 0 0

errors: No known data errors
Export the pool.
cannot open 'test': no such pool
Import the pool.
error: ZFS: I/O failure (write on unknown off 0: zio 8265d80 [L0 
unallocated] 4000L/400P DVA[0]=0:1000:400 DVA[1]=0:18001000:400 
fletcher4 lzjb LE contiguous birth=245 fill=0 
cksum=6bba8d3a44:2cfa96558ac7:c732e55bea858:2b86470f6a83373): error 28

Abort (core dumped)
bash-3.00# zpool import test
bash-3.00# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
testONLINE   0 0 0
  c2t0d0ONLINE   0 0 0

errors: No known data errors
bash-3.00# touch /test/z
bash-3.00# sync
bash-3.00# ls -l /test/z
-rw-r--r--   1 root root   0 Jun 28 04:18 /test/z
bash-3.00#

The userland zfs's export succeeds. But doing a system(zpool status 
test) right after the spa_export() succeeds shows that the the 'kernel 
zfs' still thinks it is imported.


I guess that makes sense. Nothing has been told to the 'kernel zfs' 
about the export.


But I still do not understand why the 'userland zfs' can't write to the 
pool.


Regards,
Manoj

PS: The code I have be tinkering with is attached.



--matt



Any clue as to why this is so would be appreciated.

Cheers
Manoj

Manoj Joseph wrote:

Hi,

I tried adding an spa_export();spa_import() to the code snippet. I 
get a similar crash while importing.


I/O failure (write on unknown off 0: zio 822ed40 [L0 unallocated] 
4000L/400P DVA[0]=0:1000:400 DVA[1]=0:18001000:400 fletcher4 lzjb 
LE contiguous birth=4116 fill=0 
cksum=69c3a4acfc:2c42fdcaced5:c5231ffcb2285:2b8c1a5f2cb2bfd): error 
28 Abort (core dumped)


I thought ztest could use an existing pool. Is that assumption wrong?

These are the stacks of interest.

 d11d78b9 __lwp_park (81c3e0c, 81c3d70, 0) + 19
 d11d1ad2 cond_wait_queue (81c3e0c, 81c3d70, 0, 0) + 3e
 d11d1fbd _cond_wait (81c3e0c, 81c3d70) + 69
 d11d1ffb cond_wait (81c3e0c, 81c3d70) + 24
 d131e4d2 cv_wait  (81c3e0c, 81c3d6c) + 5e
 d12fe2dd txg_wait_synced (81c3cc0, 1014, 0) + 179
 d12f9080 spa_config_update (819dac0, 0) + c4
 d12f467a spa_import (8047657, 8181f88, 0) + 256
 080510c6 main (2, 804749c, 80474a8) + b2
 08050f22 _start   (2, 8047650, 8047657, 0, 804765c, 8047678) + 7a


 d131ed79 vpanic   (d1341dbc, ca5cd248) + 51
 d131ed9f panic(d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f
 d131921d zio_done (822ed40) + 455
 d131c15d zio_next_stage (822ed40) + 161
 d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a
 d1318c88 zio_wait_children_done (822ed40) + 18
 d131c15d zio_next_stage (822ed40) + 161
 d131ba83 zio_vdev_io_assess (822ed40) + 183
 d131c15d zio_next_stage (822ed40) + 161
 d1307011 vdev_mirror_io_done (822ed40) + 421
 d131b8a2 zio_vdev_io_done (822ed40) + 36
 d131c15d zio_next_stage (822ed40) + 161
 d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a
 d1318c88 zio_wait_children_done (822ed40) + 18
 d1306be6 vdev_mirror_io_start (822ed40) + 1d2
 d131b862 zio_vdev_io_start (822ed40) + 34e
 d131c313 zio_next_stage_async (822ed40) + 1ab
 d131bb47 zio_vdev_io_assess (822ed40) + 247
 d131c15d zio_next_stage (822ed40) + 161
 d1307011 vdev_mirror_io_done (822ed40) + 421
 d131b8a2 zio_vdev_io_done (822ed40) + 36
 d131c15d zio_next_stage (822ed40) + 161
 d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a
 d1318c88 zio_wait_children_done (822ed40) + 18
 d1306be6 vdev_mirror_io_start (822ed40) + 1d2
 d131b862 zio_vdev_io_start (822ed40) + 34e
 d131c15d zio_next_stage (822ed40) + 161
 d1318dc1 zio_ready (822ed40) + 131
 d131c15d zio_next_stage (822ed40) + 161
 d131b41b zio_dva_allocate (822ed40) + 343
 d131c15d zio_next_stage (822ed40) + 161
 d131bdcb zio_checksum_generate (822ed40) + 123
 d131c15d zio_next_stage (822ed40) + 161
 d1319873 zio_write_compress (822ed40) + 4af
 d131c15d zio_next_stage (822ed40) + 161
 d1318b92 zio_wait_for_children (822ed40, 1, 822ef28) + 6a
 d1318c68 

Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Neil Perrin



Darren Dunham wrote:

The problem I've come across with using mirror or raidz for this setup
is that (as far as I know) you can't add disks to mirror/raidz groups,
and if you just add the disk to the pool, you end up in the same
situation as above (with more space but no redundancy).


You can't add to an existing mirror, but you can add new mirrors (or
raidz) items to the pool.  If so, there's no loss of redundancy.


Maybe I'm missing some context, but you can add to an existing mirror
- see zpool attach.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ReiserFS4 like metadata/search

2007-06-27 Thread Oliver Schinagl
The only thing I haven't found in zfs yet, is metadata etc info.

The previous 'next best thing' in FS was of course ReiserFS (4). Reiser3
was quite a nice thing, fast, journaled and all that, but Reiser4
promised to bring all those things that we see emerging now, like cross
FS search, any document, audio recording etc could be instantly
searched. True there is google desktop search, trackerd and what not,
but those are 'afterthoughts', not supported by the underlying FS.

So does ZFS support features like metadata and such? or is that for zfs2? :)

oliver
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Darren Dunham
 Darren Dunham wrote:
  The problem I've come across with using mirror or raidz for this setup
  is that (as far as I know) you can't add disks to mirror/raidz groups,
  and if you just add the disk to the pool, you end up in the same
  situation as above (with more space but no redundancy).
  
  You can't add to an existing mirror, but you can add new mirrors (or
  raidz) items to the pool.  If so, there's no loss of redundancy.
 
 Maybe I'm missing some context, but you can add to an existing mirror
 - see zpool attach.

It depends on what you mean by add.  :-) 

The original message was about increasing storage allocation.  You can
add redundancy to an existing mirror with attach, but you cannot
increase the allocatable storage.


-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Erik Trimble
On Wed, 2007-06-27 at 14:50 -0700, Darren Dunham wrote:
  Darren Dunham wrote:
   The problem I've come across with using mirror or raidz for this setup
   is that (as far as I know) you can't add disks to mirror/raidz groups,
   and if you just add the disk to the pool, you end up in the same
   situation as above (with more space but no redundancy).
   
   You can't add to an existing mirror, but you can add new mirrors (or
   raidz) items to the pool.  If so, there's no loss of redundancy.
  
  Maybe I'm missing some context, but you can add to an existing mirror
  - see zpool attach.
 
 It depends on what you mean by add.  :-) 
 
 The original message was about increasing storage allocation.  You can
 add redundancy to an existing mirror with attach, but you cannot
 increase the allocatable storage.
 

With mirrors, there is currently more flexibility than with raid-Z[2].
You can increase the allocatable storage size by replacing each disk in
the mirror with a larger sized one (assuming you wait for a
re-sync ;-P )

Thus, the _safe_ way to increase a mirrored vdev's size is:

Disk A:  100GB
Disk B:  100GB
Disk C:  250GB
Disk D:  250GB


zpool create tank mirror A B
(yank out A, put in C)
(wait for resync)
(yank out B, put in D)
(wait for resync)

and voila!  tank goes from 100GB to 250GB of space.

I believe this should also work if LUNs are used instead of actual disks
- but I don't believe that resizing a LUN currently in a mirror will
work (please, correct me on this), so, for a SAN-backed ZFS mirror, it
would be:

Assuming A = B  C, and after resizing A, A = C  B

zpool create tank mirror A B
zpool attach tank A C   (where C is a new LUN of the new size desired)
(wait for sync of C)
zpool detach tank A
(unmap LUN A from host, resize A to be the same as C, then map back)
zpool attach C A  
(wait for sync of A)
zpool detach B

I believe that will now result in a mirror of the full size of C, not of
B.

I'd be interested to know if you could do this:

zpool create tank mirror A B
(resize LUN A and B to new size)


without requiring a system reboot after resizing A  B  (that is, the
reboot would be needed to update the new LUN size on the host).


-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Richard Elling

Jef Pearlman wrote:
Perhaps I'm not asking my question clearly. I've already experimented a fair amount 
with zfs, including creating and destroying a number of pools with and without 
redundancy, replacing vdevs, etc. Maybe asking by example will clarify what I'm 
looking for or where I've missed the boat. The key is that I want a grow-as-you-go 
heterogenous set of disks in my pool:


The short answer:
zpool add -- add a top-level vdev as a dynamic stripe column
+ available space is increased

zpool attach -- add a mirror to an existing vdev
+ only works when the new mirror is the same size or larger than
  the existing vdev
+ available space is unchanged
+ redundancy (RAS) is increased

zpool detach -- remove a mirror from an existing vdev
+ available space increases if removed mirror is smaller than 
vdev
+ redundancy (RAS) is decreased

zpool replace -- functionally equivalent to attach followed by detach


Let's say I start with a 40g drive and a 60g drive. I create a non-redundant pool 
(which will be 100g). At some later point, I run across an unused 30g drive, which 
I add to the pool. Now my pool is 130g. At some point after that, the 40g drive 
fails, either by producing read errors or my failing to spin up at all. What happens 
to my pool? Can I mount and access it at all (for the data not on or striped across 
the 40g drive)? Can I zfs replace the 40g drive with another drive and have it 
attempt to copy as much data over as it can? Or am I just out of luck? zfs seems like 
a great way to use old/unutilized drives to expand capacity, but sooner or later one 
of those drives will fail, and if it takes out the whole pool (which it might 
reasonably do), then it doesn't work out in the end.


For non-redundant zpools, a device failure *may* cause the zpool to be 
unavailable.
The actual availability depends on the nature of the failure.

A more common scenario might be to add a 400 GByte drive, which you can use to
replace the older drives, or keep online for redundancy.

The zfs copies feature is a little bit harder to grok.  It is difficult to
predict how the system will be affected if you have copies=2 in your above
scenario, because it depends on how the space is allocated.  For more info,
see my notes at:
http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection

 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Erik Trimble
On Wed, 2007-06-27 at 12:03 -0700, Jef Pearlman wrote:
  Jef Pearlman wrote:
   Absent that, I was considering using zfs and just
   having a single pool. My main question is this: what
   is the failure mode of zfs if one of those drives
   either fails completely or has errors? Do I
   permanently lose access to the entire pool? Can I
   attempt to read other data? Can I zfs replace the
   bad drive and get some level of data recovery?
   Otherwise, by pooling drives am I simply increasing
   the probability of a catastrophic data loss? I
   apologize if this is addressed elsewhere -- I've read
   a bunch about zfs, but not come across this
   particular answer.
  
Pooling devices in a non-redundant mode (ie without a raidz or mirror
vdev) increases your chance of losing data, just like every other RAID
system out there.

However, since ZFS doesn't do concatenation (it stripes), by losing one
drive in a non-redundant stripe, you effectively corrupt the entire
dataset, as virtually all files should have some portion of their data
on the dead drive. 


  We generally recommend a single pool, as long as the
  use case permits.
  But I think you are confused about what a zpool is.
   I suggest you look
  t the examples or docs.  A good overview is the slide
  show
  http://www.opensolaris.org/os/community/zfs/docs/zfs_
  last.pdf
 
 Perhaps I'm not asking my question clearly. I've already experimented a fair 
 amount with zfs, including creating and destroying a number of pools with and 
 without redundancy, replacing vdevs, etc. Maybe asking by example will 
 clarify what I'm looking for or where I've missed the boat. The key is that I 
 want a grow-as-you-go heterogenous set of disks in my pool:
 
 Let's say I start with a 40g drive and a 60g drive. I create a non-redundant 
 pool (which will be 100g). At some later point, I run across an unused 30g 
 drive, which I add to the pool. Now my pool is 130g. At some point after 
 that, the 40g drive fails, either by producing read errors or my failing to 
 spin up at all. What happens to my pool? Can I mount and access it at all 
 (for the data not on or striped across the 40g drive)? Can I zfs replace 
 the 40g drive with another drive and have it attempt to copy as much data 
 over as it can? Or am I just out of luck? zfs seems like a great way to use 
 old/unutilized drives to expand capacity, but sooner or later one of those 
 drives will fail, and if it takes out the whole pool (which it might 
 reasonably do), then it doesn't work out in the end.
  

Nope. Your zpool is a stripe. As mentioned above, losing one disk in a
stripe effectively destroys all data, just as with any other RAID
system.


   As a side-question, does anyone have a suggestion
   for an intelligent way to approach this goal? This is
   not mission-critical data, but I'd prefer not to make
   data loss _more_ probable. Perhaps some volume
   manager (like LVM on linux) has appropriate features?
  
  ZFS, mirrored pool will be the most performant and
  easiest to manage
  with better RAS than a raidz pool.
 
 The problem I've come across with using mirror or raidz for this setup is 
 that (as far as I know) you can't add disks to mirror/raidz groups, and if 
 you just add the disk to the pool, you end up in the same situation as above 
 (with more space but no redundancy).
 
 Thanks for your help.
 
 -Jef
  
 

To answer the original question, you _have_ to create mirrors, which, if
you have odd-sized disks, will end up with unused space.

An example:

Disk A:   20GB
Disk B:   30GB
Disk C:   40GB
Disk D:   60GB


Start with disk A  B:

zpool create tank mirror A B

results in a 20GB pool.

Later, add disks C  D:

zpool add tank mirror C D

this results in a 2-wide stripe of 2 mirrors, which means there is a
total capacity of 60GB (20GB for A  B, 40GB for B  C) of the pool.
10GB of the 30GB drive, and 20GB of the 60GB drive are currently unused.
You can lose one drive from both pairs (i.e. A and C, A and D, B and C,
or B and D) before any data loss.


If you had known about the drive sizes beforehand, the you could have
done something like this:

Partition the drives as follows:

A:  1 20GB partition
B:  1 20gb  1 10GB partition
C:  1 40GB partition
D:  1 40GB partition  2 10GB paritions

then you do:

zpool create tank mirror Ap0 Bp0 mirror Cp0 Dp0 mirror Bp1 Dp1

and you get a total of 70GB of space. However, the performance on this
is going to be bad (as you frequently need to write to both partitions
on B  D, causing head seek), though you can still lose up to 2 drives
before experiencing data loss.


-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestions on 30 drive configuration?

2007-06-27 Thread Boyd Adamson

On 28/06/2007, at 12:29 AM, Victor Latushkin wrote:
It is not so easy to predict.  ZFS will coalesce writes.  A single  
transaction
group may have many different writes in it.  Also, raidz[12] is  
dynamic, and
will use what it needs, unlike separate volume managers who do not  
have any

understanding of the context of the data.


There is a good slide which illustrates how stripe width is  
selected dynamically in RAID-Z. Please see slide 13 in this slide  
deck:
http://www.snia.org/events/past/sdc2006/zfs_File_Systems-bonwick- 
moore.pdf

[...]
Btw, I believe there's no link to this presentation on  
opensolaris.org. unfortunately...


Indeed. Is there any reason that the presentation at http:// 
www.opensolaris.org/os/community/zfs/docs/zfs_last.pdf


Couldn't be updated to the one that Victor mentions?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Richard Elling

Erik Trimble wrote:

If you had known about the drive sizes beforehand, the you could have
done something like this:

Partition the drives as follows:

A:  1 20GB partition
B:  1 20gb  1 10GB partition
C:  1 40GB partition
D:  1 40GB partition  2 10GB paritions

then you do:

zpool create tank mirror Ap0 Bp0 mirror Cp0 Dp0 mirror Bp1 Dp1

and you get a total of 70GB of space. However, the performance on this
is going to be bad (as you frequently need to write to both partitions
on B  D, causing head seek), though you can still lose up to 2 drives
before experiencing data loss.


It is not clear to me that we can say performance will be bad
for stripes on single disks.  The reason is that ZFS dynamic
striping does not use a fixed interleave.  In other words, if
I write a block of N bytes to a M-way dynamic stripe, it is
not guaranteed that each device will get an I/O of N/M size.
I've only done a few measurements of this, and I've not completed
my analysis, but my data does not show the sort of thrashing one
might expect from a fixed stripe with small interleave.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss