[zfs-discuss] I/O patterns during a "zpool replace": why write to the disk being replaced?

2006-11-07 Thread Bill Sommerfeld
On a v40z running snv_51, I'm doing a "zpool replace z c1t4d0 c1t5d0".

(so, why am I doing the replace?  The outgoing disk has been reporting
read errors sporadically but with increasing frequency over time..)

zpool iostat -v shows writes going to the old (outgoing) disk as well as
to the replacement disk.  Is this intentional?  

Seems counterintuitive as I'd think you'd want to touch a suspect disk
as little as possible and as nondestructively as possible...

A representative snapshot from "zpool iostat -v" :

  capacity operationsbandwidth
poolused  avail   read  write   read  write
-  -  -  -  -  -  -
z   306G   714G  1.43K658  23.5M  1.11M
  raidz1109G   231G  1.08K392  22.3M   497K
replacing  -  -  0   1012  0  5.72M
  c1t4d0   -  -  0753  0  5.73M
  c1t5d0   -  -  0790  0  5.72M
c2t12d0-  -339177  9.46M   149K
c2t13d0-  -317177  9.08M   149K
c3t12d0-  -330181  9.27M   147K
c3t13d0-  -352180  9.45M   146K
  raidz1100G   240G117101   373K   225K
c1t3d0 -  - 65 33  3.99M  64.1K
c2t10d0-  - 60 44  3.77M  63.2K
c2t11d0-  - 62 42  3.87M  63.4K
c3t10d0-  - 63 42  3.88M  62.3K
c3t11d0-  - 65 35  4.06M  61.8K
  raidz1   96.2G   244G234164   768K   415K
c1t2d0 -  -129 49  7.85M   112K
c2t8d0 -  -133 54  8.05M   112K
c2t9d0 -  -132 56  8.08M   113K
c3t8d0 -  -132 52  8.01M   113K
c3t9d0 -  -132 49  8.16M   112K

- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best Practices recommendation on x4200

2006-11-07 Thread Mike Gerdts

On 11/7/06, Richard Elling - PAE <[EMAIL PROTECTED]> wrote:

> d10 mirror of c0t2d0s0 and c0t3d0s0swap (2+2GB, to match above)

Also a waste, use a swap file.  Add a dumpdev if you care about
kernel dumps, no need to mirror a dumpdev.


How do you figure that allocating space to a swap file is less of a
waste than adding space to a swap device?


Simple /.  Make it big enough to be useful.  Keep its changes to a
minimum.  Make more than one, so that you can use LiveUpgrade.
For consistency, you could make each disk look the same.
s0 / 10G
s6 zpool free
s7 metadb 100M


Since ZFS can get performance boosts from enabling the disk write
cache if it has the whole disk, you may want to consider something
more like the following for two of the disks (assumes mirroring rather
than raidz in the zpool):

s0 / 10G
s1 swap 
s3 alt / 10G
s6 zpool free
s7 metadb 100M

The other pair of disks are given entirely to the zpool.


Use two disks for your BE, the other two for your ABE (assuming all are
bootable).


In any case, be sure that your root slices do not start at cylinder 0
(hmmm... maybe this is SPARC-specific advice...).  One way to populate
an ABE is to mirror slices.  However, you cannot mirror between a
device that starts at cylinder 0 and one that does not.  Consider the
following mock-up (output may be a bit skewed):

Starting state...

# lustatus
slice0 - active mounted at d0
slice3 - may or may not exist, if it exists it is on d30

# metastat -p
d0 -m d1 d2 1
d1 1 1 c0t0d0s0
d2 1 1 c0t1d0s0
d30 -m d31 d32 1
d31 1 1 c0t0d0s3
d32 1 1 c0t1d0s3

Get rid of slice3 boot environment, make d31 available to recreate it.

# ludelete slice3
# metadetach d30 d31
# metaclear -r d30

Mirror d0 to d31.  Wait for it to complete.

# metattach d0 d31
# while metastat -p | grep % ; do sleep 30 ; done

Detach d31 from d0, recreate d30 mirror

# metadetach d0 d31
# metainit d30 -m d31 1
# metainit d32 1 1 c0t1d0s3
# metattach d30 d32

Create boot environment named slice3:

# lucreate -n slice3 /:d30:ufs,preserve

Now you can manipulate the slice3 boot environment as needed.

Why go through all of this?  My reasons have typically been:

1) Normally lucreate uses cpio, which doesn't cope with sparse files
well.  /var/adm/lastlog is a sparse file that can be problematic if
you have users with large UID's
2) Lots of file systems mounted and little interest in creating very
complex command lines with many -x options.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best Practices recommendation on x4200

2006-11-07 Thread Richard Elling - PAE

The best thing about best practices is that there are so many of them :-)

Robert Milkowski wrote:

Hello John,

Tuesday, November 7, 2006, 7:45:46 PM, you wrote:

JT> Greetings all-
JT>  I have a new X4200 that I'm getting ready to deploy. It has
JT> four 146 GB SAS drives. I'd like to setup the box for maximum
JT> redundancy on the data stored on these drives. Unfortunately, it
JT> looks like ZFS boot/root aren't really options at this time. The
JT> LSI Logic controller in this box only supports either a RAID0
JT> array with all four disks, or a RAID 1 array with two
JT> disks--neither of which are very appealing to me.
JT>  Ideally I'd like to have at least 300 gigs of storage
JT> available to the users, or more if I can do it with something like
JT> a RAID 5 setup. My concern, however, is that the boot partition
JT> and root partitions have data redundancy.
JT>  How would you setup this box?
JT>  It's primary used as a development server, running a myriad of 
applications.


Use SVM to mirror system, something like:

d0 mirror of c0t0d0s0 and c0t1d0s0 /2GB
d5 mirror of c0t0d0s1 and c0t1d0s1 /var 2GB


IMNSHO, having a separate /var is a complete waste of effort.
Also, 2 GBytes is too small.


d10 mirror of c0t2d0s0 and c0t3d0s0swap (2+2GB, to match above)


Also a waste, use a swap file.  Add a dumpdev if you care about
kernel dumps, no need to mirror a dumpdev.


an all 4 disks create s4 slice with the rest of the disk, should
be equal on all disks. Then create raidz pool out of those slices.
You should get above 400GB of usable storage.

That way you've got mirrored root disks, mirrored swap on another
two disks matching exactly the space used by / and /var and the
rest of the disk for your data on zfs.


ps. and of course you've got to create small slices for metadb's.



Simple /.  Make it big enough to be useful.  Keep its changes to a
minimum.  Make more than one, so that you can use LiveUpgrade.
For consistency, you could make each disk look the same.
s0 / 10G
s6 zpool free
s7 metadb 100M

Use two disks for your BE, the other two for your ABE (assuming all are
bootable).

The astute observer will note that you could also use the onboard RAID
controller for the same, simple configuration, less metadb of course.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best Practices recommendation on x4200

2006-11-07 Thread Robert Milkowski
Hello John,

Tuesday, November 7, 2006, 7:45:46 PM, you wrote:

JT> Greetings all-
JT>  I have a new X4200 that I'm getting ready to deploy. It has
JT> four 146 GB SAS drives. I'd like to setup the box for maximum
JT> redundancy on the data stored on these drives. Unfortunately, it
JT> looks like ZFS boot/root aren't really options at this time. The
JT> LSI Logic controller in this box only supports either a RAID0
JT> array with all four disks, or a RAID 1 array with two
JT> disks--neither of which are very appealing to me.
JT>  Ideally I'd like to have at least 300 gigs of storage
JT> available to the users, or more if I can do it with something like
JT> a RAID 5 setup. My concern, however, is that the boot partition
JT> and root partitions have data redundancy.
JT>  How would you setup this box?
JT>  It's primary used as a development server, running a myriad of 
applications.


Use SVM to mirror system, something like:

d0 mirror of c0t0d0s0 and c0t1d0s0 /2GB
d5 mirror of c0t0d0s1 and c0t1d0s1 /var 2GB
d10 mirror of c0t2d0s0 and c0t3d0s0swap (2+2GB, to match above)

an all 4 disks create s4 slice with the rest of the disk, should
be equal on all disks. Then create raidz pool out of those slices.
You should get above 400GB of usable storage.

That way you've got mirrored root disks, mirrored swap on another
two disks matching exactly the space used by / and /var and the
rest of the disk for your data on zfs.


ps. and of course you've got to create small slices for metadb's.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best Practices recommendation on x4200

2006-11-07 Thread Al Hopper
On Tue, 7 Nov 2006, John Tracy wrote:

> Greetings all-
>  I have a new X4200 that I'm getting ready to deploy. It has four 146 GB 
> SAS drives. I'd like to setup the box for maximum redundancy on the data 
> stored on these drives. Unfortunately, it looks like ZFS boot/root aren't 
> really options at this time. The LSI Logic controller in this box only 
> supports either a RAID0 array with all four disks, or a RAID 1 array with two 
> disks--neither of which are very appealing to me.
>  Ideally I'd like to have at least 300 gigs of storage available to the 
> users, or more if I can do it with something like a RAID 5 setup. My concern, 
> however, is that the boot partition and root partitions have data redundancy.
>  How would you setup this box?
>  It's primary used as a development server, running a myriad of 
> applications.

Since you've posted this to zfs-discuss, I'm assuming that your goal is to
find a way that you can take advantage of zfs on this box - if at all
possible.  So I'm going to propose a radical setup that I'm sure many will
have issues with and which falls outside conventional/normal best
practices, which in this case, would be to form 2 mirrors of 2 disks each
using the built-in H/W RAID ctrl and you're done.  If this is too radical
for you, it will at least provide food for thought.

First your assertion that you want redundancy for root and boot is
somewhat flawed.  Let me explain; in the "old days", loosing root on a box
was one of the worst possible user experiences - but that is simply not
true today.  If you keep the root filesystem pristine (more later) and
just save off the config files you modify (/etc/hostname.*, /etc/passwd,
/etc/group, /etc/shadow, /etc/hosts blah, blah) periodically, the root
partition can be restored quickly and simply and then your config files
restored.  Consider the root partition disposable and replaceable.  If you
setup the system initially to net boot[1], then your root partition can be
restored very quickly from the same set of files you used to load it
initially!  Since its being used as a development box, if the root disk
dies, you push in a replacement, net boot it and restore your saved config
files.  Downtime will probably be around 30 minutes, assuming you keep a
spare disk handy (in a locked, rack-mount drawer immediately adajacent to
the x4200 machine).

Next I mentioned keeping root pristine.  I'll also assume you'll use the
blastwave software reposition which installs software in /opt/csw by
default.  So first up, the disk layout config:

on the boot disk:
  - 16Gb / root partation
  - 4 to 16Gb swap partition
  - 16Gb live upgrade partition
  - small lightly used /export/home partition
  - the rest of this disk will be un-allocated at this time

with the other 3 disks, form a 3-way raidz pool, with the following broad
plan for the initial zfs filesystems you'll place in this pool:
  - a filesystem for shared home directories that will be shared into zones
  - additional swap vdev
  - a filesystem for your master zone (see below)
  - a filesystem for each zone you'll define on this box
  - a filesystem for (one or more) junk zone(s)

So now root is still pristine - no supplemental software has been loaded
or added.  First up, build a "master" zone.  Its master, in the sense that
it'll be used to clone real working zones from, in which you will do *all*
the "real" work.  So create a fat zone (create -b), run "netservices
limited" within it, add default user accounts, setup DNS etc.  The more
effort you put into building/configuration of this master zone, the easier
it'll be to add work zones to the box.

Now that you have your master zone, use zfs clone to create "fat" zones
for use as work areas.  Within these work zones, you'll install all your
blastwave tools, compilers, tools etc. etc.  You can arrange for the
shared home directories to be automatically mounted when a user with a
shared home logs into the zone (using the automounter).  You'll probably
have some users who only have logins in certains zones etc.

Next repeat the above and build/clone more work zones on a per project or
per department or per whatever-makes-sense basis.  You'll apply zfs quotas
on zones where you have concerns about the users gobbling up too much disk
space.  You'll have one or more junkzones to allow experiments with the
system config to be safely isolated.  Use zfs send/recv to backup
individual zones or datasets from within zones to another zfs server.

If you elect to install Solaris, then (my recommendation) wait for Update
3.  And you'll be able to zfs clone zones by copying them - but not create
them from snapshots.  If you install Solaris Express or the
latest/greatest OpenSolaris you'll be able to create zones very quickly
from a zfs snapshot of your master zone - saving you a good deal of time
and disk space.

Downsides: There are many.  First off, you know that zones on zfs are not
supported (yet).  And that applying patches

Re: [zfs-discuss] linux versus sol10

2006-11-07 Thread Michael Schuster

listman wrote:


hi, i found a comment comparing linux and solaris but wasn't sure which 
version of solaris was being referred. can the list confirm that this 
issue isn't a problem with solaris10/zfs??


"Linux also supports asynchronous directory updates which can make a 
significant performance improvement when branching. On Solaris machines, 
inode creation is very slow and can result in very long iowait states."


I think this cannot be commented on in a useful fashion without more 
information this supposed issue. AFAIK, neither ufs nor zfs "create inodes" 
(at run time), so this is somewhat hard to put into context.


get a complete description of what this is about, then maybe we can give you a 
useful answer.


HTH
Michael
--
Michael Schuster
Sun Microsystems, Inc.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # devices in raidz.

2006-11-07 Thread Daniel Rock

Richard Elling - PAE schrieb:

For modern machines, which *should* be the design point, the channel
bandwidth is underutilized, so why not use it?


And what about encrypted disks? Simply create a zpool with checksum=sha256, 
fill it up, then scrub. I'd be happy if I could use my machine during 
scrubbing. A throttling of scrubbing would help. Maybe also running the 
scrubbing with a "high nice level" in kernel.





NB. At 4 128kByte iops per second, it would take 11 days and 8 hours
to resilver a single 500 GByte drive -- feeling lucky?


250ms is the Veritas default. It doesn't have to be the ZFS default also.


Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] linux versus sol10

2006-11-07 Thread Casper . Dik

>hi, i found a comment comparing linux and solaris but wasn't sure  
>which version of solaris was being referred. can the list confirm  
>that this issue isn't a problem with solaris10/zfs??
>
>"Linux also supports asynchronous directory updates which can make a  
>significant performance improvement when branching. On Solaris  
>machines, inode creation is very slow and can result in very long  
>iowait states."

SInce it refers to iowait it must refer to Solaris 9 or earlier;
since it refers to "slow inode creation" it is nearly certain
it refers to pre-logging ufs, which seems the default in S8 and
earlier only.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] linux versus sol10

2006-11-07 Thread listman
hi, i found a comment comparing linux and solaris but wasn't sure which version of solaris was being referred. can the list confirm that this issue isn't a problem with solaris10/zfs??"Linux also supports asynchronous directory updates which can make a significant performance improvement when branching. On Solaris machines, inode creation is very slow and can result in very long iowait states."thX!___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Best Practices recommendation on x4200

2006-11-07 Thread John Tracy
Greetings all-
 I have a new X4200 that I'm getting ready to deploy. It has four 146 GB 
SAS drives. I'd like to setup the box for maximum redundancy on the data stored 
on these drives. Unfortunately, it looks like ZFS boot/root aren't really 
options at this time. The LSI Logic controller in this box only supports either 
a RAID0 array with all four disks, or a RAID 1 array with two disks--neither of 
which are very appealing to me.
 Ideally I'd like to have at least 300 gigs of storage available to the 
users, or more if I can do it with something like a RAID 5 setup. My concern, 
however, is that the boot partition and root partitions have data redundancy.
 How would you setup this box?
 It's primary used as a development server, running a myriad of 
applications.

Thank you-
John
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: ZFS for Linux 2.6

2006-11-07 Thread Erast Benson
On Tue, 2006-11-07 at 10:30 -0800, Akhilesh Mritunjai wrote:
> > > Yuen L. Lee wrote:
> > opensolaris could be a nice NAS filer. I posted
> > my question on "How to build a NAS box" asking for
> > instructions on how to build a Solaris NAS box.
> > It looks like everyone is busy. I haven't got any
> > response yet. By any chance, do you have any
> 
> Hi Yuen
> 
> May I suggest that a better question would have been "How to build a minimal 
> Nevada distribution ?". I'm sure it would have gotten more responses as it is 
> both - a more general, and a more relevent question.
> 
> Apart from that unasked advice, If my memory serves right the Belenix folks 
> (Moinak and gang) were discussing a similar thing in a thread sometime 
> back... chasing them might be a good idea ;-)
> 
> I found some articles on net on how to build minimal image of solaris with 
> networking. Packages relating to storage (zfs, iSCSI etc) can be added to it 
> later. The minimal system with required components, sure, is heavy - about 
> 200MB... but shouldn't be an issue for a *NAS* box. I googled "Minimal 
> solaris configuration" and found several articles.

Alternative way would be to simply use NexentaOS InstallCD and select
"Minimal Profile" during installation.

-- 
Erast

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: ZFS for Linux 2.6

2006-11-07 Thread Akhilesh Mritunjai
> > Yuen L. Lee wrote:
> opensolaris could be a nice NAS filer. I posted
> my question on "How to build a NAS box" asking for
> instructions on how to build a Solaris NAS box.
> It looks like everyone is busy. I haven't got any
> response yet. By any chance, do you have any

Hi Yuen

May I suggest that a better question would have been "How to build a minimal 
Nevada distribution ?". I'm sure it would have gotten more responses as it is 
both - a more general, and a more relevent question.

Apart from that unasked advice, If my memory serves right the Belenix folks 
(Moinak and gang) were discussing a similar thing in a thread sometime back... 
chasing them might be a good idea ;-)

I found some articles on net on how to build minimal image of solaris with 
networking. Packages relating to storage (zfs, iSCSI etc) can be added to it 
later. The minimal system with required components, sure, is heavy - about 
200MB... but shouldn't be an issue for a *NAS* box. I googled "Minimal solaris 
configuration" and found several articles.

Hope that helps
- Akhilesh
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: ZFS for Linux 2.6

2006-11-07 Thread Darren J Moffat

Yuen L. Lee wrote:

Yuen L. Lee wrote:

Thanks, Matt! I have the same understanding from my

previous

experience. The difference is my code may not be

integrated into

the official distribution. I'm interested in

porting the ZFS to the Linux

platform because I'm attempting to use ZFS in

openfiler. I think it
would be an interesting and useful project. 

What about porting openfiler to OpenSolaris ?


Good question! In my understanding, openfiler is
just a standalone version of the Linux based embedded
system with File-level protocols support, such as
NFS, CIFS, iSCSI (target/iniator), FTP and HTTP.
OpenSolaris supports all of them. 


I think the fact that openfiler is Linux isn't the
relevant and interesting bit.  The interesting bit is
the web based GUI to configure all those things.

The underlying technologies it is using are all
available in OpenSolaris distros as well.

A good place to start from might be Nexenta since that would
give you a Linux like environment to work in and might make
it easier to get the openfiler GUI up and running.

But It is a 
complete, general purpose distribution. If it is 
made to run as an appliance, I'm sure I don't need
to port ZFS to the openfiler (Linux) platform. 
opensolaris could be a nice NAS filer. I posted

my question on "How to build a NAS box" asking for
instructions on how to build a Solaris NAS box.
It looks like everyone is busy. I haven't got any
response yet. By any chance, do you have any
doc on this subject?  Thanks.


It might also be that where you posted it the correct people
aren't hanging out if you haven't already try:

The Appliances and NFS communities.

http://opensolaris.org/os/community/appliances/
http://opensolaris.org/os/community/nfs


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # devices in raidz.

2006-11-07 Thread Torrey McMahon

Richard Elling - PAE wrote:

The better approach is for the file system to do what it needs
to do as efficiently as possible, which is the current state of ZFS. 


This implies that the filesystem has exclusive use of the channel - SAN 
or otherwise - as well as the storage array front end controllers, 
cache, and the raid groups that may be behind it. What we really need in 
this case, and a few others, is the filesystem and backend storage 
working together...but I'll save that rant for an other day. ;)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: ZFS for Linux 2.6

2006-11-07 Thread Yuen L. Lee
> Yuen L. Lee wrote:
> > Thanks, Matt! I have the same understanding from my
> previous
> > experience. The difference is my code may not be
> integrated into
> > the official distribution. I'm interested in
> porting the ZFS to the Linux
> > platform because I'm attempting to use ZFS in
> openfiler. I think it
> > would be an interesting and useful project. 
> 
> What about porting openfiler to OpenSolaris ?

Good question! In my understanding, openfiler is
just a standalone version of the Linux based embedded
system with File-level protocols support, such as
NFS, CIFS, iSCSI (target/iniator), FTP and HTTP.
OpenSolaris supports all of them. But It is a 
complete, general purpose distribution. If it is 
made to run as an appliance, I'm sure I don't need
to port ZFS to the openfiler (Linux) platform. 
opensolaris could be a nice NAS filer. I posted
my question on "How to build a NAS box" asking for
instructions on how to build a Solaris NAS box.
It looks like everyone is busy. I haven't got any
response yet. By any chance, do you have any
doc on this subject?  Thanks.

Yuen
> 
> -- 
> Darren J Moffat
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # devices in raidz.

2006-11-07 Thread Richard Elling - PAE

Daniel Rock wrote:

Richard Elling - PAE schrieb:

The big question, though, is "10% of what?"  User CPU?  iops?


Maybe something like the "slow" parameter of VxVM?

   slow[=iodelay]
Reduces toe system performance impact of copy
operations.  Such operations are usually per-
formed on small regions of the  volume  (nor-
mally  from  16  kilobytes to 128 kilobytes).
This  option  inserts  a  delay  between  the
recovery  of  each  such  region . A specific
delay can be  specified  with  iodelay  as  a
number  of milliseconds; otherwise, a default
is chosen (normally 250 milliseconds).


For modern machines, which *should* be the design point, the channel
bandwidth is underutilized, so why not use it?

NB. At 4 128kByte iops per second, it would take 11 days and 8 hours
to resilver a single 500 GByte drive -- feeling lucky?  In the bad old
days when disks were small, and the systems were slow, this made some
sense.  The better approach is for the file system to do what it needs
to do as efficiently as possible, which is the current state of ZFS.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS for Linux 2.6

2006-11-07 Thread Darren J Moffat

Yuen L. Lee wrote:

Thanks, Matt! I have the same understanding from my previous
experience. The difference is my code may not be integrated into
the official distribution. I'm interested in porting the ZFS to the Linux
platform because I'm attempting to use ZFS in openfiler. I think it
would be an interesting and useful project. 


What about porting openfiler to OpenSolaris ?

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # devices in raidz.

2006-11-07 Thread Daniel Rock

Richard Elling - PAE schrieb:

The big question, though, is "10% of what?"  User CPU?  iops?


Maybe something like the "slow" parameter of VxVM?

   slow[=iodelay]
Reduces toe system performance impact of copy
operations.  Such operations are usually per-
formed on small regions of the  volume  (nor-
mally  from  16  kilobytes to 128 kilobytes).
This  option  inserts  a  delay  between  the
recovery  of  each  such  region . A specific
delay can be  specified  with  iodelay  as  a
number  of milliseconds; otherwise, a default
is chosen (normally 250 milliseconds).



Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS for Linux 2.6

2006-11-07 Thread Yuen L. Lee
> 
> 
> James Dickens wrote:
>   cite="midcd09bdd10611062233q6dde0c0clc8033761832e9ab2
> mail.gmail.com"
>  type="cite">
>  
>  On 11/6/06,  class="gmail_sendername">Yuen L. Lee <  href="mailto:[EMAIL PROTECTED]">[EMAIL PROTECTED]
> /a>> wrote:
>tyle="border-left: 1px solid rgb(204, 204, 204);
> margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I'm
> curious whether there is a version of Linux 2.6 ZFS
> available?
> Many thanks.
>   
> rry there is no ZFS in Linux, and given current
> stands of Linus
> Torvalds and the current Kernel team there never will
> be, because Linux
> is GPLv2 and it is incompatible with ZFS that is
> released under the
> CDDL license. The closest possibility to getting ZFS
> in Linux is
> through the FUSE project that is porting ZFS to
> userland that runs
> inside Linux but is not in the kernel so not limited
> by the license
> argument. 
>   
> 
> 
> Just in case it isn't mentioned by someone else, many
> of the
> OpenSolaris folks would probably encourage you, Yuen,
> to bring this up
> with the Linux kernel folks.  Obviously, things
> like filesystems are
> very useful to have implementations of on many
> platforms (i.e. people
> should own their data, their operating systems
> shouldn't).  
> 
> I'm not an expert (nor am I offering legal advice),
> but my
> understanding of GPLv2 is the copyright holder can
> explicitly state
> exceptions on linking, so they could allow linking
> with ZFS even though
> it's under the CDDL.  Linux, when run on say
> something like a
> mainframe, already does link with non-GPL
> modules.

Thanks, Matt! I have the same understanding from my previous
experience. The difference is my code may not be integrated into
the official distribution. I'm interested in porting the ZFS to the Linux
platform because I'm attempting to use ZFS in openfiler. I think it
would be an interesting and useful project. 

> 
> So my understanding is it's not a legal issue or
> technical issue (other
> than that pesky porting), but more of a
> whether-or-not-people-want-it. 
> So if you want it, you should ping the appropriate
> Linux folks.

I agree. Unfortunately, I don't have any connections with any appropriate
Linux folks. This is why I asked in the opensolaris  forum to see whether
there is any Linux  ZFS available. Nonetheless, I'm interested in porting
the ZFS to the Linux 2.6 platform. I'm hoping I can share some of the
porting workload with others who share this interest. My goal is to use
ZFS in my NAS openfiler.

>   cite="midcd09bdd10611062233q6dde0c0clc8033761832e9ab2
> mail.gmail.com"
>  type="cite">
>  
>  
> f course its probably easier just to run Solaris
> Express it should
> have most of your favorite Linux applications
> allready ported, if not
> you can use Brandz that allows you to run most Linux
> apps/excutables in
> a Zone inside Solaris. 
>   
> mes Dickens
>   href="http://uadmin.blogspot.com";>uadmin.blogspot.com
> /a> 
>   
> bsp;
>   
>  style="border-left: 1px solid rgb(204, 204, 204);
>  margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">This
> essage posted from  href="http://opensolaris.org";>opensolaris.org
> ___
> zfs-discuss mailing list
>  href="mailto:zfs-discuss@opensolaris.org";>zfs-discuss@
> opensolaris.org
> 
>  href="http://mail.opensolaris.org/mailman/listinfo/zfs
> -discuss">http://mail.opensolaris.org/mailman/listinfo
> /zfs-discuss
>   
> 
>   
> 
> 
> ___
> zfs-discuss mailing list
>  href="mailto:zfs-discuss@opensolaris.org";>zfs-discuss@
> opensolaris.org
>  href="http://mail.opensolaris.org/mailman/listinfo/zfs
> -discuss">http://mail.opensolaris.org/mailman/listinfo
> /zfs-discuss
>   
> blockquote>
> 
> 
> -- 
> Matt Ingenthron - Web Infrastructure Solutions
> Architect
> Sun Microsystems, Inc. - Client Solutions, Systems
> Practice
>  href="http://blogs.sun.com/mingenthron/";>http://blogs.
> sun.com/mingenthron/
> email:  href="mailto:[EMAIL PROTECTED]">matt.ingenthron@
> sun.com Phone: 310-242-6439
> 
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS for Linux 2.6

2006-11-07 Thread Yuen L. Lee
> Erik Trimble <[EMAIL PROTECTED]> wrote:
> 
> > There have been extensive discussions on loadable
> modules and licensing 
> > w/r/t the GPLv2 in the linux kernel. nVidia,
> amongst others, pushed hard 
> > to allow for non-GPL-compatible licensed code to be
> allowed as a Linux 
> > kernel module.  However, the kernel developers'
> consensus seems to have 
> > come down against modifying the current kernel GPL
> license to allow for 
> > non-GPL'd loadable modules.
> 
> If ever, you would not need to modify the GPL (you
> are not allowed to do so 
> anyway), but the Linux kernel code would need changes
> to have more clean
> interfaces.

It would be interesting to know whether the Linux kernel folks
are willing to accept this approach. I doubt it. It would be
easier if we can map all of the Solaris APIs that are used in
ZFS with Linux's. For instance, the VOP_XXX functions, and
the synchronization facility, spinlock_t to Solaris kmutex_t etc.
Then, we don't need to worry about whether ZFS is part of
Linux kernel project. 

> 
> Depending on the type of a loadable module and on the
> country where the Author
> is located (and the local Gopyright law), it looks
> like non-GPL modules are 
> usually allowed unless you try to incorporate these
> modules into the 
> Linux _project_ itself.
> 
> The GPL only requires that all files from a single
> project ("Work") are
> under GPL.
> 
> As I would call ZFS a separate project, it may be
> under a separate and 
> different license.
> 
> Note that if the people who like to disallow code
> under non-GPL lisenses
> like CDDLd code to be used together with GPLd
> projects, these people must
> (if they would be consistent) also demand that GPLd
> projects may not use
> LGPLd libraries (as these libs usually cannot be
> relicensed under GPL).
> 
> Conclusion: it is a problem that lives in the mind of
> the Linux kernel people
> that cannot be fixed unless these people start having
> a more realistic view
> on the problem.

I understand the concern for the Linux kernel.. In order to
make ZFS to be part of Linux distribution, some of the
Linux kernel APIs may need to be changed. It is not an
easy task because there is so much code dependent on
the APIs.

> 
> Jörg
> 
> -- 
> EMail:[EMAIL PROTECTED] (home) Jörg
>  Schilling D-13353 Berlin
>   [EMAIL PROTECTED](uni)  
> [EMAIL PROTECTED] (work) Blog:
>  http://schily.blogspot.com/
> URL:  http://cdrecord.berlios.de/old/private/
> ftp://ftp.berlios.de/pub/schily
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # devices in raidz.

2006-11-07 Thread Richard Elling - PAE

Robert Milkowski wrote:

Saturday, November 4, 2006, 12:46:05 AM, you wrote:
REP> Incidentally, since ZFS schedules the resync iops itself, then it can
REP> really move along on a mostly idle system.  You should be able to resync
REP> at near the media speed for an idle system.  By contrast, a hardware
REP> RAID array has no knowledge of the context of the data or the I/O 
scheduling,
REP> so they will perform resyncs using a throttle.  Not only do they end up
REP> resyncing unused space, but they also take a long time (4-18 GBytes/hr for
REP> some arrays) and thus expose you to a higher probability of second disk
REP> failure.

However some mechanism to slow or freeze scrub/resilvering would be
useful. Especially in cases where server does many other things and
not only file serving - and scrub/resilver can take much CPU power on
slower servers.

Something like 'zpool scrub -r 10 pool' - which would mean 10% of
speed.


I think this has some merit for scrubs, but I wouldn't suggest it for resilver.
If your data is at risk, there is nothing more important than protecting it.
While that sounds harsh, in reality there is a practical limit determined by
the ability of a single LUN to absorb a (large, sequential?) write workload.
For JBODs, that would be approximately the media speed.

The big question, though, is "10% of what?"  User CPU?  iops?
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS direct i/o

2006-11-07 Thread Roch - PAE

Here is my take on this 

http://blogs.sun.com/roch/entry/zfs_and_directio


-r


Marlanne DeLaSource writes:
 > I had a look at various topics covering ZFS direct I/O, and this topic is 
 > sometimes mentioned, and it was not really clear to me.
 > 
 > Correct me if I'm wrong
 >   Direct I/O is not strictly POSIX
 >   It is not implemented in ZFS (?)
 > 
 > Then, how can we try to replace this feature that can speed databases for 
 > exemple ?
 >  
 >  
 > This message posted from opensolaris.org
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS direct i/o

2006-11-07 Thread Marlanne DeLaSource
I had a look at various topics covering ZFS direct I/O, and this topic is 
sometimes mentioned, and it was not really clear to me.

Correct me if I'm wrong
• Direct I/O is not strictly POSIX
• It is not implemented in ZFS (?)

Then, how can we try to replace this feature that can speed databases for 
exemple ?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] # devices in raidz.

2006-11-07 Thread Robert Milkowski
Hello Richard,

Saturday, November 4, 2006, 12:46:05 AM, you wrote:


REP> Incidentally, since ZFS schedules the resync iops itself, then it can
REP> really move along on a mostly idle system.  You should be able to resync
REP> at near the media speed for an idle system.  By contrast, a hardware
REP> RAID array has no knowledge of the context of the data or the I/O 
scheduling,
REP> so they will perform resyncs using a throttle.  Not only do they end up
REP> resyncing unused space, but they also take a long time (4-18 GBytes/hr for
REP> some arrays) and thus expose you to a higher probability of second disk
REP> failure.


However some mechanism to slow or freeze scrub/resilvering would be
useful. Especially in cases where server does many other things and
not only file serving - and scrub/resilver can take much CPU power on
slower servers.

Something like 'zpool scrub -r 10 pool' - which would mean 10% of
speed.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CHOSUG Events in December with Bill Moore, co-lead of the ZFS Eng. team!!!

2006-11-07 Thread Karim Riad MAZOUNI
Dear all,

Just to announce that [b]Bill Moore [/b]will visit Switzerland on the first 
week of December and he will present & demo ZFS at our CHOSUG events planned on 
[b]Dec. 7th at EPFL[/b] (Lausanne) and on [b]Dec. 8th at CERN[/b] (Geneva).

Those events are obviously free and open to anyone: you have only to register 
so we can plan logistics (esp. access badge at CERN).

All details are at: 
http://www.opensolaris.org/os/community/os_user_groups/chosug/next_meetings/

More about CHOSUG on our main page: 
http://www.opensolaris.org/os/community/os_user_groups/chosug/

Regards,

Karim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for Linux 2.6

2006-11-07 Thread Joerg Schilling
Matt Ingenthron <[EMAIL PROTECTED]> wrote:

> I'm not an expert (nor am I offering legal advice), but my understanding 
> of GPLv2 is the copyright holder can explicitly state exceptions on 
> linking, so they could allow linking with ZFS even though it's under the 
> CDDL.  Linux, when run on say something like a mainframe, already does 
> link with non-GPL modules.

The GPLv2 does not prevent linking with different projects under different
licenses, it just prevents non-GPLv2d code to appear inside a a GPLd project.

The latter would only be true if someone claims ZFS is a part ot the Linux 
Project (GPL speek: "work").

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for Linux 2.6

2006-11-07 Thread Joerg Schilling
Erik Trimble <[EMAIL PROTECTED]> wrote:

> There have been extensive discussions on loadable modules and licensing 
> w/r/t the GPLv2 in the linux kernel. nVidia, amongst others, pushed hard 
> to allow for non-GPL-compatible licensed code to be allowed as a Linux 
> kernel module.  However, the kernel developers' consensus seems to have 
> come down against modifying the current kernel GPL license to allow for 
> non-GPL'd loadable modules.

If ever, you would not need to modify the GPL (you are not allowed to do so 
anyway), but the Linux kernel code would need changes to have more clean
interfaces.

Depending on the type of a loadable module and on the country where the Author
is located (and the local Gopyright law), it looks like non-GPL modules are 
usually allowed unless you try to incorporate these modules into the 
Linux _project_ itself.

The GPL only requires that all files from a single project ("Work") are
under GPL.

As I would call ZFS a separate project, it may be under a separate and 
different license.

Note that if the people who like to disallow code under non-GPL lisenses
like CDDLd code to be used together with GPLd projects, these people must
(if they would be consistent) also demand that GPLd projects may not use
LGPLd libraries (as these libs usually cannot be relicensed under GPL).

Conclusion: it is a problem that lives in the mind of the Linux kernel people
that cannot be fixed unless these people start having a more realistic view
on the problem.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for Linux 2.6

2006-11-07 Thread Erik Trimble
There have been extensive discussions on loadable modules and licensing 
w/r/t the GPLv2 in the linux kernel. nVidia, amongst others, pushed hard 
to allow for non-GPL-compatible licensed code to be allowed as a Linux 
kernel module.  However, the kernel developers' consensus seems to have 
come down against modifying the current kernel GPL license to allow for 
non-GPL'd loadable modules.


For an example of the type of exception required to explicitly allow 
this type of behavior, check out the GNU Classpath project's license:  
http://www.gnu.org/software/classpath/license.html


This is similar to the LGPL license.

The issue of non-GPL'd loadable modules is still a very active 
discussion, so I'm sure the last word hasn't been decided.  As pointed 
out, though, the ZFS code is CDDL, which is incompatible with the GPL. 
The FUSE project is using a similar approach to nVidia, using a piece of 
"shim" GPL'd code as a loadable module providing a stable kernel API to 
call from userland applications, which can carry any license desired.



-Erik

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss