Re: [zfs-discuss] Setting up for zfsboot

2007-04-05 Thread Roch Bourbonnais


Le 4 avr. 07 à 10:01, Paul Boven a écrit :


Hi everyone,



Swap would probably have to go on a zvol - would that be best  
placed on

the n-way mirror, or on the raidz?



From the book of Richard Elling,

Shouldn't matter. The 'existence' of a swap device is sometimes  
required.

If the device ever becomes 'in use', you'll need to find an alternative.

-r


Regards, Paul Boven.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-05 Thread Roch Bourbonnais


Now,  given proper I/O concurrency (like recently improved NCQ in our  
drivers) or SCSI CTQ,
I don't not expect the write caches to provide much performance  
gains, if any, over the situation

with write caches off.

Write caches can be extremelly effective when dealing with drives
that do not handle concurrent requests properly.

I'd be interested to see data that shows otherwise.

-r

Le 4 avr. 07 à 15:20, Constantin Gonzalez a écrit :


Hi,

Manoj Joseph wrote:

Can write-cache not be turned on manually as the user is sure that  
it is

only ZFS that is using the entire disk?


yes it can be turned on. But I don't know if ZFS would then know  
about it.


I'd still feel more comfortably with it being turned off unless ZFS  
itself

does it.

But maybe someone from the ZFS team can clarify this.

Cheers,
   Constantin

--
Constantin GonzalezSun Microsystems  
GmbH, Germany
Platform Technology Group, Global Systems Engineering  http:// 
www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/ 
constantin/


Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim- 
Heimstetten

Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland  
Boemer

Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Setting up for zfsboot

2007-04-05 Thread Roch Bourbonnais


Le 5 avr. 07 à 08:28, Robert Milkowski a écrit :


Hello Matthew,

Thursday, April 5, 2007, 1:08:25 AM, you wrote:

MA> Lori Alt wrote:


Can write-cache not be turned on manually as the user is sure  
that it is

only ZFS that is using the entire disk?


yes it can be turned on. But I don't know if ZFS would then know  
about it.


I'd still feel more comfortably with it being turned off unless  
ZFS itself

does it.

But maybe someone from the ZFS team can clarify this.

I think that it's true that ZFS would not know about the
write cache and thus you wouldn't get the benefit of it.


MA> Actually, all that matters is that the write cache is on --  
doesn't
MA> matter whether ZFS turned it on or you did it manually.   
(However, make
MA> sure that the write cache doesn't turn itself back off when you  
reboot /

MA> lose power...)

SCSI write cache flush commands will be issued regardless if zfs has a
whole disk or only a slice then, right?



That's correct. The code path that issue flushes to the write cache,  
do not check whether or no

the caches are enabled.

--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-05 Thread Constantin Gonzalez
Hi,

>>> - RAID-Z is _very_ slow when one disk is broken.
>> Do you have data on this? The reconstruction should be relatively cheap
>> especially when compared with the initial disk access.
> 
> Also, what is your definition of "broken"?  Does this mean the device
> appears as FAULTED in the pool status, or that the drive is present and
> not responding?  If it's the latter, this will be fixed by my upcoming
> FMA work.

sorry, the _very_ may be exaggarated and depending much on the load of
the system and the config.

I'm referring to a couple of posts and anecdotal experience from colleagues.
This means that indeed "slow" or "very slow" may be a mixture of
reconstruction overhead and device timeout issue.

So, it's nice to see that the upcoming FMA code will fix some of the slowness
issues.

Did anybody measure how much CPU overhead RAID-Z and RAID-Z2 parity
computation induces, both for writes and for reads (assuming a data disk
is broken)? This data would be useful when arguing for a "software RAID"
scheme in front of hardware-RAID addicted customers.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Setting up for zfsboot

2007-04-05 Thread Robert Milkowski
Hello Adam,

Wednesday, April 4, 2007, 11:41:58 PM, you wrote:

AL> On Wed, Apr 04, 2007 at 11:04:06PM +0200, Robert Milkowski wrote:
>> If I stop all activity to x4500 with a pool made of several raidz2 and
>> then I issue spare attach I get really poor performance (1-2MB/s) on a
>> pool with lot of relatively small files.

AL> Does that mean the spare is resilvering when you collect the performance
AL> data? I think a fair test would be to compare the performance of a fully
AL> functional RAID-Z stripe against a one with a missing (absent) device.

Sorry, I wasn't clear.
I'm not talking about performance while spare is resilvering.
I'm talking about resilver performance itself while all other IOs are
absent. Resilver itself is slow (lot of files) with raidz2 here.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Setting up for zfsboot

2007-04-05 Thread Robert Milkowski
Hello Matthew,

Thursday, April 5, 2007, 1:08:25 AM, you wrote:

MA> Lori Alt wrote:
>> 
>>> Can write-cache not be turned on manually as the user is sure that it is
>>> only ZFS that is using the entire disk?
>>>   
>>>
>>> yes it can be turned on. But I don't know if ZFS would then know about it.
>>>
>>> I'd still feel more comfortably with it being turned off unless ZFS itself
>>> does it.
>>>
>>> But maybe someone from the ZFS team can clarify this.
>> I think that it's true that ZFS would not know about the
>> write cache and thus you wouldn't get the benefit of it.

MA> Actually, all that matters is that the write cache is on -- doesn't 
MA> matter whether ZFS turned it on or you did it manually.  (However, make
MA> sure that the write cache doesn't turn itself back off when you reboot /
MA> lose power...)

SCSI write cache flush commands will be issued regardless if zfs has a
whole disk or only a slice then, right?

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Matthew Ahrens

Lori Alt wrote:



Can write-cache not be turned on manually as the user is sure that it is
only ZFS that is using the entire disk?
  


yes it can be turned on. But I don't know if ZFS would then know about it.

I'd still feel more comfortably with it being turned off unless ZFS itself
does it.

But maybe someone from the ZFS team can clarify this.

I think that it's true that ZFS would not know about the
write cache and thus you wouldn't get the benefit of it.


Actually, all that matters is that the write cache is on -- doesn't 
matter whether ZFS turned it on or you did it manually.  (However, make 
sure that the write cache doesn't turn itself back off when you reboot / 
lose power...)


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Adam Leventhal
On Wed, Apr 04, 2007 at 11:04:06PM +0200, Robert Milkowski wrote:
> If I stop all activity to x4500 with a pool made of several raidz2 and
> then I issue spare attach I get really poor performance (1-2MB/s) on a
> pool with lot of relatively small files.

Does that mean the spare is resilvering when you collect the performance
data? I think a fair test would be to compare the performance of a fully
functional RAID-Z stripe against a one with a missing (absent) device.

Adam

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Robert Milkowski
Hello Adam,

Wednesday, April 4, 2007, 7:08:07 PM, you wrote:

AL> On Wed, Apr 04, 2007 at 03:34:13PM +0200, Constantin Gonzalez wrote:
>> - RAID-Z is _very_ slow when one disk is broken.

AL> Do you have data on this? The reconstruction should be relatively cheap
AL> especially when compared with the initial disk access.

If I stop all activity to x4500 with a pool made of several raidz2 and
then I issue spare attach I get really poor performance (1-2MB/s) on a
pool with lot of relatively small files.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Lori Alt



Can write-cache not be turned on manually as the user is sure that it is
only ZFS that is using the entire disk?
 



yes it can be turned on. But I don't know if ZFS would then know about it.

I'd still feel more comfortably with it being turned off unless ZFS itself
does it.

But maybe someone from the ZFS team can clarify this.


I think that it's true that ZFS would not know about the
write cache and thus you wouldn't get the benefit of it.

At some point, we'd like to implement code that recognizes
the zfs "owns" the entire disk even though the disk has multiple
slices, and turn on write caching anyway.  I haven't done
much looking into this though.

Some further comment on the proposed configuration (root
mirrored across all four disks, the rest of the each disk
going into a  RAIDZ pool): 


1. I suggest  you make your root pool big enough to hold several
   boot environments so that you can try out clone-and-upgrade
   tricks like this:

   http://blogs.sun.com/timf/entry/an_easy_way_to_manage

2.  So if root is  mirrored across all four disks, that  means that swapping
will take place to all four disks.  I'm wondering if that's a problem,
or if not a problem, maybe not optimal.

Lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Nicolas Williams
On Wed, Apr 04, 2007 at 10:08:07AM -0700, Adam Leventhal wrote:
> On Wed, Apr 04, 2007 at 03:34:13PM +0200, Constantin Gonzalez wrote:
> > - RAID-Z is _very_ slow when one disk is broken.
> 
> Do you have data on this? The reconstruction should be relatively cheap
> especially when compared with the initial disk access.

RAID-Z has to be slower when there is lots of bitrot, but it shouldn't
be slower when a disk has read errors or is gone.  Or are we talking
about write performance (does RAID-Z wait too long for a disk that won't
respond?)?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Eric Schrock
On Wed, Apr 04, 2007 at 10:08:07AM -0700, Adam Leventhal wrote:
> On Wed, Apr 04, 2007 at 03:34:13PM +0200, Constantin Gonzalez wrote:
> > - RAID-Z is _very_ slow when one disk is broken.
> 
> Do you have data on this? The reconstruction should be relatively cheap
> especially when compared with the initial disk access.
> 

Also, what is your definition of "broken"?  Does this mean the device
appears as FAULTED in the pool status, or that the drive is present and
not responding?  If it's the latter, this will be fixed by my upcoming
FMA work.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Adam Leventhal
On Wed, Apr 04, 2007 at 03:34:13PM +0200, Constantin Gonzalez wrote:
> - RAID-Z is _very_ slow when one disk is broken.

Do you have data on this? The reconstruction should be relatively cheap
especially when compared with the initial disk access.

Adam

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Robert Milkowski
Hello Constantin,

Wednesday, April 4, 2007, 3:34:13 PM, you wrote:


CG> - RAID-Z is slow when writing, you basically get only one disk's bandwidth.
CG>   (Yes, with variable block sizes this might be slightly better...)

No, it's not.
It's actually very fast for writing, in many cases it would be faster
than raid-10 (both made of 4 disks).

Now doing random reads is slow...



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Constantin Gonzalez
Hi,

Manoj Joseph wrote:

> Can write-cache not be turned on manually as the user is sure that it is
> only ZFS that is using the entire disk?

yes it can be turned on. But I don't know if ZFS would then know about it.

I'd still feel more comfortably with it being turned off unless ZFS itself
does it.

But maybe someone from the ZFS team can clarify this.

Cheers,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Manoj Joseph

Constantin Gonzalez wrote:


Do I still have the advantages of having the whole disk
'owned' by zfs, even though it's split into two parts?


I'm pretty sure that this is not the case:

- ZFS has no guarantee that someone will do something else with that other
  partition, so it can't assume the right to turn on disk cache for the whole
  disk.


Can write-cache not be turned on manually as the user is sure that it is 
only ZFS that is using the entire disk?


-Manoj
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Constantin Gonzalez
Hi,

> Now that zfsboot is becoming available, I'm wondering how to put it to
> use. Imagine a system with 4 identical disks. Of course I'd like to use

you lucky one :).

> raidz, but zfsboot doesn't do raidz. What if I were to partition the
> drives, such that I have 4 small partitions that make up a zfsboot
> partition (4 way mirror), and the remainder of each drive becomes part
> of a raidz?

Sounds good. Performance will suffer a bit, as ZFS thinks it has two pools
with 4 spindels each, but it should still perform better than the same on
a UFS basis.

You may also want to have two 2-way mirrors and keep the second for other
purposes such as a scratch space for zfs migration or as spare disks for
other stuff.

> Do I still have the advantages of having the whole disk
> 'owned' by zfs, even though it's split into two parts?

I'm pretty sure that this is not the case:

- ZFS has no guarantee that someone will do something else with that other
  partition, so it can't assume the right to turn on disk cache for the whole
  disk.

- Yes, it could be smart and realize that it does have the whole disk, only
  split up across two pools, but then I assume that this is not your typical
  enterprise class configuration and so it probably didn't get implemented
  that way.

I'd say that not being able to benefit from the disk drive's cache is not
as bad in the face of ZFS' other advantages, so you can probably live with
that.

> Swap would probably have to go on a zvol - would that be best placed on
> the n-way mirror, or on the raidz?

I'd place it onto the mirror for performance reasons. Also, it feels cleaner
to have all your OS stuff on one pool and all your user/app/data stuff on
another. This is also recommended by the ZFS Best Practices Wiki on
www.solarisinternals.com.

Now back to the 4 disk RAID-Z: Does it have to be RAID-Z? Maybe you might want
to reconsider using 2 2-way mirrors:

- RAID-Z is slow when writing, you basically get only one disk's bandwidth.
  (Yes, with variable block sizes this might be slightly better...)

- RAID-Z is _very_ slow when one disk is broken.

- Using mirrors is more convenient for growing the pool: You run out of space,
  you add two disks, and get better performance too. No need to buy 4 extra
  disks for another RAID-Z set.

- When using disks, you need to consider availability, performance and space.
  Of all the three, space is the cheapest. Therefore it's best to sacrifice
  space and you'll get better availability and better performance.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Paul Boven
Hi everyone,

Now that zfsboot is becoming available, I'm wondering how to put it to
use. Imagine a system with 4 identical disks. Of course I'd like to use
raidz, but zfsboot doesn't do raidz. What if I were to partition the
drives, such that I have 4 small partitions that make up a zfsboot
partition (4 way mirror), and the remainder of each drive becomes part
of a raidz? Do I still have the advantages of having the whole disk
'owned' by zfs, even though it's split into two parts?
Swap would probably have to go on a zvol - would that be best placed on
the n-way mirror, or on the raidz?

Regards, Paul Boven.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss