[zfs-discuss] question about modification of dd_parent_obj during COW

2007-09-12 Thread fisherman
>From the online document of ZFS On-Disk Specification, I found there is a 
>field named  "dd_parent_obj" in dsl_dir_phys_t. will this field be modified or 
>kept unchanged during snapshot COW?

   For example, consider a ZFS filesytem mounted on /myzfs, which contains 2 
subdirs(A and B). If we do the following steps:
  1) create a snapshot named /[EMAIL PROTECTED]
  2) rename /myzfs/A to /myzfs/A1.

  I think the directory objects of /myzfs/A and /myzfs will be COW'd during the 
rename operation. 

  Now we can access dirctory B by specifing either "/myzfs/B "  or  "/[EMAIL 
PROTECTED]/B".
 
  The problems are:
   1)  What is the parent of B?  will "dd_parent_obj" of B be changed during 
the COW of dirctory object /myzfs?
   2)  If we remove /myzfs/B thereafter, will "dd_parent_obj"  of of /[EMAIL 
PROTECTED]/B be changed?

  Thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Root and upgrades

2007-09-12 Thread Mark De Reeper

How long before we can upgrade a ZFS based root fs? Not looking for a Live 
Upgrade feature, just to be able to boot off a newer release DVD and upgrade in 
place.

Currently using a build 62 based system, would like to start taking a look at 
some of the features showing up in newer builds.


Thanks

Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-12 Thread Marc Bevand
Pawel Jakub Dawidek  FreeBSD.org> writes:
> 
> This is how RAIDZ fills the disks (follow the numbers):
> 
>   Disk0   Disk1   Disk2   Disk3
> 
>   D0  D1  D2  P3
>   D4  D5  D6  P7
>   D8  D9  D10 P11
>   D12 D13 D14 P15
>   D16 D17 D18 P19
>   D20 D21 D22 P23
> 
> D is data, P is parity.

This layout assumes of course that large stripes have been written to
the RAIDZ vdev. As you know, the stripe width is dynamic, so it is
possible for a single logical block to span only 2 disks (for those who
don't know what I am talking about, see the "red" block occupying LBAs
D3 and E3 on page 13 of these ZFS slides [1]).

To read this logical block (and validate its checksum), only D_0 needs 
to be read (LBA E3). So in this very specific case, a RAIDZ read
operation is as cheap as a RAID5 read operation. The existence of these
small stripes could explain why RAIDZ doesn't perform as bad as RAID5
in Pawel's benchmark...

[1] http://br.sun.com/sunnews/events/2007/techdaysbrazil/pdf/eric_zfs.pdf

-marc


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-12 Thread Pawel Jakub Dawidek
On Wed, Sep 12, 2007 at 07:39:56PM -0500, Al Hopper wrote:
> >This is how RAIDZ fills the disks (follow the numbers):
> >
> > Disk0   Disk1   Disk2   Disk3
> >
> > D0  D1  D2  P3
> > D4  D5  D6  P7
> > D8  D9  D10 P11
> > D12 D13 D14 P15
> > D16 D17 D18 P19
> > D20 D21 D22 P23
> >
> >D is data, P is parity.
> >
> >And RAID5 does this:
> >
> > Disk0   Disk1   Disk2   Disk3
> >
> > D0  D3  D6  P0,3,6
> > D1  D4  D7  P1,4,7
> > D2  D5  D8  P2,5,8
> > D9  D12 D15 P9,12,15
> > D10 D13 D16 P10,13,16
> > D11 D14 D17 P11,14,17
> 
> Surely the above is not accurate?  You've showing the parity data only 
> being written to disk3.  In RAID5 the parity is distributed across all 
> disks in the RAID5 set.  What is illustrated above is RAID3.

It's actually RAID4 (RAID3 would look the same as RAIDZ, but there are
differences in practice), but my point wasn't how the parity is
distributed:) Ok, RAID5 once again:

Disk0   Disk1   Disk2   Disk3

D0  D3  D6  P0,3,6
D1  D4  D7  P1,4,7
D2  D5  D8  P2,5,8
D9  D12 P9,12,15D15
D10 D13 P10,13,16   D16
D11 D14 P11,14,17   D17

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpjnuDDD5adp.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-12 Thread Al Hopper
On Thu, 13 Sep 2007, Pawel Jakub Dawidek wrote:

> On Wed, Sep 12, 2007 at 11:20:52PM +0100, Peter Tribble wrote:
>> On 9/10/07, Pawel Jakub Dawidek <[EMAIL PROTECTED]> wrote:
>>> Hi.
>>>
>>> I've a prototype RAID5 implementation for ZFS. It only works in
>>> non-degraded state for now. The idea is to compare RAIDZ vs. RAID5
>>> performance, as I suspected that RAIDZ, because of full-stripe
>>> operations, doesn't work well for random reads issued by many processes
>>> in parallel.
>>>
>>> There is of course write-hole problem, which can be mitigated by running
>>> scrub after a power failure or system crash.
>>
>> If I read your suggestion correctly, your implementation is much
>> more like traditional raid-5, with a read-modify-write cycle?
>>
>> My understanding of the raid-z performance issue is that it requires
>> full-stripe reads in order to validate the checksum. [...]
>
> No, checksum is independent thing, and this is not the reason why RAIDZ
> needs to do full-stripe reads - in non-degraded mode RAIDZ doesn't read
> parity.
>
> This is how RAIDZ fills the disks (follow the numbers):
>
>   Disk0   Disk1   Disk2   Disk3
>
>   D0  D1  D2  P3
>   D4  D5  D6  P7
>   D8  D9  D10 P11
>   D12 D13 D14 P15
>   D16 D17 D18 P19
>   D20 D21 D22 P23
>
> D is data, P is parity.
>
> And RAID5 does this:
>
>   Disk0   Disk1   Disk2   Disk3
>
>   D0  D3  D6  P0,3,6
>   D1  D4  D7  P1,4,7
>   D2  D5  D8  P2,5,8
>   D9  D12 D15 P9,12,15
>   D10 D13 D16 P10,13,16
>   D11 D14 D17 P11,14,17

Surely the above is not accurate?  You've showing the parity data only 
being written to disk3.  In RAID5 the parity is distributed across all 
disks in the RAID5 set.  What is illustrated above is RAID3.

> As you can see even small block is stored on all disks in RAIDZ, where
> on RAID5 small block can be stored on one disk only.
>
> --

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-12 Thread Nicolas Williams
On Thu, Sep 13, 2007 at 12:56:44AM +0200, Pawel Jakub Dawidek wrote:
> On Wed, Sep 12, 2007 at 11:20:52PM +0100, Peter Tribble wrote:
> > My understanding of the raid-z performance issue is that it requires
> > full-stripe reads in order to validate the checksum. [...]
> 
> No, checksum is independent thing, and this is not the reason why RAIDZ
> needs to do full-stripe reads - in non-degraded mode RAIDZ doesn't read
> parity.

I doubt reading the parity could cost all that much (particularly if
there's enough I/O capacity).  It's reading the full 128KB that you have
to read, if a file's record size is 128KB, in order to satisfy a 2KB
read.

And ZFS has to read full blocks in order to verify the checksum.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-12 Thread Pawel Jakub Dawidek
On Wed, Sep 12, 2007 at 11:20:52PM +0100, Peter Tribble wrote:
> On 9/10/07, Pawel Jakub Dawidek <[EMAIL PROTECTED]> wrote:
> > Hi.
> >
> > I've a prototype RAID5 implementation for ZFS. It only works in
> > non-degraded state for now. The idea is to compare RAIDZ vs. RAID5
> > performance, as I suspected that RAIDZ, because of full-stripe
> > operations, doesn't work well for random reads issued by many processes
> > in parallel.
> >
> > There is of course write-hole problem, which can be mitigated by running
> > scrub after a power failure or system crash.
> 
> If I read your suggestion correctly, your implementation is much
> more like traditional raid-5, with a read-modify-write cycle?
> 
> My understanding of the raid-z performance issue is that it requires
> full-stripe reads in order to validate the checksum. [...]

No, checksum is independent thing, and this is not the reason why RAIDZ
needs to do full-stripe reads - in non-degraded mode RAIDZ doesn't read
parity.

This is how RAIDZ fills the disks (follow the numbers):

Disk0   Disk1   Disk2   Disk3

D0  D1  D2  P3
D4  D5  D6  P7
D8  D9  D10 P11
D12 D13 D14 P15
D16 D17 D18 P19
D20 D21 D22 P23

D is data, P is parity.

And RAID5 does this:

Disk0   Disk1   Disk2   Disk3

D0  D3  D6  P0,3,6
D1  D4  D7  P1,4,7
D2  D5  D8  P2,5,8
D9  D12 D15 P9,12,15
D10 D13 D16 P10,13,16
D11 D14 D17 P11,14,17

As you can see even small block is stored on all disks in RAIDZ, where
on RAID5 small block can be stored on one disk only.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp5p7Tq85M8q.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-12 Thread Nicolas Williams
On Wed, Sep 12, 2007 at 02:24:56PM -0700, Adam Leventhal wrote:
> I'm a bit surprised by these results. Assuming relatively large blocks
> written, RAID-Z and RAID-5 should be laid out on disk very similarly
> resulting in similar read performance.
> 
> Did you compare the I/O characteristic of both? Was the bottleneck in
> the software or the hardware?

Note that Pawel wrote:

Pawel> I was using 8 processes, I/O size was a random value between 2kB
Pawel> and 32kB (with 2kB step), offset was a random value between 0 and
Pawel> 10GB (also with 2kB step).

If the dataset's record size was the default (Pawel didn't say, right?)
then the reason for the lousy read performance is clear: RAID-Z has to
read full blocks to verify the checksum, whereas RAID-5 need only read
as much as is requested (assuming aligned reads, which Pawel did seem to
indicate: "2KB steps").

Peter Tribble pointed out much the same thing already.

The crucial requirement is to match the dataset record size to the I/O
size done by the application.  If the app writes in bigger chunks than
it reads and you want to optimize for write performance then set the
record size to match the write size, else set the record size to match
the read size.

Where the dataset record size is not matched to the application's I/O
size I guess we could say that RAID-Z trades off the RAID-5 write hole
for a read-hole.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-12 Thread Pawel Jakub Dawidek
On Wed, Sep 12, 2007 at 02:24:56PM -0700, Adam Leventhal wrote:
> On Mon, Sep 10, 2007 at 12:41:24PM +0200, Pawel Jakub Dawidek wrote:
> > And here are the results:
> > 
> > RAIDZ:
> > 
> > Number of READ requests: 4.
> > Number of WRITE requests: 0.
> > Number of bytes to transmit: 695678976.
> > Number of processes: 8.
> > Bytes per second: 1305213
> > Requests per second: 75
> > 
> > RAID5:
> > 
> > Number of READ requests: 4.
> > Number of WRITE requests: 0.
> > Number of bytes to transmit: 695678976.
> > Number of processes: 8.
> > Bytes per second: 2749719
> > Requests per second: 158
> 
> I'm a bit surprised by these results. Assuming relatively large blocks
> written, RAID-Z and RAID-5 should be laid out on disk very similarly
> resulting in similar read performance.

Hmm, no. The data was organized very differenly on disks. The smallest
block size used was 2kB, to ensure each block is written to all disks in
RAIDZ configuration. In RAID5 configuration however, 128kB stripe size
was used, which means each block was stored on one disk only.

Now when you read the data, RAIDZ need to read all disks for each block,
and RAID5 needs to read only one disk for each block.

> Did you compare the I/O characteristic of both? Was the bottleneck in
> the software or the hardware?

The bottleneck were definiatelly disks. CPU was like 96% idle.

To be honest I expected, just like Jeff, much bigger win for RAID5 case.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpaN8zKnXp9n.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-12 Thread Peter Tribble
On 9/10/07, Pawel Jakub Dawidek <[EMAIL PROTECTED]> wrote:
> Hi.
>
> I've a prototype RAID5 implementation for ZFS. It only works in
> non-degraded state for now. The idea is to compare RAIDZ vs. RAID5
> performance, as I suspected that RAIDZ, because of full-stripe
> operations, doesn't work well for random reads issued by many processes
> in parallel.
>
> There is of course write-hole problem, which can be mitigated by running
> scrub after a power failure or system crash.

If I read your suggestion correctly, your implementation is much
more like traditional raid-5, with a read-modify-write cycle?

My understanding of the raid-z performance issue is that it requires
full-stripe reads in order to validate the checksum. So to get better
random read performance, why not simply have a separate checksum
for each chunk in the stripe? You still eliminate the raid-5 write hole
(albeit at some loss in performance because you have to compute
and write extra checksums) but you allow multiple independent reads.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-12 Thread Adam Leventhal
On Mon, Sep 10, 2007 at 12:41:24PM +0200, Pawel Jakub Dawidek wrote:
> And here are the results:
> 
> RAIDZ:
> 
>   Number of READ requests: 4.
>   Number of WRITE requests: 0.
>   Number of bytes to transmit: 695678976.
>   Number of processes: 8.
>   Bytes per second: 1305213
>   Requests per second: 75
> 
> RAID5:
> 
>   Number of READ requests: 4.
>   Number of WRITE requests: 0.
>   Number of bytes to transmit: 695678976.
>   Number of processes: 8.
>   Bytes per second: 2749719
>   Requests per second: 158

I'm a bit surprised by these results. Assuming relatively large blocks
written, RAID-Z and RAID-5 should be laid out on disk very similarly
resulting in similar read performance.

Did you compare the I/O characteristic of both? Was the bottleneck in
the software or the hardware?

Very interesting experiment...

Adam

-- 
Adam Leventhal, FishWorkshttp://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Again ZFS with expanding LUNs!

2007-09-12 Thread Victor Engle
I like option #1 because it is simple and quick. It seems unlikely
that this will lead to an excessive number of luns in the pool in most
cases unless you start with a large number of very small luns. If you
begin with 5 100GB luns and over time add 5 more it still seems like a
reasonable and manageable pool with twice the original capacity.

And considering the array can likely support hundreds and perhaps
thousands of luns then it really isn't an issue on the array side
either.

Regards,
Vic

On 9/12/07, Bill Korb <[EMAIL PROTECTED]> wrote:
> I found this discussion just today as I recently set up my first S10 machine 
> with ZFS. We use a NetApp Filer via multipathed FC HBAs, and I wanted to know 
> what my options were in regards to growing a ZFS filesystem.
>
> After looking at this thread, it looks like there is currently no way to grow 
> an existing LUN on our NetApp and then tell ZFS to expand to fill the new 
> space. This may be coming down the road at some point, but I would like to be 
> able to do this now.
>
> At this point, I believe I have two options:
>
> 1. Add a second LUN and simply do a "zpool add" to add the new space to the 
> existing pool.
>
> 2. Create a new LUN that is the size I would like my pool to be, then use 
> "zpool replace oldLUNdev newLUNdev" to ask ZFS to resilver my data to the new 
> LUN then detach the old one.
>
> The advantage of the first option is that it happens very quickly, but it 
> could get kind of messy if you grow the ZFS pool on multiple occasions. I've 
> read that some SANs are also limited as to how many LUNs can be created (some 
> are limitations of the SAN itself whereas I believe that some others impose a 
> limit as part of the SAN license). That would also make the first approach 
> less attractive.
>
> The advantage of the second approach is that all of the space would be 
> contained in a single LUN. The disadvantages are that this would involve 
> copying all of the data from the old LUN to the new one and also this means 
> that you need to have enough free space on your SAN to create this new, 
> larger LUN.
>
> Is there a best practice regarding this? I'm leaning towards option #2 so as 
> to keep the number of LUNs I have to manage at a minimum, but #1 seems like a 
> reasonable alternative, too. Or perhaps there's an option #3 that I haven't 
> thought of?
>
> Thanks,
> Bill
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Again ZFS with expanding LUNs!

2007-09-12 Thread Bill Korb
I found this discussion just today as I recently set up my first S10 machine 
with ZFS. We use a NetApp Filer via multipathed FC HBAs, and I wanted to know 
what my options were in regards to growing a ZFS filesystem.

After looking at this thread, it looks like there is currently no way to grow 
an existing LUN on our NetApp and then tell ZFS to expand to fill the new 
space. This may be coming down the road at some point, but I would like to be 
able to do this now.

At this point, I believe I have two options:

1. Add a second LUN and simply do a "zpool add" to add the new space to the 
existing pool.

2. Create a new LUN that is the size I would like my pool to be, then use 
"zpool replace oldLUNdev newLUNdev" to ask ZFS to resilver my data to the new 
LUN then detach the old one.

The advantage of the first option is that it happens very quickly, but it could 
get kind of messy if you grow the ZFS pool on multiple occasions. I've read 
that some SANs are also limited as to how many LUNs can be created (some are 
limitations of the SAN itself whereas I believe that some others impose a limit 
as part of the SAN license). That would also make the first approach less 
attractive.

The advantage of the second approach is that all of the space would be 
contained in a single LUN. The disadvantages are that this would involve 
copying all of the data from the old LUN to the new one and also this means 
that you need to have enough free space on your SAN to create this new, larger 
LUN.

Is there a best practice regarding this? I'm leaning towards option #2 so as to 
keep the number of LUNs I have to manage at a minimum, but #1 seems like a 
reasonable alternative, too. Or perhaps there's an option #3 that I haven't 
thought of?

Thanks,
Bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression=on and zpool attach

2007-09-12 Thread Mike DeMarco
> On 9/12/07, Mike DeMarco <[EMAIL PROTECTED]> wrote:
> 
> > Striping several disks together with a stripe width
> that is tuned for your data
> > model is how you could get your performance up.
> Stripping has been left out
> > of the ZFS model for some reason. Where it is true
> that RAIDZ will stripe
> > the data across a given drive set it does not give
> you the option to tune the
> > stripe width. Do to the write performance problems
> of RAIDZ you may not
> > get a performance boost from it stripping if your
> write to read ratio is too
> > high since the driver has to calculate parity for
> each write.
> 
> I am not sure why you think striping has been left
> out of the ZFS
> model. If you create a ZFS pool without the "raidz"
> or "mirror"
> keywords, the pool will be striped. Also, the
> "recordsize" tunable can
> be useful for matching up application I/O to physical
> I/O.
> 
> Thanks,
> - Ryan
> -- 
> UNIX Administrator
> http://prefetch.net
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss

Oh... How right you are. I dug into the PDFs and read up on Dynamic striping. 
My bad.
ZFS rocks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression=on and zpool attach

2007-09-12 Thread Richard Elling
Mike DeMarco wrote:
> IO bottle necks are usually caused by a slow disk or one that has heavy 
> workloads reading many small files. Two factors that need to be considered 
> are Head seek latency and spin latency. Head seek latency is the amount 
> of time it takes for the head to move to the track that is to be written, 
> this is a eternity for the system (usually around 4 or 5 milliseconds).

For most modern disks, writes are cached in a buffer.  But reads still
take the latency hit.  The trick is to ensure that writes are committed
to media, which ZFS will do for you.

> Spin latency is the amount of time it takes for the spindle to spin the
> track to be read or written over the head. Ideally you only want to pay the 
> latency penalty once. If you have large reads and writes going to the disk 
> then compression may help a little but if you have many small reads or 
> writes it will do nothing more than to burden your CPU with a no gain 
> amount of work to do since your are going to be paying Mr latency for each 
> read or write.
> 
> Striping several disks together with a stripe width that is tuned for your 
> data model is how you could get your performance up. Stripping has been 
> left out of the ZFS model for some reason. Where it is true that RAIDZ will
> stripe the data across a given drive set it does not give you the option to 
> tune the stripe width. 

It is called "dynamic striping" and a write is not compelled to be spread
across all vdevs.  This opens up an interesting rat hole conversation about
whether stochastic spreading is always better than an efficient, larger
block write.  Our grandchildren might still be arguing this when they enter
the retirement home.  In general, for ZFS, the top-level dynamic stripe
interlace is 1 MByte which seems to fit well with the 128kByte block size.
YMMV.

> Do to the write performance problems of RAIDZ you may not get a performance 
> boost from it stripping if your write to read ratio is too high since the 
> driver has to calculate parity for each write.

Write performance for raidz is generally quite good, better than most other
RAID-5 implementations which are bit by the read-modify-write cycle (added
latency).  raidz can pay for this optimization when doing small, random
reads, TANSTAAFL.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Marion Hakanson
> . . .
>> Use JBODs. Or tell the cache controllers to ignore
>> the flushing requests.
[EMAIL PROTECTED] said:
> Unfortunately HP EVA can't do it. About the 9900V, it is really fast (64GB
> cache helps a lot) end reliable. 100% uptime in years. We'll never touch it
> to solve a ZFS problem. 

On our low-end HDS array (9520V), turning on "Synchronize Cache Invalid Mode"
did the trick for ZFS purposes (Solaris-10U3).  They've since added a Solaris
kernel tunable in /etc/system:
set zfs:zfs_nocacheflush = 1

This has the unfortunate side-effect of disabling it on all disks
for the whole system, though.

ZFS is getting more mature all the time

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] can we monitor a ZFS pool with SunMC 3.6.1 ?

2007-09-12 Thread Juan Berlie




Hello Everyone

can we monitor a ZFS  pool with SunMC 3.6.1 ?

is this a base function ?

if not will SunMC 4.0 solve this ?

Juan

-- 











  
  
  
  
  Juan Berlie 
Engagement Architect/Architecte de Systèmes
  Sun Microsystems, Inc.
1800 McGill College, Suite 800
Montréal, Québec H3A 3J6 CA
Phone x25349/514-285-8349
Mobile 1 514 781 1443
Fax 1 514 285-1983
Email [EMAIL PROTECTED]
  


  
  
  

  







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
> It seems that maybe there is too large a code path
> leading to panics --
> maybe a side effect of ZFS being "new" (compared to
> other filesystems).  I
> would hope that as these panic issues are coming up
> that the code path
> leading to the panic is evaluated for a specific fix
> or behavior code path.
> Sometimes it does make sense to panic (if there
> _will_ be data damage if
> you continue).  Other times not.
 
I think the same about panics.  So, IMHO, ZFS should not be called "stable".
But you know ... marketing ...  ;)

> I can understand where you are coming from as
>  far as the need for
> ptime and loss of money on that app server. Two years
> of testing for the
> app, Sunfire servers for N+1 because the app can't be
> clustered and you
> have chosen to run a filesystem that has just been
> made public? 

What? That server is running and will be running on UFS for many years!
Upgrading, patching, cleaning ... even touching it is strictly prohibited :)
We upgraded to S10 because of DTrace (helped us a lot) and during the
test phase we evaluated also ZFS.
Now we only use ZFS for our central backup servers (for many applications, 
systems, customers, ...)
We also manage a lot of other systems and always try to migrate customers to 
Solaris because of stability, resource control, DTrace ..  but found ZFS 
disappointing at today (probably tomorrow it will be THE filesystem).

Gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression=on and zpool attach

2007-09-12 Thread Matty
On 9/12/07, Mike DeMarco <[EMAIL PROTECTED]> wrote:

> Striping several disks together with a stripe width that is tuned for your 
> data
> model is how you could get your performance up. Stripping has been left out
> of the ZFS model for some reason. Where it is true that RAIDZ will stripe
> the data across a given drive set it does not give you the option to tune the
> stripe width. Do to the write performance problems of RAIDZ you may not
> get a performance boost from it stripping if your write to read ratio is too
> high since the driver has to calculate parity for each write.

I am not sure why you think striping has been left out of the ZFS
model. If you create a ZFS pool without the "raidz" or "mirror"
keywords, the pool will be striped. Also, the "recordsize" tunable can
be useful for matching up application I/O to physical I/O.

Thanks,
- Ryan
-- 
UNIX Administrator
http://prefetch.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Wade . Stuart

[EMAIL PROTECTED] wrote on 09/12/2007 08:04:33 AM:

> > Gino wrote:
> > > The real problem is that ZFS should stop to force
> > kernel panics.
> > >
> > I found these panics very annoying, too. And even
> > more that the zpool
> > was faulted afterwards. But my problem is that when
> > someone asks me what
> > ZFS should do instead, I have no idea.
>
> well, what about just hang processes waiting for I/O on that zpool?
> Could be possible?

It seems that maybe there is too large a code path leading to panics --
maybe a side effect of ZFS being "new" (compared to other filesystems).  I
would hope that as these panic issues are coming up that the code path
leading to the panic is evaluated for a specific fix or behavior code path.
Sometimes it does make sense to panic (if there _will_ be data damage if
you continue).  Other times not.


>
> > Seagate FibreChannel drives, Cheetah 15k, ST3146855FC
> > for the databases.
>
> What king of JBOD for that drives? Just to know ...
> We found Xyratex's to be good products.
>
> > That depends on the indivdual requirements of each
> > service. Basically,
> > we change to recordsize according to the transaction
> > size of the
> > databases and, on the filers, the performance results
> > were best when the
> > recordsize was a bit lower than the average file size
> > (average file size
> > is 12K, so I set a recordsize of 8K). I set a vdev
> > cache size of 8K and
> > our databases worked best with a vq_max_pending of
> > 32. ZFSv3 was used,
> > that's the version which is shipped with Solaris 10
> > 11/06.
>
> thanks for sharing.
>
> > Yes, but why doesn't your application fail over to a
> > standby?
>
> It is a little complex to explain. Basically that apps are making a
> lot of "number cruncing" on some a very big data in ram. Failover
> would be starting again from the beginning, with all the customers
> waiting for hours (and loosing money).
> We are working on a new app, capable to work with a couple of nodes
> but it will takes some months to be in beta, then 2 years of testing ...
>
> > a system reboot can be a single point of failure,
> > what about the network
> > infrastructure? Hardware errors? Or power outages?
>
> We use Sunfire for that reason. We had 2 cpu failures and no service
> interruption, the same for 1 dimm module (we have been lucky with
> cpu failures ;)).
> HDS raid arrays are excellent about availability. Lots of fc links,
> network links ..
> All this is in a fully redundant datacenter .. and, sure, we have a
> stand by system on a disaster recovery site (hope to never use it!).

  I can understand where you are coming from as far as the need for
uptime and loss of money on that app server. Two years of testing for the
app, Sunfire servers for N+1 because the app can't be clustered and you
have chosen to run a filesystem that has just been made public? ZFS may be
great and all, but this stinks of running a .0 version on the production
machine.  VXFS+snap has well known and documented behaviors tested for
years on production machines. Why did you even choose to run ZFS on that
specific box?

Do not get me wrong,  I really like many things about ZFS -- it is ground
breaking.  I still do not get why it would be chosen for a server in that
position until it has better real world production testing and modeling.
You have taken all of the buildup you have done and introduced an unknown
to the mix.


>
> > I'm definitely NOT some kind of know-it-all, don't
> > misunderstand me.
> > Your statement just let my alarm bells ring and
> > that's why I'm asking.
>
> Don't worry Ralf. Any suggestion/opinion/critic is welcome.
> It's a pleasure to exchange our experience
>
> Gino
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
> Gino wrote:
> > The real problem is that ZFS should stop to force
> kernel panics.
> >   
> I found these panics very annoying, too. And even
> more that the zpool 
> was faulted afterwards. But my problem is that when
> someone asks me what 
> ZFS should do instead, I have no idea.

well, what about just hang processes waiting for I/O on that zpool?
Could be possible?

> Seagate FibreChannel drives, Cheetah 15k, ST3146855FC
> for the databases.

What king of JBOD for that drives? Just to know ...
We found Xyratex's to be good products.

> That depends on the indivdual requirements of each
> service. Basically, 
> we change to recordsize according to the transaction
> size of the 
> databases and, on the filers, the performance results
> were best when the 
> recordsize was a bit lower than the average file size
> (average file size 
> is 12K, so I set a recordsize of 8K). I set a vdev
> cache size of 8K and 
> our databases worked best with a vq_max_pending of
> 32. ZFSv3 was used, 
> that's the version which is shipped with Solaris 10
> 11/06.

thanks for sharing.

> Yes, but why doesn't your application fail over to a
> standby? 

It is a little complex to explain. Basically that apps are making a lot of 
"number cruncing" on some a very big data in ram. Failover would be starting 
again from the beginning, with all the customers waiting for hours (and loosing 
money).
We are working on a new app, capable to work with a couple of nodes but it will 
takes some months to be in beta, then 2 years of testing ...

> a system reboot can be a single point of failure,
> what about the network 
> infrastructure? Hardware errors? Or power outages?

We use Sunfire for that reason. We had 2 cpu failures and no service 
interruption, the same for 1 dimm module (we have been lucky with cpu failures 
;)).
HDS raid arrays are excellent about availability. Lots of fc links, network 
links ..
All this is in a fully redundant datacenter .. and, sure, we have a stand by 
system on a disaster recovery site (hope to never use it!).

> I'm definitely NOT some kind of know-it-all, don't
> misunderstand me. 
> Your statement just let my alarm bells ring and
> that's why I'm asking.

Don't worry Ralf. Any suggestion/opinion/critic is welcome.
It's a pleasure to exchange our experience

Gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression=on and zpool attach

2007-09-12 Thread Mike DeMarco
> On 11/09/2007, Mike DeMarco <[EMAIL PROTECTED]>
> wrote:
> > > I've got 12Gb or so of db+web in a zone on a ZFS
> > > filesystem on a mirrored zpool.
> > > Noticed during some performance testing today
> that
> > > its i/o bound but
> > > using hardly
> > > any CPU, so I thought turning on compression
> would be
> > > a quick win.
> >
> > If it is io bound won't compression make it worse?
> 
> Well, the CPUs are sat twiddling their thumbs.
> I thought reducing the amount of data going to disk
> might help I/O -
> is that unlikely?

IO bottle necks are usually caused by a slow disk or one that has heavy 
workloads reading many small files. Two factors that need to be considered are 
Head seek latency and spin latency. Head seek latency is the amount of time it 
takes for the head to move to the track that is to be written, this is a 
eternity for the system (usually around 4 or 5 milliseconds). Spin latency is 
the amount of time it takes for the spindle to spin the track to be read or 
written over the head. Ideally you only want to pay the latency penalty once. 
If you have large reads and writes going to the disk then compression may help 
a little but if you have many small reads or writes it will do nothing more 
than to burden your CPU with a no gain amount of work to do since your are 
going to be paying Mr latency for each read or write.

Striping several disks together with a stripe width that is tuned for your data 
model is how you could get your performance up. Stripping has been left out of 
the ZFS model for some reason. Where it is true that RAIDZ will stripe the data 
across a given drive set it does not give you the option to tune the stripe 
width. Do to the write performance problems of RAIDZ you may not get a 
performance boost from it stripping if your write to read ratio is too high 
since the driver has to calculate parity for each write.

> 
> > > benefit of compression
> > > on the blocks
> > > that are copied by the mirror being resilvered?
> >
> > No! Since you are doing a block for block mirror of
> the data, this would not could not compress the data.
> 
> No problem, another job for rsync then :)
> 
> 
> -- 
> Rasputin :: Jack of All Trades - Master of Nuns
> http://number9.hellooperator.net/
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Ralf Ramge
Gino wrote:
> The real problem is that ZFS should stop to force kernel panics.
>
>   
I found these panics very annoying, too. And even more that the zpool 
was faulted afterwards. But my problem is that when someone asks me what 
ZFS should do instead, I have no idea.

>> I have large Sybase database servers and file servers
>> with billions of 
>> inodes running using ZFSv3. They are attached to
>> X4600 boxes running 
>> Solaris 10 U3, 2x 4 GBit/s dual FibreChannel, using
>> dumb and cheap 
>> Infortrend FC JBODs (2 GBit/s) as storage shelves.
>> 
>
> Are you using FATA drives?
>
>   
Seagate FibreChannel drives, Cheetah 15k, ST3146855FC for the databases.

For the NFS filers we use Infortrend FC shelves with SATA inside.

>> During all my 
>> benchmarks (both on the command line and within
>> applications) show that 
>> the FibreChannel is the bottleneck, even with random
>> read. ZFS doesn't 
>> do this out of the box, but a bit of tuning helped a
>> lot.
>> 
>
> You found and other good point.
> I think that with ZFS and JBOD, FC links will be soon the bottleneck.
> What tuning have you done?
>
>   
That depends on the indivdual requirements of each service. Basically, 
we change to recordsize according to the transaction size of the 
databases and, on the filers, the performance results were best when the 
recordsize was a bit lower than the average file size (average file size 
is 12K, so I set a recordsize of 8K). I set a vdev cache size of 8K and 
our databases worked best with a vq_max_pending of 32. ZFSv3 was used, 
that's the version which is shipped with Solaris 10 11/06.

> It is a problem if your apps hangs waiting for you to power down/pull out the 
> drive!
> Almost in a time=money environment :)
>
>   
Yes, but why doesn't your application fail over to a standby? I'm also 
working in a "time is money and failure no option" environment, and I 
doubt I would sleep better if I  were responsible for an application 
under such a service level agreement without full high availability. If 
a system reboot can be a single point of failure, what about the network 
infrastructure? Hardware errors? Or power outages?
I'm definitely NOT some kind of know-it-all, don't misunderstand me. 
Your statement just let my alarm bells ring and that's why I'm asking.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963 
[EMAIL PROTECTED] - http://web.de/

1&1 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, 
Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss 
Aufsichtsratsvorsitzender: Michael Scheeren

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
> > -We had tons of kernel panics because of ZFS.
> > Here a "reboot" must be planned with a couple of
> weeks in advance
> > and done only at saturday night ..
> >   
> Well, I'm sorry, but if your datacenter runs into
> problems when a single 
> server isn't available, you probably have much worse
> problems. ZFS is a 
> file system. It's not a substitute for hardware
> trouble or a misplanned 
> infrastructure. What would you do if you had the fsck
> you mentioned 
> earlier? Or with another file system like UFS, ext3,
> whatever? Boot a 
> system into single user mode and fsck several
> terabytes, after planning 
> it a couple of weeks in advance?

For example we have a couple of apps using 80-290GB RAM. Some thousands users.
We use Solaris+Sparc+High end storage because we can't afford downtimes.
We can deal with a failed file system. A reboot during the day would cost a lot 
of money.
The real problem is that ZFS should stop to force kernel panics.

> > -Our 9900V and HP EVAs works really BAD with ZFS
> because of large cache.
> > (echo zfs_nocacheflush/W 1 | mdb -kw) did not solve
> the problem. Only helped a bit.
> >
> >   
> Use JBODs. Or tell the cache controllers to ignore
> the flushing 
> requests. Should be possible, even the $10k low-cost
> StorageTek arrays 
> support this.

Unfortunately HP EVA can't do it.
About the 9900V, it is really fast (64GB cache helps a lot) end reliable. 100% 
uptime in years.
We'll never touch it to solve a ZFS problem.
We started using JBOD (12x16drive shelfs) with ZFS but speed and reliability 
(today) is not comparable to HDS+UFS.

> > -ZFS performs badly with a lot of small files.
> > (about 20 times slower that UFS with our millions
> file rsync procedures)
> >
> >   
> I have large Sybase database servers and file servers
> with billions of 
> inodes running using ZFSv3. They are attached to
> X4600 boxes running 
> Solaris 10 U3, 2x 4 GBit/s dual FibreChannel, using
> dumb and cheap 
> Infortrend FC JBODs (2 GBit/s) as storage shelves.

Are you using FATA drives?

> During all my 
> benchmarks (both on the command line and within
> applications) show that 
> the FibreChannel is the bottleneck, even with random
> read. ZFS doesn't 
> do this out of the box, but a bit of tuning helped a
> lot.

You found and other good point.
I think that with ZFS and JBOD, FC links will be soon the bottleneck.
What tuning have you done?

> > -ZFS+FC JBOD:  failed hard disk need a reboot
> :(
> > (frankly unbelievable in 2007!)
> >   
> No. Read the thread carefully. It was mentioned that
> you don't have to 
> reboot the server, all you need to do is pull the
> hard disk. Shouldn't 
> be a problem, except if you don't want to replace the
> faulty one anyway. 

It is a problem if your apps hangs waiting for you to power down/pull out the 
drive!
Almost in a time=money environment :)

> No other manual operations will be necessary, except
> for the final "zfs 
> replace". You could also try cfgadm to get rid of ZFS
> pool problems, 
> perhaps it works - I'm not sure about this, because I
> had the idea 
> *after* I solved that problem, but I'll give it a try
> someday.
> > Anyway we happily use ZFS on our new backup systems
> (snapshotting with ZFS is amazing), but to tell you
> the true we are keeping 2 large zpool in sync on each
> system because we fear an other zpool corruption.
> >
> >   
> May I ask how you accomplish that?

During the day we sync pool1 with pool2, then we °umount pool2" during sheduled 
backup operations at night.

> And why are you doing this? You should replicate your
> zpool to another 
> host, instead of mirroring locally. Where's your
> redundancy in that?

We have 4 backup hosts. Soon we'll move to 10G network and we'll replicate on 
different hosts, as you pointed out.

Gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
> We have seen just the opposite... we have a
>  server with about
> 0 million files and only 4 TB of data. We have been
> benchmarking FSes
> for creation and manipulation of large populations of
> small files and
> ZFS is the only one we have found that continues to
> scale linearly
> above one million files in one FS. UFS, VXFS, HFS+
> (don't ask why),
> NSS (on NW not Linux) all show exponential growth in
> response time as
> you cross a certain knee (we are graphing time to
> create  zero
> length files, then do a series of basic manipulations
> on them) in
> number of files. For all the FSes we have tested that
> knee has been
> under one million files, except for ZFS. I know this
> is not 'real
> world' but it does reflect the response time issues
> we have been
> trying to solve. I will see if my client (I am a
> consultant) will
> allow me to post the results, as I am under NDA for
> most of the
> details of what we are doing.

It would be great!

> On the other hand, we have seen serious
> issues using rsync to
> migrate this data from the existing server to the
> Solaris 10 / ZFS
> system, so perhaps your performance issues were rsync
> related and not
> ZFS. In fact, so far the fastest and most reliable
> method for moving
> the data is proving to be Veritas NetBackup (back it
> up on the source
> server, restore to the new ZFS server).
> 
> Now having said all that, we are probably
>  never going to see
> 00 million files in one zpool, because the ZFS
> architecture lets us
> use a more distributed model (many zpools and
> datasets within them)
> and still present the end users with a single view of
> all the data.

Hi Paul,
may I ask you your medium file size? Have you done some optimization?
ZFS recordsize?
Your test included also writing 1 million files?

Gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
> Yes, this is a case where the disk has not completely
> failed.
> ZFS seems to handle the completely failed disk case
> properly, and
> has for a long time.  Cutting the power (which you
> can also do with
> luxadm) makes the disk appear completely failed.

Richard, I think you're right.
The failed disk is still working but it has no space for bad sectors...

Gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
> On Tue, 2007-09-11 at 13:43 -0700, Gino wrote:
> > -ZFS+FC JBOD:  failed hard disk need a reboot
> :(
> > (frankly unbelievable in 2007!)
> 
> So, I've been using ZFS with some creaky old FC JBODs
> (A5200's) and old
> disks which have been failing regularly and haven't
> seen that; the worst
> I've seen running nevada was that processes touching
> the pool got stuck,

this is the problem

> but they all came unstuck when I powered off the
> at-fault FC disk via
> the A5200 front panel.

I'll try again with the EMC JBOD but anyway still remain the fact that you need 
to manually recover from an hard disk failure.

gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Ralf Ramge
Gino wrote:
[...]

> Just a few examples:
> -We lost several zpool with S10U3 because of "spacemap" bug,
> and -nothing- was recoverable.  No fsck here :(
>
>   
Yes, I criticized the lack of zpool recovery mechanisms, too, during my 
AVS testing.  But I don't have the know-how to judge if it has technical 
reasons.

> -We had tons of kernel panics because of ZFS.
> Here a "reboot" must be planned with a couple of weeks in advance
> and done only at saturday night ..
>   
Well, I'm sorry, but if your datacenter runs into problems when a single 
server isn't available, you probably have much worse problems. ZFS is a 
file system. It's not a substitute for hardware trouble or a misplanned 
infrastructure. What would you do if you had the fsck you mentioned 
earlier? Or with another file system like UFS, ext3, whatever? Boot a 
system into single user mode and fsck several terabytes, after planning 
it a couple of weeks in advance?

> -Our 9900V and HP EVAs works really BAD with ZFS because of large cache.
> (echo zfs_nocacheflush/W 1 | mdb -kw) did not solve the problem. Only helped 
> a bit.
>
>   
Use JBODs. Or tell the cache controllers to ignore the flushing 
requests. Should be possible, even the $10k low-cost StorageTek arrays 
support this.

> -ZFS performs badly with a lot of small files.
> (about 20 times slower that UFS with our millions file rsync procedures)
>
>   
I have large Sybase database servers and file servers with billions of 
inodes running using ZFSv3. They are attached to X4600 boxes running 
Solaris 10 U3, 2x 4 GBit/s dual FibreChannel, using dumb and cheap 
Infortrend FC JBODs (2 GBit/s) as storage shelves. During all my 
benchmarks (both on the command line and within applications) show that 
the FibreChannel is the bottleneck, even with random read. ZFS doesn't 
do this out of the box, but a bit of tuning helped a lot.

> -ZFS+FC JBOD:  failed hard disk need a reboot :(
> (frankly unbelievable in 2007!)
>   
No. Read the thread carefully. It was mentioned that you don't have to 
reboot the server, all you need to do is pull the hard disk. Shouldn't 
be a problem, except if you don't want to replace the faulty one anyway. 
No other manual operations will be necessary, except for the final "zfs 
replace". You could also try cfgadm to get rid of ZFS pool problems, 
perhaps it works - I'm not sure about this, because I had the idea 
*after* I solved that problem, but I'll give it a try someday.

> Anyway we happily use ZFS on our new backup systems (snapshotting with ZFS is 
> amazing), but to tell you the true we are keeping 2 large zpool in sync on 
> each system because we fear an other zpool corruption.
>
>   
May I ask how you accomplish that?

And why are you doing this? You should replicate your zpool to another 
host, instead of mirroring locally. Where's your redundancy in that?

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963 
[EMAIL PROTECTED] - http://web.de/

1&1 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, 
Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss 
Aufsichtsratsvorsitzender: Michael Scheeren

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] MS Exchange storage on ZFS?

2007-09-12 Thread Nigel Smith
Microsoft have a document you should read:
"Optimizing Storage for Microsoft Exchange Server 2003"
http://download.microsoft.com/download/b/e/0/be072b12-9c30-4e00-952d-c7d0d7bcea5f/StoragePerformance.doc

Microsoft also have a utility "JetStress" which you can use to verify
the performance of the storage system.
http://www.microsoft.com/downloads/details.aspx?familyid=94b9810b-670e-433a-b5ef-b47054595e9c&displaylang=en
I think you can use Jetstress on a non-exchange server if you copy across
some of the Exchange DLLs. 

If you do any testing along these lines, please report success or failure
back to this forum, and on the 'Storage-discuss' forum where these sort
of questions are more usually discussed.
Thanks
Nigel Smith
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss