Re: [zfs-discuss] Raidz1 p

2009-01-19 Thread Brad Hill
Sure, and thanks for the quick reply.

Controller: Supermicro AOC-SAT2-MV8 plugged into a 64-big PCI-X 133 bus
Drives: 5 x Seagate 7200.11 1.5TB disks for the raidz1.
Single 36GB western digital 10krpm raptor as system disk. Mate for this is in 
but not yet mirrored.
Motherboard: Tyan Thunder K8W S2885 (Dual AMD CPU) with 1GB ECC Ram

Anything else I can provide?

(thanks again)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Blake
So the place we are arriving is to push the RFE for shrinkable pools?

Warning the user about the difference in actual drive size, then
offering to shrink the pool to allow a smaller device seems like a
nice solution to this problem.

The ability to shrink pools might be very useful in other situations.
Say I built server that once did a decent amount of iops using SATA
disks, and now that the workloads iops is greatly increased (busy
database?), I need SAS disks.  If I'd originally bought 500gb SATA
(current sweet spot) disks, I might have a lot of empty space in my
pool.  Shrinking the pool would allow me to migrate to smaller
(capacity) SAS disks with much better seek times, without being forced
to buy 2x as many disks due to the higher cost/gb of SAS.

I think I remember an RFE for shrinkable pools, but can't find it -
can someone post a link if they know where it is?

cheers,
Blake
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 p

2009-01-19 Thread Blake
Can you share your hardware configuration?

cheers,
Blake

On Mon, Jan 19, 2009 at 11:56 PM, Brad Hill  wrote:
> Greetings!
>
> I lost one out of five disks on a machine with a raidz1 and I'm not sure 
> exactly how to recover from it. The pool is marked as FAULTED which I 
> certainly wasn't expecting with only one bum disk.
>
> r...@blitz:/# zpool status -v tank
>  pool: tank
>  state: FAULTED
> status: One or more devices could not be opened.  There are insufficient
>replicas for the pool to continue functioning.
> action: Attach the missing device and online it using 'zpool online'.
>   see: http://www.sun.com/msg/ZFS-8000-3C
>  scrub: none requested
> config:
>
>NAMESTATE READ WRITE CKSUM
>tankFAULTED  0 0 1  corrupted data
>  raidz1DEGRADED 0 0 6
>c6t0d0  ONLINE   0 0 0
>c6t1d0  ONLINE   0 0 0
>c6t2d0  ONLINE   0 0 0
>c6t3d0  UNAVAIL  0 0 0  cannot open
>c6t4d0  ONLINE   0 0 0
>
>
> Any recovery guidance I may gain from the esteemed experts of this group 
> would be extremely appreciated. I recently migrated to opensolaris + zfs on 
> the impassioned advice of a coworker and will lose some data that has been 
> modified since the move, but not yet backed up yet.
>
> Many thanks in advance...
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs null pointer deref,

2009-01-19 Thread Anton B. Rang
If you've got enough space on /var, and you had a dump partition configured, 
you should find a bunch of "vmcore.[n]" files in /var/crash by now.  The system 
normally dumps the kernel core into the dump partition (which can be the swap 
partition) and then copies it into /var/crash on the next successful reboot.

There's likely also a stack printed at the time of the crash; that might be 
enough for the ZFS developers to determine if this is a known (or even fixed) 
bug. It's also retrievable from the core. If it's not a known bug, or if more 
data is needed, the developers might want a copy of the core
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs panic

2009-01-19 Thread Anton B. Rang
Looks like a corrupted pool -- you appear to have a mirror block pointer with 
no valid children. From the dump, you could probably determine which file is 
bad, but I doubt you could delete it; you might need to recreate your pool.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Raidz1 p

2009-01-19 Thread Brad Hill
Greetings!

I lost one out of five disks on a machine with a raidz1 and I'm not sure 
exactly how to recover from it. The pool is marked as FAULTED which I certainly 
wasn't expecting with only one bum disk. 

r...@blitz:/# zpool status -v tank
  pool: tank
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankFAULTED  0 0 1  corrupted data
  raidz1DEGRADED 0 0 6
c6t0d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
c6t3d0  UNAVAIL  0 0 0  cannot open
c6t4d0  ONLINE   0 0 0


Any recovery guidance I may gain from the esteemed experts of this group would 
be extremely appreciated. I recently migrated to opensolaris + zfs on the 
impassioned advice of a coworker and will lose some data that has been modified 
since the move, but not yet backed up yet.

Many thanks in advance...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Eric D. Mudama
On Mon, Jan 19 at 23:14, Greg Mason wrote:
>So, what we're looking for is a way to improve performance, without  
>disabling the ZIL, as it's my understanding that disabling the ZIL  
>isn't exactly a safe thing to do.
>
>We're looking for the best way to improve performance, without  
>sacrificing too much of the safety of the data.
>
>The current solution we are considering is disabling the cache  
>flushing (as per a previous response in this thread), and adding one  
>or two SSD log devices, as this is similar to the Sun storage  
>appliances based on the Thor. Thoughts?

In general principles, the evil tuning guide states that the ZIL
should be able to handle 10 seconds of expected synchronous write
workload.

To me, this implies that it's improving burst behavior, but
potentially at the expense of sustained throughput, like would be
measured in benchmarking type runs.

If you have a big JBOD array with say 8+ mirror vdevs on multiple
controllers, in theory, each VDEV can commit from 60-80MB/s to disk.
Unless you are attaching a separate ZIL device that can match the
aggregate throughput of that pool, wouldn't it just be better to have
the default behavior of the ZIL contents being inside the pool itself?

The best practices guide states that the max ZIL device size should be
roughly 50% of main system memory, because that's approximately the
most data that can be in-flight at any given instant.

"For a target throughput of X MB/sec and given that ZFS pushes
transaction groups every 5 seconds (and have 2 outstanding), we also
expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service
100MB/sec of synchronous writes, 1 GBytes of log device should be
sufficient."

But, no comments are made on the performance requirements of the ZIL
device(s) relative to the main pool devices.  Clicking around finds
this entry:

http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

...which appears to indicate cases where a significant number of ZILs
were required to match the bandwidth of just throwing them in the pool
itself.


--eric


-- 
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
>
> Good idea.  Thor has a CF slot, too, if you can find a high speed
> CF card.
> -- richard

We're already using the CF slot for the OS. We haven't really found  
any CF cards that would be fast enough anyways :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Bob Friesenhahn
On Mon, 19 Jan 2009, Greg Mason wrote:

> The current solution we are considering is disabling the cache
> flushing (as per a previous response in this thread), and adding one
> or two SSD log devices, as this is similar to the Sun storage
> appliances based on the Thor. Thoughts?

You need to add some sort of fast non-volatile cache.

The Sun storage appliances are actually using battery backed DRAM for 
their write caches.  This sort of hardware is quite rare.

Fast SSD log devices are apparently pretty expensive.  Some of the 
ones for sale are actually pretty slow.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Richard Elling
Greg Mason wrote:
> So, what we're looking for is a way to improve performance, without 
> disabling the ZIL, as it's my understanding that disabling the ZIL isn't 
> exactly a safe thing to do.
> 
> We're looking for the best way to improve performance, without 
> sacrificing too much of the safety of the data.
> 
> The current solution we are considering is disabling the cache flushing 
> (as per a previous response in this thread), and adding one or two SSD 
> log devices, as this is similar to the Sun storage appliances based on 
> the Thor. Thoughts?

Good idea.  Thor has a CF slot, too, if you can find a high speed
CF card.
  -- richard


> -Greg
> 
> On Jan 19, 2009, at 6:24 PM, Richard Elling wrote:
>>>
>>> We took a rough stab in the dark, and started to examine whether or 
>>> not it was the ZIL.
>>
>> It is. I've recently added some clarification to this section in the
>> Evil Tuning Guide which might help you to arrive at a better solution.
>> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
>>  
>>
>> Feedback is welcome.
>> -- richard
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
So, what we're looking for is a way to improve performance, without  
disabling the ZIL, as it's my understanding that disabling the ZIL  
isn't exactly a safe thing to do.

We're looking for the best way to improve performance, without  
sacrificing too much of the safety of the data.

The current solution we are considering is disabling the cache  
flushing (as per a previous response in this thread), and adding one  
or two SSD log devices, as this is similar to the Sun storage  
appliances based on the Thor. Thoughts?

-Greg

On Jan 19, 2009, at 6:24 PM, Richard Elling wrote:
>>
>> We took a rough stab in the dark, and started to examine whether or  
>> not it was the ZIL.
>
> It is. I've recently added some clarification to this section in the
> Evil Tuning Guide which might help you to arrive at a better solution.
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
> Feedback is welcome.
> -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CIFS and zfs

2009-01-19 Thread Louis Hoefler
I switched to the CIFS filesharing system.

All that I needed to do was to disable the samba wins swat services, then i 
started the smb/server service.

I followed the CIFS administration guide. Almost everything worked without 
problems.
The only problem I got was a ?wins? resolution error. So, if I tried to access 
\\"hostname" I got a error message that windows was not able to log on. I tried 
the ip insteal of the hostname, suffixed by the share (-r resource-name) I just 
created, what worked.
I restarted the smb/server service again. No changes.

I restarted the machine, now everything works without problems.
Somehow there where problems with the windows name resolution without the 
restart. Maybe thats becouse I used the wins deamon before?

Greets Louis Hoefler.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Adam Leventhal
> And again, I say take a look at the market today, figure out a percentage,
> and call it done.  I don't think you'll find a lot of users crying foul over
> losing 1% of their drive space when they don't already cry foul over the
> false advertising that is drive sizes today.

Perhaps it's quaint, but 5GB still seems like a lot to me to throw away.

> In any case, you might as well can ZFS entirely because it's not really fair
> that users are losing disk space to raid and metadata... see where this
> argument is going?

Well, I see where this _specious_ argument is going.

> I have two disks in one of my systems... both maxtor 500GB drives, purchased
> at the same time shortly after the buyout.  One is a rebadged Seagate, one
> is a true, made in China Maxtor.  Different block numbers... same model
> drive, purchased at the same time.
> 
> Wasn't zfs supposed to be about using software to make up for deficiencies
> in hardware?  It would seem this request is exactly that...

That's a fair point, and I do encourage you to file an RFE, but a) Sun has
already solved this problem in a different way as a company with our products
and b) users already have the ability to right-size drives.

Perhaps a better solution would be to handle the procedure of replacing a disk
with a slightly smaller one by migrating data and then treating the extant
disks as slightly smaller as well. This would have the advantage of being far
more dynamic and of only applying the space tax in situations where it actually
applies.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Richard Elling
Greg Mason wrote:
> We're running into a performance problem with ZFS over NFS. When working 
> with many small files (i.e. unpacking a tar file with source code), a 
> Thor (over NFS) is about 4 times slower than our aging existing storage 
> solution, which isn't exactly speedy to begin with (17 minutes versus 3 
> minutes).
> 
> We took a rough stab in the dark, and started to examine whether or not 
> it was the ZIL.

It is. I've recently added some clarification to this section in the
Evil Tuning Guide which might help you to arrive at a better solution.
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
Feedback is welcome.
  -- richard

> Performing IO tests locally on the Thor shows no real IO problems, but 
> running IO tests over NFS, specifically, with many smaller files we see 
> a significant performance hit.
> 
> Just to rule in or out the ZIL as a factor, we disabled it, and ran the 
> test again. It completed in just under a minute, around 3 times faster 
> than our existing storage. This was more like it!
> 
> Are there any tunables for the ZIL to try to speed things up? Or would 
> it be best to look into using a high-speed SSD for the log device?
> 
> And, yes, I already know that turning off the ZIL is a Really Bad Idea. 
> We do, however, need to provide our users with a certain level of 
> performance, and what we've got with the ZIL on the pool is completely 
> unacceptable.
> 
> Thanks for any pointers you may have...
> 
> --
> 
> Greg Mason
> Systems Administrator
> Michigan State University
> High Performance Computing Center
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Tim
On Mon, Jan 19, 2009 at 2:55 PM, Adam Leventhal  wrote:

> Drive vendors, it would seem, have an incentive to make their "500GB"
> drives
> as small as possible. Should ZFS then choose some amount of padding at the
> end of each device and chop it off as insurance against a slightly smaller
> drive? How much of the device should it chop off? Conversely, should users
> have the option to use the full extent of the drives they've paid for, say,
> if they're using a vendor that already provides that guarantee?


Drive vendors, it would seem, have incentive to make their 500GB drives as
cheap as possible.  The two are not necessarily one and the same.

And again, I say take a look at the market today, figure out a percentage,
and call it done.  I don't think you'll find a lot of users crying foul over
losing 1% of their drive space when they don't already cry foul over the
false advertising that is drive sizes today.

In any case, you might as well can ZFS entirely because it's not really fair
that users are losing disk space to raid and metadata... see where this
argument is going?

I really, REALLY doubt you're going to have users screaming at you for
losing 1% (or whatever the figure ends up being) to a right-sizing
algorithm.  In fact, I would bet the average user will NEVER notice if you
don't tell them ahead of time.  Sort of like the average user had absolutely
no clue that 500GB drives were of slightly differing block numbers, and he'd
end up screwed six months down the road if he couldn't source an identical
drive.

I have two disks in one of my systems... both maxtor 500GB drives, purchased
at the same time shortly after the buyout.  One is a rebadged Seagate, one
is a true, made in China Maxtor.  Different block numbers... same model
drive, purchased at the same time.

Wasn't zfs supposed to be about using software to make up for deficiencies
in hardware?  It would seem this request is exactly that...



>
>
> > You know, sort of like you not letting people choose their raid layout...
>
> Yes, I'm not saying it shouldn't be done. I'm asking what the right answer
> might be.


The *right answer* in simplifying storage is not "manually slice up every
disk you insert into the system to avoid this issue".

The right answer is "right-size by default, give admins the option to skip
it if they really want".  Sort of like I'd argue the right answer on the
7000 is to give users the raid options you do today by default, and allow
them to lay it out themselves from some sort of advanced *at your own risk*
mode, whether that be command line (the best place I'd argue) or something
else.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Recovery after SAN Corruption

2009-01-19 Thread Matthew Angelo
Hello,

We recently had SAN corruption (hard power outage), and we lost a few
transaction that were waiting to be written to real disk. The end result as
we all know is CKSUM errors on the zpool from a scrub, and we also had a few
corrupted files reported by ZFS.

My question is, what is the proper way to recover from this?  Create a new
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Nicholas Lee
Another option to look at is:
set zfs:zfs_nocacheflush=1
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Best option is to get a a fast ZIL log device.


Depends on your pool as well. NFS+ZFS means zfs will wait for write
completes before responding to a sync NFS write ops.  If you have a RAIDZ
array, writes will be slower than a RAID10 style pool.


On Tue, Jan 20, 2009 at 11:08 AM, Greg Mason  wrote:

> We're running into a performance problem with ZFS over NFS. When working
> with many small files (i.e. unpacking a tar file with source code), a
> Thor (over NFS) is about 4 times slower than our aging existing storage
> solution, which isn't exactly speedy to begin with (17 minutes versus 3
> minutes).
>
> We took a rough stab in the dark, and started to examine whether or not
> it was the ZIL.
>
> Performing IO tests locally on the Thor shows no real IO problems, but
> running IO tests over NFS, specifically, with many smaller files we see
> a significant performance hit.
>
> Just to rule in or out the ZIL as a factor, we disabled it, and ran the
> test again. It completed in just under a minute, around 3 times faster
> than our existing storage. This was more like it!
>
> Are there any tunables for the ZIL to try to speed things up? Or would
> it be best to look into using a high-speed SSD for the log device?
>
> And, yes, I already know that turning off the ZIL is a Really Bad Idea.
> We do, however, need to provide our users with a certain level of
> performance, and what we've got with the ZIL on the pool is completely
> unacceptable.
>
> Thanks for any pointers you may have...
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
We're running into a performance problem with ZFS over NFS. When working 
with many small files (i.e. unpacking a tar file with source code), a 
Thor (over NFS) is about 4 times slower than our aging existing storage 
solution, which isn't exactly speedy to begin with (17 minutes versus 3 
minutes).

We took a rough stab in the dark, and started to examine whether or not 
it was the ZIL.

Performing IO tests locally on the Thor shows no real IO problems, but 
running IO tests over NFS, specifically, with many smaller files we see 
a significant performance hit.

Just to rule in or out the ZIL as a factor, we disabled it, and ran the 
test again. It completed in just under a minute, around 3 times faster 
than our existing storage. This was more like it!

Are there any tunables for the ZIL to try to speed things up? Or would 
it be best to look into using a high-speed SSD for the log device?

And, yes, I already know that turning off the ZIL is a Really Bad Idea. 
We do, however, need to provide our users with a certain level of 
performance, and what we've got with the ZIL on the pool is completely 
unacceptable.

Thanks for any pointers you may have...

--

Greg Mason
Systems Administrator
Michigan State University
High Performance Computing Center
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Richard Elling
Tim wrote:
> On Mon, Jan 19, 2009 at 1:12 PM, Bob Friesenhahn 
> mailto:bfrie...@simple.dallas.tx.us>> wrote:
> 
> On Mon, 19 Jan 2009, Adam Leventhal wrote:
> 
> 
> Are you telling me zfs is deficient to the point it can't
> handle basic
> right-sizing like a 15$ sata raid adapter?
> 
> 
> How do there $15 sata raid adapters solve the problem? The more
> details you
> could provide the better obviously.

Note that for the LSI RAID controllers Sun uses on many products,
if you take a disk that was JBOD and tell the controller to make
it RAIDed, then the controller will relabel the disk for you and
will cause you to lose the data.  As best I can tell, ZFS is better
in that it will protect your data rather than just relabeling and
clobbering your data.  AFAIK, NVidia and others do likewise.

> It is really quite simple.  If the disk is resilvered but the new
> drive is a bit too small, then the RAID card might tell you that a
> bit of data might have lost in the last sectors, or it may just
> assume that you didn't need that data, or maybe a bit of cryptic
> message text scrolls off the screen a split second after it has been
> issued.  Or if you try to write at the end of the volume and one of
> the replacement drives is a bit too short, then the RAID card may
> return a hard read or write error.  Most filesystems won't try to
> use that last bit of space anyway since they run real slow when the
> disk is completely full, or their flimsy formatting algorithm always
> wastes a bit of the end of the disk.  Only ZFS is rash enough to use
> all of the space provided to it, and actually expect that the space
> continues to be usable.
> 
> 
> 
> It's a horribly *bad thing* to not use the entire disk and right-size it 
> for sanity's sake.  That's why Sun currently sells arrays that do JUST 
> THAT.  

??

> I'd wager fishworks does just that as well.  Why don't you open source 
> that code and prove me wrong ;)

I don't think so, because fishworks is an engineering team and I
don't think I can reserve space on a person... at least not legally
where I live :-)

But this is not a problem for the Sun Storage 7000 systems because
the supported disks are already "right-sized."

> I'm wondering why they don't come right out with it and say "we want to 
> intentionally make this painful to our end users so that they buy our 
> packaged products".  It'd be far more honest and productive than this 
> pissing match.

I think that if there is enough real desire for this feature,
then someone would file an RFE on http://bugs.opensolaris.org
It would help to attach diffs to the bug and it would help to
reach a concensus of the amount of space to be reserved prior
to filing.  This is not an intractable problem and easy workarounds
already exist, but if ease of use is more valuable than squeezing
every last block, then the RFE should fly.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Adam Leventhal
On Mon, Jan 19, 2009 at 01:35:22PM -0600, Tim wrote:
> > > Are you telling me zfs is deficient to the point it can't handle basic
> > > right-sizing like a 15$ sata raid adapter?
> >
> > How do there $15 sata raid adapters solve the problem? The more details you
> > could provide the better obviously.
> 
> They short stroke the disk so that when you buy a new 500GB drive that isn't
> the exact same number of blocks you aren't screwed.  It's a design choice to
> be both sane, and to make the end-users life easier.  You know, sort of like
> you not letting people choose their raid layout...

Drive vendors, it would seem, have an incentive to make their "500GB" drives
as small as possible. Should ZFS then choose some amount of padding at the
end of each device and chop it off as insurance against a slightly smaller
drive? How much of the device should it chop off? Conversely, should users
have the option to use the full extent of the drives they've paid for, say,
if they're using a vendor that already provides that guarantee?

> You know, sort of like you not letting people choose their raid layout...

Yes, I'm not saying it shouldn't be done. I'm asking what the right answer
might be.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Feature Request Discussion (Was: Understanding ZFS replication)

2009-01-19 Thread Timothy Renner
So personally I find ZFS to be fantastic, it's only missing three 
features from my ideal filesystem:
1) The ability to easily recover the portions of a filesystem that are 
still intact after a catastrophic failure (It looks like zfs scrub can 
do this as long as a damaged pool could be imported, so this is almost 
there, or it's hackable at the moment if a bit of drive information has 
been kept around)

2) The ability to push the data off a device and safely remove it from a 
non-mirrored pool (Marked as a future feature)

3) File system level mirroring across devices, rather than device level 
mirroring, so...  To raise this issue for discussion, pros/cons/not 
worth the effort, ideas:

It would be fantastic if ZFS could support another option for copies 
that **guarantees** that it writes copies to different devices, and if 
it cannot (due to free space constraints or failing/failed device), 
write to the same device, but raise an error/warning that could be 
checked in zpool status or similar fashion (similar to a RAID5 losing a 
disk...  It's workable, simply degraded)

zpool scrub strikes me as the perfect tool to attempt to enforce the 
copies=X attribute, as a way to bring the entire filesystem into line 
with the current settings and ensure that old data meets the 
requirement, rather than only affecting new data.
An issue I immediately see here would involve possibly needing to move 
data from one disk to another in order to free up space for replication 
across devices, which is likely non-trivial.

-Tim

Miles Nordin wrote:
>> "tr" == Timothy Renner  writes:
>> 
>
> tr> zfs set copies=2 zfspool/test2
>
> 'copies=2' says things will be written twice, but regardless of
> discussion about where the two copies are written, copies=2 says
> nothing at all about being able to *read back* your data if one of the
> copies disappears.  It only promises that the two copies will be
> written.  This does you no good at all if you can't import the pool,
> which is probably what will happen to anyone who has relied on
> copies=2 for redundancy.
>
> The discussion about *where* the copies tend to be written is really
> impractical and distracting, IMO.
>
> The chance that the copies won't be written to separate vdev's is not
> where the problem comes from.  You can't import a pool unless it has
> enough redundancy at vdev-level to get all your data, so copies=2
> doesn't add much.  The best copies=2 will do is give you a slightly
> better shot at evacuating the data from a slowly-failing drive.  If
> anyone at all should be using it, certainly I don't think someone with
> more than one drive should be using it.
>   
> 
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS tale of woe and fail

2009-01-19 Thread Miles Nordin
> "b" == Blake   writes:

 b> removing the zfs cache file located at /etc/zfs/zpool.cache
 b> might be an emergency workaround?

just the opposite.  There seem to be fewer checks blocking the
autoimport of pools listed in zpool.cache than on 'zpool import'
manual imports.  I'd expect the reverse, for some forceable 'zpool
import' to accept pools that don't autoimport, but at least Ross found
zpool.cache could auto-import a pool with a missing slog, while 'zpool
import' tells you, recreate from backup.


pgpa9pIk70DwP.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Tim
On Mon, Jan 19, 2009 at 1:12 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Mon, 19 Jan 2009, Adam Leventhal wrote:
>
>>
>>  Are you telling me zfs is deficient to the point it can't handle basic
>>> right-sizing like a 15$ sata raid adapter?
>>>
>>
>> How do there $15 sata raid adapters solve the problem? The more details
>> you
>> could provide the better obviously.
>>
>
> It is really quite simple.  If the disk is resilvered but the new drive is
> a bit too small, then the RAID card might tell you that a bit of data might
> have lost in the last sectors, or it may just assume that you didn't need
> that data, or maybe a bit of cryptic message text scrolls off the screen a
> split second after it has been issued.  Or if you try to write at the end of
> the volume and one of the replacement drives is a bit too short, then the
> RAID card may return a hard read or write error.  Most filesystems won't try
> to use that last bit of space anyway since they run real slow when the disk
> is completely full, or their flimsy formatting algorithm always wastes a bit
> of the end of the disk.  Only ZFS is rash enough to use all of the space
> provided to it, and actually expect that the space continues to be usable.
>
>

It's a horribly *bad thing* to not use the entire disk and right-size it for
sanity's sake.  That's why Sun currently sells arrays that do JUST THAT.

I'd wager fishworks does just that as well.  Why don't you open source that
code and prove me wrong ;)

I'm wondering why they don't come right out with it and say "we want to
intentionally make this painful to our end users so that they buy our
packaged products".  It'd be far more honest and productive than this
pissing match.


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Tim
On Mon, Jan 19, 2009 at 12:39 PM, Adam Leventhal  wrote:

>
> Sorry, I must have missed your point. I thought that you were saying that
> HDS, NetApp, and EMC had a different model. Were you merely saying that the
> software in those vendors' products operates differently than ZFS?
>

Gosh, was the point that hard to get?  Let me state it a fourth time:  They
all short stroke the disks to avoid the CF that results in all drives not
adhering to a strict sizing standard.



>
> > Are you telling me zfs is deficient to the point it can't handle basic
> > right-sizing like a 15$ sata raid adapter?
>
> How do there $15 sata raid adapters solve the problem? The more details you
> could provide the better obviously.
>

They short stroke the disk so that when you buy a new 500GB drive that isn't
the exact same number of blocks you aren't screwed.  It's a design choice to
be both sane, and to make the end-users life easier.  You know, sort of like
you not letting people choose their raid layout...

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disks in each RAIDZ group

2009-01-19 Thread Blake
I think this is probably true, and I suspect that Sun is also
targeting media warehousing shops like some of the big social
networking/video sites, where storage is coming online too fast to
make manual tuning a sensible thing to do.  Look at many enterprise
storage graphs showing bytes on the x and time on the y axis, and you
see a very scary picture for an admin - unless you can slap storage
into a rack with minimal setup time.

Plus, one can yell at the 7000-series when the stress gets to be too
much: www.youtube.com/watch?v=tDacjrSCeq4

>From another point of view, making ZFS as friendly as possible to
'regular' users - those who *do* buy drives from Fry's - will
certainly help drive adoption.  Some of these people become the buyers
in IT departments later.  Lots of Linux tools, while horrendous to
administer (LVM), have worked nicely with junky hardware for a long
time.  So now, we have Linux even in places where it may not make the
most sense.  My last enterprise job had many terabytes of data sitting
on LVM.  I'm glad I wasn't the admin for that nightmare.

cheers,
Blake

On Mon, Jan 19, 2009 at 2:00 PM, Bob Friesenhahn
 wrote:

>
> It seems likely that Sun discovered that raw Solaris and system
> configuration is too difficult for many "Windows" shops to grasp so
> they introduced a simplified appliance product line which is
> engineered entirely by Sun, and with a simplified administration
> interface which does not require a lot of training to understand.
>
> Since the product is entirely engineered by Sun, they can ensure that
> the provided disk drives and configuration are carefully matched
> (based on testing and analysis) in order to offer the best
> price/performance ratio.
>
> Bob
> ==
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding ZFS replication

2009-01-19 Thread Miles Nordin
> "tr" == Timothy Renner  writes:

tr> zfs set copies=2 zfspool/test2

'copies=2' says things will be written twice, but regardless of
discussion about where the two copies are written, copies=2 says
nothing at all about being able to *read back* your data if one of the
copies disappears.  It only promises that the two copies will be
written.  This does you no good at all if you can't import the pool,
which is probably what will happen to anyone who has relied on
copies=2 for redundancy.

The discussion about *where* the copies tend to be written is really
impractical and distracting, IMO.

The chance that the copies won't be written to separate vdev's is not
where the problem comes from.  You can't import a pool unless it has
enough redundancy at vdev-level to get all your data, so copies=2
doesn't add much.  The best copies=2 will do is give you a slightly
better shot at evacuating the data from a slowly-failing drive.  If
anyone at all should be using it, certainly I don't think someone with
more than one drive should be using it.


pgp2ATsq8cq9a.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Bob Friesenhahn
On Mon, 19 Jan 2009, Adam Leventhal wrote:
>
>> Are you telling me zfs is deficient to the point it can't handle basic
>> right-sizing like a 15$ sata raid adapter?
>
> How do there $15 sata raid adapters solve the problem? The more details you
> could provide the better obviously.

It is really quite simple.  If the disk is resilvered but the new 
drive is a bit too small, then the RAID card might tell you that a bit 
of data might have lost in the last sectors, or it may just assume 
that you didn't need that data, or maybe a bit of cryptic message text 
scrolls off the screen a split second after it has been issued.  Or if 
you try to write at the end of the volume and one of the replacement 
drives is a bit too short, then the RAID card may return a hard read 
or write error.  Most filesystems won't try to use that last bit of 
space anyway since they run real slow when the disk is completely 
full, or their flimsy formatting algorithm always wastes a bit of the 
end of the disk.  Only ZFS is rash enough to use all of the space 
provided to it, and actually expect that the space continues to be 
usable.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disks in each RAIDZ group

2009-01-19 Thread Bob Friesenhahn
On Mon, 19 Jan 2009, Tim wrote:

> Remember that one time when I talked about limiting snapshots to protect a
> user from themselves, and you joined into the fray of people calling me a
> troll?  Can you feel the irony oozing out between your lips, or are you
> completely oblivious to it?

Tim,

I admire that you are able to keep your trolls on topic, quite unlike 
JZ.  This is a clear indication that you are not yet qualified for the 
rubber room.

If you don't like the simplified nature of the Sun Unified Storage 
7000 products, you always have the option of building your own system 
based on your preferred hardware, and OpenSolaris (or Linux), just as 
you did before.  The mere existence of these products should not annoy 
you.

It seems likely that Sun discovered that raw Solaris and system 
configuration is too difficult for many "Windows" shops to grasp so 
they introduced a simplified appliance product line which is 
engineered entirely by Sun, and with a simplified administration 
interface which does not require a lot of training to understand.

Since the product is entirely engineered by Sun, they can ensure that 
the provided disk drives and configuration are carefully matched 
(based on testing and analysis) in order to offer the best 
price/performance ratio.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS tale of woe and fail

2009-01-19 Thread Blake
Miles,
that's correct - I got muddled in the details of the thread.

I'm not necessarily suggesting this, but is this an occasion when
removing the zfs cache file located at /etc/zfs/zpool.cache might be
an emergency workaround?

Tom, please don't try this until someone more expert replies to my question.

cheers,
Blake


On Mon, Jan 19, 2009 at 1:43 PM, Miles Nordin  wrote:
>
> b> You can get a sort of redundancy by creating multiple
> b> filesystems with 'copies' enabled on the ones that need some
> b> sort of self-healing in case of bad blocks.
>
> Won't work here.  The pool won't import at all.  The type of bad block
> fixing you're talking about applies to cases where the pool imports,
> but 'zpool status' reports files with bad blocks in them.
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disks in each RAIDZ group

2009-01-19 Thread Adam Leventhal
> BWAHAHAHAHA.  That's a good one.  "You don't need to setup your raid, that's
> micro-managing, we'll do that."
> 
> Remember that one time when I talked about limiting snapshots to protect a
> user from themselves, and you joined into the fray of people calling me a
> troll?

I don't remember this, but I don't doubt it.

> Can you feel the irony oozing out between your lips, or are you
> completely oblivious to it?

The irony would be that on one hand I object to artificial limitations to
business-critical features while on the other hand I think that users don't
need to tweak settings that add complexity and little to no value? They seem
very different to me, so I suppose the answer to your question is: no I cannot
feel the irony oozing out between my lips, and yes I'm oblivious to the same.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Adam Leventhal
> > > Since it's done in software by HDS, NetApp, and EMC, that's complete
> > > bullshit.  Forcing people to spend 3x the money for a "Sun" drive that's
> > > identical to the seagate OEM version is also bullshit and a piss-poor
> > > answer.
> >
> > I didn't know that HDS, NetApp, and EMC all allow users to replace their
> > drives with stuff they've bought at Fry's. Is this still covered by their
> > service plan or would this only be in an unsupported config?
> 
> So because an enterprise vendor requires you to use their drives in their
> array, suddenly zfs can't right-size?  Vendor requirements have absolutely
> nothing to do with their right-sizing, and everything to do with them
> wanting your money.

Sorry, I must have missed your point. I thought that you were saying that
HDS, NetApp, and EMC had a different model. Were you merely saying that the
software in those vendors' products operates differently than ZFS?

> Are you telling me zfs is deficient to the point it can't handle basic
> right-sizing like a 15$ sata raid adapter?

How do there $15 sata raid adapters solve the problem? The more details you
could provide the better obviously.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Julien Gabel
>> Creating a slice, instead of using the whole disk, will cause ZFS to
>> not enable write-caching on the underlying device.

> Correct.  Engineering trade-off.  Since most folks don't read the manual,
> or the best practices guide, until after they've hit a problem, it is really
> just a CYA entry :-(

It seems this trade-off can now be mitigated, regarding Roch Bourbonnais
comment on a another thread on this list:
- http://mail.opensolaris.org/pipermail/zfs-discuss/2009-January/054587.html

In particular:
" If ZFS owns a disk it will enable the write cache on the drive but I'm
  not positive this has a great performance impact today.  It used to
  but that was before we had a proper NCQ implementation.  Today
  I don't know that it helps much.  That this is because we always
  flush the cache when consistency requires it."

-- 
julien.
http://blog.thilelli.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS tale of woe and fail

2009-01-19 Thread Miles Nordin
> "nk" == Nathan Kroenert  writes:
> "b" == Blake   writes:

nk> I'm not sure how you can class it a ZFS fail when the Disk
nk> subsystem has failed...

The disk subsystem did not fail and lose all its contents.  It just
rebooted a few times.

 b> You can get a sort of redundancy by creating multiple
 b> filesystems with 'copies' enabled on the ones that need some
 b> sort of self-healing in case of bad blocks.

Won't work here.  The pool won't import at all.  The type of bad block
fixing you're talking about applies to cases where the pool imports,
but 'zpool status' reports files with bad blocks in them.


pgp6eWbgsPtQQ.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disks in each RAIDZ group

2009-01-19 Thread Tim
On Mon, Jan 19, 2009 at 11:02 AM, Adam Leventhal  wrote:

> > "The recommended number of disks per group is between 3 and 9. If you
> have
> > more disks, use multiple groups."
> >
> > Odd that the Sun Unified Storage 7000 products do not allow you to
> control
> > this, it appears to put all the hdd's into one group.  At least on the
> 7110
> > we are evaluating there is no control to allow multiple groups/different
> > raid types.
>
> Our experience has shown that that initial guess of 3-9 per parity device
> was
> surprisingly narrow. We see similar performance out to much wider stripes
> which, of course, offer the user more usable capacity.
>
> We don't allow you to manually set the RAID stripe widths on the 7000
> series
> boxes because frankly the stripe width is an implementation detail. If you
> want the best performance, choose mirroring; capacity, double-parity RAID;
> for something in the middle, we offer 3+1 single-parity RAID. Other than
> that you're micro-optimizing for gains that would hardly be measurable
> given
> the architecture of the Hybrid Storage Pool. Recall that unlike other
> products in the same space, we get our IOPS from flash rather than from
> a bazillion spindles spinning at 15,000 RPM.
>
> Adam
>


BWAHAHAHAHA.  That's a good one.  "You don't need to setup your raid, that's
micro-managing, we'll do that."

Remember that one time when I talked about limiting snapshots to protect a
user from themselves, and you joined into the fray of people calling me a
troll?  Can you feel the irony oozing out between your lips, or are you
completely oblivious to it?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Tim
On Mon, Jan 19, 2009 at 11:05 AM, Adam Leventhal  wrote:

> > Since it's done in software by HDS, NetApp, and EMC, that's complete
> > bullshit.  Forcing people to spend 3x the money for a "Sun" drive that's
> > identical to the seagate OEM version is also bullshit and a piss-poor
> > answer.
>
> I didn't know that HDS, NetApp, and EMC all allow users to replace their
> drives with stuff they've bought at Fry's. Is this still covered by their
> service plan or would this only be in an unsupported config?
>
> Thanks.
>
> Adam
>


So because an enterprise vendor requires you to use their drives in their
array, suddenly zfs can't right-size?  Vendor requirements have absolutely
nothing to do with their right-sizing, and everything to do with them
wanting your money.

Are you telling me zfs is deficient to the point it can't handle basic
right-sizing like a 15$ sata raid adapter?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Miles Nordin
> "edm" == Eric D Mudama  writes:

   edm> If, instead of having ZFS manage these differences, a user
   edm> simply created slices that were, say, 98%

if you're willing to manually create slices, you should be able to
manually enable the write cache, too, while you're in there, so I
wouldn't worry about that.  I'd worry a little about the confusion
over this write cache bit in general---where the write cache setting
is stored and when it's enabled and when (if?) it's disabled, if the
rules differ on each type of disk attachment, and if you plug the disk
into Linux will Linux screw up the setting by auto-enabling at boot or
by auto-disabling at shutdown or does Linux use stateless versions
(analagous to sdparm without --save) when it prints that boot-time
message about enabling write caches?  For example weirdness, on iSCSI
I get this, on a disk to which I've let ZFS write a GPT/EFI label:

write_cache> display
Write Cache is disabled
write_cache> enable
Write cache setting is not changeable

so is that a bug of my iSCSI target, and is there another implicit
write cache inside the iSCSI initiator or not?  The Linux hdparm man
page says:

   -W Disable/enable  the  IDE  drive's write-caching feature (default
  state is undeterminable; manufacturer/model specific).

so is the write_cache 'display' feature in 'format -e' actually
reliable?  Or is it impossible to reliably read this setting on an ATA
drive, and 'format -e' is making stuff up?

With Linux I can get all kinds of crazy caching data from a SATA disk:

r...@node0 ~ # sdparm --page=ca --long /dev/sda
/dev/sda: ATA   WDC WD1000FYPS-0  02.0
Caching (SBC) [PS=0] mode page:
  IC  0  Initiator control
  ABPF0  Abort pre-fetch
  CAP 0  Caching analysis permitted
  DISC0  Discontinuity
  SIZE0  Size (1->CSS valid, 0->NCS valid)
  WCE 1  Write cache enable
  MF  0  Multiplication factor
  RCD 0  Read cache disable
  DRRP0  Demand read retension priority
  WRP 0  Write retension priority
  DPTL0  Disable pre-fetch transfer length
  MIPF0  Minimum pre-fetch
  MAPF0  Maximum pre-fetch
  MAPFC   0  Maximum pre-fetch ceiling
  FSW 0  Force sequential write
  LBCSS   0  Logical block cache segment size
  DRA 0  Disable read ahead
  NV_DIS  0  Non-volatile cache disable
  NCS 0  Number of cache segments
  CSS 0  Cache segment size

but what's actually coming from the drive, and what's fabricated by
the SCSI-to-SATA translator built into Garzik's libata?  Because I
think Solaris has such a translator, too, if it's attaching sd to SATA
disks.  I'm guessing it's all a fantasy because:

r...@node0 ~ # sdparm --clear=WCE /dev/sda
/dev/sda: ATA   WDC WD1000FYPS-0  02.0
change_mode_page: failed setting page: Caching (SBC)

but neverminding the write cache, I'd be happy saying ``just round
down disk sizes using the labeling tool instead of giving ZFS the
whole disk, if you care,'' IF the following things were true:

 * doing so were written up as a best-practice.  because, I think it's
   a best practice if the rest of the storage industry from EMC to $15
   promise cards is doing it, though maybe it's not important any more
   because of IDEMA.  And right now very few people are likely to have
   done it because of the way they've been guided into the setup process.

 * it were possible to do this label-sizing to bootable mirrors in the
   various traditional/IPS/flar/jumpstart installers

 * there weren't a proliferation of >= 4 labeling tools in Solaris,
   each riddled with assertion bailouts and slightly different
   capabilities.  Linux also has a mess of labeling tools, but they're
   less assertion-riddled, and usually you can pick one and use it for
   everything---you don't have to drag out a different tool for USB
   sticks because they're considered ``removeable.''  Also it's always
   possible to write to the unpartitioned block device with 'dd' on
   Linux (and FreeBSD and Mac OS X), no matter what label is on the
   disk, while Solaris doesn't seem to have an unpartitioned device.
   And finally the Linux formatting tools work by writing to this
   unpartitioned device, not by calling into a rat's nest of ioctl's,
   so they're much easier for me to get along with.

   Part of the attraction of ZFS should be avoiding this messy part of
   Solaris, but we still have to use format/fmthard/fdisk/rmformat, to
   swap label types because ZFS won't, to frob the write cache because
   ZFS's user interface is too simple and does that semi-automatically
   though I'm not sure all the rules it's using, to enumerate the
   installed disks, to determine in which of the several states
   working / connected-but-not-identified / disconnected /
   disconnected-but-refcounted the iSCSI initiator is in.

   And while ZFS will do special things to an UNlabeled disk, I'm not
   sure there i

Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Richard Elling
Jim Dunham wrote:
> Richard,
>   
>> Ross wrote:
>> 
>>> The problem is they might publish these numbers, but we really have  
>>> no way of controlling what number manufacturers will choose to use  
>>> in the future.
>>>
>>> If for some reason future 500GB drives all turn out to be slightly  
>>> smaller than the current ones you're going to be stuck.  Reserving  
>>> 1-2% of space in exchange for greater flexibility in replacing  
>>> drives sounds like a good idea to me.  As others have said, RAID  
>>> controllers have been doing this for long enough that even the very  
>>> basic models do it now, and I don't understand why such simple  
>>> features like this would be left out of ZFS.
>>>
>>>
>>>   
>> I have added the following text to the best practices guide:
>>
>> * When a vdev is replaced, the size of the replacements vdev, measured
>> by usable
>> sectors, must be the same or greater than the vdev being replaced.  
>> This
>> can be
>> confusing when whole disks are used because different models of  
>> disks may
>> provide a different number of usable sectors. For example, if a pool  
>> was
>> created
>> with a "500 GByte" drive and you need to replace it with another "500
>> GByte"
>> drive, then you may not be able to do so if the drives are not of the
>> same make,
>> model, and firmware revision. Consider planning ahead and reserving  
>> some
>> space
>> by creating a slice which is smaller than the whole disk instead of  
>> the
>> whole disk.
>> 
>
> Creating a slice, instead of using the whole disk, will cause ZFS to  
> not enable write-caching on the underlying device.
>   

Correct.  Engineering trade-off.  Since most folks don't read the manual,
or the best practices guide, until after they've hit a problem, it is 
really
just a CYA entry :-(

BTW, I also added a quick link to CR 4852783, reduce pool capacity, which
is the feature which has a good chance of making this point moot.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Adam Leventhal
> Since it's done in software by HDS, NetApp, and EMC, that's complete
> bullshit.  Forcing people to spend 3x the money for a "Sun" drive that's
> identical to the seagate OEM version is also bullshit and a piss-poor
> answer.

I didn't know that HDS, NetApp, and EMC all allow users to replace their
drives with stuff they've bought at Fry's. Is this still covered by their
service plan or would this only be in an unsupported config?

Thanks.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disks in each RAIDZ group

2009-01-19 Thread Adam Leventhal
> "The recommended number of disks per group is between 3 and 9. If you have
> more disks, use multiple groups."
> 
> Odd that the Sun Unified Storage 7000 products do not allow you to control
> this, it appears to put all the hdd's into one group.  At least on the 7110
> we are evaluating there is no control to allow multiple groups/different
> raid types.

Our experience has shown that that initial guess of 3-9 per parity device was
surprisingly narrow. We see similar performance out to much wider stripes
which, of course, offer the user more usable capacity.

We don't allow you to manually set the RAID stripe widths on the 7000 series
boxes because frankly the stripe width is an implementation detail. If you
want the best performance, choose mirroring; capacity, double-parity RAID;
for something in the middle, we offer 3+1 single-parity RAID. Other than
that you're micro-optimizing for gains that would hardly be measurable given
the architecture of the Hybrid Storage Pool. Recall that unlike other
products in the same space, we get our IOPS from flash rather than from
a bazillion spindles spinning at 15,000 RPM.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disks in each RAIDZ group

2009-01-19 Thread Andrew Gabriel
Asif Iqbal wrote:
> On Mon, Jan 19, 2009 at 10:47 AM, Andrew Gabriel  
> wrote:
>> I've seen a webpage (a blog, IIRC) which compares the performance of
>> RAIDZ with differing numbers of disks in each RAIDZ group. I can't now
> 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
> 
> section:
>   RAID-Z Configuration Requirements and Recommendations

Thanks. I had found that, but there is another blog somewhere which 
compared the performance of RAIDZ's built with different values.

>> find this, and can't seem to find the right things to get google to
>> search on. Does anyone recall where this is? ISTR the optimum number of
>> disks was 5-6.

-- 
Cheers
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Disks in each RAIDZ group

2009-01-19 Thread Thomas J. Kiblin
Andrew,

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#RAID-Z_Configuration_Requirements_and_Recommendations
 

"The recommended number of disks per group is between 3 and 9. If you have more 
disks, use multiple groups."

Odd that the Sun Unified Storage 7000 products do not allow you to control 
this, it appears to put all the hdd's into one group.  At least on the 7110 we 
are evaluating there is no control to allow multiple groups/different raid 
types.

Tom


- "Andrew Gabriel"  wrote:

> I've seen a webpage (a blog, IIRC) which compares the performance of 
> RAIDZ with differing numbers of disks in each RAIDZ group. I can't now
> 
> find this, and can't seem to find the right things to get google to 
> search on. Does anyone recall where this is? ISTR the optimum number
> of 
> disks was 5-6.
> 
> -- 
> Cheers
> Andrew
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Jim Dunham
Richard,

> Ross wrote:
>> The problem is they might publish these numbers, but we really have  
>> no way of controlling what number manufacturers will choose to use  
>> in the future.
>>
>> If for some reason future 500GB drives all turn out to be slightly  
>> smaller than the current ones you're going to be stuck.  Reserving  
>> 1-2% of space in exchange for greater flexibility in replacing  
>> drives sounds like a good idea to me.  As others have said, RAID  
>> controllers have been doing this for long enough that even the very  
>> basic models do it now, and I don't understand why such simple  
>> features like this would be left out of ZFS.
>>
>>
>
> I have added the following text to the best practices guide:
>
> * When a vdev is replaced, the size of the replacements vdev, measured
> by usable
> sectors, must be the same or greater than the vdev being replaced.  
> This
> can be
> confusing when whole disks are used because different models of  
> disks may
> provide a different number of usable sectors. For example, if a pool  
> was
> created
> with a "500 GByte" drive and you need to replace it with another "500
> GByte"
> drive, then you may not be able to do so if the drives are not of the
> same make,
> model, and firmware revision. Consider planning ahead and reserving  
> some
> space
> by creating a slice which is smaller than the whole disk instead of  
> the
> whole disk.

Creating a slice, instead of using the whole disk, will cause ZFS to  
not enable write-caching on the underlying device.

- Jim

>
>
>> Fair enough, for high end enterprise kit where you want to squeeze  
>> every byte out of the system (and know you'll be buying Sun  
>> drives), you might not want this, but it would have been trivial to  
>> turn this off for kit like that.  It's certainly a lot easier to  
>> expand a pool than shrink it!
>>
>
> Actually, enterprise customers do not ever want to squeeze every  
> byte, they
> would rather have enough margin to avoid such issues entirely.  This  
> is what
> I was referring to earlier in this thread wrt planning.
> -- richard
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Richard Elling
Ross wrote:
> The problem is they might publish these numbers, but we really have no way of 
> controlling what number manufacturers will choose to use in the future.
>
> If for some reason future 500GB drives all turn out to be slightly smaller 
> than the current ones you're going to be stuck.  Reserving 1-2% of space in 
> exchange for greater flexibility in replacing drives sounds like a good idea 
> to me.  As others have said, RAID controllers have been doing this for long 
> enough that even the very basic models do it now, and I don't understand why 
> such simple features like this would be left out of ZFS.
>
>   

I have added the following text to the best practices guide:

* When a vdev is replaced, the size of the replacements vdev, measured 
by usable
sectors, must be the same or greater than the vdev being replaced. This 
can be
confusing when whole disks are used because different models of disks may
provide a different number of usable sectors. For example, if a pool was 
created
with a "500 GByte" drive and you need to replace it with another "500 
GByte"
drive, then you may not be able to do so if the drives are not of the 
same make,
model, and firmware revision. Consider planning ahead and reserving some 
space
by creating a slice which is smaller than the whole disk instead of the 
whole disk.

> Fair enough, for high end enterprise kit where you want to squeeze every byte 
> out of the system (and know you'll be buying Sun drives), you might not want 
> this, but it would have been trivial to turn this off for kit like that.  
> It's certainly a lot easier to expand a pool than shrink it!
>   

Actually, enterprise customers do not ever want to squeeze every byte, they
would rather have enough margin to avoid such issues entirely.  This is what
I was referring to earlier in this thread wrt planning.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (no subject)

2009-01-19 Thread Fredrich Maney
It really is sad when you have to start filtering technical mailing
lists to weed out the junk.

On Sun, Jan 18, 2009 at 4:17 PM, JZ  wrote:
> Obama just made a good speech.
> I hope you were watching TV...
>
> Best,
> z
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding ZFS replication

2009-01-19 Thread Blake
This make sense. Given set of devices, ZFS can only write to free
blocks.  If the only free blocks are close together or on the same
dev, then the protection can't be as great.  This is quite likely to
happen on a fullish disk.  copies > 1, however, is still better than
none (a single dropped block in the right place can wreak havoc).

I personally like to use the 'copies' feature on machines where
allocation priority for devices is for my storage pool, rather than my
root pool.  I like the idea that I can have multiple copies of blocks
on my (single) boot device.  This also work nicely because on a
Solaris machine with lots of memory, I don't have to write to the disk
much after boot, so the performance penalty seems fairly small.  I
have this running right now in one case.  When I get the ability to
mirror my rpool, I can remove the copies property if I wish.

One other important caveat is that zfs properties only apply to
newly-written data.  So setting copies > 1 after an install won't make
copies of the blocks you did the initial install to, just the block
written going forward.

cheers,
Blake

On Mon, Jan 19, 2009 at 1:04 AM, Carson Gaspar  wrote:
> Bob Friesenhahn wrote:
>> On Sun, 18 Jan 2009, Tim wrote:
>>
>>> Honestly, I believe this list... when other people have asked if they can
>>> use the copies= to avoid mirroring everything.  I can't say I've saved any
>>> of the threads because they didn't seem of any particular importance to me
>>> at the time.
>>
>> The extra copies help avoid data loss, but if a disk is lost and there
>> is no disk-wise redundancy, then the pool will be lost.
>
> I'm reading a lot of posts where folks don't seem to be understanding
> each other, so let me try and re-phrase things.
>
> If you set copies=n, where n > 1, ZFS will _attempt_ to put the copies
> on different block devices. If it can't, it will _attempt_ to place the
> copies "far" away from each other on the same block device.
>
> The key word above is "attempt". Previous posters have shot this down
> for "poor man's mirroring" because of the lack of guarantees. I suspect
> these naysayers (and rightly so) are what Tim is recalling.
>
> --
> Carson
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disks in each RAIDZ group

2009-01-19 Thread Asif Iqbal
On Mon, Jan 19, 2009 at 10:47 AM, Andrew Gabriel  wrote:
> I've seen a webpage (a blog, IIRC) which compares the performance of
> RAIDZ with differing numbers of disks in each RAIDZ group. I can't now

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

section:
  RAID-Z Configuration Requirements and Recommendations

> find this, and can't seem to find the right things to get google to
> search on. Does anyone recall where this is? ISTR the optimum number of
> disks was 5-6.
>
> --
> Cheers
> Andrew
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Disks in each RAIDZ group

2009-01-19 Thread Andrew Gabriel
I've seen a webpage (a blog, IIRC) which compares the performance of 
RAIDZ with differing numbers of disks in each RAIDZ group. I can't now 
find this, and can't seem to find the right things to get google to 
search on. Does anyone recall where this is? ISTR the optimum number of 
disks was 5-6.

-- 
Cheers
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS tale of woe and fail

2009-01-19 Thread Blake
You can get a sort of redundancy by creating multiple filesystems with
'copies' enabled on the ones that need some sort of self-healing in
case of bad blocks.

Is it possible to at least present your disks as several LUNs?  If you
must have an abstraction layer between ZFS and the block device,
presenting ZFS with a plurality of abstracted devices would let you
get some sort of parity...or is this device live and in production?

I do think that, though ZFS doesn't need fsck in the traditional
sense, some sort of recovery tool would make storage admins even
happier about using ZFS.

cheers,
Blake

On Mon, Jan 19, 2009 at 4:09 AM, Tom Bird  wrote:
> Toby Thain wrote:
>> On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote:
>>
>>> Hey, Tom -
>>>
>>> Correct me if I'm wrong here, but it seems you are not allowing ZFS any
>>> sort of redundancy to manage.
>
> Every other file system out there runs fine on a single LUN, when things
> go wrong you have a fsck utility that patches it up and the world keeps
> on turning.
>
> I can't find anywhere that will sell me a 48 drive SATA JBOD with all
> the drives presented on a single SAS channel, so running on a single
> giant LUN is a real world scenario that ZFS should be able to cope with,
> as this is how the hardware I am stuck with is arranged.
>
>> Which is particularly catastrophic when one's 'content' is organized as
>> a monolithic file, as it is here - unless, of course, you have some way
>> of scavenging that file based on internal structure.
>
> No, it's not a monolithic file, the point I was making there is that no
> files are showing up.
>
 r...@cs4:~# find /content
 /content
 r...@cs4:~# (yes that really is it)
>
> thanks
> --
> Tom
>
> // www.portfast.co.uk -- internet services and consultancy
> // hosting from 1.65 per domain
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Blake
I'm going waaay out on a limb here, as a non-programmer...but...

Since the source is open, maybe community members should organize and
work on some sort of sizing algorithm?  I can certainly imagine Sun
deciding to do this in the future - I can also imagine that it's not
at the top of Sun's priority list (most of the devices they deal with
are their own, and perhaps not subject to the right-sizing issue).  If
it matters to the community, why not, as a community, try to
fix/improve zfs in this way?

Again, I've not even looked at the code for block allocation or
whatever it might be called in this case, so I could be *way* off here
:)

Lastly, Antonius, you can try the zpool trick to get this disk
relabeled, I think.  Try 'zpool create temp_pool [problem_disk]' then
'zpool destroy temp_pool]' - this should relabel the disk in question
and set up the defaults that zfs uses.  Can you also run format >
partition > print on one of the existing disks and send the output so
that we can see what the existing disk looks like? (Off-list directly
to me if you prefer).

cheers,
Blake
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding ZFS replication

2009-01-19 Thread Sean Sprague
Z,

> Beloved Tim,
> You challenged me a while ago, as a friend.
> I did what you asked me to do, in the honor of my father.
>  
> Best,
> z

Please don't post personal stuff like this or links to wikipedia or 
other ephemera/apocrypha to this/any list unless they are relevant.

Thanks... Sean.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Ross
The problem is they might publish these numbers, but we really have no way of 
controlling what number manufacturers will choose to use in the future.

If for some reason future 500GB drives all turn out to be slightly smaller than 
the current ones you're going to be stuck.  Reserving 1-2% of space in exchange 
for greater flexibility in replacing drives sounds like a good idea to me.  As 
others have said, RAID controllers have been doing this for long enough that 
even the very basic models do it now, and I don't understand why such simple 
features like this would be left out of ZFS.

Fair enough, for high end enterprise kit where you want to squeeze every byte 
out of the system (and know you'll be buying Sun drives), you might not want 
this, but it would have been trivial to turn this off for kit like that.  It's 
certainly a lot easier to expand a pool than shrink it!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS size is different ?

2009-01-19 Thread Roch
Chookiex writes:
 > Hi all,
 > 
 > I have 2 questions about ZFS.
 > 
 > 1. I have create a snapshot in my pool1/data1, and zfs send/recv it to 
 > pool2/data2. but I found the USED in zfs list is different:
 > NAME   USED  AVAIL  REFER  MOUNTPOINT
 > pool2/data2 160G  1.44T   159G  /pool2/data2
 > pool1/data 176G   638G   175G /pool1/data1
 > 
 > It keep about 30,000,000 files.
 > The content of  p_pool/p1 and backup/p_backup is almost same. But why is the 
 > size different?
 > 

160G for 30M files means your avg file size is 5333 Bytes.

Pick one such files just for illustration:  5333 Bytes to be
stored on raid-z2  of  5 disks (3+2). So  you  have to store
5333 Bytes of data onto 3 data disks. You will need a stripe
of 4 x 512B sectors  on each of  the 3 data disks. So that's
6K of data.

Over a single volume, you'd need 11 sectors of 512B to store
5632 Bytes.

For this avg file size you thus have either 12 or 11 sectors
to store the data, a 9% difference.

You then need  to tack the extra  parity blocks. For raid-z2
is  a double parity  scheme  whereas raid-5 is single parity
(and will only survice a single disk failure).

Depending on how these  parity blocks are accounted for  and
your exact files  size distribution, the difference you note
does not appear unwaranted.




 > 2.  /pool2/data2 is a RAID5 Disk Array with 8 disks, and , and /pool1/data1 
 > is a RAIDZ2 with 5 disks.
 > The configure like this:
 > 
 > NAMESTATE READ WRITE CKSUM
 > pool2  ONLINE   0 0 0
 >   c7t10d0   ONLINE   0 0 0
 > 
 > 
 > NAME  STATE READ WRITE CKSUM
 > pool1   ONLINE   0 0 0
 >   raidz2  ONLINE   0 0 0
 > c3t2d0ONLINE   0 0 0
 > c3t1d0ONLINE   0 0 0
 > c3t3d0  ONLINE   0 0 0
 > c3t4d0  ONLINE   0 0 0
 > c3t5d0  ONLINE   0 0 0
 > 
 > We found that pool1 is more slow than pool2, even with the same number of 
 > disks.
 > So, which is better between RAID5 + ZFS and RAIDZ + ZFS?
 > 

Uncached  RAID-5 random  read   is expected to  deliver more
total random read IOPS than uncached Raid-Z.

The downside  if  using single raid-5   volume is that if  a
checksum error is ever detected by ZFS, ZFS report the error
but will not be able to correct data blocks (metadata blocks
are stored redundantly and will be corrected).


-r



 > 
 >  york", "times", serif;font-size:12pt">Hi all,I have 2 
 > questions about ZFS.1. I have create a snapshot in 
 > my pool1/data1, and zfs send/recv it to pool2/data2. but I found the USED in 
 > zfs list is different:NAME   USED  AVAIL  
 > REFER  MOUNTPOINTpool2/data2 160G  1.44T   159G  
 > /pool2/data2  pool1/data 176G   638G   175G 
 > /pool1/data1  It keep about 30,000,000 
 > files.The content of  p_pool/p1 and backup/p_backup is almost 
 > same. But why is the size different?2.  
 > /pool2/data2 is a RAID5 Disk Array with 8 disks, and , and /pool1/data1 is a 
 > RAIDZ2 with 5 disks.The configure like this:   
 >  NAMESTATE
 >  READ WRITE CKSUM    pool2  ONLINE   0 0 0   
 >c7t10d0   ONLINE   0 0 0    NAME   
 >STATE READ WRITE CKSUM    pool1   ONLINE   0 
 > 0 0  raidz2  ONLINE   0 0 0    
 > c3t2d0ONLINE   0 0 0    c3t1d0ONLINE   0 
 > 0 0    c3t3d0  ONLINE   0 0 0    
 > c3t4d0  ONLINE   0 0 0    c3t5d0  ONLINE   0 
 > 0 0We found that pool1 is more slow than 
 > pool2, even with the same number of disks.So, which is better 
 > between RAID5 + ZFS and RAIDZ + 
 > ZFS?
 > 
 > 
 > 
 >   
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Antonius
yes, it's the same make and model as most of the other disks in the zpool and 
reports the same number of sectors
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS tale of woe and fail

2009-01-19 Thread Tom Bird
Toby Thain wrote:
> On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote:
> 
>> Hey, Tom -
>>
>> Correct me if I'm wrong here, but it seems you are not allowing ZFS any
>> sort of redundancy to manage.

Every other file system out there runs fine on a single LUN, when things
go wrong you have a fsck utility that patches it up and the world keeps
on turning.

I can't find anywhere that will sell me a 48 drive SATA JBOD with all
the drives presented on a single SAS channel, so running on a single
giant LUN is a real world scenario that ZFS should be able to cope with,
as this is how the hardware I am stuck with is arranged.

> Which is particularly catastrophic when one's 'content' is organized as
> a monolithic file, as it is here - unless, of course, you have some way
> of scavenging that file based on internal structure.

No, it's not a monolithic file, the point I was making there is that no
files are showing up.

>>> r...@cs4:~# find /content
>>> /content
>>> r...@cs4:~# (yes that really is it)

thanks
-- 
Tom

// www.portfast.co.uk -- internet services and consultancy
// hosting from 1.65 per domain
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss