date:20100320

Re: [zfs-discuss] Rethinking my zpool

2010-03-20 Thread Brandon High

On Sat, Mar 20, 2010 at 1:35 PM, Richard Elling wrote:

> For those disinclined to click, data retention when mirroring wins over
> raidz
> when looking at the problem from the perspective of number of drives
> available.  Why? Because 5+1 raidz survives the loss of any disk, but 3
> sets
> of 2-way mirrors can survive the loss of 3 disks, as long as 2 of those
> disks
> are not in the same set. The rest is just math.
>

The one dimension left out in your comparison is the portion of space that's
available for use vs. redundancy overhead. I'm sure you just never thought
of it. ;-)

For 12 disks using a 4-way mirror, you'd have 75% overhead but the best
MTTDL. raidz3 is only 25% overhead, but provides a better MTTDL than 3-way
mirrors (at 66% overhead). raidz2 (16% overhead) has better MTTDL than 2-way
mirrors (at 50%).

So clearly, if fault tolerance is the absolute most important factor, a
really big mirror is best. This will also give very good read performance. I
imagine a 12-way mirror would last a while (2.09E+57 years according to
Richard's formula) but it's also at high cost.

I think the only real route to follow is to determine how much space you
need, and then optimize MTTDL and performance around that constraint. If you
determine that you need 10 TB available, then (using 1.5T drives) you need
to use at least 7 disks for data. That means a 12-disk raidz3 (13.5 TB), or
2x 6-disk raidz2 (12 TB). The raidz3 will have higher fault tolerance, but
lower performance.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intel SASUC8I - worth every penny

2010-03-20 Thread Erik Trimble

Nah, the 8x2.5"-in-2 are $220, while the 5x3.5"-in-3 are $120.  You can 
get 4x3.5"-in-3 for $100, 3x3.5"-in-2 for $80, and even 4x2.5"-in-1 for 
$65.  ( http://www.addonics.com/products/raid_system/ae4rcs25nsa.asp )



The Cool Master thing you linked to isn't a Hot Swap module. It does 
4-in-3, but there's no backplane. You can't hot-swap drives put into 
that sucker.


-Erik



Ethan wrote:
Whoops, Erik's links show I was wrong about my first point. Though 
those 5-in-3s are five times as expensive as the 4-in-3. 

On Sat, Mar 20, 2010 at 22:46, Ethan > wrote:


I don't think you can fit five 3.5" drives in 3 x 5.25", but I
have a number of coolermaster 4-in-3 modules, I recommend them:
http://www.amazon.com/-/dp/B00129CDGC/


On Sat, Mar 20, 2010 at 20:23, Geoff mailto:geoffakerl...@gmail.com>> wrote:

Thanks for your review!  My SiI3114 isn't recognizing drives
in Opensolaris so I've been looking for a replacement.  This
card seems perfect so I ordered one last night.  Can anyone
recommend a cheap 3 x 5.25 ---> 5 3.5 enclosure I could use
with this card?  The extra ports necessitate more drives,
obviously :)
--
This message posted from opensolaris.org 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org 
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intel SASUC8I - worth every penny

2010-03-20 Thread Ethan

Whoops, Erik's links show I was wrong about my first point. Though those
5-in-3s are five times as expensive as the 4-in-3.

On Sat, Mar 20, 2010 at 22:46, Ethan  wrote:

> I don't think you can fit five 3.5" drives in 3 x 5.25", but I have a
> number of coolermaster 4-in-3 modules, I recommend them:
> http://www.amazon.com/-/dp/B00129CDGC/
>
>
> On Sat, Mar 20, 2010 at 20:23, Geoff  wrote:
>
>> Thanks for your review!  My SiI3114 isn't recognizing drives in
>> Opensolaris so I've been looking for a replacement.  This card seems perfect
>> so I ordered one last night.  Can anyone recommend a cheap 3 x 5.25 ---> 5
>> 3.5 enclosure I could use with this card?  The extra ports necessitate more
>> drives, obviously :)
>> --
>> This message posted from opensolaris.org
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intel SASUC8I - worth every penny

2010-03-20 Thread Ethan

I don't think you can fit five 3.5" drives in 3 x 5.25", but I have a number
of coolermaster 4-in-3 modules, I recommend them:
http://www.amazon.com/-/dp/B00129CDGC/

On Sat, Mar 20, 2010 at 20:23, Geoff  wrote:

> Thanks for your review!  My SiI3114 isn't recognizing drives in Opensolaris
> so I've been looking for a replacement.  This card seems perfect so I
> ordered one last night.  Can anyone recommend a cheap 3 x 5.25 ---> 5 3.5
> enclosure I could use with this card?  The extra ports necessitate more
> drives, obviously :)
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intel SASUC8I - worth every penny

2010-03-20 Thread Erik Trimble


Geoff wrote:

Thanks for your review!  My SiI3114 isn't recognizing drives in Opensolaris so 
I've been looking for a replacement.  This card seems perfect so I ordered one 
last night.  Can anyone recommend a cheap 3 x 5.25 ---> 5 3.5 enclosure I could 
use with this card?  The extra ports necessitate more drives, obviously :)
  
You may need to replace the "RAID" bios with the "IDE" bios, for the 
Sil3114.


http://www.siliconimage.com/support/searchresults.aspx?pid=28&cat=15

Get the flash tool, plus the "IDE BIOS" download, and flash that to your 
card. It should then work well, and provide OpenSolaris with what it 
really wants - a JBOD controller, rather than a sort-kinda-fake-Raid 
controller.



That said, the LSI-based HBA really is the thing you want. It's nice.  :-)

I've moved to 7200RPM 2.5" laptop drives over 3.5" drives, for a 
combination of reasons:  lower-power, better performance than a 
comparable sized 3.5" drives, and generally lower-capacities meaning 
resilver times are smaller. They're a bit more $/GB, but not a lot.
If you can stomach the extra cost (they run $220), I'd actually 
recommend getting a 8x2.5" in 2x5.25" enclosure from Supermicro.  It 
works nicely, plus it gives you a nice little place to put your SSD.   :-)


http://www.supermicro.com/products/accessories/mobilerack/CSE-M28E1.cfm


Other than that, I've had good luck with the Venus series for 3.5" 
Hotswap drives:


http://www.centralcomputers.com/commerce/catalog/product.jsp?product_id=59195
http://www.newegg.com/Product/Product.aspx?Item=N82E16817332011



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Bill Sommerfeld


On 03/19/10 19:07, zfs ml wrote:

What are peoples' experiences with multiple drive failures?


1985-1986.  DEC RA81 disks.  Bad glue that degraded at the disk's 
operating temperature.  Head crashes.  No more need be said.


- Bill



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Robert Milkowski



To add my 0.2 cents...

I think starting/stopping scrub belongs to cron, smf, etc. and not to 
zfs itself.


However what would be nice to have is an ability to freeze/resume a 
scrub and also limit its rate of scrubbing.
One of the reason is that when working in SAN environments one have to 
take into account more that just a server where a scrub will be running 
as while it might not impact the server  it might cause an issue for 
others, etc.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intel SASUC8I - worth every penny

2010-03-20 Thread Geoff

Thanks for your review!  My SiI3114 isn't recognizing drives in Opensolaris so 
I've been looking for a replacement.  This card seems perfect so I ordered one 
last night.  Can anyone recommend a cheap 3 x 5.25 ---> 5 3.5 enclosure I could 
use with this card?  The extra ports necessitate more drives, obviously :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Tim Cook

On Sat, Mar 20, 2010 at 5:36 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Sat, 20 Mar 2010, Tim Cook wrote:
>
>>
>> Funny (ironic?) you'd quote the UNIX philosophy when the Linux folks have
>> been running around since day
>> one claiming the basic concept of ZFS fly's in the face of that very
>> concept.  Rather than do one thing
>> well, it's unifying two things (file system and raid/disk management) into
>> one.  :)
>>
>
> Most software introduced in Linux clearly violates the "UNIX philosophy".
>  Instead of small and simple parts we have huge and complex parts, with many
> programs requiring 70 or 80 libraries in order to run.  Zfs's intermingling
> of layers is benign in comparison.
>
>
> Bob
>
>
You can take that up with them :)  I'm just pointing out the obvious irony
of  claiming separation as an excuse for not adding features when the
product is based on the very idea of unification of
layers/features/functionality.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Svein Skogen


On 21.03.2010 00:14, Erik Trimble wrote:

Richard Elling wrote:

I see this on occasion. However, the cause is rarely attributed to a bad
batch of drives. More common is power supplies, HBA firmware, cables,
Pepsi syndrome, or similar.
-- richard

Mmmm. Pepsi Syndrome. I take it this is similar to the Coke addiction
many of my keyboards have displayed, going to great lengths to make sure
that I pour at least a half can into them at the least convenient time?

Also, see the related disease, C>N>S (Coke through Nose onto Screen).


Not to mention "sysadmin having a bad day, tower frontdoor 
dented"-syndrome. ;)


//Svein

--

Sending mail from a temporary set up workstation, as my primary W500 is 
off for service. PGP not installed.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Erik Trimble


Richard Elling wrote:

I see this on occasion. However, the cause is rarely attributed to a bad
batch of drives. More common is power supplies, HBA firmware, cables,
Pepsi syndrome, or similar.
 -- richard
  
Mmmm. Pepsi Syndrome.  I take it this is similar to the Coke addiction 
many of my keyboards have displayed, going to great lengths to make sure 
that I pour at least a half can into them at the least convenient time?


Also, see the related disease, C>N>S (Coke through Nose onto Screen).

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Q : recommendations for zpool configuration

2010-03-20 Thread Bob Friesenhahn


On Sat, 20 Mar 2010, Eric Andersen wrote:


2.  Taking into account the above, it's a great deal easier on the 
pocket book to expand two drives at a time instead of four at a 
time.  As bigger drives are always getting cheaper, I feel that I 
have a lot more flexibility with mirrors when it comes to expanding. 
If you have limitless physical space for drives, you might feel 
differently.


I agree with your arguments.  Just make sure that you have a way to 
expand a mirror pair without losing redundancy.  For example, make 
sure that there is a way to add a new device to act as the replacement 
without taking existing devices off line.  Otherwise there is some 
possibility of data loss during the replacement.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Bob Friesenhahn


On Sat, 20 Mar 2010, Tim Cook wrote:


Funny (ironic?) you'd quote the UNIX philosophy when the Linux folks have been 
running around since day
one claiming the basic concept of ZFS fly's in the face of that very concept.  
Rather than do one thing
well, it's unifying two things (file system and raid/disk management) into one. 
 :)


Most software introduced in Linux clearly violates the "UNIX 
philosophy".  Instead of small and simple parts we have huge and 
complex parts, with many programs requiring 70 or 80 libraries in 
order to run.  Zfs's intermingling of layers is benign in comparison.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Svein Skogen


On 20.03.2010 23:00, Gary Gendel wrote:

I'm not sure I like this at all.  Some of my pools take hours to scrub.  I have 
a cron job run scrubs in sequence...  Start one pool's scrub and then poll 
until it's finished, start the next and wait, and so on so I don't create too 
much load and bring all I/O to a crawl.

The job is launched once a week, so the scrubs have plenty of time to finish. :)

Scrubs every hour?  Some of my pools would be in continuous scrub.


If I'm not mistaken, I suggested a default value of 168 hours, which is 
... a week. ;)


//Svein
--

Sending mail from a temporary set up workstation, as my primary W500 is 
off for service. PGP not installed.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Q : recommendations for zpool configuration

2010-03-20 Thread Eric Andersen

I went through this determination when setting up my pool.  I decided to go 
with mirrors instead of raidz2 after considering the following:

1.  Drive capacity in my box.  At most, I can realistically cram 10 drives in 
my box and I am not interested in expanding outside of the box.  I could go 
with 2.5 inch drives and fit a lot more, but I don't feel the necessity to do 
so.  That being said, given the historic trend for mass storage drives to 
become cheaper over time, I have a feeling that I will be replacing drives to 
expand storage space long before the drives themselves start failing.  The 
added redundancy of raidz2 is great, but I am betting that, barring a poorly 
manufactured drive, I will be replacing the drives with bigger drives before 
they have a chance to reach the end of their life.

2.  Taking into account the above, it's a great deal easier on the pocket book 
to expand two drives at a time instead of four at a time.  As bigger drives are 
always getting cheaper, I feel that I have a lot more flexibility with mirrors 
when it comes to expanding.  If you have limitless physical space for drives, 
you might feel differently. 

3.  Mirrors are going to perform better than raidz.  Again, redundancy is 
great, but so is performance.  My setup is for home use.  I want to keep my 
data safe but at the same time I am limited by cost and space.  I think that 
given the tradeoff between the two, mirrors win.  I feel that the chances of 
two drives in a mirror failing simultaneously are remote enough that I'll take 
the risk.

4.  Again, I'm running this at home.  It's not mission critical to me to have 
my data available 24/7.  Redundancy is a convenience and not a necessity.  
Regardless of what you choose, backups are what will save your ass in the event 
of catastrophe.  Having said that, I currently don't have a good backup 
solution and how to implement a good backup solution seems to be a hot topic on 
this list lately.  Figuring out how to easily, effectively and cheaply back up 
multiple terabytes of storage is my number one priority at the moment.

So anyways, all things considered, I prefer the better performance and easier 
expansion of storage space vs my physical space over a relatively small layer 
of extra redundancy.  If you aren't doing anything that necessitates the added 
redundancy of raidz2, go with mirrors.  Either way, if you care about your 
data, back it up.

eric
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Tim Cook

On Sat, Mar 20, 2010 at 5:00 PM, Gary Gendel  wrote:

> I'm not sure I like this at all.  Some of my pools take hours to scrub.  I
> have a cron job run scrubs in sequence...  Start one pool's scrub and then
> poll until it's finished, start the next and wait, and so on so I don't
> create too much load and bring all I/O to a crawl.
>
> The job is launched once a week, so the scrubs have plenty of time to
> finish. :)
>
> Scrubs every hour?  Some of my pools would be in continuous scrub.
>
>
Who said anything about scrubs every hour?  I see he mentioned hour being
the granularity of the frequency, but that hardly means you'd HAVE to run
scrubs every hour.  Nobody is stopping you from setting it to 3600 hours if
you so choose.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Tim Cook

On Sat, Mar 20, 2010 at 4:00 PM, Richard Elling wrote:

> On Mar 20, 2010, at 12:07 PM, Svein Skogen wrote:
> > We all know that data corruption may happen, even on the most reliable of
> hardware. That's why zfs har pool scrubbing.
> >
> > Could we introduce a zpool option (as in zpool set  )
> for "scrub period", in "number of hours" (with 0 being no automatic
> scrubbing).
>
> Currently you can do this with cron, of course (or at).  The ZFS-based
> appliances
> in the market offer simple ways to manage such jobs -- NexentaStor,
> Oracle's Sun
> OpenStorage, etc.
>


Right, but I rather agree with Svein.  It would be nice to have it
integrated.  I would argue at the very least, it should become an integrated
service much like auto-snapshot (which could/was also done from cron).
 Doing a basic cron means if you have lots of pools, you might start
triggering several scrubs at the same time, which may or may not crush the
system with I/O load.  So the answer is "well then query to see if the last
scrub is done", and suddenly we've gone from a simple cron job to custom
scripting based on what could be a myriad of variables.



>
> > I see several modern raidcontrollers (such as the LSI Megaraid MFI line)
> has such features (called "patrol reads") already built into them. Why
> should zfs have the same? Having the zpool automagically handling this
> (probably a good thing to default it on 168 hours or one week) would also
> mean that the scrubbing feature is independent from cron, and since scrub
> already has lower priority than ... actual work, it really shouldn't annoy
> anybody (except those having their server under their bed).
> >
> > Of course I'm more than willing to stand corrected if someone can tell me
> where this is already implemented, or why it's not needed. Proper flames
> over this should start with a "warning, flame" header, so I can don my
> asbestos longjohns. ;)
>
> Prepare your longjohns!  Ha!
> Just kidding... the solution exists, just turn it on.  And remember the
> UNIX philosophy.
> http://en.wikipedia.org/wiki/Unix_philosophy
>  -- richard
>
>
Funny (ironic?) you'd quote the UNIX philosophy when the Linux folks have
been running around since day one claiming the basic concept of ZFS fly's in
the face of that very concept.  Rather than do one thing well, it's unifying
two things (file system and raid/disk management) into one.  :)

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Gary Gendel

I'm not sure I like this at all.  Some of my pools take hours to scrub.  I have 
a cron job run scrubs in sequence...  Start one pool's scrub and then poll 
until it's finished, start the next and wait, and so on so I don't create too 
much load and bring all I/O to a crawl.

The job is launched once a week, so the scrubs have plenty of time to finish. :)

Scrubs every hour?  Some of my pools would be in continuous scrub.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Richard Elling

On Mar 20, 2010, at 12:07 PM, Svein Skogen wrote:
> We all know that data corruption may happen, even on the most reliable of 
> hardware. That's why zfs har pool scrubbing.
> 
> Could we introduce a zpool option (as in zpool set  ) for 
> "scrub period", in "number of hours" (with 0 being no automatic scrubbing).

Currently you can do this with cron, of course (or at).  The ZFS-based 
appliances
in the market offer simple ways to manage such jobs -- NexentaStor, Oracle's 
Sun 
OpenStorage, etc.

> I see several modern raidcontrollers (such as the LSI Megaraid MFI line) has 
> such features (called "patrol reads") already built into them. Why should zfs 
> have the same? Having the zpool automagically handling this (probably a good 
> thing to default it on 168 hours or one week) would also mean that the 
> scrubbing feature is independent from cron, and since scrub already has lower 
> priority than ... actual work, it really shouldn't annoy anybody (except 
> those having their server under their bed).
> 
> Of course I'm more than willing to stand corrected if someone can tell me 
> where this is already implemented, or why it's not needed. Proper flames over 
> this should start with a "warning, flame" header, so I can don my asbestos 
> longjohns. ;)

Prepare your longjohns!  Ha!
Just kidding... the solution exists, just turn it on.  And remember the UNIX 
philosophy.
http://en.wikipedia.org/wiki/Unix_philosophy
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Bob Friesenhahn


On Fri, 19 Mar 2010, zfs ml wrote:

same enclosure, same rack, etc for a given raid 5/6/z1/z2/z3 system, should 
we be paying more attention to harmonics, vibration/isolation and 
non-intuitive system level statistics that might be inducing close proximity 
drive failures rather than just throwing more parity drives at the problem?


Yes.

Perfect symmetry is:

  a) wonderful
  b) evil

?

What is the meaning of "forklift impalement"?  What is the standard 
spacing between forklift tines?


What is the hight of a Hoover vaccum cleaner handle?

Many things that logical engineers do are exessively "Michelangelo" 
when "Picasso" is what is needed.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-20 Thread Richard Elling

On Mar 18, 2010, at 6:28 AM, Darren J Moffat wrote:
> The only tool I'm aware of today that provides a copy of the data, and all of 
> the ZPL metadata and all the ZFS dataset properties is 'zfs send'.


AFAIK, this is correct.  
Further, the only type of tool that can backup a pool is a tool like dd.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Rethinking my zpool

2010-03-20 Thread Richard Elling

On Mar 19, 2010, at 5:32 AM, Chris Dunbar - Earthside, LLC wrote:

> Hello,
> 
> After being immersed in this list and other ZFS sites for the past few weeks 
> I am having some doubts about the zpool layout on my new server. It's not too 
> late to make a change so I thought I would ask for comments. My current plan 
> to to have 12 x 1.5 TB disks in a what I would normally call a RAID 10 
> configuration. That doesn't seem to be the right term here, but there are 6 
> sets of mirrored disks striped together. I know that "smaller" sets of disks 
> are preferred, but how small is small? I am wondering if I should break this 
> into two sets of 6 disks. I do have a 13th disk available as a hot spare. 
> Would it be available for either pool if I went with two? Finally, would I be 
> better off with raidz2 or something else instead of the striped mirrored 
> sets? Performance and fault tolerance are my highest priorities.

Do you believe in coincidence? :-)  I recently blogged about the reliability
analysis using 12 disks as a representative sample.  I didn't add a hot
spare for this analysis, but it would help in all cases.
http://blog.richardelling.com/2010/02/zfs-data-protection-comparison.html

For those disinclined to click, data retention when mirroring wins over raidz
when looking at the problem from the perspective of number of drives 
available.  Why? Because 5+1 raidz survives the loss of any disk, but 3 sets
of 2-way mirrors can survive the loss of 3 disks, as long as 2 of those disks 
are not in the same set. The rest is just math.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Svein Skogen


On 20.03.2010 20:53, Giovanni Tirloni wrote:

On Sat, Mar 20, 2010 at 4:07 PM, Svein Skogen mailto:sv...@stillbilde.net>> wrote:

We all know that data corruption may happen, even on the most
reliable of hardware. That's why zfs har pool scrubbing.

Could we introduce a zpool option (as in zpool set 
) for "scrub period", in "number of hours" (with 0 being no
automatic scrubbing).

I see several modern raidcontrollers (such as the LSI Megaraid MFI
line) has such features (called "patrol reads") already built into
them. Why should zfs have the same? Having the zpool automagically
handling this (probably a good thing to default it on 168 hours or
one week) would also mean that the scrubbing feature is independent
from cron, and since scrub already has lower priority than ...
actual work, it really shouldn't annoy anybody (except those having
their server under their bed).

Of course I'm more than willing to stand corrected if someone can
tell me where this is already implemented, or why it's not needed.
Proper flames over this should start with a "warning, flame" header,
so I can don my asbestos longjohns. ;)


That would add unnecessary code to the ZFS layer for something that cron
can handle in one line.


It would add some code, but it could quite possibly reside in the same 
area that already handles automatic rebuilds (for hotspares). The reason 
I'm thinking it belongs there, is that ZFS has a rather good counter at 
"how many hours of runtime", while cron only has knowledge about wall time.



Someone could hack zfs.c to automatically handle editing the crontab but
I don't know if it's worth the effort.


This would be a possible workaround, but there are ... several 
implementations of cron out there with more than one syntax...



Are you worried that cron will fail or is it just an aesthetic requirement ?


No, I'm thinking more in the lines of "zfs could be ported to pure 
storage boxes that don't really need a lot of other daemons running" 
(ZFS and cronstar with a decent management frontend would beat a _LOT_ 
of the cheap NAS/SAN boxes out there). Besides, I don't like "relying on 
external software" for filesystem-services. ;)  (And you can call that 
aesthetic if you like)


//Svein

--

Sending mail from a temporary set up workstation, as my primary W500 is 
off for service. PGP not installed.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Richard Elling

On Mar 19, 2010, at 7:07 PM, zfs ml wrote:
> Most discussions I have seen about RAID 5/6 and why it stops "working" seem 
> to base their conclusions solely on single drive characteristics and 
> statistics.
> It seems to me there is a missing component in the discussion of drive 
> failures in the real world context of a system that lives in an environment 
> shared by all the system components - for instance, the video of the disks 
> slowing down when they are yelled at is a good visual example of the negative 
> effect of vibration on drives.  http://www.youtube.com/watch?v=tDacjrSCeq4
> 
> I thought the google and CMU papers talked about a surprisingly high (higher 
> than expected) rate of multiple drive failures of drives "nearby" each other, 
> but I couldn't find it when I re-=skimmed the papers now.
> 
> What are peoples' experiences with multiple drive failures? Given that we 
> often use same brand/model/batch drives (even though we are not supposed to), 
> same enclosure, same rack, etc for a given raid 5/6/z1/z2/z3 system, should 
> we be paying more attention to harmonics, vibration/isolation and 
> non-intuitive system level statistics that might be inducing close proximity 
> drive failures rather than just throwing more parity drives at the problem?

Yes :-)
Or to put this another way, when you have components in a system that are 
very reliable, the system failures become dominated by failures that are not 
directly attributed to the components. This is fallout from the notion of 
"synergy"
or the whole is greater than the sum of the parts.

synergy (noun) the interaction or cooperation of two or more organizations, 
substances,
or other agents to produce a combined effect greater than the sum of their 
separate 
effects.

> What if our enclosure and environmental factors increase the system level 
> statistics for multiple drive failures beyond the (used by everyone) single 
> drive failure statistics to the point where it is essentially negating the 
> positive effect of adding parity drives?

Statistical studies or reliability predictions for components do not take into
account causes such as factory contamination, environment, shipping/handling
events, etc.  The math is a lot easier if you can forget about such things.

> I realize this issue is not addressed because there is too much variability 
> in the enviroments, etc but I thought it would be interesting to see if 
> anyone has experienced much in terms of close time proximity, multiple drive 
> failures.

I see this on occasion. However, the cause is rarely attributed to a bad
batch of drives. More common is power supplies, HBA firmware, cables,
Pepsi syndrome, or similar.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Giovanni Tirloni

On Sat, Mar 20, 2010 at 4:07 PM, Svein Skogen  wrote:

> We all know that data corruption may happen, even on the most reliable of
> hardware. That's why zfs har pool scrubbing.
>
> Could we introduce a zpool option (as in zpool set  ) for
> "scrub period", in "number of hours" (with 0 being no automatic scrubbing).
>
> I see several modern raidcontrollers (such as the LSI Megaraid MFI line)
> has such features (called "patrol reads") already built into them. Why
> should zfs have the same? Having the zpool automagically handling this
> (probably a good thing to default it on 168 hours or one week) would also
> mean that the scrubbing feature is independent from cron, and since scrub
> already has lower priority than ... actual work, it really shouldn't annoy
> anybody (except those having their server under their bed).
>
> Of course I'm more than willing to stand corrected if someone can tell me
> where this is already implemented, or why it's not needed. Proper flames
> over this should start with a "warning, flame" header, so I can don my
> asbestos longjohns. ;)
>

That would add unnecessary code to the ZFS layer for something that cron can
handle in one line.

Someone could hack zfs.c to automatically handle editing the crontab but I
don't know if it's worth the effort.

Are you worried that cron will fail or is it just an aesthetic requirement ?

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is this pool recoverable?

2010-03-20 Thread Patrick Tiquet

Thanks for the info. 
I'll try the live CD method when I have access to the system next week.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ISCSI + RAID-Z + OpenSolaris HA

2010-03-20 Thread David Magda


On Mar 20, 2010, at 14:37, Remco Lengers wrote:

You seem to be concerned about the availability? Open HA seems to be  
a package last updated in 2005 (version 0.3.6). (?) It seems to me  
like a real fun toy project to build but I would be pretty reserved  
about the actual availability and putting using these kind of setup  
for production purposes and actually expecting some kind of proper  
uptime.


There are more recent versions available:

In conjunction with the release of OpenSolaris 2009.06, Sun delivers  
Open HA Cluster 2009.06 


http://www.opensolaris.com/learn/features/availability/
http://hub.opensolaris.org/bin/view/Community+Group+ha-clusters/

More info at:

http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss

There's also Solaris Cluster, which is free-as-beer:

http://www.sun.com/software/solaris/cluster/

Support costs.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Svein Skogen

We all know that data corruption may happen, even on the most reliable 
of hardware. That's why zfs har pool scrubbing.


Could we introduce a zpool option (as in zpool set  ) 
for "scrub period", in "number of hours" (with 0 being no automatic 
scrubbing).


I see several modern raidcontrollers (such as the LSI Megaraid MFI line) 
has such features (called "patrol reads") already built into them. Why 
should zfs have the same? Having the zpool automagically handling this 
(probably a good thing to default it on 168 hours or one week) would 
also mean that the scrubbing feature is independent from cron, and since 
scrub already has lower priority than ... actual work, it really 
shouldn't annoy anybody (except those having their server under their bed).


Of course I'm more than willing to stand corrected if someone can tell 
me where this is already implemented, or why it's not needed. Proper 
flames over this should start with a "warning, flame" header, so I can 
don my asbestos longjohns. ;)


//Svein

--

Sending mail from a temporary set up workstation, as my primary W500 is 
off for service. PGP not installed.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is this pool recoverable?

2010-03-20 Thread Sriram Narayanan

On Sun, Mar 21, 2010 at 12:32 AM, Miles Nordin  wrote:
>> "sn" == Sriram Narayanan  writes:
>
>    sn> http://docs.sun.com/app/docs/doc/817-2271/ghbxs?a=view
>
> yeah, but he has no slog, and he says 'zpool clear' makes the system
> panic and reboot, so even from way over here that link looks useless.
>
> Patrick, maybe try a newer livecd from genunix.org like b130 or later
> and see if the panic is fixed so that you can import/clear/export the
> pool.  The new livecd's also have 'zpool import -F' for Fix Harder
> (see manpage first).  Let us know what happens.
>

Yes, I realized that after I posted to the list, and I replied again
asking him to use the opensolaris LiveCD. I just noticed that I
replied direct rather than to the list.

-- Sriram
-
Belenix: www.belenix.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is this pool recoverable?

2010-03-20 Thread Miles Nordin

> "sn" == Sriram Narayanan  writes:

sn> http://docs.sun.com/app/docs/doc/817-2271/ghbxs?a=view

yeah, but he has no slog, and he says 'zpool clear' makes the system
panic and reboot, so even from way over here that link looks useless.

Patrick, maybe try a newer livecd from genunix.org like b130 or later
and see if the panic is fixed so that you can import/clear/export the
pool.  The new livecd's also have 'zpool import -F' for Fix Harder
(see manpage first).  Let us know what happens.


pgpT7dIOFPNUD.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 3 disk RAID-Z2 pool

2010-03-20 Thread Svein Skogen


On 20.03.2010 17:39, Henk Langeveld wrote:

On 2010-03-15 16:50, Khyron:

Yeah, this threw me. A 3 disk RAID-Z2 doesn't make sense, because at a
redundancy level, RAID-Z2 looks like RAID 6. That is, there are 2
levels of
parity for the data. Out of 3 disks, the equivalent of 2 disks will be
used
to store redundancy (parity) data and only 1 disk equivalent will
store actual
data. This is what others might term a "degenerate case of 3-way
mirroring", except with a lot more computational overhead since we're
performing 2
parity calculations.

I'm curious what the purpose of creating a 3 disk RAID-Z2 pool is/was?
(For my own personal edification. Maybe there is something for me to
learn
from this example.)


One reason for a raid-z2 would be to protect yourself against a
structural firmware error corrupting certain data patterns in a similar
way. An alternate pattern could make it to and from disk correctly.

I once visited a customer who was experiencing data corruption caused by
their firmware (back in 1991 I guess). They had no reproducable case
and neither of us was in anyway satisfied. Only half a year later did I
hear that this particular type of disk needed a firmware update, which
came to light in a DBMS - which would force a read of its data after
every write.

Apparently it took so long because the erroneous behaviour only happened
under certain conditions.


I think situations like this is why some raidcontrollers have patrol 
reading (basically the same as the ZFS Scrub) on automatic schedule. I 
think my current generation Megaraid (8308elp, "obsoleted" by newer 
models) defaults to patrol read every 168 hours (one week) for SAS-disks...


And that background corruption is exactly why zpool scrubs should be 
scheduled.


//Svein
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ISCSI + RAID-Z + OpenSolaris HA

2010-03-20 Thread Remco Lengers


Vikkr,

You seem to be concerned about the availability? Open HA seems to be a 
package last updated in 2005 (version 0.3.6). (?)
It seems to me like a real fun toy project to build but I would be 
pretty reserved about the actual availability and putting using these 
kind of setup for production purposes and actually expecting some kind 
of proper uptime.


hth,

..Remco

vikkr wrote:

THX Ross, i plan exporting each drive individually over iSCSI.
I this case, the write, as well as reading, will go to all 6 discs at once, 
right?

The only question - how to calculate fault tolerance of such a system if the 
discs are all different in size?
Maybe there is such a tool? or check?
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Usage of hot spares and hardware allocation capabilities.

2010-03-20 Thread Bob Friesenhahn


On Sat, 20 Mar 2010, Robin Axelsson wrote:

My idea is rather that the "hot spares" (or perhaps we should say 
"cold spares" then) are off all the time until they are needed or 
when a user initiated/scheduled system integrity check is being 
conducted. They could go up for a "test spin" at each occasion a 
scrub is initiated which is not too frequently.


Solaris does include a power management function which is cable of 
spinning down idle disks.  The disks are then spun up if they are 
accessed.


Perhaps I was a little too conclusive with my assumptions regarding 
ZFS and OpenSolaris. I figured that real enterprise applications 
rather use Solaris together with carefully selected hardware whereas 
OpenSolaris is more aimed at lower-budget/mainstream applications as 
a way of gaining a wider acceptance for OpenSolaris and ZFS (and of 
course to help the development of Solaris too, unless there are 
other plans ...). It has been discussed in many places that file 
systems do not change as frequently as the operating systems which 
is considered to be an issue when it comes to the implementation of 
newer and better technology.


It seems that most of your assumptions about Solaris/OpenSolaris are 
completely bogus and based on some other operating system.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ISCSI + RAID-Z + OpenSolaris HA

2010-03-20 Thread Ross Walker


On Mar 20, 2010, at 11:48 AM, vikkr  wrote:


THX Ross, i plan exporting each drive individually over iSCSI.
I this case, the write, as well as reading, will go to all 6 discs  
at once, right?


The only question - how to calculate fault tolerance of such a  
system if the discs are all different in size?

Maybe there is such a tool? or check?


They should all be the same size.

You can make them the same size on the iSCSI target.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 3 disk RAID-Z2 pool

2010-03-20 Thread Henk Langeveld


On 2010-03-15 16:50, Khyron:

Yeah, this threw me.  A 3 disk RAID-Z2 doesn't make sense, because at a
redundancy level, RAID-Z2 looks like RAID 6.  That is, there are 2 levels of
parity for the data.  Out of 3 disks, the equivalent of 2 disks will be used
to store redundancy (parity) data and only 1 disk equivalent will store actual
data.  This is what others might term a "degenerate case of 3-way
mirroring", except with a lot more computational overhead since we're 
performing 2
parity calculations.

I'm curious what the purpose of creating a 3 disk RAID-Z2 pool is/was?
(For my own personal edification.  Maybe there is something for me to learn
from this example.)


One reason for a raid-z2 would be to protect yourself against a 
structural firmware error corrupting certain data patterns in a similar 
way.  An alternate pattern could make it to and from disk correctly.


I once visited a customer who was experiencing data corruption caused by 
their firmware (back in 1991 I guess).  They had no reproducable case

and neither of us was in anyway satisfied.  Only half a year later did I
hear that this particular type of disk needed a firmware update, which
came to light in a DBMS - which would force a read of its data after 
every write.


Apparently it took so long because the erroneous behaviour only happened 
under certain conditions.



Cheers,
Henk

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is this pool recoverable?

2010-03-20 Thread Sriram Narayanan

On Sat, Mar 20, 2010 at 9:19 PM, Patrick Tiquet  wrote:
> Also, I tried to run zpool clear, but the system crashes and reboots.

Please see if this link helps
http://docs.sun.com/app/docs/doc/817-2271/ghbxs?a=view

-- Sriram
-
Belenix: www.belenix.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is this pool recoverable?

2010-03-20 Thread Patrick Tiquet

Also, I tried to run zpool clear, but the system crashes and reboots.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-20 Thread David Magda


On Mar 20, 2010, at 00:57, Edward Ned Harvey wrote:

I used NDMP up till November, when we replaced our NetApp with a  
Solaris Sun
box.  In NDMP, to choose the source files, we had the ability to  
browse the
fileserver, select files, and specify file matching patterns.  My  
point is:
NDMP is file based.  It doesn't allow you to spawn a process and  
backup a

data stream.


Not quite.

It can reference files, but only by specifying where they are in an  
opaque "data stream" (see §2.3.5.2 of the NDMPv4 spec [1]):


The file locator data in the file history record is in a data  
service (OS) specific format. To the DMA this information is an  
opaque string. This means that the DMA will not attempt to interpret  
it. In order to determine the location of a file in the backup data  
stream, the DMA will send the complete file history record for the  
corresponding file history record to the data service, the data  
service will calculate the starting location and the length of the  
byte string to be read from the original backup data stream. The DMA  
will use this data to manipulate the tape service to retrieve the  
selected data.


So the backup software (DMA) simply knows the tape on which the file  
is on, and the starting byte of that tape, but if you want to restore  
a file from (say) a NetApp share or export, you have to send the bytes  
to another NetApp which can interpret the stream. It's not like the  
byte stream is in a known format (tar, cpio, or zip) that can be  
interpreted by anyone. (Unless you reverse engineer the format of  
course.)


After a filer ("NDMP Data Service") is told to start backing up, it  
can tell the backup software ("NDMP Data Management Application"--DMA)  
about files via the NDMP_FH_ADD_FILE command (see §4.3.1 [1]).


[1] http://www.ndmp.org/download/sdk_v4/draft-skardal-ndmp4-04.txt

So technically Oracle can implement an NDMP service on (Open)Solaris,  
and backup vendors could interface with that and send the raw ZFS data  
stream to tape. As the Solaris kernel traverses the file system, and  
comes across directories and files, it would tell the backup software  
about the file (path, owner, group, etc.) and where it is in the  
stream sent to "tape" (LTO, VTL, etc.). On file restoration, the  
backup software would then have to send the (opaque-to-it) data stream  
from tape to another Solaris box that could interpret it.


This is of course in the case of a CIFS share or NFS export, where the  
filer (NetApp, Sun 7000 series, Celerra) has some knowledge of the  
file names, and wouldn't work on a raw LUN--unless the filer starts  
parsing the LUN for disk formats like is done with VMware's VMDK  
format and NetBackup, where they can figure out the files.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ISCSI + RAID-Z + OpenSolaris HA

2010-03-20 Thread vikkr

THX Ross, i plan exporting each drive individually over iSCSI.
I this case, the write, as well as reading, will go to all 6 discs at once, 
right?

The only question - how to calculate fault tolerance of such a system if the 
discs are all different in size?
Maybe there is such a tool? or check?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] is this pool recoverable?

2010-03-20 Thread Patrick Tiquet

This system is running stock 111b runinng on an Intel Atom D945GCLF2 
motherboard. The pool is of two mirrored 1TB sata disks. I noticed the system 
was locked up, rebooted and the pool status shows as follows:


  pool: atomfs
 state: FAULTED
status: An intent log record could not be read.
Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
or ignore the intent log records by running 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
atomfs  FAULTED  0 0 1  bad intent log
  mirrorDEGRADED 0 0 6
c8d0DEGRADED 0 0 6  too many errors
c9d0DEGRADED 0 0 6  too many errors
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ISCSI + RAID-Z + OpenSolaris HA

2010-03-20 Thread Ross Walker


On Mar 20, 2010, at 10:18 AM, vikkr  wrote:


Hi sorry for bad eng and picture :).

Can such a decision?

3 servers openfiler give their drives 2 - 1 tb ISCSI server to  
OpenSolaris

On OpenSolaris assembled a RAID-Z with double parity.
Server OpenSolaris provides NFS access to this array, and duplicated  
by means of Open HA CLuster


Yes, you can.

With three servers you want to to provide resiliency against the loss  
of any one server.


I guess these are mirrors in each server?

If so, you will get better performance and more useable capacity by  
exporting each drive individually over iSCSI and setting the 6 drives  
as a raidz2 or even raidz3 which will give 3-4 drives of capacity,  
raidz3 will provide resiliency of a drive failure during a server  
failure.


-Ross
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Usage of hot spares and hardware allocation capabilities.

2010-03-20 Thread Tonmaus

> I know about those SoHo boxes and the whatnot, they
> keep spinning up and down all the time and the worst
> thing is that you cannot disable this sleep/powersave
> feature on most of these devices.

That to judge is in the eye of the beholder. We have a couple of Thecus NAS 
boxes and some LVM Raids on Ubuntu in the company which work like a charm with 
WD green drives spinning down on inactivity.  A hot spare is typically inactive 
most of the time and it does spin down unless required. That's because there 
are people in the Linux world who have a focus on implementing and maintaining  
power save options. I think that's great.

> I believe I have seen a "sleep mode" support when I
> skimmed through the feature lists of the LSI
> controllers but I'm not sure right now.

Neither am I. LSIs feature list is centred around SAS support on their own 
drivers. What exactly will work after you have added SAS Expanders (which I 
haven't but might do in the future) and attached SATA drives (which I have but 
might change some of them to SAS), running the native kernel driver (mpt) 
instead of LSI's proprietary one, changing the firmware from LSI default IR to 
IT mode (which you will typically do for MPxIO), is a challenge facing the 
possible permutations.
Bottom line: you will have to try.

> My idea is rather that the "hot spares" (or perhaps
> we should say "cold spares" then) are off all the
> time until they are needed or when a user
> initiated/scheduled system integrity check is being
> conducted. They could go up for a "test spin" at each
> occasion a scrub is initiated which is not too
> frequently.

I wouldn't know anything that will work like you figure, including zfs. I.e., a 
hot spare is completely inactive during a scrub, but each 'zpool status' 
command will return the state of the spare - which will work for a spun-down 
drive as well btw. The idea of test-spinning a hot spare is quite far-fetched 
on that background. Scrub as well isn't a generic function to do a "system 
integrity" test, but has specific functionality in respect to a zpool. Putting 
things in the usual categories, your spare requirements are closer to having a 
cold spare, I would say. Maybe you can find a method starting from there.

> 
>  I figured
> that real enterprise applications rather use Solaris
> together with carefully selected hardware whereas
> OpenSolaris is more aimed at lower-budget/mainstream
> applications as a way of gaining a wider acceptance
> for OpenSolaris and ZFS 

That's a selective perspective I would say. I.e., Fishworks is derived from 
Opensolaris directly while the feature set of Solaris is again quite different. 
The salient point for the enterprise solutions is that you pay for the 
services, among other that expensive engineers have figured out which 
components will provide the functions you have requested, and there will be 
somebody you can bark up to if things don't work as advertised. For the white 
box, open approach this is up to you.

>It has been discussed in many places that
> file systems do not change as frequently as the
> operating systems which is considered to be an issue
> when it comes to the implementation of newer and
> better technology.

FWIW, ZFS is mutating quite dynamically.

> > ... you should be able to move a disk to another
> location (channel, slot)
> > and it would still be a part of the same pool and
> VDEV.
> >
> > This works very well, given your controller
> properly supports it. ...
> 
> I hope you are absolutely sure about this. The main
> reason I asked this question comes from the thread
> "Intel SASUC8I worth every penny" in this forum
> section where the thread starter warned that one
> should use "zpool export" "zpool import" when
> migrating a tank from one (set of) controller(s) to
> another.

It's not obvious that my assertion about mpt functions will apply when your 
want move from a contoller I figure is an AMD onboard (?) to an LSI SAS mpt. 
Instead, you will have to investigate what driver module your onboard 
controller will use, and what the properties of this driver are. Additionally, 
you will have to cross-check if the two drivers will have a behaviour that is 
compatible, i.e. the receiving controller doesn't do any things based on 
implicit assumptions the sending controller didn't provide.
To give you an example what will happen when you have a drive on a controller 
that is driven by ahci (7M), the difference will be, among other, that before 
you pull the drive you will have to unconfigure it in cfgadm as an additional 
step. If you don't observe that, you can blow things up.
Moreover, ahci (7M) according to the specs I know will not support power save. 
Bottom line: you will have to find out. 

What the "warning" is concerned: migrating a whole pool is not the same thing 
as swapping slots within a pool. I.e., if you pull more than the allowed number 
(failover resilience) from your pool at the same time while the pool is hot, 
you

Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks

2010-03-20 Thread Chris Murray

That's a good idea, thanks. I get the feeling the remainder won't be zero, 
which will back up the misalignment theory. After a bit more digging, it seems 
the problem is just an NTFS issue and can be addressed irrespective of 
underlying storage system.

I think I'm going to try the process in the following link:
http://www.tuxyturvy.com/blog/index.php?/archives/59-Aligning-Windows-Partitions-Without-Losing-Data.html

With any luck I'll then see a smaller dedup table, and better performance!

Thanks to those for feedback,
Chris
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-20 Thread Edward Ned Harvey

> 5+ years ago the variety of NDMP that was available with the
> combination of NetApp's OnTap and Veritas NetBackup did backups at the
> volume level.  When I needed to go to tape to recover a file that was
> no longer in snapshots, we had to find space on a NetApp to restore
> the volume.  It could not restore the volume to a Sun box, presumably
> because the contents of the backup used a data stream format that was
> proprietary to NetApp.
> 
> An expired Internet Draft for NDMPv4 says:
> 
>   butype_name
>  Specifies the name of the backup method to be used
> for the
>  transfer (dump, tar, cpio, etc). Backup types are
> NDMP Server
>  implementation dependent and MUST match one of the
> Data
>  Server implementation specific butype_name
> strings accessible
>  via the NDMP_CONFIG_GET_BUTYPE_INFO request.
> 
> http://www.ndmp.org/download/sdk_v4/draft-skardal-ndmp4-04.txt
> 
> It seems pretty clear from this that an NDMP data stream can contain
> most anything and is dependent on the device being backed up.

So it's clear that at least the folks at ndmp.org were/are thinking about
doing backups using techniques not necessarily based on filesystem.  But ...

Where's the implementation?  It doesn't do any good if there's just an RFC
written somewhere that all the backup tools ignore.  I was using BackupExec
11d with NDMP Option to backup my Netapp.  This setup certainly couldn't do
anything other than file selection.  I can't generalize and say "nothing
does," but ...  Does anything?  Does anything out there support
non-file-based backup via NDMP?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-20 Thread Edward Ned Harvey

> > I'll say it again: neither 'zfs send' or (s)tar is an
> > enterprise (or
> > even home) backup system on their own one or both can
> > be components of
> > the full solution.
> >
> 
> Up to a point. zfs send | zfs receive does make a very good back up
> scheme for the home user with a moderate amount of storage. Especially
> when the entire back up will fit on a single drive which I think  would
> cover the majority of home users.

I'll repeat:  There is nothing preventing you from creating an external
zpool using more than one disk.  Sure it's convenient when your whole backup
fits onto a single external disk, but not necessary.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Usage of hot spares and hardware allocation capabilities.

2010-03-20 Thread Robin Axelsson

I know about those SoHo boxes and the whatnot, they keep spinning up and down 
all the time and the worst thing is that you cannot disable this 
sleep/powersave feature on most of these devices.
I believe I have seen a "sleep mode" support when I skimmed through the feature 
lists of the LSI controllers but I'm not sure right now.
My idea is rather that the "hot spares" (or perhaps we should say "cold spares" 
then) are off all the time until they are needed or when a user 
initiated/scheduled system integrity check is being conducted. They could go up 
for a "test spin" at each occasion a scrub is initiated which is not too 
frequently.

Perhaps I was a little too conclusive with my assumptions regarding ZFS and 
OpenSolaris. I figured that real enterprise applications rather use Solaris 
together with carefully selected hardware whereas OpenSolaris is more aimed at 
lower-budget/mainstream applications as a way of gaining a wider acceptance for 
OpenSolaris and ZFS (and of course to help the development of Solaris too, 
unless there are other plans ...). It has been discussed in many places that 
file systems do not change as frequently as the operating systems which is 
considered to be an issue when it comes to the implementation of newer and 
better technology.

I intend to use a raidz2 setup on 8 disks attached to an LSI SAS1068 (LSI 
SAS3081E-R) based controller and if I decide to use a hot spare I will attach 
it to the SB750 controller of the system. If the hot spare kicks in I would 
probably want to swap it with the faulty hard drive on the LSI controller.

> ... you should be able to move a disk to another location (channel, slot)
> and it would still be a part of the same pool and VDEV.
>
> This works very well, given your controller properly supports it. ...

I hope you are absolutely sure about this. The main reason I asked this 
question comes from the thread "Intel SASUC8I worth every penny" in this forum 
section where the thread starter warned that one should use "zpool export" 
"zpool import" when migrating a tank from one (set of) controller(s) to another.

I didn't mean to hurt anyone's feelings here. I'm new to OpenSolaris and ZFS. 
When I asked these questions I just had finished reading the "OpenSolaris 
Bible" and the ZFS administration guide (819-5461) together with some of the 
pages on the opensolaris.com website, so I was merely quoting my sources when I 
said "newer versions of x supports y". Thank you for your replies they have 
been insightful.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-20 Thread Mike Gerdts

On Fri, Mar 19, 2010 at 11:57 PM, Edward Ned Harvey
 wrote:
>> 1. NDMP for putting "zfs send" streams on tape over the network.  So
>
> Tell me if I missed something here.  I don't think I did.  I think this
> sounds like crazy talk.
>
> I used NDMP up till November, when we replaced our NetApp with a Solaris Sun
> box.  In NDMP, to choose the source files, we had the ability to browse the
> fileserver, select files, and specify file matching patterns.  My point is:
> NDMP is file based.  It doesn't allow you to spawn a process and backup a
> data stream.
>
> Unless I missed something.  Which I doubt.  ;-)

5+ years ago the variety of NDMP that was available with the
combination of NetApp's OnTap and Veritas NetBackup did backups at the
volume level.  When I needed to go to tape to recover a file that was
no longer in snapshots, we had to find space on a NetApp to restore
the volume.  It could not restore the volume to a Sun box, presumably
because the contents of the backup used a data stream format that was
proprietary to NetApp.

An expired Internet Draft for NDMPv4 says:

  butype_name
 Specifies the name of the backup method to be used for the
 transfer (dump, tar, cpio, etc). Backup types are
NDMP Server
 implementation dependent and MUST match one of the Data
 Server implementation specific butype_name
strings accessible
 via the NDMP_CONFIG_GET_BUTYPE_INFO request.

http://www.ndmp.org/download/sdk_v4/draft-skardal-ndmp4-04.txt

It seems pretty clear from this that an NDMP data stream can contain
most anything and is dependent on the device being backed up.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-20 Thread Chris Gerhard

> 
> I'll say it again: neither 'zfs send' or (s)tar is an
> enterprise (or 
> even home) backup system on their own one or both can
> be components of 
> the full solution.
> 

Up to a point. zfs send | zfs receive does make a very good back up scheme for 
the home user with a moderate amount of storage. Especially when the entire 
back up will fit on a single drive which I think  would cover the majority of 
home users. 

Using external drives and incremental zfs streams allows for extremely quick 
back ups of large amounts of data. 

It certainly does for me. 
http://chrisgerhard.wordpress.com/2007/06/01/rolling-incremental-backups/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Usage of hot spares and hardware allocation capabilities.

2010-03-20 Thread Tonmaus

> So, is there a
> sleep/hibernation/standby mode that the hot spares
> operate in or are they on all the time regardless of
> whether they are in use or not?

This depends on the power-save options of your hardware, not on ZFS. Arguably, 
there is less ware on the heads for a hot spare. I guess that many modern disks 
will park the heads after a certain time, or spin even down, unless the 
controller prevents that. The question is if the disk comes back fast enough 
when required - your bets are on the controller supporting that properly. As it 
seems, there is little focus on that matter at SUN and among community members. 
At least my own investigations how to best make use of power save options like 
most SoHo NAS boxes offer returned only dire results. 
 
> Usually the hot spare is on a not so well-performing
> SAS/SATA controller,

There is no room for "not so well-performing" controllers in my servers. I 
would not allow wasting PCIe slots, backplanes for anything that doesn't live 
up to specs (my requirements). That being said, JBOD HBAs are those that 
perform best with ZFS and those happen to be not very expensive. Additionally, 
I will avoid a checker-board of components striving for keeping things as 
simple as possible. 


> To be more general; are the hard drives in the pool
> "hard coded" to their SAS/SATA channels or can I swap
> their connections arbitrarily if I would want to do
> that? Will zfs automatically identify the association
> of each drive of a given pool or tank and
> automatically reallocate them to put the
> pool/tank/filesystem back in place?

This works very well, given your controller properly supports it. I tried that 
on an Areca 1170 a couple of weeks ago, with interesting results that turned 
out to be an Areca firmware flaw. You may find the thread on this list. I would 
recommend that you do such tests when implementing your array before going in 
production with it. Analogue aspects may apply for

- Hotswapping
- S.M.A.R.T.
- replace failing components or change configuration
- transfer a whole array to another host
(list is not comprehensive)

I think at this moment you have two choices to be sure that all "advertised" 
ZFS features will be available in your system:
- learning it the hard way by try and error
- use SUN hardware, or another turnkey solution that offers ZFS, such as 
NexentaStore

A popular approach is following along the rails of what is being used by SUN, a 
prominent example being the LSI 106x SAS HBAs in "IT" mode.

Regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

49 matches

Mail list logo