Re: [zfs-discuss] Large zpool design considerations

2008-07-04 Thread Henrik Johansen
Chris Cosby wrote:
I'm going down a bit of a different path with my reply here. I know that all
shops and their need for data are different, but hear me out.

1) You're backing up 40TB+ of data, increasing at 20-25% per year. That's
insane. Perhaps it's time to look at your backup strategy no from a hardware
perspective, but from a data retention perspective. Do you really need that
much data backed up? There has to be some way to get the volume down. If
not, you're at 100TB in just slightly over 4 years (assuming the 25% growth
factor). If your data is critical, my recommendation is to go find another
job and let someone else have that headache.

Well, we are talking about backup for ~900 servers that are in
production. Our retention period is 14 days for stuff like web servers,
and 3 weeks for SQL and such. 

We could deploy deduplication but it makes me a wee bit uncomfortable to
blindly trust our backup software.

2) 40TB of backups is, at the best possible price, 50-1TB drives (for spares
and such) - $12,500 for raw drive hardware. Enclosures add some money, as do
cables and such. For mirroring, 90-1TB drives is $22,500 for the raw drives.
In my world, I know yours is different, but the difference in a $100,000
solution and a $75,000 solution is pretty negligible. The short description
here: you can afford to do mirrors. Really, you can. Any of the parity
solutions out there, I don't care what your strategy, is going to cause you
more trouble than you're ready to deal with.

Good point. I'll take that into consideration.

I know these aren't solutions for you, it's just the stuff that was in my
head. The best possible solution, if you really need this kind of volume, is
to create something that never has to resilver. Use some nifty combination
of hardware and ZFS, like a couple of somethings that has 20TB per container
exported as a single volume, mirror those with ZFS for its end-to-end
checksumming and ease of management.

That's my considerably more than $0.02

On Thu, Jul 3, 2008 at 11:56 AM, Bob Friesenhahn 
[EMAIL PROTECTED] wrote:

 On Thu, 3 Jul 2008, Don Enrique wrote:
 
  This means that i potentially could loose 40TB+ of data if three
  disks within the same RAIDZ-2 vdev should die before the resilvering
  of at least one disk is complete. Since most disks will be filled i
  do expect rather long resilvering times.

 Yes, this risk always exists.  The probability of three disks
 independently dying during the resilver is exceedingly low. The chance
 that your facility will be hit by an airplane during resilver is
 likely higher.  However, it is true that RAIDZ-2 does not offer the
 same ease of control over physical redundancy that mirroring does.
 If you were to use 10 independent chassis and split the RAIDZ-2
 uniformly across the chassis then the probability of a similar
 calamity impacting the same drives is driven by rack or facility-wide
 factors (e.g. building burning down) rather than shelf factors.
 However, if you had 10 RAID arrays mounted in the same rack and the
 rack falls over on its side during resilver then hope is still lost.

 I am not seeing any options for you here.  ZFS RAIDZ-2 is about as
 good as it gets and if you want everything in one huge pool, there
 will be more risk.  Perhaps there is a virtual filesystem layer which
 can be used on top of ZFS which emulates a larger filesystem but
 refuses to split files across pools.

 In the future it would be useful for ZFS to provide the option to not
 load-share across huge VDEVs and use VDEV-level space allocators.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes

-- 
Med venlig hilsen / Best Regards

Henrik Johansen
[EMAIL PROTECTED]


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large zpool design considerations

2008-07-04 Thread Marc Bevand
Chris Cosby ccosby+zfs at gmail.com writes:
 
 
 You're backing up 40TB+ of data, increasing at 20-25% per year.
 That's insane.

Over time, backing up his data will require _fewer_ and fewer disks.
Disk sizes increase by about 40% every year.

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] J4200/J4400 Array

2008-07-04 Thread Mertol Ozyoney
Infact using NVRam in a JBOD is less safe as most of the Jbods that use
NvRam have only one NvRam not being mirrored.
Therefore if NvRam goes bad you quarantee inconsistancy. However ZFS is
finetuned in every layer for all or nothing commit kind of working. 
Therefore ZFs have the internal mechanisms to be consistant at the time of a
device failure. If you put a device between storage and ZFS that ZFs can not
control , that device should be redundant and should be able to quarantee
consistancy and Jbod NvRam modules are very problematic. I have a customer
who had 80 TB on Lustre system and who system locked because of a battery
problem and it took them a week to figure out what went wrong. 

Mertol 





Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email [EMAIL PROTECTED]



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Albert Chin
Sent: Thursday, July 03, 2008 8:17 PM
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] J4200/J4400 Array

On Thu, Jul 03, 2008 at 01:43:36PM +0300, Mertol Ozyoney wrote:
 You are right that J series do not have nvram onboard. However most Jbods
 like HPS's MSA series have some nvram. 
 The idea behind not using nvram on the Jbod's is 
 
 -) There is no use to add limited ram to a JBOD as disks already have a
lot
 of cache.
 -) It's easy to design a redundant Jbod without nvram. If you have nvram
and
 need redundancy you need to design more complex HW and more complex
firmware
 -) Bateries are the first thing to fail 
 -) Servers already have too much ram

Well, if the server attached to the J series is doing ZFS/NFS,
performance will increase with zfs:zfs_nocacheflush=1. But, without
battery-backed NVRAM, this really isn't safe. So, for this usage case,
unless the server has battery-backed NVRAM, I don't see how the J series
is good for ZFS/NFS usage.

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS in S10U6 vs openSolaris 05/08

2008-07-04 Thread Ross
Yup, I'm watching that card closely.  No Solaris drivers yet, but hopefully 
somebody will realise just how good that could be for the ZIL and work on some.

Just the 80GB $2,400 card would make a huge difference to write performance.  
For use with VMware and NFS it would be a godsend.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss