[zfs-discuss] zpool errors - ZFS-8000-8A

2008-11-17 Thread Chris Cosby
One of my zpools (sol-10-u4-ga-sparc) has experienced some permanent errors.
At this point, I don't care about the contents of the files in question, I
merely want to cleanup the zpool. zpool clear doesn't seem to do anything at
all. Any suggestions?

# zpool status -v ccm01
  pool: ccm01
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
ccm01   ONLINE   0 0 0
  c2t40d0   ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

ccm01/users/ccm_sol:<0x1dabe88>
ccm01/users/ccm_sol:<0x1dabe8a>
ccm01/users/ccm_sol:<0x1dabe8d>
ccm01/users/ccm_sol:<0x1dabe8e>
ccm01/users/ccm_sol:<0x1dabe90>
ccm01/users/ccm_sol:<0x1dabe91>
ccm01/users/ccm_sol:<0x1dabe95>
ccm01/users/ccm_sol:<0x1dabe97>
ccm01/users/ccm_sol:<0x1dabe98>
ccm01/users/ccm_sol:<0x1dabe99>
ccm01/users/ccm_sol:<0x1dabe9b>
ccm01/users/ccm_sol:<0x1dabe9c>
ccm01/export/ccmdb/development_db:<0x12948d>
ccm01/export/ccmdb/development_db:<0x129617>
ccm01/export/ccmdb/development_db:<0x12961b>


-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best layout for 15 disks?

2008-08-22 Thread Chris Cosby
On Fri, Aug 22, 2008 at 1:08 PM, mike <[EMAIL PROTECTED]> wrote:

> It looks like this will be the way I do it:
>
> initially:
> zpool create mypool raidz2 disk0 disk1 disk2 disk3 disk4 disk5 disk6 disk7
>
> when I need more space and buy 8 more disks:
> zpool add mypool raidz2 disk8 disk9 disk10 disk11 disk12 disk13 disk14
> disk15
>
> Correct?
>
>
> > Enable compression, and set up multiple raidz2 groups.  Depending on
> > what you're storing, you may get back more than you lose to parity.
>
> It's DVD backups and media files. Probably everything has already been
> compressed pretty well by the time it hits ZFS.
>
> > That's a lot of spindles for a home fileserver.   I'd be inclined to go
> > with a smaller number of larger disks in mirror pairs, allowing me to
> > buy larger disks in pairs as they come on the market to increase
> > capacity.
>
> Or do smaller groupings of raidz1's (like 3 disks) so I can remove
> them and put 1.5TB disks in when they come out for instance?

Somebody correct me if I'm wrong. ZFS (early versions) did not support
removing zdevs from a pool. It was a future feature. Is it done yet?


>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Possible to do a stripe vdev?

2008-08-22 Thread Chris Cosby
About the best I can see:

zpool create dirtypool raidz 250a 250b 320a raidz 320b 400a 400b raidz 500a
500b 750a

And you have to do them in that order. The zpool will create using the
smallest device. This gets you about 2140GB (500 + 640 + 1000) of space.
Your desired method is only 2880GB (720 * 4) and is WAY harder to setup and
maintain, especially if you get into the SDS configuration.

I, for one, welcome our convoluted configuration overlords. I'd also like to
see what the zpool looks like if it works. This is, obviously, untested.

chris


On Fri, Aug 22, 2008 at 11:03 AM, Nils Goroll <[EMAIL PROTECTED]>wrote:

> Hi,
>
> John wrote:
> > I'm setting up a ZFS fileserver using a bunch of spare drives. I'd like
> some redundancy and to maximize disk usage, so my plan was to use raid-z.
> The problem is that the drives are considerably mismatched and I haven't
> found documentation (though I don't see why it shouldn't be possible) to
> stripe smaller drives together to match bigger ones. The drives are: 1x750,
> 2x500, 2x400, 2x320, 2x250. Is it possible to accomplish the following with
> those drives:
> >
> > raid-z
> >750
> >500+250=750
> >500+250=750
> >400+320=720
> >400+720=720
>
>
> Though I've never used this in production, it seems possible to layer ZFS
> on good old SDS (aka SVM, disksuite).
>
> At least I managed to create a trivial pool on
> what-10-mins-ago-was-my-swap-slice:
>
> haggis:/var/tmp# metadb -f -a -c 3 /dev/dsk/c5t0d0s7
> haggis:/var/tmp# metainit d10 1 1 /dev/dsk/c5t0d0s1
> d10: Concat/Stripe is setup
> haggis:/var/tmp# zpool create test /dev/md/dsk/d10
> haggis:/var/tmp# zpool status test
>  pool: test
>  state: ONLINE
>  scrub: none requested
> config:
>
>NAME   STATE READ WRITE CKSUM
>test   ONLINE   0 0 0
>  /dev/md/dsk/d10  ONLINE   0 0 0
>
> So it looks like you could do the follwing:
>
> * Put a small slice (10-20m should suffice, by convention it's slice 7 on
> the first cylinders) on each of your disks and make them the metadb, if you
> are not using SDS already
>  metadb -f -a -c 3 
>
>  make slice 0 the remainder of each disk
>
> * for your 500/250G drives, create a concat (stripe not possible) for each
> pair. for clarity, I'd recommend to include the 750G disk as well (syntax
> from memory, apologies if I'm wrong with details):
>
> metainit d11 1 1 <700G disk>s0
> metainit d12 2 1 <500G disk>s0 1 <250G disk>s0
> metainit d13 2 1 <500G disk>s0 1 <250G disk>s0
> metainit d14 2 1 <400G disk>s0 1 <320G disk>s0
> metainit d15 2 1 <400G disk>s0 1 <320G disk>s0
>
> * create a raidz pool on your metadevices
>
> zpool create  raidz /dev/md/dsk/d11 /dev/md/dsk/d12 /dev/md/dsk/d13
> /dev/md/dsk/d14 /dev/md/dsk/d15
>
> Again: I have never tried this, so please don't blame me if this doesn't
> work.
>
> Nils
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-14 Thread Chris Cosby
To further clarify Will's point...

Your current setup provides excellent hardware protection, but absolutely no
data protection.
ZFS provides excellent data protection when it has multiple copies of the
data blocks (>1 hardware devices).

Combine the two, provide >1 hardware devices to ZFS, and you have a really
nice solution. If you can spare the space, setup your arrays and things to
provide exactly 2 identical LUNs to your ZFS box and create your zpool with
those in a mirror. The best of all worlds.


On Thu, Aug 14, 2008 at 9:41 AM, Will Murnane <[EMAIL PROTECTED]>wrote:

> On Thu, Aug 14, 2008 at 07:42, Borys Saulyak <[EMAIL PROTECTED]>
> wrote:
> > I've got, lets say, 10 disks in the storage. They are currently in RAID5
> configuration and given to my box as one LUN. You suggest to create 10 LUNs
> instead, and give them to ZFS, where they will be part of one raidz, right?
> > So what sort of protection will I gain by that? What kind of failure will
> be eliminated? Sorry, but I cannot catch it...
> Suppose that ZFS detects an error in the first case.  It can't tell
> the storage array "something's wrong, please fix it" (since the
> storage array doesn't provide for this with checksums and intelligent
> recovery), so all it can do is tell the user "this file is corrupt,
> recover it from backups".
>
> In the second case, ZFS can use the parity or mirrored data to
> reconstruct plausible blocks, and then see if they match the checksum.
>  Once it finds one that matches (which will happen as long as
> sufficient parity remains), it can write the corrected data back to
> the disk that had junk on it, and report to the user "there were
> problems over here, but I fixed them".
>
> Will
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions about ZFS Send/Receive

2008-07-29 Thread Chris Cosby
Obviously, I should stop answering, as all I deal with and all that I will
deal with is GA Solaris. OpenSolaris might as well not exist as far as I'm
concerned. With that in mind, I'll just keep reading and appreciating all of
the good zfs info that comes along.

Peace out.

On Tue, Jul 29, 2008 at 5:54 PM, eric kustarz <[EMAIL PROTECTED]> wrote:

>
> On Jul 29, 2008, at 2:24 PM, Chris Cosby wrote:
>
>
>>
>> On Tue, Jul 29, 2008 at 5:13 PM, Stefano Pini <[EMAIL PROTECTED]>
>> wrote:
>> Hi guys,
>> we are  proposing  a customer  a couple of X4500 (24 Tb) used as NAS (i.e.
>> NFS server).
>> Both server will contain the same files and should be accessed by
>> different clients at the same time (i.e. they should be both active)
>> So we need to guarantee that both x4500 contain the same files:
>> We could simply copy the contents on both x4500 , which is an option
>> because the "new files" are in a limited number and rate , but we would
>> really like to use ZFS send & receive commands:
>> If they are truly limited, something like an rsync or similar. There was a
>> script being thrown around a while back that was touted as the Best Backup
>> Script That Doesn't Do Backups, but I can't find it. In essence, it just
>> created a list of what changed since the last backup and allowed you to use
>> tar/cpio/cp - whatever to do the backup.
>>
>
> I think zfs send/recv would be a great way to go here - see below.
>
>
>>
>>
>>
>> AFAIK the commands works fine but  generally speaking are there any known
>> limitations ?
>> And, in detail , it is not clear  if the receiving ZFS file system could
>> be used regularly while it is in "receiving mode":
>> in poor words is it possible to read and export in nfs,   files from a
>>  ZFS file system while it is receiving update from  another  ZFS send ?
>> First, the zfs send works only on a snapshot. -i sends incremental
>> snapshots, so you would think that would work. From the zfs man page, you'll
>> see that during a receive, the destination file system is unmounted and
>> cannot be accessed during the receive.
>>
>>  If an incremental stream is received, then the  destina-
>> tion file system must already exist, and its most recent
>> snapshot must match the incremental stream's source. The
>> destination  file  system and all of its child file sys-
>> tems are unmounted and cannot  be  accessed  during  the
>> receive operation.
>>
>
> Actually we don't unmount the file systems anymore for incremental
> send/recv, see:
> 6425096 want online 'zfs recv' (read only and read/write)
>
> Available since November 2007 in OpenSolaris/Nevada.  Coming to a s10u6
> near you.
>
> eric
>



-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions about ZFS Send/Receive

2008-07-29 Thread Chris Cosby
On Tue, Jul 29, 2008 at 5:13 PM, Stefano Pini <[EMAIL PROTECTED]> wrote:

> Hi guys,
> we are  proposing  a customer  a couple of X4500 (24 Tb) used as NAS (i.e.
> NFS server).
> Both server will contain the same files and should be accessed by different
> clients at the same time (i.e. they should be both active)
> So we need to guarantee that both x4500 contain the same files:
> We could simply copy the contents on both x4500 , which is an option
> because the "new files" are in a limited number and rate , but we would
> really like to use ZFS send & receive commands:

If they are truly limited, something like an rsync or similar. There was a
script being thrown around a while back that was touted as the Best Backup
Script That Doesn't Do Backups, but I can't find it. In essence, it just
created a list of what changed since the last backup and allowed you to use
tar/cpio/cp - whatever to do the backup.


>
>
> AFAIK the commands works fine but  generally speaking are there any known
> limitations ?
> And, in detail , it is not clear  if the receiving ZFS file system could be
> used regularly while it is in "receiving mode":
> in poor words is it possible to read and export in nfs,   files from a  ZFS
> file system while it is receiving update from  another  ZFS send ?

First, the zfs send works only on a snapshot. -i sends incremental
snapshots, so you would think that would work. From the zfs man page, you'll
see that during a receive, the destination file system is unmounted and
cannot be accessed during the receive.

  If an incremental stream is received, then the  destina-
 tion file system must already exist, and its most recent
 snapshot must match the incremental stream's source. The
 destination  file  system and all of its child file sys-
 tems are unmounted and cannot  be  accessed  during  the
 receive operation.


>
> Clearly  until the new updates are received and applied the old copy would
> be used
>
> TIA
> Stefano
>
>
>
> Sun Microsystems Spa
> Viale Fulvio testi 327
> 20162 Milano ITALY
> me *STEFANO PINI*
> Senior Technical Specialist at Sun Microsystems Italy <
> http://www.sun.com/italy>
> contact | [EMAIL PROTECTED]  | +39 02
> 64152150
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>


-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Chris Cosby
On Tue, Jul 22, 2008 at 11:19 AM, <[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM:
>
> > To do dedup properly, it seems like there would have to be some
> > overly complicated methodology for a sort of delayed dedup of the
> > data. For speed, you'd want your writes to go straight into the
> > cache and get flushed out as quickly as possibly, keep everything as
> > ACID as possible. Then, a dedup scrubber would take what was
> > written, do the voodoo magic of checksumming the new data, scanning
> > the tree to see if there are any matches, locking the duplicates,
> > run the usage counters up or down for that block of data, swapping
> > out inodes, and marking the duplicate data as free space.
> I agree,  but what you are describing is file based dedup,  ZFS already has
> the groundwork for dedup in the system (block level checksuming and
> pointers).
>
> > It's a
> > lofty goal, but one that is doable. I guess this is only necessary
> > if deduplication is done at the file level. If done at the block
> > level, it could possibly be done on the fly, what with the already
> > implemented checksumming at the block level,
>
> exactly -- that is why it is attractive for ZFS,  so much of the groundwork
> is done and needed for the fs/pool already.
>
> > but then your reads
> > will suffer because pieces of files can potentially be spread all
> > over hell and half of Georgia on the zdevs.
>
> I don't know that you can make this statement without some study of an
> actual implementation on real world data -- and then because it is block
> based,  you should see varying degrees of this dedup-flack-frag depending
> on data/usage.

It's just a NonScientificWAG. I agree that most of the duplicated blocks
will in most cases be part of identical files anyway, and thus lined up
exactly as you'd want them. I was just free thinking and typing.


>
>
> For instance,  I would imagine that in many scenarios much od the dedup
> data blocks would belong to the same or very similar files. In this case
> the blocks were written as best they could on the first write,  the deduped
> blocks would point to a pretty sequential line o blocks.  Now on some files
> there may be duplicate header or similar portions of data -- these may
> cause you to jump around the disk; but I do not know how much this would be
> hit or impact real world usage.
>
>
> > Deduplication is going
> > to require the judicious application of hallucinogens and man hours.
> > I expect that someone is up to the task.
>
> I would prefer the coder(s) not be seeing "pink elephants" while writing
> this,  but yes it can and will be done.  It (I believe) will be easier
> after the grow/shrink/evac code paths are in place though. Also,  the
> grow/shrink/evac path allows (if it is done right) for other cool things
> like a base to build a roaming defrag that takes into account snaps,
> clones, live and the like.  I know that some feel that the grow/shrink/evac
> code is more important for home users,  but I think that it is super
> important for most of these additional features.

The elephants are just there to keep the coders company. There are tons of
benefits for dedup, both for home and non-home users. I'm happy that it's
going to be done. I expect the first complaints will come from those people
who don't understand it, and their df and du numbers look different than
their zpool status ones. Perhaps df/du will just have to be faked out for
those folks, or we just apply the same hallucinogens to them instead.


>
>
> -Wade
>
> > On Tue, Jul 22, 2008 at 10:39 AM, <[EMAIL PROTECTED]> wrote:
> > [EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM:
> >
> > > > Hi All
> > > >Is there any hope for deduplication on ZFS ?
> > > >Mertol Ozyoney
> > > >Storage Practice - Sales Manager
> > > >Sun Microsystems
> > > > Email [EMAIL PROTECTED]
> > >
> > > There is always hope.
> > >
> > > Seriously thought, looking at http://en.wikipedia.
> > > org/wiki/Comparison_of_revision_control_software there are a lot of
> > > choices of how we could implement this.
> > >
> > > SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge
> > > one of those with ZFS.
> > >
> > > It _could_ be as simple (with SVN as an example) of using directory
> > > listings to produce files which were then 'diffed'. You could then
> > > view the diffs as though they were changes made to lines of source
> code.
> > >
> > > Just add a "tree" subroutine to allow you to grab all the diffs that
> > > referenced changes to file 'xyz' and you would have easy access to
> > > all the changes of a particular file (or directory).
> > >
> > > With the speed optimized ability added to use ZFS snapshots with the
> > > "tree subroutine" to rollback a single file (or directory) you could
> > > undo / redo your way through the filesystem.
> > >
> >
>
> > dedup is not revision control,  you seem to completely misunderstand the
> > problem.
> >
> >
> >
> > > Using a LKCD
> (http://

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Chris Cosby
To do dedup properly, it seems like there would have to be some overly
complicated methodology for a sort of delayed dedup of the data. For speed,
you'd want your writes to go straight into the cache and get flushed out as
quickly as possibly, keep everything as ACID as possible. Then, a dedup
scrubber would take what was written, do the voodoo magic of checksumming
the new data, scanning the tree to see if there are any matches, locking the
duplicates, run the usage counters up or down for that block of data,
swapping out inodes, and marking the duplicate data as free space. It's a
lofty goal, but one that is doable. I guess this is only necessary if
deduplication is done at the file level. If done at the block level, it
could possibly be done on the fly, what with the already implemented
checksumming at the block level, but then your reads will suffer because
pieces of files can potentially be spread all over hell and half of Georgia
on the zdevs. Deduplication is going to require the judicious application of
hallucinogens and man hours. I expect that someone is up to the task.

On Tue, Jul 22, 2008 at 10:39 AM, <[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM:
>
> > > Hi All
> > >Is there any hope for deduplication on ZFS ?
> > >Mertol Ozyoney
> > >Storage Practice - Sales Manager
> > >Sun Microsystems
> > > Email [EMAIL PROTECTED]
> >
> > There is always hope.
> >
> > Seriously thought, looking at http://en.wikipedia.
> > org/wiki/Comparison_of_revision_control_software there are a lot of
> > choices of how we could implement this.
> >
> > SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge
> > one of those with ZFS.
> >
> > It _could_ be as simple (with SVN as an example) of using directory
> > listings to produce files which were then 'diffed'. You could then
> > view the diffs as though they were changes made to lines of source code.
> >
> > Just add a "tree" subroutine to allow you to grab all the diffs that
> > referenced changes to file 'xyz' and you would have easy access to
> > all the changes of a particular file (or directory).
> >
> > With the speed optimized ability added to use ZFS snapshots with the
> > "tree subroutine" to rollback a single file (or directory) you could
> > undo / redo your way through the filesystem.
> >
>
>
> dedup is not revision control,  you seem to completely misunderstand the
> problem.
>
>
>
> > Using a LKCD (
> http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html
> > ) you could "sit out" on the play and watch from the sidelines --
> > returning to the OS when you thought you were 'safe' (and if not,
> > jumping backout).
> >
>
> Now it seems you have veered even further off course.  What are you
> implying the LKCD has to do with zfs, solaris, dedup, let alone revision
> control software?
>
> -Wade
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs patches in latest sol10 u2 patch bundle

2008-07-16 Thread Chris Cosby
S10U2 has zfs version=1. Any patches are just bug fixes (I'm not sure if
there are any). If your intention is to get to a newer, later, greater ZFS,
you'll need to upgrade. S10U5 has, for example, version=4. Differences in
the versions of zfs can be found at
http://opensolaris.org/os/community/zfs/version/4/ (change the 4 to any
number 1-11 for details).

On Wed, Jul 16, 2008 at 9:55 AM, Manyam <[EMAIL PROTECTED]> wrote:

> Hi ZFS gurus  --  I have a v240 with solaris10 u2 release  and ZFs - could
> you please tell me if by applying the latest patch bundle of update 2 -- I
> will get the all the ZFS patches installed as well ?
>
> Thanks much for your support
>
> ~Balu
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Status of ZFS on Solaris 10

2008-07-07 Thread Chris Cosby
We're running Solaris 10 U5 on lots of Sun SPARC hardware. That's ZFS
version=4. Simple question: how far behind is this version of ZFS as
compared to what is in Nevada? Just point me to the web page, I know it's
out there somewhere.

-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large zpool design considerations

2008-07-03 Thread Chris Cosby
I'm going down a bit of a different path with my reply here. I know that all
shops and their need for data are different, but hear me out.

1) You're backing up 40TB+ of data, increasing at 20-25% per year. That's
insane. Perhaps it's time to look at your backup strategy no from a hardware
perspective, but from a data retention perspective. Do you really need that
much data backed up? There has to be some way to get the volume down. If
not, you're at 100TB in just slightly over 4 years (assuming the 25% growth
factor). If your data is critical, my recommendation is to go find another
job and let someone else have that headache.

2) 40TB of backups is, at the best possible price, 50-1TB drives (for spares
and such) - $12,500 for raw drive hardware. Enclosures add some money, as do
cables and such. For mirroring, 90-1TB drives is $22,500 for the raw drives.
In my world, I know yours is different, but the difference in a $100,000
solution and a $75,000 solution is pretty negligible. The short description
here: you can afford to do mirrors. Really, you can. Any of the parity
solutions out there, I don't care what your strategy, is going to cause you
more trouble than you're ready to deal with.

I know these aren't solutions for you, it's just the stuff that was in my
head. The best possible solution, if you really need this kind of volume, is
to create something that never has to resilver. Use some nifty combination
of hardware and ZFS, like a couple of somethings that has 20TB per container
exported as a single volume, mirror those with ZFS for its end-to-end
checksumming and ease of management.

That's my considerably more than $0.02

On Thu, Jul 3, 2008 at 11:56 AM, Bob Friesenhahn <
[EMAIL PROTECTED]> wrote:

> On Thu, 3 Jul 2008, Don Enrique wrote:
> >
> > This means that i potentially could loose 40TB+ of data if three
> > disks within the same RAIDZ-2 vdev should die before the resilvering
> > of at least one disk is complete. Since most disks will be filled i
> > do expect rather long resilvering times.
>
> Yes, this risk always exists.  The probability of three disks
> independently dying during the resilver is exceedingly low. The chance
> that your facility will be hit by an airplane during resilver is
> likely higher.  However, it is true that RAIDZ-2 does not offer the
> same ease of control over physical redundancy that mirroring does.
> If you were to use 10 independent chassis and split the RAIDZ-2
> uniformly across the chassis then the probability of a similar
> calamity impacting the same drives is driven by rack or facility-wide
> factors (e.g. building burning down) rather than shelf factors.
> However, if you had 10 RAID arrays mounted in the same rack and the
> rack falls over on its side during resilver then hope is still lost.
>
> I am not seeing any options for you here.  ZFS RAIDZ-2 is about as
> good as it gets and if you want everything in one huge pool, there
> will be more risk.  Perhaps there is a virtual filesystem layer which
> can be used on top of ZFS which emulates a larger filesystem but
> refuses to split files across pools.
>
> In the future it would be useful for ZFS to provide the option to not
> load-share across huge VDEVs and use VDEV-level space allocators.
>
> Bob
> ==
> Bob Friesenhahn
> [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Fwd: [cifs-discuss] sharesmb on multiple fs

2008-06-25 Thread Chris Cosby
On Wed, Jun 25, 2008 at 11:42 PM, Matt Harrison <
[EMAIL PROTECTED]> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Afshin Salek wrote:
> | Your terminology is a bit confusing for me, so:
>
> Sorry i should have worded this much better,
>
> | you have 1 pool (zpool create)
> | you have a FS called public (zfs create?)
> |
> | what do you mean by "keep on separate zfs's"? You
> | mean ZFS snapshot?
>
> Ok, I'll start again:
>
> I have a pool "zpool create tank [...]"
>
> Then I made a zfs "zfs create [...] tank/public"
>
> Now I want to keep the sections of public separate, i.e on individual zfs.
>
> So I do "zfs create [...] tank/public/audio"
>
> The problem is that if public is shared via smb, the user is unable to
> access audio. It seems that if a zfs is shared, the child zfs' are not
> accessible as it would if they were just subdirectories.
>
You're absolutely correct - and it's because the "child zfs'", we'll call
them "filesystems" since that's what they are, are not directories. The OS
treats mount points differently than it does simple directories. Think of
each of those ZFS entities just as you would if they weren't mounted under
the same parent. i.e. If you had /home/floyd and /usr/local/pink - you
wouldn't try to setup a smb share under / and descend all of the way down.
The way you describe your wants, you want to create a different share for
each of the children anyway, so just do it. Oh, and this isn't a ZFS
limitation, it's an architecture limitation. i.e. You're Doing It Wrong
(TM).

>
> So I can do "cd /tank/public; mkdir audio" which gives users access to
> public/audio via the public share, but it doesn't allow detailed
> management of audio as it would with individual zfs'.
>
> I hope this is a better explanation,
>
> Thanks
>
> - --
> Matt Harrison
> [EMAIL PROTECTED]
> http://mattharrison.org
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (MingW32)
>
> iEYEARECAAYFAkhjEBAACgkQxNZfa+YAUWFSfwCfQxvONHtrqsf5F2FcUNYIRA8L
> SDYAoL2vFdRx0WNN5wn7jnBY1ddIYod+
> =zKm1
> -END PGP SIGNATURE-
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread Chris Cosby
On Mon, Jun 23, 2008 at 8:45 PM, Richard Elling <[EMAIL PROTECTED]>
wrote:

> Brian Hechinger wrote:
> > On Mon, Jun 23, 2008 at 11:18:21AM -0600, Lori Alt wrote:
> >
> >> Sorry it's taken me so long to weigh in on this.
> >>
> >
> > You're busy with important things, we'll forgive you. ;)
> >
> >
> >> With zfs, we don't actually have to put /var in its own
> >> slice.  We can achieve the same goal by putting it
> >> in its own dataset and assigning a quota to that dataset.
> >>
> >> That's really the only reason we offered this option.
> >>
> >
> > And thank you for doing so.  I will always put /var in it's own "area"
> > even if the definition of that area has changed with the use of ZFS.
> >
> > Rampant writes to /var can *still* run / out of space even on ZFS, being
> > able to keep that from happening is never a bad idea as far as I'm
> > concerned. :)
> >
> >
>
> I think the ability to have different policies for file systems
> is pure goodness -- though you pay for it on the backup/
> restore side.
>
> A side question though, my friends who run Windows,
> Linux, or OSX don't seem to have this bias towards isolating
> /var.  Is this a purely Solaris phenomenon?  If so, how do we
> fix it?

I don't think it's a Solaris phenomenon, and it's not really a /var thing.
UNIX heads have always had to contend with the disaster that is a full /
filesystem. /var was always the most common culprit for causing it to run
out of space. If you talk to the really paranoid among us, we run a
read-only root filesystem. The real way to "fix" it, in zfs terms, is to
reserve a minimum amount of space in / - thereby guaranteeing that you don't
fill up your root filesystem.

>
>  -- richard
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle and ZFS

2008-06-23 Thread Chris Cosby
>From my usage, the first question you should ask your customer is how much
of a performance hit they can spare when switching to ZFS for Oracle. I've
done lots of tweaking (following threads I've read on the mailing list), but
I still can't seem to get enough performance out of any databases on ZFS.
I've tried using zvols, cooked files on top of ZFS filesystems, everything,
but either raw disk devices via the old style DiskSuite tools or cooked
files on top of the same are far more performant than anything on ZFS. Your
mileage may vary, but so far, that's where I stand.

As for the corrupted filesystem, ZFS is much better, but there are still no
guarantees that your filesystem won't be corrupted during a hard shutdown.
The CoW and checksumming gives you a much lower incidence of corruption, but
the customer still needs to be made aware that things like battery backed
controllers, managed UPS, redundant power supplies, and the like are the
first thing they need to put into place - not the last.

On Mon, Jun 23, 2008 at 11:56 AM, Mertol Ozyoney <[EMAIL PROTECTED]>
wrote:

>  Hi All ;
>
>
>
> One of our customer is suffered from FS being corrupted after an unattanded
> shutdonw due to power problem.
>
> They want to switch to ZFS.
>
>
>
> From what I read on, ZFS will most probably not be corrupted from the same
> event. But I am not sure how will Oracle be affected from a sudden power
> outage when placed over ZFS ?
>
>
>
> Any comments ?
>
>
>
> PS: I am aware of UPS's and smilar technologies but customer is still
> asking those if ... questions ...
>
>
>
> Mertol
>
>
>
>
>
>
>
> [image: http://www.sun.com/emrkt/sigs/6g_top.gif] 
>
> *Mertol Ozyoney *
> Storage Practice - Sales Manager
>
> *Sun Microsystems, TR*
> Istanbul TR
> Phone +902123352200
> Mobile +905339310752
> Fax +90212335
> Email [EMAIL PROTECTED] <[EMAIL PROTECTED]>
>
>
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>


-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss