Re: [zfs-discuss] Planed ZFS-Features - Is there a List or something else

2009-12-09 Thread Dale Ghent

What you're talking about is a side-benefit of the BP rewrite section of the 
linked slides.

I believe that once BP rewrite is fully baked, we'll soon afterwards see a 
device removal feature arrive.

/dale

On Dec 9, 2009, at 3:46 PM, R.G. Keen wrote:

 I didn't see remove a simple device anywhere in there.
 
 Is it:
 too hard to even contemplate doing, 
 or
 too silly a thing to do to even consider letting that happen
 or 
 too stupid a question to even consider
 or
 too easy and straightforward to do the procedure I see recommended (export 
 the whole pool, destroy the pool, remove the device, remake the pool, then 
 reimport the pool) to even bother with?
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs/io performance on Netra X1

2009-11-14 Thread Dale Ghent


There is also a long-standing bug in the ALi chipset used on these servers 
which ZFS tickles. I don't think a work-around for this bug was ever 
implemented, and it's still present in Solaris 10.

On Nov 13, 2009, at 11:29 AM, Richard Elling wrote:

 The Netra X1 has one ATA bus for both internal drives.
 No way to get high perf out of a snail.
 
  -- richard
 
 
 
 On Nov 13, 2009, at 8:08 AM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
 wrote:
 
 On Fri, 13 Nov 2009, Tim Cook wrote:
 If it is using parallel SCSI, perhaps there is a problem with the SCSI bus 
 termination or a bad cable?
 SCSI?  Try PATA ;)
 
 Is that good?  I don't recall ever selecting that option when purchasing a 
 computer.  It seemed safer to stick with SCSI than to try exotic 
 technologies.
 
 Does PATA daisy-chain disks onto the same cable and controller?
 
 If this PATA and drives are becoming overwelmed, maybe it will help to tune 
 zfs:zfs_vdev_max_pending down to a very small value in the kernel.
 
 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sniping a bad inode in zfs?

2009-10-27 Thread Dale Ghent

I've have a single-fs, mirrored pool on my hands which recently went
through a bout of corruption. I've managed to clean up a good bit of
it but it appears that I'm left with some directories which have bad
refcounts.

For example, I have what should be an empty directory foo which,
when you cd into it and ls -al, it shows a incorrect refcount for a
empty directory:

total 444
drwxr-xr-x   2 dalegusers  3 Aug 17 13:20 ./
drwx--x--x  64 dalegusers117 Aug 17 13:20 ../

Thus, attempts to remove this directory via rmdir fails with
directory not empty and rm -rf gacks with File exists

I can touch a new file in this dir and such, with the refcount
incrementing to 4, and removing it poses no problem, either, with the
refcount decrementing back to 3. However 3 is the wrong number. It
should of course be only 2 (. and ..)

Normally on UFS I would just take the 'nuke it from orbit' route and
use clri to wipe the directory's inode. However, clri doesn't appear
to be zfs aware (there's not even a zfs analog of clri in /usr/lib/fs/
zfs), and I don't immediately see an option in zdb which would help
cure this.

Any suggestions would be appreciated.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs code and fishworks fork

2009-10-27 Thread Dale Ghent

On Oct 27, 2009, at 2:00 PM, Bryan Cantrill wrote:




  I can agree that the software is the one that really has the added
  value, but to my opinion allowing a stack like Fishworks to run  
outside

  the Sun Unified Storage would lead to lower price per unit(Fishwork
  license) but maybe increase revenue.


I'm afraid I don't see that argument at all; I think that the  
economics
that you're advocating would be more than undermined by the  
necessarily
higher costs of validating and supporting a broader range of  
hardware and

firmware...


(Just playing Devil's Advocate here)

There could be no economics at all. A basic warranty would be provided  
but running a standalone product is a wholly on your own proposition  
once one ventures outside a very small hardware support matrix.


Perhaps Fishworks/AK would have a OpenSolaris edition - leave the bulk  
of the actual hardware support up to a support infrastructure that's  
already geared towards making wide ranges of hardware supportable -  
OpenSolaris/Solaris, after all, does allow that.


Perhaps this could be a version of Fishworks that's not as integrated  
with what you get on a SUS platform; if some of the Fishworks  
functionality that depends on a precise hardware combo could be  
reduced or generalized, perhaps it's worth consideration. Knowing the  
little I do about what's going on under the hood of a SUS system, I  
wouldn't expect the version of Fishworks uses on the SUS systems to  
have 100% parity with a unbundled Fishworks edition - but the core  
features, by and large, would convey.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs code and fishworks fork

2009-10-27 Thread Dale Ghent


On Oct 27, 2009, at 2:58 PM, Bryan Cantrill wrote:




I can agree that the software is the one that really has the added
value, but to my opinion allowing a stack like Fishworks to run
outside
the Sun Unified Storage would lead to lower price per unit(Fishwork
license) but maybe increase revenue.


I'm afraid I don't see that argument at all; I think that the
economics
that you're advocating would be more than undermined by the
necessarily
higher costs of validating and supporting a broader range of
hardware and
firmware...


(Just playing Devil's Advocate here)

There could be no economics at all. A basic warranty would be  
provided

but running a standalone product is a wholly on your own proposition
once one ventures outside a very small hardware support matrix.

Perhaps Fishworks/AK would have a OpenSolaris edition - leave the  
bulk

of the actual hardware support up to a support infrastructure that's
already geared towards making wide ranges of hardware supportable -
OpenSolaris/Solaris, after all, does allow that.

Perhaps this could be a version of Fishworks that's not as integrated
with what you get on a SUS platform; if some of the Fishworks
functionality that depends on a precise hardware combo could be
reduced or generalized, perhaps it's worth consideration. Knowing the
little I do about what's going on under the hood of a SUS system, I
wouldn't expect the version of Fishworks uses on the SUS systems to
have 100% parity with a unbundled Fishworks edition - but the core
features, by and large, would convey.


Why would we do this?  I'm all for zero-cost endeavors, but this isn't
zero-cost -- and I'm having a hard time seeing the business case here,
especially when we have so many paying customers for whom the business
case for our time and energy is crystal clear...


Hey, I was just offering food for thought from the technical end :)

Of course the cost in man hours to attain a reasonable, unbundled  
version would have to be justifiable. If that aspect isn't currently  
justifiable, then that's as far as the conversation needs to go.  
However, times change and one day demand could very well justify the  
business costs.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] s10u8: lots of fixes, any commentary?

2009-10-15 Thread Dale Ghent


So looking at the README for patch 14144[45]-09, there are ton of ZFS  
fixes and feature adds.


The big features are already described in the update 8 release docs,  
but would anyone in-the-know care to comment or point out any  
interesting CR fixes that might be substantial in the areas of  
stability or performance?


Thanks for what looks like a well-loaded KJP.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-15 Thread Dale Ghent

On Sep 15, 2009, at 5:21 PM, Richard Elling wrote:



On Sep 15, 2009, at 1:03 PM, Dale Ghent wrote:


On Sep 10, 2009, at 3:12 PM, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched  
state at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also  
be in Solaris 10 Update 9.
This fix speeds up the sequential prefetch pattern described in  
this CR without slowing down other prefetch patterns.  Some kstats  
have also been added to help improve the observability of ZFS file  
prefetching.


Awesome that the fix exists. I've been having a hell of a time with  
device-level prefetch on my iscsi clients causing tons of  
ultimately useless IO and have resorted to setting  
zfs_vdev_cache_max=1.


This only affects metadata. Wouldn't it be better to disable
prefetching for data?


Well, that's a surprise to me, but the zfs_vdev_cache_max=1 did  
provide relief.


Just a general description of my environment:

My setup consists of several s10uX iscsi clients which get LUNs from a  
pairs of thumpers. Each thumper pair exports identical LUNs to each  
iscsi client, and the client in turn mirrors each LUN pair inside a  
local zpool. As more space is needed on a client, a new LUN is created  
on the pair of thumpers, exported to the iscsi client, which then  
picks it up and we add a new mirrored vdev to the client's existing  
zpool.


This is so we have data redundancy across chassis, so if one thumper  
were to fail or need patching, etc, the iscsi clients just see one of  
side of their mirrors drop out.


The problem that we observed on the iscsi clients was that, when  
viewing things through 'zpool iostat -v', far more IO was being  
requested from the LUs than was being registered for the vdev those  
LUs were a member of.


Being that that was a iscsi setup with stock thumpers (no SSD ZIL,  
L2ARC) serving the LUs, this apparently overhead caused far more  
uneccessary disk IO on the thumpers, thus starving out IO for data  
that was actually needed.


The working set is lots of small-ish files, entirely random IO.

If zfs_vdev_cache_max only affects metadata prefetches, which  
parameter affects data prefetches ?


I have to admit that disabling device-level prefetching was a shot in  
the dark, but it did result in drastically reduced contention on the  
thumpers.


/dale





Question though... why is bug fix that can be a watershed for  
performance be held back for so long? s10u9 won't be available for  
at least 6 months from now, and with a huge environment, I try hard  
not to live off of IDRs.


Am I the only one that thinks this is way too conservative? It's  
just maddening to know that a highly beneficial fix is out there,  
but its release is based on time rather than need. Sustaining  
really needs to be more proactive when it comes to this stuff.


/dale





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-15 Thread Dale Ghent

On Sep 15, 2009, at 6:28 PM, Bob Friesenhahn wrote:


On Tue, 15 Sep 2009, Dale Ghent wrote:


Question though... why is bug fix that can be a watershed for  
performance be held back for so long? s10u9 won't be available for  
at least 6 months from now, and with a huge environment, I try hard  
not to live off of IDRs.


As someone who currently faces kernel panics with recent U7+ kernel  
patches (on AMD64 and SPARC) related to PCI bus upset, I expect that  
Sun will take the time to make sure that the implementation is as  
good as it can be and is thoroughly tested before release.


Are you referring the the same testing that gained you this PCI panic  
feature in s10u7?


Testing is a no-brainer, and I would expect that there already exists  
some level of assurance that a CR fix is correct at the point of  
putback.


But I've dealt with many bugs both very recently and long in the past  
where a fix has existed in nevada for months, even a year, before I  
got bit by the same bug in s10 and then had to go through the support  
channels to A) convince whomever I'm talking to that, yes, I'm hitting  
this bug, B) yes, there is a fix, and then C) pretty please can I have  
an IDR


Just this week I'm wrapping up testing of a IDR which addresses a  
e1000g hardware errata that was fixed in onnv earlier this year in  
February. For something that addresses a hardware issue on a Intel  
chipset used on shipping Sun servers, one would think that Sustaining  
would be on the ball and get that integrated ASAP. But the current  
mode of operation appears to be no CR, no backport, which leaves us  
customers needlessly running into bugs and then begging for their  
fixes... or hearing the dreaded oh that fix will be available two  
updates from now. Not cool.


/dale



/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

2009-05-01 Thread Dale Ghent


On May 1, 2009, at 2:09 AM, Wilkinson, Alex wrote:



   0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote:


On Thu, 30 Apr 2009, Wilkinson, Alex wrote:


I currently have a single 17TB MetaLUN that i am about to present  
to an
OpenSolaris initiator and it will obviously be ZFS. However, I am  
constantly
reading that presenting a JBOD and using ZFS to manage the RAID is  
best
practice ? Im not really sure why ? And isn't that a waste of a  
high performing

RAID array (EMC) ?


The JBOD advantage is that then ZFS can schedule I/O for the disks
and there is less chance of an unrecoverable pool since ZFS is  
assured

to lay out redundant data on redundant hardware and ZFS uses more
robust error detection than the firmware on any array.  When using
mirrors there is considerable advantage since writes and reads can be
concurrent.

That said, your EMC hardware likely offers much nicer interfaces for
indicating and replacing bad disk drives.  With the ZFS JBOD approach
you have to back-track from what ZFS tells you (a Solaris device ID)
and figure out which physical drive is not behaving correctly.  EMC
tech support may not be very helpful if ZFS says there is something
wrong but the raid array says there is not. Sometimes there is value
with taking advantage of what you paid for.


So, shall I forget ZFS and use UFS ?


Not at all. Just export lots of LUNs from your EMC to get the IO  
scheduling win, not one giant one, and configure the zpool as a stripe.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Looking for new SATA/SAS HBA; JBOD is not always JBOD

2009-01-09 Thread Dale Ghent
On Jan 9, 2009, at 9:28 AM, Erik Trimble wrote:

 I'm pretty darned sure that the LSI 1068-based HBAs will do true
 JBOD.

Indeed they do, and the mpt driver works fine with these cards.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Dale Ghent

Are you putting your archive and redo logs on a separate zpool (not  
just a different zfs fs with the same pool as your data files) ?

Are you using direct io at all in any of the config scenarios you  
listed?

/dale

On Nov 22, 2008, at 12:41 PM, Chris Greer wrote:

 So to give a little background on this, we have been benchmarking  
 Oracle RAC on Linux vs. Oracle on Solaris.  In the Solaris test, we  
 are using vxvm and vxfs.
 We noticed that the same Oracle TPC benchmark at roughly the same  
 transaction rate was causing twice as many disk I/O's to the backend  
 DMX4-1500.

 So we concluded this is pretty much either Oracle is very different  
 in RAC, or our filesystems may be the culprits.  This testing is  
 wrapping up (it all gets dismantled Monday), so we took the time to  
 run a simulated disk I/O test with an 8K IO size.


 vxvm with vxfs we achieved 2387 IOPS
 vxvm with ufs we achieved 4447 IOPS
 ufs on disk devices we achieved 4540 IOPS
 zfs we achieved 1232 IOPS

 The only zfs tunings we have done are setting set zfs:zfs_nocache=1
 in /etc/system and changing the recordsize to be 8K to match the test.

 I think the files we are using in the test were created before we  
 changed the recordsize, so I deleted them and recreated them and  
 have started the other test...but does anyone have any other ideas?

 This is my first experience with ZFS with a comercial RAID array and  
 so far it's not that great.

 For those interested, we are using the iorate command from EMC for  
 the benchmark.  For the different test, we have 13 luns presented.   
 Each one is its own volume and filesystem and a singel file on those  
 filesystems.  We are running 13 iorate processes in parallel (there  
 is no cpu bottleneck in this either).

 For zfs, we put all those luns in a pool with no redundancy and  
 created 13 filesystems and still running 13 iorate processes.

 we are running Solaris 10U6
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Do you grok it?

2008-09-12 Thread Dale Ghent

On Sep 12, 2008, at 1:35 PM, Richard Elling wrote:

 greenBytes has a very well produced teaser commercial on their site.
http://www.green-bytes.com

 Actually, I think it is one of the better commercials done by tech
 companies in a long time.  Do you grok it?

Did I detect a (well-done) metaphor for shared ZFS?

I must say that the videography itself is very nice.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4540

2008-07-12 Thread Dale Ghent
On Jul 11, 2008, at 5:32 PM, Richard Elling wrote:


 Yes, of course.  But there is only one CF slot.

Cool coincidence that the following article on CF cards and DMA  
transfers was posted to /.

http://hardware.slashdot.org/article.pl?sid=08/07/12/1851251

I take it that Sun's going ship/sell OEM'd CF cards of some sort for  
Loki. Hopefully they're ones that don't crap out on DMA transfers.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2 items on the wish list

2008-06-27 Thread Dale Ghent

Re-reading your question is occurs to me that you might be referring  
to the ability to mount a snapshot on *another server* ?

There's no built-in feature in zfs for that, but a workaround would be  
to do what I just detailed, with the additional step of exporting that  
cloned snapshot to the other server via NFS.

/dale

On Jun 27, 2008, at 7:02 PM, Dale Ghent wrote:

 On Jun 27, 2008, at 5:58 PM, Mertol Ozyoney wrote:

 Hi all ;

 There are two tinhs that some customers are asking for constantly  
 about ZFS.

 Ability to mount snap shots somewhere else. [this doesnt look easy,  
 perhaps a proxy kind of set up ?

 This feature has been in ZFS since day 1. You would promote a  
 snapshot to a clone, and mount that clone where ever you wish:

 1) Create the snapshot:
 zfs snapshot somepool/[EMAIL PROTECTED]

 2) Promote the snapshot to a file system, which would be mounted at / 
 somepool/snap:
 zfs clone somepool/[EMAIL PROTECTED] somepool/snap

 3) Optionally mount that cloned snapshot somewhere else:
 zfs set mountpoint=/snapshot somepool/snap

 /dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mirroring zfs slice

2008-06-17 Thread Dale Ghent
On Jun 17, 2008, at 12:23 PM, Srinivas Chadalavada wrote:

 :root # zpool create export mirror c2t0d0s5 c2t0d0s5
 invalid vdev specification
 use '-f' to override the following errors:
 /dev/dsk/c2t0d0s5 is part of active ZFS pool export. Please see  
 zpool(1M).

(I presume that you meant to use c2t2d0s5 as the second slice)

You've already created your pool, so all you want to do is attach the  
new slice to be a mirror of one that is already in the pool:

zpool attach export c2t0d0s5 c2t2d0s5

This will create a mirror between c2t0d0s5 and c2t2d0s5

First be sure that slice 5 on c2t2d0 is at least the same size as  
c2t0d0s5. If c2t2d0 is unused, you can copy the vtoc from the first  
disk to the second one with a simple command:

prtvtoc /dev/rdsk/c2t0d0s2 | fmthard -s - /dev/rdsk/c2t2d0s2

Since you're on x86, you may need to run fdisk against c2t2d0 if it is  
a virgin drive.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mirroring zfs slice

2008-06-17 Thread Dale Ghent
On Jun 17, 2008, at 1:13 PM, dick hoogendijk wrote:

 This is about slices. Can this be done for a whole disk too? And it
 yes, do these disks have to be exactly the same size?

Indeed, it can be used on an entire disk.

Examples:

zpool create mypool c1t0d0
zpool attach mypool c1t0d0 c2t0d0

zpool create mypool mirror c1t0d0 c2t0d0
...

Note the lack of a slice ID in the above command's disk  
specifications. ZFS will interpret this as use the entire disk

At that point it will apply a EFI label to the disk and bring it into  
the pool as specified.

This method is preferred over specifying slice 2 (eg, c1t0d0s2) when  
wanting to use the entire disk for ZFS.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] nfs and smb performance

2008-03-27 Thread Dale Ghent


Have you turned on the Ignore cache flush commands option on the  
xraids? You should ensure this is on when using ZFS on them.

/dale

On Mar 27, 2008, at 6:16 PM, abs wrote:
 hello all,
 i have two xraids connect via fibre to a poweredge2950.  the 2  
 xraids are configured with 2 raid5 volumes each, giving me a total  
 of 4 raid5 volumes.  these are striped across in zfs.  the read and  
 write speeds local to the machine are as expected but i have noticed  
 some performance hits in the read and write speed over nfs and samba.

 here is the observation:

 each filesystem is shared via nfs as well as samba.
 i am able to mount via nfs and samba on a Mac OS 10.5.2 client.
 i am able to only mount via nfs on a Mac OS 10.4.11 client. (there  
 seems to be authentication/encryption issue between the 10.4.11  
 client and solaris box in this scenario. i know this is a bug on the  
 client side)

 when writing a file via nfs from the 10.5.2 client the speeds are 60  
 ~ 70 MB/sec.
 when writing a file via samba from the 10.5.2 client the speeds are  
 30 ~ 50 MB/sec

 when writing a file via nfs from the 10.4.11 client the speeds are  
 20 ~ 30 MB/sec.

 when writing a file via samba from a Windows XP client the speeds  
 are 30 ~ 40 MB.

 i know that there is an implementational difference in nfs and samba  
 on both Mac OS 10.4.11 and 10.5.2 clients but that still does not  
 explain the Windows scenario.


 i was wondering if anyone else was experiencing similar issues and  
 if there is some tuning i can do or am i just missing something.   
 thanx in advance.

 cheers,
 abs






 Never miss a thing. Make Yahoo your  
 homepage.___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] List of ZFS patches to be released with Solaris 10 U5

2008-03-04 Thread Dale Ghent
On Mar 4, 2008, at 5:13 PM, Ben Grele wrote:

 Experts,
 Do you know where I could find the list of all the ZFS patches that  
 will
 be released with Solaris 10 U5? My customer told me that they've seen
 such list for prior update releases. I've not been able to find  
 anything
 like it in the usual places.

Yes, something akin to George Wilson's post for s10u4 would be nice:

http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024516.html

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS replication strategies

2008-02-01 Thread Dale Ghent
On Feb 1, 2008, at 1:15 PM, Vincent Fox wrote:

 Ideally I'd love it if ZFS directly supported the idea of rolling  
 snapshots out into slower secondary storage disks on the SAN, but in  
 the meanwhile looks like we have to roll our own solutions.

If you're running some recent SXCE build, you could use ZFS with AVS  
for remote replication over IP.

http://blogs.sun.com/AVS/entry/avs_and_zfs_seamless

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Dale Ghent
On Jan 30, 2008, at 3:44 PM, Vincent Fox wrote:

 What we ended up doing, for political reasons, was putting the  
 squeeze on our Sun reps and getting a 10u4 kernel spin patch with...  
 what did they call it?  Oh yeah a big wad of ZFS fixes.  So this  
 ends up being a hug PITA because for the next 6 months to a year we  
 are tied to getting any kernel patches through this other channel  
 rather than the usual way.   But it does work for us, so there you  
 are.

Speaking of big wad of ZFS fixes, is it me or is anyone else here  
getting kind of displeased over the glacial speed of the backporting  
of ZFS stability fixes to s10? It seems that we have to wait around  
4-5 months for a oft-delayed s10 update for any fixes of substance to  
come out.

Not only that, but also one day the zfs is its own patch, and then it  
is part of the current KU, and now it's part of the nfs patch where  
zfs isn't mentioned anywhere in the patch's synopsis.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz and compression, difficulties

2008-01-26 Thread Dale Ghent
On Jan 26, 2008, at 3:24 AM, Joachim Pihl wrote:

 So far so good, zfs get all reports compression to be active. Now  
 for
 the problem: After adding another 300GB of uncompressed .tif  
 and .bin/.cue
 (audio CD) files, compression ratio is still at 1.00, indicating  
 that no
 compression has taken place.

TIFF files can how their own compression (compressed TIFF) and many  
image editors have this on by default, so you wouldn't know about it  
unless you specifically looked for it. I think the TIFF spec specifies  
LZW compression for this... but either way, if this is indeed the  
case, zfs compression won't help with those

Now, the bin/cue file format specifies no optional compression, so  
those bin files should be nothing but a raw image of 16bit PCM  
audio... which you should see some (but not great) compression with.  
The default lzjb compression scheme in zfs might not be terribly  
effective on this type of file data being that it's optimized for  
speed rather than compression efficiency. Try turning on gzip  
compression in zfs instead and see if things improve.

To make that simple I'd just make a new fs (eg; pool/data/audio) and  
then 'zfs set compression=gzip-4 pool/data/audio' and then mv your bin/ 
cue files there.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS for OSX - it'll be in there.

2007-10-04 Thread Dale Ghent
...and eventually in a read-write capacity:

http://www.macrumors.com/2007/10/04/apple-seeds-zfs-read-write- 
developer-preview-1-1-for-leopard/

Apple has seeded version 1.1 of ZFS (Zettabyte File System) for Mac  
OS X to Developers this week. The preview updates a previous build  
released on June 26, 2007.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Dale Ghent
On Oct 3, 2007, at 10:31 AM, Roch - PAE wrote:

 If the DB cache is made large enough to consume most of memory,
 the ZFS copy will quickly be evicted to stage other I/Os on
 their way to the DB cache.

 What problem does that pose ?

Personally, I'm still not completely sold on the performance  
(performance as in ability, not speed) of ARC eviction. Often times,  
especially during a resilver, a server with ~2GB of RAM free under  
normal circumstances will dive down to the minfree floor, causing  
processes to be swapped out. We've had to take to manually  
constraining ARC max size so this situation is avoided. This is on  
s10u2/3. I haven't tried anything heavy duty with Nevada simply  
because I don't put Nevada in production situations.

Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm  
surprised that this is being met with skepticism considering that  
Oracle highly recommends direct IO be used,  and, IIRC, Oracle  
performance was the main motivation to adding DIO to UFS back in  
Solaris 2.6. This isn't a problem with ZFS or any specific fs per se,  
it's the buffer caching they all employ. So I'm a big fan of seeing  
6429855 come to fruition.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Dale Ghent
On Oct 3, 2007, at 5:21 PM, Richard Elling wrote:

 Slightly off-topic, in looking at some field data this morning  
 (looking
 for something completely unrelated) I notice that the use of directio
 on UFS is declining over time.  I'm not sure what that means...  
 hopefully
 not more performance escalations...

Sounds like someone from ZFS team needs to get with someone from  
Oracle/MySQL/Postgres and get the skinny on how the IO rubber-road  
boundary should look, because it doesn't sound like there's a  
definitive or at least a sure answer here.

Oracle trumpets the use of DIO, and there are benchmarks and first- 
hand accounts out there from DBAs on its virtues - at least when  
running on UFS (and EXT2/3 on Linux, etc)

As it relates to ZFS mechanics specifically, there doesn't appear to  
be any settled opinion.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS

2007-09-24 Thread Dale Ghent
On Sep 24, 2007, at 6:15 PM, Paul B. Henson wrote:

 Well, considering that some days we automatically create accounts for
 thousands of students, I wouldn't want to be the one stuck typing 'zfs
 create' a thousand times 8-/. And that still wouldn't resolve our
 requirement for our help desk staff to be able to manage quotas  
 through our
 existing identity management system.

Not to sway you away from ZFS/NFS considerations, but I'd like to add  
that people who in the past used DFS typically went on to replace it  
with AFS. Have you considered it?

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] The Dangling DBuf Strikes Back

2007-09-03 Thread Dale Ghent

I saw a putback this past week from M. Maybee regarding this, but I  
thought I'd post here that I saw what is apparently an incarnation of  
6569719 on a production box running  s10u3 x86 w/ latest (on  
sunsolve) patches. I have 3 other servers configured the same way WRT  
work load, zfs pools and hardware resources, so if this occurs again  
I'll see about logging a case and getting a relief patch. Anyhow,  
perhaps a backport to s10 may be in order

This server is an x4100 hosting about 10k email accounts using Cyrus,  
and Cyrus's squatter mailbox indexer was running at the time (lots  
of small r/w IO), as well as Networker-based backups which sucks data  
off a clone (yet tons more small ro IO).

Unfortunately due to a recent RAM upgrade of the server in question,  
the dump device was too small to hold a complete vmcore, but at least  
the stack trace was logged. Here it is, at least for the posterity's  
sake:

Sep  3 03:27:43 xxx ^Mpanic[cpu0]/thread=fe80007b7c80:
Sep  3 03:27:43 xxx genunix: [ID 895785 kern.notice] dangling dbufs  
(dn=fe8432bad7d8, dbuf=fe81f93c5bd8)
Sep  3 03:27:43 xxx unix: [ID 10 kern.notice]
Sep  3 03:27:43 xxx genunix: [ID 655072 kern.notice] fe80007b7960  
zfs:zfsctl_ops_root+2f168a42 ()
Sep  3 03:27:43 xxx genunix: [ID 655072 kern.notice] fe80007b79a0  
zfs:zfsctl_ops_root+2f168af8 ()
Sep  3 03:27:44 xxx genunix: [ID 655072 kern.notice] fe80007b7a10  
zfs:dnode_sync+334 ()
Sep  3 03:27:44 xxx genunix: [ID 655072 kern.notice] fe80007b7a60  
zfs:dmu_objset_sync_dnodes+7b ()
Sep  3 03:27:44 xxx genunix: [ID 655072 kern.notice] fe80007b7af0  
zfs:dmu_objset_sync+5c ()
Sep  3 03:27:44 xxx genunix: [ID 655072 kern.notice] fe80007b7b10  
zfs:dsl_dataset_sync+23 ()
Sep  3 03:27:44 xxx genunix: [ID 655072 kern.notice] fe80007b7b60  
zfs:dsl_pool_sync+7b ()
Sep  3 03:27:44 xxx genunix: [ID 655072 kern.notice] fe80007b7bd0  
zfs:spa_sync+116 ()
Sep  3 03:27:44 xxx genunix: [ID 655072 kern.notice] fe80007b7c60  
zfs:txg_sync_thread+115 ()
Sep  3 03:27:44 xxx genunix: [ID 655072 kern.notice] fe80007b7c70  
unix:thread_start+8 ()

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS

2007-05-30 Thread Dale Ghent

On May 31, 2007, at 12:15 AM, Nathan Huisman wrote:


= PROBLEM

To create a disk storage system that will act as an archive point for
user data (Non-recoverable data), and also act as a back end storage
unit for virtual machines at a block level.


snip

Here are some tips from me. I notice you mention iSCSI a lot so I'll  
stick to that...


Q1: The best way to mirror in real time is to do it from the  
consumers of the storage, ie, your iSCSI clients. Implement two  
storage servers (say, two x4100s with attached disk) and put their  
disk into zpools. The two servers do not have to know about each  
other. Configure ZFS file systems identically on both and export them  
to the client that'll use it. Use the software mirroring feature on  
the client to mirror these iSCSI shares (eg: dynamic disks on  
Windows, LVM on Linux, SVM on Solaris).


What this gives you are two storage servers (ZFS-backed, serving out  
iSCSI shares) and the client(s) take a share from each and mirror  
them... if one of the ZFS servers were to go kaput, the other is  
still there actively taking in and serving data. From the client's  
perspective, it'll just look like one side of the mirror went down  
and after you get the downed ZFS server back up, you would initiate  
normal mirror reattachment procedure on the client(s).


This will also allow you to patch your ZFS servers without downtime  
incurred on your clients.


The disk storage on your two ZFS+iSCSI servers could be anything.  
Given your budget and space needs, I would suggest looking at the  
Apple Xserve RAID with 750GB drives. You're a .edu, so the price of  
these things will likely please you (I just snapped up two of them at  
my .edu for a really insane price).


Q2: The client will just see the iSCSI share as a raw block device.  
Put your ext3/xfs/jfs on it as you please... to ZFS on the it is just  
data. That's the only way you can use iSCSI, really it's block  
level, remember. On ZFS, the iSCSI backing store is one large sparse  
file.


Q3: See the zpool man page, specifically the 'zpool replace ...'  
command.


Q4: Since (or if) you're doing iSCSI, ZFS snapshots will be of no  
value to you since ZFS can't see into those iSCSI backing store  
files. I'll assume that you have a backup system in place for your  
existing infrastructure (Networker, NetBackup or what have you) so  
back up the stuff from the *clients* and not the ZFS servers. Just  
space the backup schedule out if you have multiple clients so that  
the ZFS+iSCSI servers aren't overloaded with all its clients reading  
data suddenly with backup time rolls around.


Q5: Sure, nothing would stop you from doing that sort of config, but  
it's something that would make Rube Goldberg smile. Keep out any  
unneeded complexity and condense the solution.


Excuse my ASCII art skills, but consider this:

[JBOD/ARRAY]---(fc)---[ZFS/iSCSI server 1]---(iscsi share)- 
[Client]

  [mirroring the]
[JBOD/ARRAY]---(fc)---[ZFS/iSCSI server 2]---(iscsi share)- 
[ two   shares ]


Kill one of the JBODs or arrays, OR the ZFS+iSCSI servers, and your  
clients are still in good shape as long as their software mirroring  
facility behaves.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] RAIDZn+1 (related to the h/w raid ponderings)

2007-05-29 Thread Dale Ghent


Dropping in on this convo a little late, but here's something that  
has been nagging me - gaining the ability to mirror two (or more)  
RAIDZ sets.


A little background on why I'd really like to see this

I have two data centers on my campus and my FC-based SAN stretches  
between them. When I buy RAID arrays, I do so in pairs so that one  
array ends up in each data center, and the LUN config on these  
matched arrays are also the same. The server which consume those  
LUNs use software mirroring (via ZFS, SVM, or whatever) to mirror  
data in real time between the two arrays... in effect gaining me  
chassis-level redundancy in two separate buildings set half a mile  
apart. If one building goes up in smoke, I know that my data is fine  
in the other.


Anyway, lets now step down to the array level. These array disks are  
configured in RAID5 sets. This is a further level of redundancy so  
that if one of the arrays in a pair is out of service for an extended  
time, the remaining array can still withstand a drive failure.


Well, I'd like to get rid of that hardware RAID5 and use RAIDz... but  
then that would preclude my mirroring setup from happening since I  
can't set up two distinct RAIDZn sets within a pool and mirror them.


For those familiar with ZFS internals, could a RAIDZ+1  
configuration be a distinct possibility?


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] gzip compression throttles system?

2007-05-03 Thread Dale Ghent

On May 2, 2007, at 10:36 PM, Ian Collins wrote:


The files are between 15 and 50MB.  It's worth pointing out that .wav
files only compress by a few percent.


Not entirely related to your maxed CPU problem, but

gzip on PCM audio isn't, as you point out, going to earn you much of  
a compression ratio. It's also very very slow on this kind of data.  
If you're looking to compress wav or other PCM-based audio formats  
for storage space reasons, use FLAC. It's guaranteed that you'll get  
a far better compress ratio and a quicker result for your troubles  
than you will with gzip. There are many technical reasons for this,  
but generally, FLAC knows about audio down to the sample and is  
geared for the properties of PCM audio. gzip just sees it as a  
generic data blob like any other which contributes to its  
inefficiencies in this case.


The downside is that, well, it's not an option on the ZFS level, but  
you don't necessarily have to pre-decompress your FLAC-compressed WAV  
file in order to listen to them :)


Hmm... a FLAC-based compression mech in ZFS for efficient (and  
lossless) PCM audio storage...


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] XServe Raid Complex Storage Considerations

2007-04-25 Thread Dale Ghent

On Apr 25, 2007, at 11:17 AM, cedric briner wrote:


hello the list,

After reading the _excellent_ ZFS Best Practices Guide, I've seen  
in the section: ZFS and Complex Storage Consideration that we  
should configure the storage system to ignore command which will  
flush the memory into the disk.


So does some of you knows how to tell Xserve Raid to ignore  
``fsync'' requests ?


After the announce that zfs will be included in Tiger, I'll be  
surprised that the Xserve Raid will not include such configuration.


You can tell the Xserve RAID to ignore cache flush commands, but the  
option to make this so is controller-wide and not settable on a per- 
LUN basis.


In the Xserve RAID management app, select a controller, click the  
Settings button and enter the admin password for the array. Then  
click the Performance tab, and make sure that the Allow Host Cache  
Flushing option is unchecked for the controllers you don't want that  
on.


/dale


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Exporting zvol properties to .zfs

2007-02-19 Thread Dale Ghent


Here at my university, I recently started selling disk space to users  
from a server with 4.5TB of space. They purchase space and I make  
them their own volume, typically with compression on and it's then  
exported via NFS to their servers/workstations. So far this has gone  
quite well (with zil_disable and a tuned up nfsd of course)


Anyhow, the frustration exhibited by a new customer of mine made me  
think of a new RFE possibility. This customer purchased some space  
and began moving his data (2TB's worth) over to it from his ailing  
RAID array. He became frantic at one point and said that the transfer  
was taking too long.


What he was doing was judging the speed at which the move was going  
by doing a 'df' on his NFS client and comparing that to the existing  
partition which holds his data. What he didn't realize was that the  
transfer seemed slower because his data on the ZFS-backed NFS server  
was being compressed by a 2:1 ratio... so, for example, although the  
df on his NFS client reported 250G used, in reality approximately  
500G had been transfered and then compressed on ZFS.


This was explained to him and that averted his fury for the time  
being... but it got me thinking about how things such as the current  
compression ratio for a volume could be indicated over a otherwise  
ZFS-agnostic NFS export. The .zfs snapdir came to mind. Perhaps ZFS  
could maintain a special file under there, called compressratio for  
example, and a remote client could cat it or whatever to be aware of  
how volume compression factors into their space usage.


Any thoughts? A quick b.o.o search did bring up and existing RFE  
along these lines, so I thought I'd mention that here.


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snapdir visable recursively throughout a dataset

2007-02-05 Thread Dale Ghent

On Feb 5, 2007, at 7:57 AM, Robert Milkowski wrote:


I haven't tried it but what if you mounted ro via loopback into a zone


/zones/myzone01/root/.zfs is loop mounted in RO to /zones/ 
myzone01/.zfs


I've tried something similar but found out that vfstab is evaluated  
prior to zpool import, so any lofs directives in vfstab will fail if  
the source of the lofs mount is in ZFS :/


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-31 Thread Dale Ghent

On Jan 31, 2007, at 4:26 AM, Selim Daoud wrote:


you can still do some lun masking at the HBA level (Solaris 10)
this feature is call blacklist


Oh, I'd do that but Solaris isn't the only OS that uses arrays on my  
SAN, and other hosts even cross-departmental. Thus masking from the  
array is a must to keep the amount of host-based tomfoolery to a  
minimum.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] System pause peculiarity with mysql on zfs

2006-12-07 Thread Dale Ghent


Hey all, I run a netra X1 as the mysql db server for my small  
personal web site. This X1 has two drives in it with SVM-mirrored UFS  
slices for / and /var, a swap slice, and slice 7 is zfs. There is one  
zfs mirror pool called local on which there are a few file systems,  
one of which is for mysql. slice 7 used to be ufs, and I had no  
performance problems when that was the case. There is 1152MB of RAM  
on this box, half of which is in use. Solaris 10 FCS + all the latest  
patches as of today.


So anyway, after moving mysql to live on zfs (with compression turned  
on for the volume in question), I noticed that web pages on my site  
took a bit of time, sometimes up to 20 seconds to load. I'd jump on  
to my X1, and notice that according to top, kernel was hogging  
80-100% of the 500Mhz CPU, and mysqld was the top process in CPU use.  
The load average would shoot from a normal 0.something up to 6 or  
even 8. Command-line response was stop and go.


Then I'd notice my page would finally load, and that corresponded  
with load and kernel CPU usage decreasing back to normal levels.


I am able to reliably replicate this, and I ran lockstat while this  
was going on, the output of which is here:


http://elektronkind.org/osol/lockstat-zfs-0.txt

Part of me is kind of sure that this is 6421427 as there appears to  
be long and copious trips through ata_wait() as that bug illustrates,  
but I just want to be sure of it (and when is that bug seeing a  
solaris 10 patch, btw?)


TIA,
/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System pause peculiarity with mysql on zfs

2006-12-07 Thread Dale Ghent

On Dec 7, 2006, at 1:46 PM, Jason J. W. Williams wrote:


Hi Dale,

Are you using MyISAM or InnoDB?


InnoDB.


Also, what's your zpool configuration?


A basic mirror:

[EMAIL PROTECTED]zpool status
  pool: local
state: ONLINE
scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
local ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s7  ONLINE   0 0 0
c0t2d0s7  ONLINE   0 0 0

errors: No known data errors
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System pause peculiarity with mysql on zfs

2006-12-07 Thread Dale Ghent

On Dec 7, 2006, at 5:22 PM, Nicholas Senedzuk wrote:

You said you are running Solaris 10 FCS but zfs was not released  
until Solaris 10 6/06 which is Solaris 10U2.


Look at a Solaris 10 6/06 CD/DVD. Check out the Solaris_10/ 
UpgradePatches directory.


ah! well whaddya know...

Yes, apply those (you have to do them in the right order to do it in  
one run with 'patchadd -M') and you can bring your older box up to  
date with the update release.


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: System pause peculiarity with mysql on zfs

2006-12-07 Thread Dale Ghent

On Dec 7, 2006, at 6:14 PM, Anton B. Rang wrote:


This does look like the ATA driver bug rather than a ZFS issue per se.


Yes indeed. Well, that answers that. FWIW, I'm hour 2 of a mysql  
configure script run. Yow!


(For the curious, the reason ZFS triggers this when UFS doesn't is  
because ZFS sends a synchronize cache command to the disk, which is  
not handled in DMA mode by the controller; and for this particular  
controller, switching between DMA and PIO mode has some quirks  
which were worked around by adding delays. The fix involves a new  
quirk-work-around.)


Ah, so I suppose this would affect the V100, too. The same ALi IDE  
controller in that box.


Thanks for the insight. Since the fix for this made it into snv_52, I  
suppose it's too recent for a backport and patch release for s10 :(


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS related kernel panic

2006-12-04 Thread Dale Ghent

Matthew Ahrens wrote:

Jason J. W. Williams wrote:

Hi all,

Having experienced this, it would be nice if there was an option to
offline the filesystem instead of kernel panicking on a per-zpool
basis. If its a system-critical partition like a database I'd prefer
it to kernel-panick and thereby trigger a fail-over of the
application. However, if its a zpool hosting some fileshares I'd
prefer it to stay online. Putting that level of control in would
alleviate a lot of the complaints it seems to me...or at least give
less of a leg to stand on. ;-)


Agreed, and we are working on this.


Similar to UFS's onerror mount option, I take it?

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS related kernel panic

2006-12-04 Thread Dale Ghent

Richard Elling wrote:


Actually, it would be interesting to see how many customers change the
onerror setting.  We have some data, just need more days in the hour.


I'm pretty sure you'd find that info in over 6 years of submitted 
Explorer output :)


I imagine that stuff is sandboxed away in a far off department, though.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored Raidz

2006-10-24 Thread Dale Ghent

On Oct 24, 2006, at 4:56 AM, Michel Kintz wrote:


It is not always a matter of more redundancy.
In my customer's case, they have storage in 2 different rooms of  
their datacenter and want to mirror from one storage unit in one  
room to the other.
So having in this case a combination of RAID-Z + Mirror makes sense  
in my mind   or ?


It /does/ make sense. Having a geographically diverse storage  
scenario like this is good, but changes the rules a bit, and in a way  
that you can't fully take advantage of by using only soft RAID such  
as ZFS or SVM. The missing link as you point out is the missing  
ability to mirror (within the ZFS) a RAIDZ vdev.


To get around this, I just use hardware RAID5 on my separate arrays  
and use either ZFS or SVM mirroring between the two on the hosts. I  
have thought about this over the past several months, and believe  
that it's probably better this way rather doing it all in ZFS or SVM.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Mirrored Raidz

2006-10-24 Thread Dale Ghent

On Oct 24, 2006, at 12:33 PM, Frank Cusack wrote:

On October 24, 2006 9:19:07 AM -0700 Anton B. Rang  
[EMAIL PROTECTED] wrote:
Our thinking is that if you want more redundancy than RAID-Z, you  
should
use RAID-Z with double parity, which provides more reliability  
and more

usable storage than a mirror of RAID-Zs would.


This is only true if the drives have either independent or identical
failure modes, I think.  Consider two boxes, each containing ten  
drives.
Creating RAID-Z within each box protects against single-drive  
failures.

Mirroring the boxes together protects against single-box failures.


But mirroring also protects against single-drive failures.


Right, but mirrored raidz would in this case protect the admin from:

1) one entire jbod chassis/comm failure, and
2) individual drive failure in the remaining chassis during an  
occurrence of (1)


Since the person is dealing with JBODS and not hardware RAID arrays,  
my suggestion is to combine ZFS and SVM.


1) Use ZFS and make a raidz-based ZVOL of disks on each of the two JBODs
2) Use SVM to mirror the two ZVOLs. Newfs that with UFS.

Not at all optimal, but it'll work. It would be nice if you could  
manage a mirror of existing vdevs within ZFS and this mirroring would  
be a special case where it would be dumb and just present the volume  
and pass through most of the stuff to the raidz (or whatever) vdev  
below. It would be silly to double-cksum and compress everything, not  
to mention the possibility of differing record sizes.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Mirrored Raidz

2006-10-24 Thread Dale Ghent

On Oct 24, 2006, at 2:46 PM, Richard Elling - PAE wrote:


Pedantic question, what would this gain us other than better data
retention?
Space and (especially?) performance would be worse with RAID-Z+1
than 2-way mirrors.


You answered your own question, it would gain the user better data  
retention :)


The space tradeoff is an obvious side effect and unavoidable. For  
situations where this is not an overriding issue, it just isn't an  
issue. I don't believe performance would be adversely impacted to a  
practical degree, though.


A dumb ZFS mirror strategy in this case would just copy reads and  
writes to and from the vdevs below it, OR pre-package the writes  
itself with compression and checksums and send that data below to the  
raidz's to be stored (which would probably be more problematic to  
implement in the zfs code).


With the latter, checksums and compression would be done only once  
(at the mirror level) and not done by each of the n number of  
underlying vdevs.


So, a little ascii art to summarize:

1) The probably-easiest-to-implement approach:

  [app]
  |
[zfs volume]
  |
   [vdev mirror]  -- passes thru read/write ops,  
regulates recordsize. It's mainly dumb

  |
  [raidz vdev]-- |--[raidz vdev]...  -- each vdev generates cksums,  
compression per normal

|   |   |   |  |   |   |   |
   [phys devs]  [phys devs]


2) The less-CPU-but-more-convoluted approach:

  [app]
  |
[zfs volume]
  |
   [vdev mirror]  -- generates cksums, compression,  
regulates recordsize

  |
  [raidz vdev]-- |--[raidz vdev]...  -- each vdev just stores data  
as it is passed in from above

|   |   |   |  |   |   |   |
   [phys devs]  [phys devs]


Any of those two would be quite handy in the environment where you  
want to mirror data between, say, JBODs and retention is the primary  
goal.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Mirrored Raidz

2006-10-24 Thread Dale Ghent

On Oct 24, 2006, at 3:23 PM, Frank Cusack wrote:


http://blogs.sun.com/roch/entry/when_to_and_not_to says a raid-z
vdev has the read throughput of 1 drive for random reads.  Compared
to #drives for a stripe.  That's pretty significant.


Okay, then if the person can stand to lose even more space, do zfs  
mirroring on each JBOD. Then we'd have a mirror of mirrors instead of  
a mirror of raidz's.


Remember, the OP wanted chassis-level redundancy as well as  
redundancy within the domain of each chassis. You can't do that now  
with ZFS unless you combine ZFS with SVM.


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAID-10

2006-10-22 Thread Dale Ghent

On Oct 22, 2006, at 9:57 PM, Al Hopper wrote:


On Sun, 22 Oct 2006, Stephen Le wrote:

Is it possible to construct a RAID-10 array with ZFS? I've read  
through

the ZFS documentation, and it appears that the only way to create a
RAID-10 array would be to create two mirrored (RAID-1) emulated  
volumes

in ZFS and combine those to create the outer RAID-0 volume.

Am I approaching this in the wrong way? Should I be using SVM to  
create
my RAID-1 volumes and then create a ZFS filesystem from those  
volumes?


No - don't do that.  Here is a ZFS version of a RAID 10 config with 4
disks:


snip

To further agree with/illustrate Al's point, here's an example of  
'zpool status' output which reflects this type of configuration:


(Note that there is one mirror set for each pair of drives. In this  
case, drive 1 on crontroller 3 is mirrored to drive 1 on controller  
4, and so on. This will ensure continuity should one controller/buss/ 
cable fail.)


[EMAIL PROTECTED]zpool status
  pool: data
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
data ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t0d0   ONLINE   0 0 0
c4t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t1d0   ONLINE   0 0 0
c4t10d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t2d0   ONLINE   0 0 0
c4t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t3d0   ONLINE   0 0 0
c4t12d0  ONLINE   0 0 0

errors: No known data errors
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Inexpensive SATA Whitebox

2006-10-17 Thread Dale Ghent

On Oct 17, 2006, at 1:59 PM, Richard Elling - PAE wrote:


The realities of the hardware world strike again.

Sun does use the Siig SATA chips in some products, Marvell in others,
and NVidia MCPs in others.  The difference is in who writes the  
drivers.

NVidia, for example, has a history of developing their own drivers and
keeping them closed-source.  This is their decision and, I speculate,
largely based on their desire to keep the hardware implementation  
details

from their competitors.


If you want to learn the source of mine, Frank's and undoubtedly  
others' ire, please refer to:


http://www.sun.com/products-n-solutions/hardware/docs/html/ 
819-3722-15/index.html#21924


This is the release notes of the X2100. The indication that hot-swap  
works under Windows (but not Linux or Solaris) seems to be an obvious  
indicator of it not being a hardware lacking, but a driver one (which  
would make sense, ata does not expect a device to go away).


Further, if my memory isn't playing tricks on me, when I received my  
first X2100 (around a month or two after they were first released) I  
recall an addition small yellow paper tucked in the accessories box  
separately from the standard documentation saying that hot-swap under  
Solaris would be supported in a future Solaris version.


There's also a bug open on this matter, and has been open for a long  
time. If this wasn't feasible, I imagine the bug would be closed  
already with a WONTFIX.



If you want NVidia drivers for Solaris, then please let NVidia know.


As an outsider, I don't want to trivialize the happenings in the Sun- 
nVidia relationship, but look at nge(7d) as an example. Surely if  
that exists (closed source, and I assume it's provided by nVidia in  
part or whole and under NDA) then a NV SATA driver shouldn't be hard  
to obtain, even if it too ended up being closed-source (a la the  
Marvell driver).


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS patches for S10 6/06

2006-10-12 Thread Dale Ghent

On Oct 5, 2006, at 2:28 AM, George Wilson wrote:


Andreas,

The first ZFS patch will be released in the upcoming weeks. For  
now, the latest available bits are the ones from s10 6/06.


George, will there at least be a T patch available?

I'm anxious for these because my ZFS-backed NFS server just isn't  
having it in terms of client i/o rates.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Inexpensive SATA Whitebox

2006-10-11 Thread Dale Ghent

On Oct 11, 2006, at 10:10 AM, [EMAIL PROTECTED] wrote:

So are there any pci-e SATA cards that are supported ? I was hoping  
to go with a sempron64. Using old-pci seems like a waste.


Yes.

I wrote up a little review of the SIIG SC-SAE412-S1 card which is a  
two port PCIe card based on the Silicon Image 3132 chip:


http://elektronkind.org/2006/09/siig-esata-ii-pcie-card-and-opensolaris

The card is a two port eSATA2 card, but SIIG also sells a two port  
internal SATA card based on the same chip as well.


This card is running fine under SX:CR build 47 and would presumably  
also run fine under Solaris 10 Update 2 or later.


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Inexpensive SATA Whitebox

2006-10-11 Thread Dale Ghent

On Oct 11, 2006, at 7:36 PM, David Dyer-Bennet wrote:


I've been running Linux since kernel 0.99pl13, I think it was, and
have had amazingly little trouble.  Whereas I'm now sitting on $2k of
hardware that won't do what I wanted it to do under Solaris, so it's a
bit of a hot-button issue for me right now.


Yes, but remember back in the days of Linux 0.99, the amount of PC  
hardware was nowhere near as varied as it is today. Integrated  
chipsets? A pipe dream! Aside from video card chips and proprietary  
pre-ATAPI CDROM interfaces, you didn't have to reach far to find a  
driver which covered a given piece of hardware because when you got  
down to it, most hardware was the same. NE2000, anyone?


Today, in 2006 - much different story. I even had Linux AND Solaris  
problems with my machine's MCP51 chipset when it first came out. Both  
forcedeth and nge croaked on it. Welcome to the bleeding edge. You're  
unfortunately on the bleeding edge of hardware AND software.


When in that situation, one can be patient, be helpful, or go back to  
where one came from.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Inexpensive SATA Whitebox

2006-10-11 Thread Dale Ghent

On Oct 12, 2006, at 12:23 AM, Frank Cusack wrote:

On October 11, 2006 11:14:59 PM -0400 Dale Ghent  
[EMAIL PROTECTED] wrote:

Today, in 2006 - much different story. I even had Linux AND Solaris
problems with my machine's MCP51 chipset when it first came out. Both
forcedeth and nge croaked on it. Welcome to the bleeding edge. You're
unfortunately on the bleeding edge of hardware AND software.


Yeah, Solaris x86 is so bleeding edge that it doesn't even support
Sun's own hardware!  (x2100 SATA, which is now already in its second
generation)


You know, I'm really perplexed over that, especially given that the  
silicon image chips (AFAIK) aren't in any Sun product and yet they  
have a SATA framework driver.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Dale Ghent

James C. McPherson wrote:

As I understand things, SunCluster 3.2 is expected to have support for 
HA-ZFS

and until that version is released you will not be running in a supported
configuration and so any errors you encounter are *your fault alone*.


Still, after reading Mathias's description, it seems that the former 
node is doing an implicit forced import when it boots back up. This 
seems wrong to me.


zpools should be imported only of the zpool itself says it's not already 
taken, which of course would be overidden by a manual -f import.


zpool sorry, i already have a boyfriend, host b
host a darn, ok, maybe next time

rather than the current scenario:

zpool host a, I'm over you now. host b is now the man in my life!
host a I don't care! you're coming with me anyways. you'll always be 
mine!

* host a stuffs zpool into the car and drives off

...and we know those situations never turn out particularly well.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Dale Ghent

On Sep 13, 2006, at 12:32 PM, Eric Schrock wrote:


Storing the hostid as a last-ditch check for administrative error is a
reasonable RFE - just one that we haven't yet gotten around to.
Claiming that it will solve the clustering problem oversimplifies the
problem and will lead to people who think they have a 'safe' homegrown
failover when in reality the right sequence of actions will  
irrevocably

corrupt their data.


HostID is handy, but it'll only tell you who MIGHT or MIGHT NOT have  
control of the pool.


Such an RFE would even more worthwhile if it included something such  
as a time stamp. This time stamp (or similar time-oriented signature)  
would be updated regularly (bases on some internal ZFS event). If  
this stamp goes for an arbitrary length of time without being  
updated, another host in the cluster could force import it on the  
assumption that the original host is no longer able to communicate to  
the zpool.


This is a simple idea description, but perhaps worthwhile if you're  
already going to change the label structure for adding the hostid.


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Dale Ghent

On Sep 13, 2006, at 1:37 PM, Darren J Moffat wrote:

That might be acceptable in some environments but that is going to  
cause  disks to spin up.  That will be very unacceptable in a  
laptop and maybe even in some energy conscious data centres.


Introduce an option to 'zpool create'? Come to think of it,  
describing attributes for a pool seems to be lacking (unlike zfs  
volumes)


What you are proposing sounds a lot like a cluster hear beat which  
IMO really should not be implemented by writing to disks.


That would be an extreme example of the use for this. While it / 
could/ be used as a heart beat mechanism, it would be useful  
administratively.


# zpool status foopool
Pool foopool is currently imported by host.blah.com
Import time: 4 April 2007 16:20:00
Last activity: 23 June 2007 18:42:53
...
...


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] For those looking for an SATA add-on card...

2006-09-10 Thread Dale Ghent


I have success with one made by SIIG:

http://elektronkind.org/2006/09/siig-esata-ii-pcie-card-and-opensolaris

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA hot plug correction

2006-09-09 Thread Dale Ghent

On Sep 9, 2006, at 12:04 PM, David Dyer-Bennet wrote:


Thanks, that seems fairly clear.  So another approach I could take is
to buy one of the supported controllers, if they're available on a
card I could plug in.


The Silicon Image chipset is pretty popular and can be found on many  
SATA and eSATA cards, such as this one:


http://cooldrives.com/seata1ex1inp.html


The porting of ZFS to Linux may also eventually solve my problem.

I'm going to try installing the nv44 I've got, just in case that might
have advanced the driver state of the art.


Even in nv44, the SATA framework still contains only two drivers, one  
for the Silicon Image sil3124 and the other for the Marvell 88SX6xxx.  
So as of today, you're still limited to using SATA controllers based  
on those two chipsets (at least they're popular chipsets for SATA add- 
on cards).


As Frank mentioned, the nVidia CK8-04 chipset which is present in the  
X2100 and (I think) the Ultra 20 is conspicuously absent, as are SATA  
framework drivers for the nVidia MCP55 chipset which is used in the  
new Sun AM2-based systems. I scour the weekly putback logs for any  
SATA-related changes and things seem kind of quiet on that front, and  
naturally the bugs logged for adding these features never reflect any  
updates. grumble groan.


I really do wish Sun was more vocal regarding their progress in this  
important area.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs. Apple XRaid

2006-07-31 Thread Dale Ghent

On Jul 31, 2006, at 7:30 PM, Rich Teer wrote:


On Mon, 31 Jul 2006, Dale Ghent wrote:

So what does this exercise leave me thinking? Is Linux 2.4.x  
really screwed up
in NFS-land? This Solaris NFS replaces a Linux-based NFS server  
that the


Linux has had, uhhmmm (struggling to be nice), iffy NFS for ages.


Right, but I never had this speed problem when the NFS server was  
running Linux on hardware that had the quarter of the CPU power and  
half the disk i/o capacity that the new Solaris-based one has.


So either Linux's NFS client was more compatible with the bugs in  
Linux's NFS server and ran peachy that way, or something's truly  
messed up with how Solaris's NFS server handles Linux NFS clients.


Mind you, all the tests I did in my previous posts were on shares  
served out of ZFS. I just lopped a fresh LUN off another Xserve RAID  
on my SAN, gave it to the NFS server and put UFS on it. Let's see if  
there's a difference when mounting that on the clients:


Linux NFS client mounting UFS-backed share:
=
[EMAIL PROTECTED]/$ mount -o nfsvers=3,rsize=32768,wsize=32768 ds2- 
private:/ufsfoo /mnt

[EMAIL PROTECTED]/$ cd /mnt
[EMAIL PROTECTED]/mnt$ time dd if=/dev/zero of=blah bs=1024k count=128128 
+0 records in

128+0 records out

real0m9.267s
user0m0.000s
sys 0m2.480s
=

Hey! look at that! 9.2 seconds in this test. The same test with the  
ZFS-backed share (see previous email in this thread) took 1m 21s to  
complete. Remember this same test that I did but with a NFSv2 mount  
and took 36 minutes to complete on the ZFS-backed share? Let's try  
that here with the UFS-based share:


=
[EMAIL PROTECTED]/$mount -o nfsvers=2,rsize=32768,wsize=32768 ds2-private:/ 
ufsfoo /mnt

[EMAIL PROTECTED]/$ cd /mnt
[EMAIL PROTECTED]/mnt$ time dd if=/dev/zero of=blah2 bs=1024k count=128128 
+0 records in

128+0 records out

real0m3.103s
user0m0.000s
sys 0m2.880s
=

Three seconds vs. 36 minutes.

Me thinks that there's something fishy here, regardless of Linux's  
reputation in the NFS world.


Don't get me wrong. I love Solaris like I love taffy (and BOY do I  
love taffy) but there seems to be some really wonky Linux-NFS- 
Solaris-ZFS interaction going on that's really killing  
performance and my finger so far points at Solaris. :/


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs. Apple XRaid

2006-07-31 Thread Dale Ghent

On Jul 31, 2006, at 8:07 PM, eric kustarz wrote:



The 2.6.x Linux client is much nicer... one thing fixed was the  
client doing too many commits (which translates to fsyncs on the  
server).  I would still recommend the Solaris client but i'm sure  
that's no surprise.  But if you'r'e stuck on Linux, upgrade to the  
latest stable 2.6.x and i'd be curious if it was better.


I'd love to be on kernel 2.6 but due to the philosophical stance  
towards OpenAFS of some people on the lkml list[1], moving to 2.6 is  
a tough call for us to do. But that's another story for another list.  
The fact is that I'm stuck on 2.4 for the time being and I'm having  
problems with a Solaris/ZFS NFS server that I'm (and Jan) are not  
having with Solaris/UFS and (in my case) Linux/XFS NFS server.


[1] https://lists.openafs.org/pipermail/openafs-devel/2006-July/ 
014041.html


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS needs a viable backup mechanism

2006-07-11 Thread Dale Ghent

On Jul 9, 2006, at 12:42 PM, Richard Elling wrote:

Ok, so I only managed data centers for 10 years.  I can count on 2  
fingers

the times this was useful to me. It is becoming less useful over time
unless your recovery disk is exactly identical to the lost disk.  This
may sound easy, but it isn't.  In the old days, Sun put specific disk
geometry information for all FRU-compatible disks, no matter who  
the supplier

was.  Since we use multiple suppliers, that enabled us to support some
products which were very sensitive to the disk geometry.  Needless to
say, this is difficult to manage over time (and expensive).  A better
approach is to eliminate the need to worry about the geometry.  ZFS is
an example of this trend.  You can now forget about those things which
were painful, such as the borkeness created when you fmthard to a
different disk :-)  Methinks you are working too hard :-)


Right. I am working too hard. It's been a pain but has shaved a lot  
of time and uncertainty off of recovering from big problems in the  
past. But up until 1.5 weeks ago ZFS in a production environ wasn't a  
reality (No, as much as I like it I'm not going to use nevada in  
production). Now ZFS is here in Solaris 10.


But you hooked into my point too much. My point was that keeping  
backups of things that normal BR systems don't touch (such as VTOCs;  
such as ZFS volume structure and settings) is part of the Plan for  
the worst,  maintain for the best ethos that I've developed over / 
my/ 10 years in data centers. This includes getting everything from  
app software to the lowest, deepest, darkest configs of RAID arrays  
(and now, ZFS) and whatnot back in place in as little time and with  
most ease as possible. I see dicking around with 'zfs create blah;zfs  
set foo=bar blah' and so on as a huge time waster when trying to  
resurrect a system from the depths of brokeness, no matter how often  
or not I'll find myself in that situation. It's slow and prone to error.



I do agree that (evil) quotas and other attributes are useful to
carry with the backup, but that is no panacea either and we'll need to
be able to overrule them.  For example, suppose I'm consolidating two
servers onto one using backups.  If I apply a new quota to an existing
file system, then I may go over quota -- resulting in even more manual
labor.


I'm talking about nothing beyond restoring a system to the state it  
was prior to a catastrophic event. I'm just talking practicality  
here, not idiosyncratics of sysadmin'ing or what's evil and what's not.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS needs a viable backup mechanism

2006-07-08 Thread Dale Ghent

On Jul 9, 2006, at 12:32 AM, Richard Elling wrote:


I'll call your bluff.  Is a zpool create any different for backup
than the original creation?  Neither ufsdump nor tar-like programs
do a mkfs or tunefs.  In those cases, the sys admin still has to
create the file system using whatever volume manager they wish.
Creating a zpool is trivial by comparison.  If you don't like it,
then modifying a zpool on the fly afterwards is also, for most
operations, quite painless.


_Huh_?

I was taking the stance that ZFS is a completely different paradigm  
than UFS and whatever volume management might be present underneath  
that and should be treated as such. I don't accept the argument that  
it wasn't in UFS, so we don't see the need for it in ZFS.


What I was getting at was for a way to dump, in a human-readable but  
machine parsable form (eg: XML) the configuration of not only a zpool  
itself, but also the volumes within it as well as the settings for  
those volumes.


Hypothetical situation:

I have all my ZFS eggs in one basket (ie, a single JBOD or RAID  
array). Said array tanks in such a way that 100% data loss is  
suffered and it and its disks must be completely replaced. The files  
in the zpool(s) present on this array have been backed up using, say,  
Legato, so I can at least get my data back with a simple restore when  
the replacement array comes online. But Legato only saw things as  
files and directories. It never knew that a particular directory was  
actually the root of a volume nested amongst other volumes.


So what of the tens or hundreds of ZFS volumes that I had that data  
sorted in and the individual (and perhaps highly varied)  
configurations of those volumes? That stuff - the metadata - sure  
wasn't saved by Legato. If I didn't manually keep notes or hadn't  
rolled my own script to save the volume configs in my own  
idiosyncratic format, I would be up the proverbial creek.


So I postulated that it would be nice if one could save a zpool's  
entire volume configuration in one easy way and restore it just as  
easily if needed.


Instead of:
1) Bring new hardware online
2) Create zpool and try one's best to recreate the previous volume  
structure and its settings (quota, compression, sharenfs, etc)

3) Restore data from traditional BR system (legato, netbackup, etc)
4) Pray I got (2) right.
5) Play config cleanup whack-a-mole as time goes on as mistakes or  
omissions are uncovered. In all likelihood it would be the users  
letting me know what I missed.


...I could instead do:
1) Bring new hardware online
2) Create zpool and then 'zfs config -f zpool-volume-config-backup.xml'
3) Restore data from wherever as in (3) above
4) Be reasonably happy knowing that the volume config is pretty close  
to what it should be, depending on how old the config dump is, of  
course. Every volume has its quotas set correctly, compression is  
turned on in the right places, the right volumes are shared along  
with their particular NFS options, and so on.


Having this feature seems like a no-brainer to me. Who cares if SVM/ 
UFS/whatever didn't have it. ZFS is different from those. This is  
another area where ZFS could thumb its nose at those relative  
dinosaurs, feature-wise, and I argue that this is an important  
feature to have.


See, you're talking with a person who saves prtvtoc output of all his  
disks so that if a disk dies, all I need to do to recreate the dead  
disk's exact slice layout on the replacement drive is to run that  
saved output through fmthard. One second on the command line rather  
than spending 10 minutes hmm'ing and haw'ing around in format. ZFS  
seems like it would be a prime candidate for this sort of thing.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS needs a viable backup mechanism

2006-07-07 Thread Dale Ghent

On Jul 7, 2006, at 1:45 PM, Bill Moore wrote:


That said, we actually did talk to a lot of customers during the
development of ZFS.  The overwhelming majority of them had a backup
scheme that did not involve ufsdump.  I know there are folks that live
and die by ufsdump, but most customers have other solutions, which
generate backups just fine.


Perhaps these dev customers needed to spend a little more time with  
ZFS, and do it in a production environ where backups and restores are  
arguably of a more urgent matter than in a test environment.


Regarding making things ZFS aware, I just had a thought off the top  
of my head; the feasibility of which I have no idea and will leave up  
to those who are in the know to decide:


ZFS we all know is just more than a dumb fs like UFS is. As  
mentioned, it has metadata in the form of volume options and whatnot.  
So, sure, I can still use my Legato/NetBackup/Amanda and friends to  
back that data up... but if the worst were to happen and I find  
myself having to restore not only data, but the volume structure of a  
pool as well, then there a huge time sink and an important one to  
avoid in a production environment. Immediately, I see quick way to  
relieve this (note I did not necessarily imply resolve this):


Add an option to zpool(1M) to dump the pool config as well as the  
configuration of the volumes within it to an XML file. This file  
could then be sucked in to zpool at a later date to recreate/ 
replicate the pool and its volume structure in one fell swoop. After  
that, Just Add Data(tm).


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Dale Ghent

Torrey McMahon wrote:

ZFS is greatfor the systems that can run it. However, any enterprise 
datacenter is going to be made up of many many hosts running many many 
OS. In that world you're going to consolidate on large arrays and use 
the features of those arrays where they cover the most ground. For 
example, if I've 100 hosts all running different OS and apps and I can 
perform my data replication and redundancy algorithms, in most cases 
Raid, in one spot then it will be much more cost efficient to do it there.


Exactly what I'm pondering.

In the near to mid term, Solaris with ZFS can be seen as sort of a 
storage virtualizer where it takes disks into ZFS pools and volumes and 
then presents them to other hosts and OSes via iSCSI, NFS, SMB and so 
on. At that point, those other OSes can enjoy the benefits of ZFS.


In the long term, it would be nice to see ZFS (or its concepts) 
integrated as the LUN provisioning and backing store mechanism on 
hardware RAID arrays themselves, supplanting the traditional RAID 
paradigms that have been in use for years.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Priorities

2006-06-23 Thread Dale Ghent

On Jun 23, 2006, at 1:09 PM, eric kustarz wrote:



How about it folks - would it be a good idea for me to explore  
what it takes to get such a bug/RFE setup implemented for the ZFS  
community on OpenSolaris.org?



what's wrong with http://bugs.opensolaris.org/bugdatabase/index.jsp  
for finding bugs?


There's a LOT of things wrong with how b.s.o is presented.

For us non-Sun people, b.s.o is a one-way ticket, and only when we're  
lucky.


First, yes, we can search on bug keywords and categories. Great. Used  
to need a Sunsolve acct for this. But once we do that, we can only  
hope that the bugs we want to read about in detail aren't comprised  
solely of See Notes and that's it. It's like seeing To be  
continued... right before the climax of a movie. Useless and  
frustrating.


Second, while there is a way for Joe Random to submit a bug, there is  
zero way for Joe Random to interact with a bug. No voting to bump or  
drop a priority, no easy way to find hot topic bugs, no way to add  
one's own notes to the issue. I guess the desperate just have to clog  
the system with new bugs and have them marked as dups or badger  
someone with a sun.com email address to do it for us.


Third, much of end-to-end bug servicing from a non-Sun perspective is  
still an uphill battle, from acronyms and terms used to policies and  
coordination of work, e.g. Is someone in Sun or elsewhere already  
working on this particular bug I'm interested in? and questions  
which would stem from that basic one.


In summary, the bug/RFE process is still a mystery after 1 year, and  
who knows if it'll stay the ginormous tease that it currently is.  
Really, it's still no better than if one had a Sunsolve account in  
years' past.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS questions

2006-06-16 Thread Dale Ghent


On Jun 16, 2006, at 11:40 PM, Richard Elling wrote:


Kimberly Chang wrote:

A couple of ZFS questions:
1. ZFS dynamic striping will automatically use new added devices  
when there are write requests. Customer has a *mostly read-only*  
application with I/O bottleneck, they wonder if there is a ZFS  
command or mechanism to enable the manual rebalancing of ZFS data  
when adding new drives to an existing pool?


cp :-)
If you copy the file then the new writes will be spread across the  
newly

added drives.  It doesn't really matter how you do the copy, though.


She raises an interesting point, though.

The concept of shifting blocks in a zpool around in the background as  
part of a scrubbing process and/or on the order of a explicit command  
to populate newly added devices seems like it could be right up ZFS's  
alley. Perhaps it could also be done with volume-level granularity.


Off the top of my head, an area where this would be useful is  
performance management - e.g. relieving load on a particular FC  
interconnect or an overburdened RAID array controller/cache thus  
allowing total no-downtime-to-cp-data-around flexibility when one is  
horizontally scaling storage performance.


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss