Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Khyron
I notice you use the word "volume" which really isn't accurate or
appropriate here.

If all of these VDEVs are part of the same pool, which as I recall you
said they are, then writes are striped across all of them (with bias for
the more empty aka less full VDEVs).

You probably want to "zfs send" the oldest dataset (ZFS terminology
for a file system) into a new dataset.  That oldest dataset was created
when there were only 2 top level VDEVs, most likely.  If you have
multiple datasets created when you had only 2 VDEVs, then send/receive
them both (in serial fashion, one after the other).  If you have room for
the snapshots too, then send all of it and then delete the source dataset
when done.  I think this will achieve what you want.

You may want to get a bit more specific and choose from the oldest
datasets THEN find the smallest of those oldest datasets and
send/receive it first.  That way, the send/receive completes in less
time, and when you delete the source dataset, you've now created
more free space on the entire pool but without the risk of a single
dataset exceeding your 10 TiB of workspace.

ZFS' copy-on-write nature really wants no less than 20% free because
you never update data in place; a new copy is always written to disk.

You might want to consider turning on compression on your new datasets
too, especially if you have free CPU cycles to spare.  I don't know how
compressible your data is, but if it's fairly compressible, say lots of
text,
then you might get some added benefit when you copy the old data into
the new datasets.  Saving more space, then deleting the source dataset,
should help your pool have more free space, and thus influence your
writes for better I/O balancing when you do the next (and the next) dataset
copies.

HTH.

On Tue, Aug 3, 2010 at 22:48, Eduardo Bragatto  wrote:

> On Aug 3, 2010, at 10:08 PM, Khyron wrote:
>
>  Long answer: Not without rewriting the previously written data.  Data
>> is being striped over all of the top level VDEVs, or at least it should
>> be.  But there is no way, at least not built into ZFS, to re-allocate the
>> storage to perform I/O balancing.  You would basically have to do
>> this manually.
>>
>> Either way, I'm guessing this isn't the answer you wanted but hey, you
>> get what you get.
>>
>
> Actually, that was the answer I was expecting, yes. The real question,
> then, is: what data should I rewrite? I want to rewrite data that's written
> on the nearly full volumes so they get spread to the volumes with more space
> available.
>
> Should I simply do a " zfs send | zfs receive" on all ZFSes I have? (we are
> talking about 400 ZFSes with about 7 snapshots each, here)... Or is there a
> way to rearrange specifically the data from the nearly full volumes?
>
>
> Thanks,
> Eduardo Bragatto
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Khyron
Short answer: No.

Long answer: Not without rewriting the previously written data.  Data
is being striped over all of the top level VDEVs, or at least it should
be.  But there is no way, at least not built into ZFS, to re-allocate the
storage to perform I/O balancing.  You would basically have to do
this manually.

Either way, I'm guessing this isn't the answer you wanted but hey, you
get what you get.

On Tue, Aug 3, 2010 at 13:52, Eduardo Bragatto  wrote:

> Hi,
>
> I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1
> volumes (of 7 x 2TB disks each):
>
> # zpool iostat -v | grep -v c4
> capacity operationsbandwidth
> pool   used  avail   read  write   read  write
>   -  -  -  -  -  -
> backup35.2T  15.3T602272  15.3M  11.1M
>  raidz1  11.6T  1.06T138 49  2.99M  2.33M
>  raidz1  11.8T   845G163 54  3.82M  2.57M
>  raidz1  6.00T  6.62T161 84  4.50M  3.16M
>  raidz1  5.88T  6.75T139 83  4.01M  3.09M
>   -  -  -  -  -  -
>
> Originally there were only the first two raidz1 volumes, and the two from
> the bottom were added later.
>
> You can notice that by the amount of used / free space. The first two
> volumes have ~11TB used and ~1TB free, while the other two have around ~6TB
> used and ~6TB free.
>
> I have hundreds of zfs'es storing backups from several servers. Each ZFS
> has about 7 snapshots of older backups.
>
> I have the impression I'm getting degradation in performance due to the
> limited space in the first two volumes, specially the second, which has only
> 845GB free.
>
> Is there any way to re-stripe the pool, so I can take advantage of all
> spindles across the raidz1 volumes? Right now it looks like the newer
> volumes are doing the heavy while the other two just hold old data.
>
> Thanks,
> Eduardo Bragatto
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup tool

2010-06-09 Thread Khyron
My inclination, based on what I've read and heard from others, is to say
"no".
But again, the best way to find out is to write the code.  :\

On Wed, Jun 9, 2010 at 11:45, Edward Ned Harvey wrote:

> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Toyama Shunji
> >
> > Certainly I feel it is difficult, but is it logically impossible to
> > write a filter program to do that, with reasonable memory use?
>
> Good question.  I don't know the answer.
>
> If somebody wanted to, would it be impossible to write a program to extract
> a single file (or subset of files) from a zfs send datastream?
>
> I don't know anything about the internal data structuring format of the zfs
> send datastream.  So I couldn't begin to answer the question.
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks

2010-06-07 Thread Khyron
It would be helpful if you posted more information about your
configuration.
Numbers *are* useful too, but minimally, describing your setup, use case,
the hardware and other such facts would provide people a place to start.

There are much brighter stars on this list than myself, but if you are
sharing
your ZFS dataset(s) via NFS with a heavy traffic load (particularly
writes),
a mirrored SLOG will probably be useful.  (The ZIL is a component of every
ZFS pool.  A SLOG is a device, usually an SSD or mirrored pair of SSDs,
on which you can locate your ZIL for enhanced *synchronous* write
performance.)  Since ZFS does sync writes, that might be a win for you, but
again it depends on a lot of factors.

Help us (or rather, the community) help you by providing real information
and data.

On Mon, Jun 7, 2010 at 19:59, besson3c  wrote:

> Hello,
>
> I'm wondering if somebody can kindly direct me to a sort of newbie way of
> assessing whether my ZFS pool performance is a bottleneck that can be
> improved upon, and/or whether I ought to invest in a SSD ZIL mirrored pair?
> I'm a little confused by what the output of iostat, fsstat, the zilstat
> script, and other diagnostic tools illuminates, and I'm definitely not
> completely confident with what I think I do understand. I'd like to sort of
> start over from square one with my understanding of all of this.
>
> So, instead of my posting a bunch of numbers, could you please help me with
> some basic tactics and techniques for making these assessments? I have some
> reason to believe that there are some performance problems, as the loads on
> the machine writing to these ZFS NFS shares can get pretty high during heavy
> writing of small files. Throw in the ZFS queue parameters in addition to all
> of these others numbers and variables and I'm a little confused as to where
> best to start. It is also a possibility that the ZFS server is not the
> bottleneck here, but I would love it if I can feel a little more confident
> in my assessments.
>
> Thanks for your help! I expect that this conversation will get pretty
> technical and that's cool (that's what I want too), but hopefully this is
> enough to get the ball rolling!
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup tool

2010-06-07 Thread Khyron
To answer the question you asked here...the answer is "no".  There have been

MANY discussions of this in the past.  Here's the lng thread I started
back
in May about backup strategies for ZFS pools and file systems:

http://mail.opensolaris.org/pipermail/zfs-discuss/2010-March/038678.html

But to do what you're talking about, no, you cannot.  There are other ways
to
accomplish that outcome and the above thread discusses many of them.  But
ZFS send/recv cannot and isn't designed to.

On Mon, Jun 7, 2010 at 10:34, Toyama Shunji  wrote:

> Can I extract one or more specific files from zfs snapshot stream?
> Without restoring full file system.
> Like ufs based 'restore' tool.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The next release

2010-04-28 Thread Khyron
Ian: Of course they expected answers to those questions here.  It seems many

people do not read the forums or mailing list archives to see their
questions
previously asked (and answered) many many times over, or the flames that
erupt from them.  It's scary how much people don't check historical records
before asking questions.  As my Trinidadian friends would say, most of these

posters are "asking answers".

Autumn (if that is your real name), the short answer to the first question
is
"it will be released when it is released".  Read the archives.

The answer to the 2nd question is "we'll have to see what Oracle does
related to the community, so continue watching their behavior", followed
with "read the archives AGAIN".

Finally, "Autumn", the answer to the question about the direction of ZFS
is...
wait for it...read the bloody archives.

All of your questions have been asked and answered MULTIPLE times in
the past by far too many people who could have taken some time to just
read for the answer instead of exhibiting poor Netiquette by asking
questions
such as these.

On Wed, Apr 28, 2010 at 19:20, Ian Collins  wrote:

> On 04/29/10 11:02 AM, autumn Wang wrote:
>
>> One quick question: When will the next formal release be released?
>>
>>
>
> Of what?
>
>
>  Does oracle have plan to support OpenSolaris community as Sun did before?
>> What is the direction of ZFS in future?
>>
>>
>
> Do you really expect answers to those question here?
>
> --
> Ian.
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread Khyron
A few things come to mind...

1. A lot better than...what?  Setting the recordsize to 4K got you some
deduplication but maybe the pertinent question is what were you
expecting?

2. Dedup is fairly new.  I haven't seen any reports of experiments like
yours so...CONGRATULATIONS!!  You're probably the first.  Or at least
the first willing to discuss it with the world as a matter of public record?

Since dedup is new, you can't expect much in the way of previous
experience with it.  I also haven't seen coordinated experiments of various
configurations with dedup off then on, for comparison.

In the end, the question is going to be whether that level of dedup is going

to be enough for you.  Is dedup even important?  Is it just a "gravy"
feature
or a key requirement?  You're in un-explored territory, it appears.

On Fri, Apr 23, 2010 at 11:41, tim Kries  wrote:

> Hi,
>
> I am playing with opensolaris a while now. Today i tried to deduplicate the
> backup VHD files Windows Server 2008 generates. I made a backup before and
> after installing AD-role and copied the files to the share on opensolaris
> (build 134). First i got a straight 1.00x, then i set recordsize to 4k (to
> be like NTFS), it jumped up to 1.29x after that. But it should be a lot
> better right?
>
> Is there something i missed?
>
> Regards
> Tim
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle to no longer support ZFS on OpenSolaris?

2010-04-20 Thread Khyron
I have no idea who you're talking to, but presumably you mean this link:

http://lists.freebsd.org/pipermail/freebsd-questions/2010-April/215269.html

Worked fine for me.  I didn't post it.  I'm not the OP on this thread or on
the FreeBSD thread.  So what "broken link" are you talking about and to whom

were you responding?

On Tue, Apr 20, 2010 at 06:58, Tonmaus  wrote:

> Why don't you just fix the apparently broken link to your source, then?
>
> Regards,
>
> Tonmaus
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle to no longer support ZFS on OpenSolaris?

2010-04-19 Thread Khyron
This is how rumors get started.

>From reading that thread, the OP didn't seem to know much of anything
about...
anything.  Even less so about Solaris and OpenSolaris.  I'd advise not to
get your
news from mailing lists, especially not mailing lists for people who don't
use the
product you're interested in.

Nothing like this has been said anywhere by anyone that even resembles or
approximates an Oracle representative.  So, yeah, ignore it, as the guy was
just asking dumb questions in a very poor manner about things he has
absolutely
no knowledge of, and adding assumptions on top of that, in his best but not
very
good English.  At least, that's my impression and opinion.

Finally, Michael S. made the best recommendation...talk to your sales rep if
you're
a paying customer.

Cheers!

On Tue, Apr 20, 2010 at 01:18, Ken Gunderson  wrote:

> Greetings All:
>
> Granted there has been much fear, uncertainty, and doubt following
> Oracle's take over of Sun, but I ran across this on a FreeBSD mailing
> list post dated 4/20/2010"
>
> "...Seems that Oracle won't offer support for ZFS on opensolaris"
>
> Link here to full post here:
>
> <
> http://lists.freebsd.org/pipermail/freebsd-questions/2010-April/215269.html
> >
>
> It seems like such would be pretty outrageous and the OP either confused
> or spreading FUD, but then on the other hand there's lot of rumors
> flying around about hidden agendas behind the 2010.03 delay, and Oracle
> being Oracle such could be within the realm of possibilities.
>
> Given Oracle's information policies we're not likely to know if such is
> indeed the case until it's a fait accompli but I nonetheless thought
> this would be the best place to inquire (or perhaps Indiana list, as I
> assume OP is referencing upcoming opensolaris.com release).
>
> Thank you and have a nice day.
>
> --
> Ken Gunderson 
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] newbie, WAS: Re: SSD best practices

2010-04-19 Thread Khyron
I would advise getting familiar with the basic terminology and vocabulary of
ZFS
first.  Start with the Solaris 10 ZFS Administration Guide.  It's a bit more
complete
for a newbie.

http://docs.sun.com/app/docs/doc/819-5461?l=en

You can then move on to the Best Practices Guide, Configuration Guide,
Troubleshooting Guide and Evil Tuning Guide on solarisinternals.com:

http://www.solarisinternals.com//wiki/index.php?title=Category:ZFS

All of the features in ZFS on Solaris 10 appear in OpenSolaris; the inverse
does
not necessarily hold true, as active development occurs on the OpenSolaris
trunk
and updates take about a year to filter back down into Solaris due to
integration
concerns, testing, etc.

A Separate Log (SLOG) device can be used for a ZIL, but they are not
necessarily
the same thing.  The ZIL always exists, and is part of the pool if you have
not
defined a SLOG device.

The zpool.cache file does not reside in the pool.  It lives in /etc/zfs in
the root
file system of your OpenSolaris system.  Thus, it does not reside "on the
ZIL
device" either, since there may not necessarily be a SLOG (what you would
term
a "ZIL device") anyway.  (There is always a ZIL, though.  See remarks
above.)

Hopefully that clears up some of the misconceptions and misunderstandings
you
have.

Cheers!

On Mon, Apr 19, 2010 at 06:52, Michael DeMan  wrote:

> Also, pardon my typos, and my lack of re-titling my subject to note that it
> is a fork from the original topic.  Corrections in text that I noticed after
> finally sorting out getting on the mailing list are below...
>
> On Apr 19, 2010, at 3:26 AM, Michael DeMan wrote:
>
> > By the way,
> >
> > I would like to chip in about how informative this thread has been, at
> least for me, despite (and actually because of) the strong opinions on some
> of the posts about the issues involved.
> >
> > From what I gather, there is still an interesting failure possibility
> with ZFS, although probably rare.  In the case where a zil (aka slog) device
> fails, AND the zpool.cache information is not available, basically folks are
> toast?
> >
> > In addition, the zpool.cache itself exhibits the following behaviors (and
> I could be totally wrong, this is why I ask):
> >
> > A.  It is not written to frequently, i.e., it is not a performance impact
> unless new zfs file systems (pardon me if I have the incorrect terminology)
> are not being fabricated and supplied to the underlying operating system.
> >
> The above 'are not being fabricated' should be 'are regularly being
> fabricated'
>
> > B.  The current implementation stores that cache file on the zil device,
> so if for some reason, that device is totally lost (along with said .cache
> file), it is nigh impossible to recover the entire pool it correlates with.
> The above, 'on the zil device', should say 'on the fundamental zfs file
> system itself, or a zil device if one is provisioned'
>
> >
> >
> > possible solutions:
> >
> > 1.  Why not have an option to mirror that darn cache file (like to the
> root file system of the boot device at least as an initial implementation)
> no matter what intent log devices are present?  Presuming that most folks at
> least want enough redundancy that their machine will boot, and if it boots -
> then they have a shot at recovery of the balance of the associated (zfs)
> directly attached storage, and with my other presumptions above, there is
> little reason do not to offer a feature like this?
> Missing final sentence: The vast amount of problems with computer and
> network reliability is typically related to human error.  The more '9s' that
> can be intrinsically provided by the systems themselves helps mitigate this.
>
> >
> >
> > Respectfully,
> > - mike
> >
> >
> > On Apr 18, 2010, at 10:10 PM, Richard Elling wrote:
> >
> >> On Apr 18, 2010, at 7:02 PM, Don wrote:
> >>
> >>> If you have a pair of heads talking to shared disks with ZFS- what can
> you do to ensure the second head always has a current copy of the
> zpool.cache file?
> >>
> >> By definition, the zpool.cache file is always up to date.
> >>
> >>> I'd prefer not to lose the ZIL, fail over, and then suddenly find out I
> can't import the pool on my second head.
> >>
> >> I'd rather not have multiple failures, either.  But the information
> needed in the
> >> zpool.cache file for reconstructing a missing (as in destroyed)
> top-level vdev is
> >> easily recovered from a backup or snapshot.
> >> -- richard
> >>
> >> ZFS storage and performance consulting at http://www.RichardElling.com
> >> ZFS training on deduplication, NexentaStor, and NAS performance
> >> Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
> >>
> >>
> >>
> >>
> >>
> >> ___
> >> zfs-discuss mailing list
> >> zfs-discuss@opensolaris.org
> >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.or

Re: [zfs-discuss] snapshots taking too much space

2010-04-13 Thread Khyron
Now is probably a good time to mention that dedupe likes LOTS of RAM, based
on
experiences described here.  8 GiB minimum is a good start.  And to avoid
those
obscenely long removal times due to updating the DDT, an SSD based L2ARC
device
seems to be highly recommended as well.

That is, of course, if the OP decides to go the dedupe route.  I get the
feeling there
is an actual solution to, or at least an intelligent reason for, for the
symptoms he's
experiencing.  I'm just not sure what either of those might be.

On Tue, Apr 13, 2010 at 03:09, Peter Tripp  wrote:

> Oops, I meant SHA256.  My mind just maps SHA->SHA1, totally forgetting that
> ZFS actually uses SHA256 (a SHA-2 variant).
>
> More on ZFS dedup, checksums and collisions:
> http://blogs.sun.com/bonwick/entry/zfs_dedup
> http://www.c0t0d0s0.org/archives/6349-Perceived-Risk.html
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Removing SSDs from pool

2010-04-05 Thread Khyron
Response below...

2010/4/5 Andreas Höschler 

> Hi Edward,
>
> thanks a lot for your detailed response!
>
>
>  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>>> boun...@opensolaris.org] On Behalf Of Andreas Höschler
>>>
>>> • I would like to remove the two SSDs as log devices from the pool and
>>> instead add them as a separate pool for sole use  by the database to
>>> see how this enhences performance. I could certainly do
>>>
>>>zpool detach tank c1t7d0
>>>
>>> to remove one disk from the log mirror. But how can I get back the
>>> second SSD?
>>>
>>
>> If you're running solaris, sorry, you can't remove the log device.  You
>> better keep your log mirrored until you can plan for destroying and
>> recreating the pool.  Actually, in your example, you don't have a mirror
>> of
>> logs.  You have two separate logs.  This is fine for opensolaris (zpool
>>
>>> =19), but not solaris (presently up to zpool 15).  If this is solaris,
>>> and
>>>
>> *either* one of those SSD's fails, then you lose your pool.
>>
>
> I run Solaris 10 (not Open Solaris)!
>
> You say the log mirror
>
>
>  pool: tank
>  state: ONLINE
>  scrub: none requested
> config:
>
>NAMESTATE READ WRITE CKSUM
>tankONLINE   0 0 0
>...
>
>logs
>  c1t6d0ONLINE   0 0 0
>  c1t7d0ONLINE   0 0 0
>
> does not do me anything good (redundancy-wise)!? Shouldn't I dettach the
> second drive then and try to use it for something else, may be another
> machine?
>
>
No, he did *not* say that a mirrored SLOG has no benefit, redundancy-wise.
He said that YOU do *not* have a mirrored SLOG.  You have 2 SLOG devices
which are striped.  And if this machine is running Solaris 10, then you
cannot
remove a log device because those updates have not made their way into
Solaris 10 yet.  You need pool version >= 19 to remove log devices, and S10
does not currently have patches to ZFS to get to a pool version >= 19.

If your SLOG above were mirrored, you'd have "mirror" under "logs".  And you

probably would have "log" not "logs" - notice the "s" at the end meaning
plural,
meaning multiple independent log devices, not a mirrored pair of logs which
would effectively look like 1 device.

-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS backup configuration

2010-03-24 Thread Khyron
Yes, I think Eric is correct.

Funny, this is an adjunct to the thread I started entitled "Thoughts on ZFS
Pool
Backup Strategies".  I was going to include this point in that thread but
thought
better of it.

It would be nice if there were an easy way to extract a pool configuration,
with
all of the dataset properties, ACLs, etc. so that you could easily reload it
into a
new pool.  I could see this being useful in a disaster recovery sense, and
I'm
sure people smarter than I can think of other uses.

>From my reading of the documentation and man pages, I don't see that any
such command currently exists.  Something that would allow you dump the
config into a file and read it back from a file using typical Unix semantics
like
STDIN/STDOUT.  I was thinking something like:

zpool dump  [-o ]

zpool load  [-f wrote:

> On Wed, Mar 24 at 12:20, Wolfraider wrote:
>
>> Sorry if this has been dicussed before. I tried searching but I
>> couldn't find any info about it. We would like to export our ZFS
>> configurations in case we need to import the pool onto another
>> box. We do not want to backup the actual data in the zfs pool, that
>> is already handled through another program.
>>
>
> I'm pretty sure the configuration is embedded in the pool itself.
> Just import on the new machine.  You may need --force/-f the pool
> wasn't exported on the old system properly.
>
> --eric
>
> --
> Eric D. Mudama
> edmud...@mail.bounceswoosh.org
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel SASUC8I - worth every penny

2010-03-23 Thread Khyron
Heh.

The original definition of "I" was inexpensive.  Was never meant to be
"independent".
Guess that changed by vendors.  The idea all along was to take inexpensive
hardware
and use software to turn it into a reliable system.

http://portal.acm.org/citation.cfm?id=50214

http://www.cs.cmu.edu/~garth/RAIDpaper/Patterson88.pdf



Regarding the 2.5" laptop drives, do the inherent error detection properties
>> of ZFS subdue any concerns over a laptop drive's higher bit error rate or
>> rated MTBF?  I've been reading about OpenSolaris and ZFS for several months
>> now and am incredibly intrigued, but have yet to implement the solution in
>> my lab.
>>
>
> Well ... the price difference means you can have mirrors of the laptop
> drives and still save money compared to the "enterprise" ones. With a modern
> patrol-reading (scrub or hardware raid) array-setup, and with some
> redundancy, you can re-implement "I" to mean "inexpensive" not "independent"
> in RAID. ;)
>
>
> //Svein
>
> --
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-19 Thread Khyron
Responses inline below...

On Sat, Mar 20, 2010 at 00:57, Edward Ned Harvey wrote:

> > 1. NDMP for putting "zfs send" streams on tape over the network.  So
>
> Tell me if I missed something here.  I don't think I did.  I think this
> sounds like crazy talk.
>
> I used NDMP up till November, when we replaced our NetApp with a Solaris
> Sun
> box.  In NDMP, to choose the source files, we had the ability to browse the
> fileserver, select files, and specify file matching patterns.  My point is:
> NDMP is file based.  It doesn't allow you to spawn a process and backup a
> data stream.
>
> Unless I missed something.  Which I doubt.  ;-)
>
>
You clearly know more about NDMP than I do.  I'm still learning.  I forgot
that you previously mentioned the file-based nature of NDMP.  I'm still
wondering about that in the longer term, but yeah, this is my mistake.  I'll

end up doing some deeper diving on this topic, I can see.  But this was
just me seeking clarity.

Maybe Fishworks appliances would benefit from the presence of NDMP
but if you're using a standard server running (Open)Solaris, it looks like a

non-starter.


>
> > To Ed Harvey:
> >
> > Some questions about your use of NetBackup on your secondary server:
> >
> > 1. Do you successfully backup ZVOLs?  We know NetBackup should be able
> > to capture datasets (ZFS file systems) using straight POSIX semantics.
>
> I wonder if I'm confused by that question.  "backup zvols" to me, would
> imply something at a lower level than the filesystem.  No, we're not doing
> that.  We just specify "backup the following directory and all of its
> subdirectories."  Just like any other typical backup tool.
>
> The reason we bought NetBackup is because it intelligently supports all the
> permissions, ACL's, weird (non-file) file types, and so on.  And it
> officially supports ZFS, and you can pay for an enterprise support
> contract.
>
> Basically, I consider the purchase cost of NetBackup to be insurance.
> Although I never plan to actually use it for anything, because all our
> bases
> are covered by "zfs send" to hard disks and tapes.  I actually trust the
> "zfs send" solution more, but I can't claim that I, or anything I've ever
> done, is 100% infallible.  So I need a commercial solution too, just so I
> can point my finger somewhere if needed.
>

Yeah, I get all the reasons you state for using NetBackup.  Makes total
sense.  And I asked this question to be clear about support for backing
up ZVOLs outside if ZFS-specific tools e.g. zfs(1M).  I didn't actually
think
NetBackup could capture ZVOLs, for the reasons you listed, but I wanted
to be absolutely clear.  Asking the wrong questions is the leading cause
of wrong answers, as a former boss of mine used to say.


>
>
> > 2. What version of NetBackup are you using?
>
> I could look it up, but I'd have to VPN in and open up a console, etc etc.
> We bought it in November, so it's whatever was current 4-5 months ago.
>
>
Ok.  Thanks.


>
> > 3. You simply run the NetBackup agent locally on the (Open)Solaris
> > server?
>
> Yup.  We're doing no rocket science with it.  Ours is the absolute most
> basic NetBackup setup you could possibly have.  We're not using 90% of the
> features of NetBackup.  It's installed on a Solaris 10 server, with locally
> attached tape library, and it does backups directly from local disk to
> local
> tape.
>
>
This is an advantage of Solaris being a 1st class citizen in the NetBackup
world.  For a Unified Storage appliance, however, NDMP for file level
backup may be a reasonable choice (as Darren postulated earlier).  But if
you just buy a server and install Solaris, then the NetBackup Solaris agent
is the easiest route, as you've shown.

Thanks again, Ed, for your time and generosity.  And thank you to all
contributors to this thread for indulging my curiosity.

-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-19 Thread Khyron
Erik,

I don't think there was any confusion about the block nature of "zfs send"
vs. the file nature of star.  I think what this discussion is coming down to
is
the best ways to utilize "zfs send" as a backup, since (as Darren Moffat has

noted) it supports all the ZFS objects and metadata.

I see 2 things coming out of this:

1. NDMP for putting "zfs send" streams on tape over the network.  So the
question I have now is for anyone who has used or is using NDMP on OSol.
How well does it work?  Pros?  Cons?  If people aren't using it, why not?  I

think this is one area where there are some gains to be made on the OSol
backup front.

I still need to go back and look at the best ways to use local tape drives
on
OSol file servers running ZFS to capture ZFS objects and metadata (ZFS
ACLs, ZVOLs, etc.).

2. A new tool is required to provide some of the functionality desired, at
least as a supported backup method from Sun.  While someone in the
community may be interested in developing such a tool, Darren also noted
that the requisite APIs are private currently and still in flux.  They
haven't
yet stabilized and been published.

To Ed Harvey:

Some questions about your use of NetBackup on your secondary server:

1. Do you successfully backup ZVOLs?  We know NetBackup should be able
to capture datasets (ZFS file systems) using straight POSIX semantics.
2. What version of NetBackup are you using?
3. You simply run the NetBackup agent locally on the (Open)Solaris server?

I thank everyone who has participated in this conversation for sharing their

thoughts, experiences and realities.  It has been most informational.

On Fri, Mar 19, 2010 at 13:11, erik.ableson  wrote:

> On 19 mars 2010, at 17:11, Joerg Schilling wrote:
>
> >> I'm curious, why isn't a 'zfs send' stream that is stored on a tape yet
> >> the implication is that a tar archive stored on a tape is considered a
> >> backup ?
> >
> > You cannot get a single file out of the zfs send datastream.
>
> zfs send is a block-level transaction with no filesystem dependencies - it
> could be transmitting a couple of blocks that represent a portion of a file,
> not necessarily an entire file.  And since it can also be used to host a
> zvol with any filesystem format imaginable it doesn't want to know.
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Usage of hot spares and hardware allocation capabilities.

2010-03-19 Thread Khyron
Responses inline...

On Tue, Mar 16, 2010 at 07:35, Robin Axelsson
wrote:

> I've been informed that newer versions of ZFS supports the usage of hot
> spares which is denoted for drives that are not in use but available for
> resynchronization/resilvering should one of the original drives fail in the
> assigned storage pool.
>

That is the definition of a hot spare, at least informally.  ZFS has
supported
this for some time (if not from the beginning; I'm not in a position to
answer
that).  It is *not* new.


>
> I'm a little sceptical about this because even the hot spare will be
> running for the same duration as the other disks in the pool and therefore
> will be exposed to the same levels of hardware degradation and failures
> unless it is put to sleep during the time it is not being used for storage.
> So, is there a sleep/hibernation/standby mode that the hot spares operate in
> or are they on all the time regardless of whether they are in use or not?
>

Not that I am aware of or have heard others report.  No such "sleep mode"
exists.  Sounds like you want a Copan storage system.  AFAIK, hot spares
are always spinning, that's why they are hot.


>
> Usually the hot spare is on a not so well-performing SAS/SATA controller,
> so given the scenario of a hard drive failure upon which a hot spare has
> been used for resilvering of say a raidz2 cluster, can I move the resilvered
> hot spare to the faster controller by letting it take the faulty hard
> drive's space using the "zpool offline", "zpool online" commands?
>

Usually?  That's not my experience, from multiple vendors hardware RAID
arrays.  Usually it's on a channel used by storage disks.  Maybe someone
else has seen otherwise.  I'd be personally curious to know what system
puts a spare on a lower performance channel.  That risks slowing the entire
device (RAID set/group) when the hot spare kicks in.

As for your questions, that doesn't make a lot of sense to me.  I don't even

get how that would work, but I'm not "Wile E. Coyote, Super Genius" either.


>
> To be more general; are the hard drives in the pool "hard coded" to their
> SAS/SATA channels or can I swap their connections arbitrarily if I would
> want to do that? Will zfs automatically identify the association of each
> drive of a given pool or tank and automatically reallocate them to put the
> pool/tank/filesystem back in place?
>

No.  Each disk in the pool has a unique ID, as I understand.  Thus, you
should be able to move a disk to another location (channel, slot) and it
would still be a part of the same pool and VDEV.

All of that said, I saw this post when it originally came in.  I notice no
one has
responded to it until now.  I don't know about anyone else, but I know that
I
was offended when I read this.  I know for myself, I wasn't sure how to take

this when I read it.

Maybe you should not assume that people on this list don't know what
hot sparing is, or that ZFS just learned.  Just a suggestion.

-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-19 Thread Khyron
The point I think Bob was making is that FireWire is an Apple technology, so

they have a vested interest in making sure it works well on their systems
and
with their OS.  They could even have a specific chipset that they
exclusively
use in their systems, although I don't see why others couldn't source it
(with
the exception that others may be too cheap to do so).  Given these factors,
it makes sense that FireWire performs brilliantly on Apple
hardware/software,
while everyone else makes the bare minimum (or less) investment in it, if
that much.  So those open drivers, while they could be useful for learning
or other purposes, may not be directly usable for the systems people are
running with OpenSolaris.

At least, that's what I think Bob meant.

On Fri, Mar 19, 2010 at 17:08, Alex Blewitt  wrote:

> On 19 Mar 2010, at 15:30, Bob Friesenhahn wrote:
>
> > On Fri, 19 Mar 2010, Khyron wrote:
> >> Getting better FireWire performance on OpenSolaris would be nice though.
> >> Darwin drivers are open...hmmm.
> >
> > OS-X is only (legally) used on Apple hardware.  Has anyone considered
> that since Firewire is important to Apple, they may have selected a
> particular Firewire chip which performs particularly well?
>
> Darwin is open-source.
>
> http://www.opensource.apple.com/source/xnu/xnu-1486.2.11/
>
> http://www.opensource.apple.com/source/IOFireWireFamily/IOFireWireFamily-417.4.0/
>
> Alex




-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-19 Thread Khyron
I'm also a Mac user.  I use Mozy instead of DropBox, but it sounds like
DropBox should get a place at the table.  I'm about to download it in a few
minutes.

I'm right now re-cloning my internal HD due to some HFS+ weirdness.  I
have to completely agree that ZFS would be a great addition to MacOS X,
and the best imaginable replacement for HFS+.  The file system and
associated problems are my only complaint with the entire OS.  I guess my
browser usage pattern is just too much for HFS+.

Of course, I'm the only person I know who said that Sun should have
bought Apple 10 years ago.  What do I know?

Getting better FireWire performance on OpenSolaris would be nice though.
Darwin drivers are open...hmmm.

On Thu, Mar 18, 2010 at 18:19, David Magda  wrote:

> On Mar 18, 2010, at 14:23, Bob Friesenhahn wrote:
>
>  On Thu, 18 Mar 2010, erik.ableson wrote:
>>
>>>
>>> Ditto on the Linux front.  I was hoping that Solaris would be the
>>> exception, but no luck.  I wonder if Apple wouldn't mind lending one of the
>>> driver engineers to OpenSolaris for a few months...
>>>
>>
>> Perhaps the issue is the filesystem rather than the drivers.  Apple users
>> have different expectations regarding data loss than Solaris and Linux users
>> do.
>>
>
> Apple users (of which I am one) expect things to Just Work. :)
>
> And there are Apple users and Apple users:
>
> http://daringfireball.net/2010/03/ode_to_diskwarrior_superduper_dropbox
>
> If anyone Apple is paying attention, perhaps you could re-open discussions
> with now-Oracle about getting ZFS into Mac OS. :)
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-19 Thread Khyron
Ahhh, this has been...interesting...some real "personalities" involved in
this
discussion.  :p  The following is long-ish but I thought a re-cap was in
order.
I'm sure we'll never finish this discussion, but I want to at least have a
new
plateau or base from which to consider these questions.

I've just read through EVERY post to this thread, so I want to recap the
best
points in the vein of the original thread, and set a new base for continuing

the conversation.  Personally, I'm less interested in the archival case;
rather,
I'm looking for the best way to either recover from a complete system
failure
or recover an individual file or file set from some backup media, most
likely
tape.

Now let's put all of this together, along with some definitions.  First, the

difference between archival storage (to tape or other) and backup.  I think
the best definition provided in this thread came from Darren Moffat as well.

As Carsten Aulbert mentioned, this discussion is fairly useless until we
start
using the same terminology to describe a set of actions.

For this discussion, I am defining archival as taking the data and placing
it
on some media - likely tape, but not necessarily - in the simplest format
possible that could hopefully be read by another device in the future.  This

could exclude capturing NTFS/NFSv4/ZFS ACLs, Solaris extended attributes,
or zpool properties (aka metadata for purposes of this discussion).  With an

archive, we may not go back and touch the data for a long time, if ever
again.

Backup, OTOH, is the act of making a perfect copy of the data to some
media (in my interest tape, but again, not necessarily) which includes all
of
the metadata associated with that data.  Such a copy would allow perfect
re-creation of the data in a new environment, recovery from a complete
system failure, or single file (or file set) recovery.  With a backup, we
have
the expectation that we may need to return to it shortly after it is
created,
so we have to be able to trust it...now.  Data restored from this backup
needs to be an exact replica of the original source - ZFS pool and dataset
properties, extended attributes, and ZFS ACLs included.

Now that I hopefully have common definitions for this conversation (and
I hope I captured Darren's meaning accurately), I'll divide this into 2
sections,
starting with NDMP.

NDMP:

For those who are unaware (and to clarify my own understanding), I'll take
a moment to describe NDMP.  NDMP was invented by NetApp to allow direct
backup of their Filers to tape backup servers, and eventually onto tape.  It

is designed to remove the need for indirect backup by backing up the NFS
or CIFS shared file systems on the clients.  Instead, we backup the shared
file systems directly from the Filer (or other file server - say Fishworks
box
or OpenSolaris server) to the backup server via the network.  We avoid
multiple copies of the shared file systems.  NDMP is a network-based
delivery mechanism to get data from a storage server to a backup server,
which is why the backup software must also speak NDMP.  Hopefully, my
description is mostly accurate, and it is clear why this might be useful for

people using (Open)Solaris + ZFS for tape backup or archival purposes.

Darren Moffat made the point that NDMP could be used to do the tape
splitting, but I'm not sure this is accurate.  If "zfs send" from a file
server
running (Open)Solaris to a tape drive over NDMP  is viable -- which it
appears to be to me -- then the tape splitting would be handled by the
tape backup application.  In my world, that's typically NetBackup or some
similar enterprise offering.  I see no reason why it couldn't be Amanda or
Bacula or Arkeia or something else.  THIS is why I am looking for faster
progress on NDMP.

Now, NDMP doesn't do you much good for a locally attached tape drive,
as Darren and Svein pointed out.  However, provided the software which is
installed on this fictional server can talk to the tape in an appropriate
way,
then all you have to do is pipe "zfs send" into it.  Right?  What did I
miss?

ZVOLs and NTFS/NFSv4/ZFS ACLs:

The answer is "zfs send" to both of my questions about ZVOLs and ACLs.

At the center of all of this attention is "zfs send".  As Darren Moffat
pointed
out, it has all the pieces to do a proper, complete and correct backup.  The

big remaining issue that I see is how do you place a "zfs send" stream on a
tape in a reliable fashion.  CR 6936195 would seem to handle one complaint
from Svein, Miles Nordin and others about reliability of the send stream on
the tape.  Again, I think NDMP may help answer this question for file
servers without attached tape devices.  For those with attached tape
devices,
what's the equivalent answer?  Who is doing this, and how?  I believe we've
seen Ed Harvey say "NetBackup" and Ian Collins say "NetVault".  Do these
products capture all the metadata required to call this copy a "backup"?
That's my next question.

Finally, Damon Atkins said:

"But 

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Khyron
Ian,

When you say you spool to tape for off-site archival, what software do you
use?

On Wed, Mar 17, 2010 at 18:53, Ian Collins  wrote:




>
> I have been using a two stage backup process with my main client,
> send/receive to a backup pool and spool to tape for off site archival.
>
> I use a pair (on connected, one off site) of removable drives as single
> volume pools for my own backups via send/receive.
>
> --
> Ian.
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-17 Thread Khyron
For those following along, this is the e-mail I meant to send to the list
but
instead sent directly to Tonmaus.  My mistake, and I apologize for having to

re-send.

=== Start ===

My understanding, limited though it may be, is that a scrub touches ALL data
that
has been written, including the parity data.  It confirms the validity of
every bit that
has been written to the array.  Now, there may be an implementation detail
that is
responsible for the pathology that you observed.  More than likely, I'd
imagine.  Filing
a bug may be in order.  Since triple parity RAIDZ exists now, you may want
to test
with that by grabbing a LiveCD or LiveUSB image from genunix.org.  Maybe
RAIDZ3
has the same (or worse) problems?

As for "scrub management", I pointed out the specific responses from Richard
where
he noted that scrub I/O priority *can* be tuned.  How you do that, I'm not
sure.
Richard, how does one tune scrub I/O priority?  Other than that, as I said,
I don't
think there is a model (publicly available anyway) describing scrub behavior
and how it
scales with pool size (< 5 TB, 5 TB - 50 TB, > 50 TB, etc.) or data layout
(mirror vs.
RAIDZ vs. RAIDZ2).  ZFS is really that new, that all of this needs to be
reconsidered
and modeled.  Maybe this is something you can contribute to the community?
ZFS
is a new storage system, not the same old file systems whose behaviors and
quirks
are well known because of 20+ years of history.  We're all writing a new
chapter in
data storage here, so it is incumbent upon us to share knowledge in order to
answer
these types of questions.

I think the questions I raised in my longer response are also valid and need
to be
re-considered.  There are large pools in production today.  So how are
people
scrubbing these pools?  Please post your experiences with scrubbing 100+ TB
pools.

Tonmaus, maybe you should repost my other questions in a new, separate
thread?

=== End ===

On Tue, Mar 16, 2010 at 19:41, Tonmaus  wrote:

> > Are you sure that you didn't also enable
> > something which
> > does consume lots of CPU such as enabling some sort
> > of compression,
> > sha256 checksums, or deduplication?
>
> None of them is active on that pool or in any existing file system. Maybe
> the issue is particular to RAIDZ2, which is comparably recent. On that
> occasion: does anybody know if ZFS reads all parities during a scrub?
> Wouldn't it be sufficient for stale corruption detection to read only one
> parity set unless an error occurs there?
>
> > The main concern that one should have is I/O
> > bandwidth rather than CPU
> > consumption since "software" based RAID must handle
> > the work using the
> > system's CPU rather than expecting it to be done by
> > some other CPU.
> > There are more I/Os and (in the case of mirroring)
> > more data
> > transferred.
>
> What I am trying to say is that CPU may become the bottleneck for I/O in
> case of parity-secured stripe sets. Mirrors and simple stripe sets have
> almost 0 impact on CPU. So far at least my observations. Moreover, x86
> processors not optimized for that kind of work as much as i.e. an Areca
> controller with a dedicated XOR chip is, in its targeted field.
>
> Regards,
>
> Tonmaus
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-17 Thread Khyron
Ugh!  I meant that to go to the list, so I'll probably re-send it for the
benefit
of everyone involved in the discussion.  There were parts of that that I
wanted
others to read.

>From a re-read of Richard's e-mail, maybe he meant that the number of I/Os
queued to a device can be tuned lower and not the priority of the scrub (as
I took him to mean).  Hopefully Richard can clear that up.  I personally
stand
corrected for mis-reading Richard there.

Of course the performance of a given system cannot be described until it is
built.  Again, my interpretation of your e-mail was that you were looking
for
a model for the performance of concurrent scrub and I/O load of a RAIDZ2
VDEV that you could scale up from your "test" environment of 11 disks to a
200+ TB behemoth.  As I mentioned several times, I doubt such a model
exists, and I have not seen anything published to that effect.  I don't know

how useful it would be if it did exist because the performance of your disks

would be a critical factor.  (Although *any* model beats no model any day.)
Let's just face it.  You're using a new storage system that has not been
modeled.  To get the model you seek, you will probably have to create it
yourself.

(It's notable that most of the ZFS models that I have seen have been done
by Richard.  Of course, they were MTTDL models, not scrub vs. I/O
performance models for different VDEV types.)

As for your point about building large pools from lots of mirror VDEVs, my
response is "meh".  I've said several times, and maybe you've missed it
several times, that there may be pathologies for which YOU should open
bugs.  RAIDZ3 may exhibit the same kind of pathologies you observed with
RAIDZ2.  Apparently RAIDZ does not.  I've also noticed (and I'm sure I'll
be corrected if I'm mistaken) that there is not a limit on the number of
VDEVs in a pool but single digit RAID VDEVs are recommended.  So there
is nothing preventing you from building (for example) VDEVs from 1 TB
disks.  If you take 9 x 1 TB disks per VDEV, and use RAIDZ2, you get 7 TB
usable.  That means about 29 VDEVs to get 200 TB.  Double the disk
capacity and you can probably get to 15 top level VDEVs.  (And you'll want
that RAIDZ2 as well since I don't know if you could trust that many disks,
whether enterprise or consumer.)  However, that number of top level VDEVs
sounds reasonable based on what others have reported.  What's been
proven to be "A Bad Idea(TM)" is putting lots of disks in a single VDEV.

Remember that ZFS is a *new* software system.  It is complex.  It will have
bugs.  You have chosen ZFS; it didn't choose you.  So I'd say you can
contribute to the community by reporting back your experiences, opening
bugs on things which make sense to open bugs on, testing configurations,
modeling, documenting and sharing.  So far, you just seem to be interested
in taking w/o so much as an offer of helping the community or developers to
understand what works and what doesn't.  All take and no give is not cool.
And if you don't like ZFS, then choose something else.  I'm sure EMC or
NetApp will willingly sell you all the spindles you want.  However, I think
it is
still early to write off ZFS as a losing proposition, but that's my opinion.

So far, you seem to be spending a lot of time complaining about a *new*
software system that you're not paying for.  That's pretty tasteless, IMO.

And now I'll re-send that e-mail...

P.S.: Did you remember to re-read this e-mail?  Read it 2 or 3 times and be
clear about what I said and what I did _not_ say.

On Wed, Mar 17, 2010 at 16:12, Tonmaus  wrote:

> Hi,
>
> I got a message from you off-list that doesn't show up in the thread even
> after hours. As you mentioned the aspect here as well I'd like to respond
> to, I'll do it from here:
>
> > Third, as for ZFS scrub prioritization, Richard
> > answered your question about that.  He said it is
> > low priority and can be tuned lower.  However, he was
> > answering within the context of an 11 disk RAIDZ2
> > with slow disks  His exact words were:
> >
> >
> > This could be tuned lower, but your storage
> > is slow and *any* I/O activity will be
> > noticed.
>
> Richard told us two times that scrub already is as low in priority as can
> be. From another message:
>
> "Scrub is already the lowest priority. Would you like it to be lower?"
>
>
> =
>
> As much as the comparison goes between "slow" and "fast" storage. I have
> understood that Richard's message was that with storage providing better
> random I/O zfs priority scheduling will perform significantly better,
> providing less degradation of concurrent load. While I am even inclined to
> buy that, nobody will be able to tell me how a certain system will behave
> until it was tested, and to what degree concurrent scrubbing still will be
> possible.
> Another thing: people are talking a lot about narrow vdevs and mirrors.
> However, when you need to build a 2

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Khyron
To be sure, Ed, I'm not asking:

Why bother trying to backup with "zfs send" when there are fully supportable
and
working options available right NOW?

Rather, I am asking:

Why do we want to adapt "zfs send" to do something it was never intended
to do, and probably won't be adapted to do (well, if at all) anytime soon
instead of
optimizing existing technologies for this use case?


But I got it.  "zfs send" is fast.  Let me ask you this, Ed...where do you
"zfs send"
your data to? Another pool?  Does it go to tape eventually?  If so, what is
the setup
such that it goes to tape?  I apologize for asking here, as I'm sure you
described it
in one of the other threads I mentioned, but I'm not able to go digging in
those
threads at the moment.

I ask this because I see an opportunity to kill 2 birds with one stone.
With proper
NDMP support and "zfs send" performance, why can't you get the advantages of

"zfs send" without trying to shoehorn "zfs send" into a use it's not
designed for?

Maybe NDMP support needs to be a higher focus of the ZFS team?  I noticed
not
many people even seem to be asking for it, never mind screaming for it.
However,
I did say this in my original e-mail - that I see NDMP support as being a
way to handle
the calls for "zfs send" to tape.

Maybe we can broaden the conversation at this point.  For all of those who
use
NDMP today to backup Filers, be they NetApp, EMC, or other vendors'
devices...how
is your experience with NDMP?  *IS* anyone using NDMP?  If you have the
option of
using NDMP and you don't, why don't you?  Backing up file servers directly
to tape
seems to be an obvious WIN, so if people aren't doing it, I'm curious why
they aren't.
That's any kind of file server, because (Open)Solaris will increasingly be
applied in this
role.  That was pretty much the goal of the Fishworks team, IIRC.  So this
looks like
an opportunity by Sun (Oracle) to take a neglected backup technology and
make it a
must-have backup technology, by making it integrate smoothly with ZFS and
high
performance.

On Wed, Mar 17, 2010 at 09:37, Edward Ned Harvey wrote:

> > The one thing that I keep thinking, and which I have yet to see
> > discredited, is that
> > ZFS file systems use POSIX semantics.  So, unless you are using
> > specific features
> > (notably ACLs, as Paul Henson is), you should be able to backup those
> > file systems
> > using well known tools.
>
> This is correct.  Many people do backup using tar, star, rsync, etc.
>
>
> > The Best Practices Guide is also very clear about send and receive NOT
> > being
> > designed explicitly for backup purposes.  I find it odd that so many
> > people seem to
> > want to force this point.  ZFS appears to have been designed to allow
> > the use of
> > well known tools that are available today to perform backups and
> > restores.  I'm not
> > sure how many people are actually using NFS v4 style ACLs, but those
> > people have
> > the most to worry about when it comes to using tar or NetBackup or
> > Networker or
> > Amanda or Bacula or star to backup ZFS file systems.  Everyone else,
> > which appears
> > to be the majority of people, have many tools to choose from, tools
> > they've used
> > for a long time in various environments on various platforms.  The
> > learning curve
> > doesn't appear to be as steep as most people seem to make it out to
> > be.  I honestly
> > think many people may be making this issue more complex than it needs
> > to be.
>
> I think what you're saying is:  Why bother trying to backup with "zfs send"
> when the recommended practice, fully supportable, is to use other tools for
> backup, such as tar, star, Amanda, bacula, etc.   Right?
>
> The answer to this is very simple.
> #1  "zfs send" is much faster.  Particularly for incrementals on large
> numbers of files.
> #2  "zfs send" will support every feature of the filesystem, including
> things like filesystem properties, hard links, symlinks, and objects which
> are not files, such as character special objects, fifo pipes, and so on.
> Not to mention ACL's.  If you're considering some other tool (rsync, star,
> etc), you have to read the man pages very carefully to formulate the exact
> backup command, and there's no guarantee you'll find a perfect backup
> command.  There is a certain amount of comfort knowing that the people who
> wrote "zfs send" are the same people who wrote the filesystem.  It's
> simple,
> and with no arguments, and no messing around with man page research, it's
> guaranteed to make a perfect copy of the whole filesystem.
>
> Did I mention fast?  ;-)  Prior to zfs, I backed up my file server via
> rsync.  It's 1TB of mostly tiny files, and it ran for 10 hours every night,
> plus 30 hours every weekend.  Now, I use zfs send, and it runs for an
> average 7 minutes every night, depending on how much data changed that day,
> and I don't know - 20 hours I guess - every month.
>
>


-- 
"You can choose your friends, you can choose the deals." - Equity Private

"I

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Khyron
Exactly!

This is what I meant, at least when it comes to backing up ZFS datasets.
There
are tools available NOW, such as Star, which will backup ZFS datasets due to
the
POSIX nature of those datasets.  As well, Amanda, Bacula, NetBackup,
Networker
and probably some others I missed.  Re-inventing the wheel is not required
in these
cases.

As I said in my original e-mail, Star is probably perfect once it gets ZFS
(e.g. NFS v4)
ACL and NDMP support (e.g. accepting NDMP input streams and ouputting onto
tape).

ZVOLs are the piece I'm still not sure about though.  So I repeat my
question: how
are people backing up ZVOLs today?  (If Star could do ZVOLs as well as NDMP
and
ZFS ACLs, then it literally *is* perfect.)

On Wed, Mar 17, 2010 at 09:01, Joerg Schilling <
joerg.schill...@fokus.fraunhofer.de> wrote:

> Stephen Bunn  wrote:
>
> > between our machine's pools and our backup server pool.  It would be
> > nice, however, if some sort of enterprise level backup solution in the
> > style of ufsdump was introduced to ZFS.
>
> Star can do the same as ufsdump does but independent of OS and filesystem.
>
> Star is currently missing support for ZFS ACLs and  for extended attributes
> from Solaris. If you are interested, make a test. If you need support for
> ZFS
> ACLs or Solaris extendd attributes, send me a note.
>
> Jörg
>
> --
>  
> EMail:jo...@schily.isdn.cs.tu-berlin.de(home)
>  Jörg Schilling D-13353 Berlin
>   j...@cs.tu-berlin.de(uni)
>   joerg.schill...@fokus.fraunhofer.de (work) Blog:
> http://schily.blogspot.com/
>  URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Khyron
Note to readers: There are multiple topics discussed herein.  Please
identify which
idea(s) you are responding to, should you respond.  Also make sure to take
in all of
this before responding.  Something you want to discuss may already be
covered at
a later point in this e-mail, including NDMP and ZFS ACLs.  It's lng.

It seems to me that something is being overlooked (either by myself or
others) in all
of these discussions about backing up ZFS pools...

The one thing that I keep thinking, and which I have yet to see discredited,
is that
ZFS file systems use POSIX semantics.  So, unless you are using specific
features
(notably ACLs, as Paul Henson is), you should be able to backup those file
systems
using well known tools.  The ZFS Best Practices Guide speaks to this in
section 4.4
(specifically 4.4.3[1]) and there have been various posters who have spoken
of using
other tools.  (Star comes to mind, most prominently.)

The Best Practices Guide is also very clear about send and receive NOT being

designed explicitly for backup purposes.  I find it odd that so many people
seem to
want to force this point.  ZFS appears to have been designed to allow the
use of
well known tools that are available today to perform backups and restores.
I'm not
sure how many people are actually using NFS v4 style ACLs, but those people
have
the most to worry about when it comes to using tar or NetBackup or Networker
or
Amanda or Bacula or star to backup ZFS file systems.  Everyone else, which
appears
to be the majority of people, have many tools to choose from, tools they've
used
for a long time in various environments on various platforms.  The learning
curve
doesn't appear to be as steep as most people seem to make it out to be.  I
honestly
think many people may be making this issue more complex than it needs to be.

Maybe the people having the most problems are those who are new to Solaris,
but
if you have any real *nix experience, Solaris shouldn't be that difficult to
figure out,
especially for those with System V experience.  The Linux folks?  Well, I
sorta feel
sorry for you and I sorta don't.

So, am I missing something?  It wouldn't surprise me if I am.  What am I
missing?

The other things I have been thinking about are NDMP support and what tools
out
there support NFS v4 ACLs.

Has anyone successfully used NDMP support with ZFS?  If so, what did you
do?  How
did you configure your system, including any custom coding you did?  From
the looks
of the NDMP project on os.org, NDMP was integrated in build 102[3] but it
appears
to only be NDMP v4 not the latest, v5.  Maybe NDMP support would placate
some of
those screaming for the send stream to be a tape backup format?

As for ACLs[2], the list of tools supporting NFS v4 ACLs seems to be pretty
small.  I
plan to spend some quality time with RFC 3530 to get my head around NFS v4,
and
ACLs in particular.  star seems to be fairly adept, with the exception of
the NFS v4
ACL support.  Hopefully that is forthcoming?  Again, I think those people
who are
not using ZFS ACLs can probably perform actual tape backups (should they
choose
to) with existing tools.  If I'm mistaken or missing something, I invite
someone to
please point it out.

Finally, there's backup of ZVOLs.  I don't know what the commercial tool
support
for backing up ZVOLs looks like but I know this is the *perfect* place for
NDMP.
Backing up ZVOLs should be priority #1 for NDMP support in (Open)Solaris, I
think.
Looking through the symbols in libzfs.so, I don't see anything specifically
related to
backup of ZVOLs in the existing code.  How are people handling ZVOL backups
today?

Not to be too flip, but star looks like it might be the perfect tape backup
software
if it supported NDMP, NFS v4 ACLs and ZVOLs.  Just thinking out loud...

[1]
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Using_ZFS_With_Enterprise_Backup_Solutions

[2] http://docs.sun.com/app/docs/doc/819-5461/ftyxi?l=en&a=view

[3] http://hub.opensolaris.org/bin/view/Project+ndmp/

Aside: I see so many posts to this list about backup strategy for ZFS file
systems,
and I continue to be amazed by how few people check the archives for
previous
discussions before they start a new one.  So many of the conversations are
repeated over and over, with good information being spread over multiple
threads?
I personally find it interesting that so few people read first before
posting.  Few
even seem to bother to do so much (little?) as a Google search which would
yield
several previous discussions on the topic of ZFS pool backups to tape.

Oh well.

-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-16 Thread Khyron
The issue as presented by Tonmaus was that a scrub was negatively impacting
his RAIDZ2 CIFS performance, but he didn't see the same impact with RAIDZ.
I'm not going to say whether that is a "problem" one way or the other; it
may
be expected behavior under the circumstances.  That's for ZFS developers to
speak on.  (This was one of many issues Tonmaus mentioned.)

However, what was lost was the context.  Tonmaus reported this behavior on
a commodity server using slow disks in an 11 disk RAIDZ2 set.  However, he
*really* wants to know if this will be an issue on a 100+ TB pool.  So his
examples were given on a pool that was possibly 5% of the size the pool that

he actually wants to deploy.  He never said any of this in the original
e-mail, so
Richard assumed the context to be the smaller system.  That's why I pointed
out all of the discrepancies and questions he could/should have asked which
might have yielded more useful answers.

There's quite a difference between the 11 disk RAIDZ2 set and a 100+ TB ZFS
pool, especially when the use case, VDEV layout and other design aspects of
the 100+ TB pool have not been described.

On Tue, Mar 16, 2010 at 13:41, David Dyer-Bennet  wrote:

>
> On Tue, March 16, 2010 11:53, thomas wrote:
> > Even if it might not be the best technical solution, I think what a lot
> of
> > people are looking for when this comes up is a knob they can use to say
> "I
> > only want X IOPS per vdev" (in addition to low prioritization) to be used
> > while scrubbing. Doing so probably helps them feel more at ease that they
> > have some excess capacity on cpu and vdev if production traffic should
> > come along.
> >
> > That's probably a false sense of moderating resource usage when the
> > current "full speed, but lowest prioritization" is just as good and would
> > finish quicker.. but, it gives them peace of mind?
>
> I may have been reading too quickly, but I have the impression that at
> least some of the people not happy with the current prioritization were
> reporting severe impacts to non-scrub performance when a scrub was in
> progress.  If that's the case, then they have a real problem, they're not
> just looking for more peace of mind in a hypothetical situation.
> --
> David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
> Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
> Photos: http://dd-b.net/photography/gallery/
> Dragaera: http://dragaera.info
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-16 Thread Khyron
In following this discussion, I get the feeling that you and Richard are
somewhat
talking past each other.  He asked you about the hardware you are currently
running
on, whereas you seem to be interested in a model for the impact of scrubbing
on
I/O throughput that you can apply to some not-yet-acquired hardware.

It should be clear by now that the model you are looking for does not exist
given
how new ZFS is, and Richard has been focusing his comments on your existing
(home)
configuration since that is what you provided specs for.

Since you haven't provided specs for this larger system you may be
purchasing in the
future, I don't think anyone can give you specific guidance on what the I/O
impact of
scrubs on your configuration will be.  Richard seems to be giving more
design guidelines
and hints, and just generally good to know information to keep in mind while
designing
your solution.  Of course, he's been giving it in the context of your 11
disk wide
RAIDZ2 and not the 200 TB monster you only described in the last e-mail.

Stepping back, it may be worthwhile to examine the advice Richard has given,
in the
context of the larger configuration.

First, you won't be using commodity hardware for your enterprise-class
storage system,
will you?

Second, I would imagine that as a matter of practice, most people schedule
their pools
to scrub as far away from prime hours as possible.  Maybe it's possible, and
maybe it's
not.  The question to the larger community should be "who is running a 100+
TB pool
and how have you configured your scrubs?"  Or even "for those running 100+
TB pools,
do your scrubs interfere with your production traffic/throughput?  If so,
how do you
compensate for this?"

Third, as for ZFS scrub prioritization, Richard answered your question about
that.  He
said it is low priority and can be tuned lower.  However, he was answering
within the
context of an 11 disk RAIDZ2 with slow disks  His exact words were:

"This could be tuned lower, but your storage is slow and *any* I/O activity
will be
noticed."

If you had asked about a 200 TB enterprise-class pool, he may have had a
different
response.  I don't know if ZFS will make different decisisons regarding I/O
priority on
commodity hardware as opposed to enterprise hardware, but I imagine it does
*not*.
If I'm mistaken, someone should correct me.  Richard also said "In b133, the
priority
scheduler will work better than on older releases."  That may not be an
issue since
you haven't acquired your hardware YET, but again, Richard didn't know that
you
were talking about a 200 TB behemoth because you never said that.

Fourth, Richard mentioned a wide RAIDZ2 set.  Hopefully, if nothing else,
we've
seen that designing larger ZFS storage systems with pools composed of
smaller top
level VDEVs works better, and preferably mirrored top level VDEVs in the
case of lots
of small, random reads.  You didn't indicate the profile of the data to be
stored on
your system, so no one can realistically speak to that.  I think the general
guidance
is sound.  Multiple top level VDEVs, preferably mirrors.  If you're creating
RAIDZ2
top level VDEVs, then they should be smaller (narrower) in terms of the
number of
disks in the set.  11 would be too many, based on what I have seen and heard
on
this list cross referenced with the (little) information you have provided.

RAIDZ2 would appear to require more CPU power that RAIDZ, based on the
report
you gave and thus may have less negative impact on the performance of your
storage
system.  I'll cop to that.  However, you never mentioned how your 200 TB
behemoth
system will be used, besides an off-hand remark about CIFS.  Will it be
serving CIFS?
NFS?  Raw ZVOLs over iSCSI?  You never mentioned any of that.  Asking about
CIFS
if you're not going to serve CIFS doesn't make much sense.  That would
appear to
be another question for the ZFS gurus here -- WHY does RAIDZ2 cause so much
negative performance impact on your CIFS service while RAIDZ does not?  Your

experience is that a scrub of a RAIDZ2 maxed CPU while a RAIDZ scrub did
not, right?

Fifth, the pool scrub should probably be as far away from peak usage times
as possible.
That may or may not be feasible, but I don't think anyone would disagree
with that
advice.  Again, I know there are people running large pools who perform
scrubs.  It
might be worthwhile to directly ask what these people have experienced in
terms of
scrub performance on RAIDZ vs. RAIDZ2, or in general.

Finally, I would also note that Richard has been very responsive to your
questions (in
his own way) but you increasingly seem to be hostile and even disrespectful
toward
him.  (I've noticed this in more then one of your e-mails; they sound
progressively
more self-centered and selfish.  That's just my opinion.)  If this is a
community, that's
not a helpful way to treat a senior member of the community, even if he's
not
answering the question you want answered.

Keep in mind that asking the wrong questions 

Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-15 Thread Khyron
Yeah, this threw me.  A 3 disk RAID-Z2 doesn't make sense, because at a
redundancy level, RAID-Z2 looks like RAID 6.  That is, there are 2 levels of

parity for the data.  Out of 3 disks, the equivalent of 2 disks will be used
to
store redundancy (parity) data and only 1 disk equivalent will store actual
data.  This is what others might term a "degenerate case of 3-way
mirroring",
except with a lot more computational overhead since we're performing 2
parity calculations.

I'm curious what the purpose of creating a 3 disk RAID-Z2 pool is/was?
(For my own personal edification.  Maybe there is something for me to learn
from this example.)

Aside: Does ZFS actually create the pool as a 3-way mirror, given that this
configuration is effectively the same?  This is a question for any of the
ZFS
team who may be reading but I'm curious now.

On Mon, Mar 15, 2010 at 10:38, Michael Hassey  wrote:

> Sorry if this is too basic -
>
> So I have a single zpool in addition to the rpool, called xpool.
>
> NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
> rpool   136G   109G  27.5G79%  ONLINE  -
> xpool   408G   171G   237G42%  ONLINE  -
>
> I have 408 in the pool, am using 171 leaving me 237 GB.
>
> The pool is built up as;
>
>  pool: xpool
>  state: ONLINE
>  scrub: none requested
> config:
>
>NAMESTATE READ WRITE CKSUM
>xpool   ONLINE   0 0 0
>  raidz2ONLINE   0 0 0
>c8t1d0  ONLINE   0 0 0
>c8t2d0  ONLINE   0 0 0
>c8t3d0  ONLINE   0 0 0
>
> errors: No known data errors
>
>
> But - and here is the question -
>
> Creating file systems on it, and the file systems in play report only 76GB
> of space free
>
> <<<>>
>
> xpool/zones/logserver/ROOT/zbe 975M  76.4G   975M  legacy
> xpool/zones/openxsrvr 2.22G  76.4G  21.9K
>  /export/zones/openxsrvr
> xpool/zones/openxsrvr/ROOT2.22G  76.4G  18.9K  legacy
> xpool/zones/openxsrvr/ROOT/zbe2.22G  76.4G  2.22G  legacy
> xpool/zones/puggles241M  76.4G  21.9K
>  /export/zones/puggles
> xpool/zones/puggles/ROOT   241M  76.4G  18.9K  legacy
> xpool/zones/puggles/ROOT/zbe   241M  76.4G   241M  legacy
> xpool/zones/reposerver 299M  76.4G  21.9K
>  /export/zones/reposerver
>
>
> So my question is, where is the space from xpool being used? or is it?
>
>
> Thanks for reading.
>
> Mike.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Disk label reference info

2010-03-14 Thread Khyron
I thought pointing out some of this information might come in handy for some
of the
folks who are new to the (Open)Solaris world.

The following section discusses differences between SMI labels (aka VTOC)
and EFI
GPT labels.  It may not be everything one needs to know in order to
successfully
manage disks with ZFS, it contains a fair amount.  I definitely recommend
reading for
anyone new to ZFS and/or (Open)Solaris.  SMI labels are not evil.  Dated,
probably,
but they are not as difficult to understand or use as people seem to want to
make
them out to be.

See the "About Disk Labels" and "About Disk Slices" sections, in
particular.  The
"Comparison of the EFI Label and the VTOC Label" might also be helpful.

http://docs.sun.com/app/docs/doc/817-5093/disksconcepts-1?a=view

For tasks and concepts around administering disks, see:

http://docs.sun.com/app/docs/doc/817-5093/disksprep-31030?a=view

Good adjunct information to the first link.

Finally, the Wikipedia entry for the EFI GPT disk label:

http://en.wikipedia.org/wiki/GUID_Partition_Table

Hopefully this will be useful to someone.  I apologize if others feel this
is not the
place for this, but I so often see questions about disk labeling,
partitioning, and
other associated topics.  It seemed like it might be helpful to some
people.  I
promise that my next post will deal with a very ZFS specific topic.

Cheers!

-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] getting drive serial number

2010-03-08 Thread Khyron
I'm imagining that OpenSolaris isn't *too* different from Solaris 10 in this

regard.

I believe Richard Elling recommended "cfgadm -v".  I'd also suggest
"iostat -E", with and without "-n" for good measure.

So that's "iostat -E" and "iostat -En".  As long as you know the physical
drive
specification for the drive (ctd which appears to be c9t1d0 from
the other e-mail you sent), "iostat -E" has never failed me.  If you need to

know the drive identifier, then that's an additional issue.

On Sun, Mar 7, 2010 at 13:30, Ethan  wrote:

> I have a failing drive, and no way to correlate the device with errors in
> the zpool status with an actual physical drive.
> If I could get the device's serial number, I could use that as it's printed
> on the drive.
> I come from linux, so I tried dmesg, as that's what's familiar (I see that
> the man page for dmesg on opensolaris says that I should be using syslogd
> but I haven't been able to figure out how to get the same output from
> syslogd). But, while I see at the top the serial numbers for some other
> drives, I don't see the one I want because it seems to be scrolled off the
> top.
> Can anyone tell me how to get the serial number of my failing drive? Or
> some other way to correlate the device with the physical drive?
>
> -Ethan
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>


-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-17 Thread Khyron
Ugh!  If you received a direct response to me instead of via the list,
apologies for
that.


Rob:

I'm just reporting the news.  The RFE is out there.  Just like SLOGs, I
happen to
think it a good idea, personally, but that's my personal opinion.  If it
makes dedup
more usable, I don't see the harm.


Taemun:

The issue, as I understand it, is not "use-lots-of-cpu" or "just dies from
paging".  I
believe it is more to do with all of the small, random reads/writes in
updating the
DDT.

Remember, the DDT is stored within the pool, just as the ZIL is if you don't
have
a SLOG.  (The S in SLOG standing for "separate".)  So all the DDT updates
are in
competition for I/O with the actual data deletion.  If the DDT could be
stored as
a separate VDEV already, I'm sure a way would have been hacked together by
someone (likely someone on this list).  Hence, the need for the RFE to
create this
functionality where it does not currently exist.  The DDT is separate from
the ARC
or L2ARC.

Here's the bug:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913566

If I'm incorrect, someone please let me know.


Markus:

Yes, the issue would appear to be dataset size vs. RAM size.  Sounds like an
area
ripe for testing, much like RAID Z3 performance.


Cheers all!

On Tue, Feb 16, 2010 at 00:20, taemun  wrote:

> The system in question has 8GB of ram. It never paged during the
> import (unless I was asleep at that point, but anyway).
>
> It ran for 52 hours, then started doing 47% kernel cpu usage. At this
> stage, dtrace stopped responding, and so iopattern died, as did
> iostat. It was also increasing ram usage rapidly (15mb / minute).
> After an hour of that, the cpu went up to 76%. An hour later, CPU
> usage stopped. Hard drives were churning throughout all of this
> (albeit at a rate that looks like each vdev is being controller by a
> single threaded operation).
>
> I'm guessing that if you don't have enough ram, it gets stuck on the
> use-lots-of-cpu phase, and just dies from too much paging. Of course,
> I have absolutely nothing to back that up.
>
> Personally, I think that if L2ARC devices were persistent, we already
> have the mechanism in place for storing the DDT as a "seperate vdev".
> The problem is, there is nothing you can run at boot time to populate
> the L2ARC, so the dedup writes are ridiculously slow until the cache
> is warm. If the cache stayed warm, or there was an option to forcibly
> warm up the cache, this could be somewhat alleviated.
>
> Cheers
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-15 Thread Khyron
The DDT is stored within the pool, IIRC, but there is an RFE open to allow
you to
store it on a separate top level VDEV, like a SLOG.

The other thing I've noticed with all of the "destroyed a large dataset with
dedup
enabled and it's taking forever to import/destory/ wrote:

> Just thought I'd chime in for anyone who had read this - the import
> operation completed this time, after 60 hours of disk grinding.
>
> :)
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Different Hash algorithm

2010-02-07 Thread Khyron
Well, it's an attack, right?  Neither Skein nor Threefish has been
compromised.
In fact, this is what you want to see - researchers attacking an algorithm
which
goes a long way toward furthering or proving the security of said
algorithm.  I
think I agree with Darren overall, but this still looks promising because
these
researchers, while attacking Threefish and clearly finding some way to
simplify
a further attack, have still not managed to compromise it.  Exposing the
algo
to the scrutiny of the community will either help strengthen it, or expose
its
weakness, and all will be better as a result (in theory).

I am now curious, though, along with David, as to the reason Skein in
particular
was pointed out?  Is there any particular reason, or is it just that Joerg
came
across it while working on his blog posts?  There may not be a reason, which
is
perfectly fine, but for the sake of curiosity, if there is one, please share
Joerg.

On Sun, Feb 7, 2010 at 15:53, David Magda  wrote:

>
> On Feb 7, 2010, at 15:10, Darren J Moffat wrote:
>
>  On 07/02/2010 20:07, Joerg Moellenkamp wrote:
>>
>>> Hello,
>>>
>>> while writing some articles about dedup, hashes and ZFS for my blog, i
>>> asked myself: When fletcher4 is fast, but collision prone and sha256 is
>>> slower, but relatively secure, wouldn't it be reasonable to integrate
>>> Skein (http://www.schneier.com/skein.pdf) into ZFS to yield faster
>>> checksumming as well as a reduced probability of false positive
>>> deduplications due to hash collisions?
>>>
>>
>> If Skein passes the cryptanlaysis for the SHA3 competition being run by
>> NIST and is the winner of that competition or is otherwise considered sounds
>> by the crypto community then yes until then I think it is premature to do so
>> as it is a very new algorithm.
>>
>
> A new attack on Threefish (which Skein is based on) was recently announced:
>
>http://www.schneier.com/blog/archives/2010/02/new_attack_on_t.html
>
> Any reason why the OP prefers Skein over any of the other SHA-3 candidates?
>
>http://en.wikipedia.org/wiki/NIST_hash_function_competition
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help: Advice for a NAS

2009-08-10 Thread Khyron
I think the point, Chester, which everyone seems to be dancing around
or missing, is that your planning may need to go back to the drawing board
on this one.  Absorb the resources out there for how to best configure
your pools and vdevs, *then* implement.  That's the most efficient way to go

about doing what you want to do.  You can't add a single disk to an existing

vdev, as near as I can tell.

There are other discussions on this list about expanding or transforming
vdevs
of a certain type to another type, but this functionality appears to be low
on
the ZFS development teams list of priorities (probably with good reason) and

somewhat high on complexity.

Now, I'll include my list of things you might want to go back and read (and
possibly re-read) to do planning for your migration from the current RAIDZ
to
a new implementation of ZFS that gives you your desired outcome.  Please
excuse me if you have spent time with these, but I figure it's worth saying
just in case:

ZFS Admin Guide:
http://docs.sun.com/app/docs/doc/819-5461?l=en
ZFS @ Solaris Internals:
http://www.solarisinternals.com//wiki/index.php?title=Category:ZFS

Of course, the ZFS community on OpenSolaris.org:
http://opensolaris.org/os/community/zfs/

No need to mention zfs-discuss.

Anyway, the answer is that what you say you want to do, you cannot do.
There is probably another way to accomplish your ultimate goal, but you
want to get clear about that goal in DETAIL then see how, or if, you can
use ZFS to achieve that goal.  Since ZFS hails from a world of people who
are fairly accustomed to thinking in detailed, enterprise terms, you'll
either
want to start looking at things in this way or seek out solutions which fit
how you want to use your software and hardware.

HTH.

Cheers!

On Mon, Aug 10, 2009 at 00:17, Chester  wrote:

> Hi guys,
>
> Previously, I had three 1TB drives in my desktop using the Intel's
> southbridge RAID for storage.  The only problem with that is every time
> Windows Vista took a dump, I would be in jeopardy of corrupting the storage
> space; thus, I decided to have a dedicated machine just for serving up
> files.
>
> I now have a 3ware 9650SE 16 port host controller and currently four 2TB
> Western Digital WD2002FYPS drives (I also have a leftover drive from an old
> machine that's currently the boot drive), Supermicro server MB and a Celeron
> 440 2GHz chip (primarly because it's 35watts and I wanted to minimize power
> usage when the machine sat idle).
>
> I was weary about using a Microsoft OS as I wanted it to be reliable and
> also didn't want to have to pay for another license, etc.  I tried FreeNAS
> and it was ok, but somewhat limited in what you can do with it.  I have a
> squeezebox and would want to install their SqueezeCenter server to server up
> lossless compressed audio files.  I was also intrigued by ZFS after reading
> so much about it.
>
> I did try around with creating a raidz1 zpool, but then learned that the
> current implementation is somewhat limited in that you can't expand the
> raidset.  Using a raidz storage pool would also not take advantage of the
> 3ware's dedicated hardware for computing parity, etc.  Anyway, I
> successfully created a RAID5 set using three drives and created a zpool.  I
> then migrated the RAID5 set to add an additional 2TB drive (took 3 days!).
>  The server is currently down because I needed to use the RAM elsewhere, but
> after expanding the storage area, the filesystem still stayed the original
> size using the three 2TB drives.  I tried finding a way to get the full
> space allocation, but it seems that many use simple SATA ports and the raidz
> solution.  I'm also a n00b, so any advice would be greatly appreciated.  If
> you think I'm going down the wrong path, I would like to hear it.
>
> TIA,
> Chester
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

AlphaGuy - http://alphaguy.blogspot.com
On Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss