Re: [zfs-discuss] Scrub not completing?

2010-03-17 Thread Ian Collins

On 03/18/10 11:09 AM, Bill Sommerfeld wrote:

On 03/17/10 14:03, Ian Collins wrote:

I ran a scrub on a Solaris 10 update 8 system yesterday and it is 100%
done, but not complete:

   scrub: scrub in progress for 23h57m, 100.00% done, 0h0m to go
If blocks that have already been visited are freed and new blocks are 
allocated, the seen:allocated ratio is no longer an accurate estimate 
of how much more work is needed to complete the scrub.


Before the scrub prefetch code went in, I would routinely see scrubs 
last 75 hours which had claimed to be "100.00% done" for over a day.




Interesting comparison, yesterday's scrub counted down from about 25 
hours to go, today's is reporting:


scrub: scrub in progress for 7h36m, 15.86% done, 40h22m to go

Not much has changed in the pool over night.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] lazy zfs destroy

2010-03-17 Thread Chris Paul
OK I have a very large zfs snapshot I want to destroy. When I do this, the 
system nearly freezes during the zfs destroy. This is a Sun Fire X4600 with 
128GB of memory. Now this may be more of a function of the IO device, but let's 
say I don't care that this zfs destroy finishes quickly. I actually don't care, 
as long as it finishes before I run out of disk space.

So a suggestion for room for growth for the zfs suite is the ability to lazily 
destroy snapshots, such that the destroy goes to sleep if the cpu idle time 
falls under a certain percentage.

How doable is that?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Damon Atkins
I vote for zfs needing a backup and restore command against a snapshot.

backup command should output on stderr at least 
Full_Filename SizeBytes Modification_Date_1970secSigned
so backup software can build indexes and stdout contains the data.

The advantage of zfs providing the command is that as ZFS upgrades or new 
features are added backup vendors do not need to re-test their code. Could also 
mean that when encryption comes a long a property on pool could indicate if it 
is OK to decrypt the filenames only as part of a backup.

restore would work the same way except you would pass a filename or a directory 
to restore etc. And backup software would send back the stream to zfs restore 
command.

The other alternative is for zfs to provide a standard API for backups like 
Oracle does for RMAN.

It would be very useful with snapshots across pools
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6916404
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storgage box?

2010-03-17 Thread David Dyer-Bennet

On 3/17/2010 21:07, Ian Collins wrote:


I have a couple of x4540s which use ZFS send/receive to replicate each 
other hourly.  Ech box has about 4TB of data, with maybe 10G of 
changes per hour.  I have run the replication every 15 minutes, but 
hourly is good enough for us.




What software version are you running?  And can you show me the zfs send 
/ zfs receive commands for the incremental case that are working for 
you?  My incremental replication streams never complete, they hang 
part-way through (and then require system reboot to free up the IO 
system).   I'm running 2009.06, which is 111b.


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Daniel Carosone
On Wed, Mar 17, 2010 at 08:43:13PM -0500, David Dyer-Bennet wrote:
> My own stuff is intended to be backed up by a short-cut combination --  
> zfs send/receive to an external drive, which I then rotate off-site (I  
> have three of a suitable size).  However, the only way that actually  
> works so far is to destroy the pool (not just the filesystem) and  
> recreate it from scratch, and then do a full replication stream.  That  
> works most of the time, hangs about 1/5.  Anything else I've tried is  
> much worse, with hangs approaching 100%.
>
> I've posted here a few times about it; as I say, I'm in waiting for next  
> stable release mode, we'll see how that does, and I'll push much harder  
> for some kind of resolution if I still have trouble then.

My guess, asserted without substantiation, is that this could well be
related to the usb channel and drivers as much as to anything with
zfs.   In any case, if the new release as a package solves the
problem, it doesn't necessarily matter to you which specific
change(s) within made the difference.

If it still shows up in the current dev builds, or the release
shortly, then some process of elimination to narrow down the problem
area would be worthwhile.   In particular, some umass bridge chips are
frankly shite, and you may be unlucky enough to have those between
your host and your backup disks.

--
Dan.


pgpwxYz3cHZet.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storgage box?

2010-03-17 Thread Ian Collins

On 03/18/10 01:03 PM, Matt wrote:

Shipping the iSCSI and SAS questions...


Later on, I would like to add a second lower spec box to continuously (or 
near-continously) mirror the data (using a gig crossover cable, maybe).  I have 
seen lots of ways of mirroring data to other boxes which has left me with more 
questions than answers.  Is there a simple, robust way of doing this without 
setting up a complex HA service and at the same time minimising load on the 
master?

   
The answer really depends on how current you wish to keep your backup 
and how much data you have to replicate.


I have a couple of x4540s which use ZFS send/receive to replicate each 
other hourly.  Ech box has about 4TB of data, with maybe 10G of changes 
per hour.  I have run the replication every 15 minutes, but hourly is 
good enough for us.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread David Dyer-Bennet

On 3/17/2010 17:53, Ian Collins wrote:

On 03/18/10 03:53 AM, David Dyer-Bennet wrote:


Also, snapshots.  For my purposes, I find snapshots at some level a very
important part of the backup process.  My old scheme was to rsync from
primary ZFS pool to backup ZFS pool, and snapshot both pools (with
somewhat different retention schedules).  My new scheme, forced by 
the ACL

issues, is to use ZFS send/receive (but I haven't been able to make it
work yet), including snapshots.


I'm sure the folks here can help with that.


At this point I'm waiting for the 2010.Spring; in the previous stable 
release I'm having  large filesystems on the backup drives fail to 
destroy, and replication streams failing to complete, and so forth.




I have been using a two stage backup process with my main client, 
send/receive to a backup pool and spool to tape for off site archival.


I use a pair (on connected, one off site) of removable drives as 
single volume pools for my own backups via send/receive.




My own stuff is intended to be backed up by a short-cut combination -- 
zfs send/receive to an external drive, which I then rotate off-site (I 
have three of a suitable size).  However, the only way that actually 
works so far is to destroy the pool (not just the filesystem) and 
recreate it from scratch, and then do a full replication stream.  That 
works most of the time, hangs about 1/5.  Anything else I've tried is 
much worse, with hangs approaching 100%.


I've posted here a few times about it; as I say, I'm in waiting for next 
stable release mode, we'll see how that does, and I'll push much harder 
for some kind of resolution if I still have trouble then.


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool reporting consistent read errors

2010-03-17 Thread no...@euphoriq.com
In the end, it was the drive.  I replaced the drive and all the errors went 
away.  Another testimony to ZFS - all my data was intact after the resilvering 
process, even with some other errors in the pool.  ZFS resilvered the entire 
new disk and fixed the other errors.  You have to love ZFS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub not completing?

2010-03-17 Thread Giovanni Tirloni
On Wed, Mar 17, 2010 at 7:09 PM, Bill Sommerfeld  wrote:

> On 03/17/10 14:03, Ian Collins wrote:
>
>> I ran a scrub on a Solaris 10 update 8 system yesterday and it is 100%
>> done, but not complete:
>>
>>   scrub: scrub in progress for 23h57m, 100.00% done, 0h0m to go
>>
>
> Don't panic.  If "zpool iostat" still shows active reads from all disks in
> the pool, just step back and let it do its thing until it says the scrub is
> complete.
>
> There's a bug open on this:
>
> 6899970 scrub/resilver percent complete reporting in zpool status can be
> overly optimistic
>
> scrub/resilver progress reporting compares the number of blocks read so far
> to the number of blocks currently allocated in the pool.
>
> If blocks that have already been visited are freed and new blocks are
> allocated, the seen:allocated ratio is no longer an accurate estimate of how
> much more work is needed to complete the scrub.
>
> Before the scrub prefetch code went in, I would routinely see scrubs last
> 75 hours which had claimed to be "100.00% done" for over a day.


I've routinely seen that happen with resilvers on builds 126/127 on
raidz/raidz2. It reaches completion and stay in progress for as much as 50
hours at times. We just wait and let it do its work.

The bugs database doesn't show if developers have added comments about that.
Would have access to check if resilvers were mentioned ?

BTW, since this bug only exists in the bug database, does it mean it was
filled by a Sun engineer or a customer ? What's the relationship between
that and the defect database ? I'm still trying to understand the flow of
information here, since both databases seem to be used exclusively for
OpenSolaris but one is less open.

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is this a sensible spec for an iSCSI storgage box?

2010-03-17 Thread Matt
Dear list,

I am in the process of speccing an OpenSolaris box for iSCSI Storage of 
XenServer domUs.  I'm trying to get the best performance from a combination of 
decent SATA II disks and some SSDs and I would really appreciate some feedback 
on my plans.  I don't have much idea what the workload will be like because we 
simply haven't got any existing implementation to guide us.  All I can say is 
that the vast majority of domUs will be small linux web servers, so I guess it 
will be largely random IO...

I was planning on using either a current development build of OpenSolaris or 
perhaps the next release version if it comes out in time -I understand 2009.06 
has some issues which negatively affect iSCSI and/or ZFS performance?

Here is what I have in mind for the hardware:

1 x Supermicro 4U Rackmount Chassis 24 x 3.5in SAS Hot-Swap
1 x Supermicro X8ST3-F Server Board LGA1366 DDR3 SAS/SATA2 RAID IPMI GbE PCIe 
ATX MBD-X8ST3-F-O
2 x Intel dual port gigabit NICs (model to be decided)
1 x Supermicro AOC-USAS-L4i UIO RAID Adapter SAS 8-Port 16MB PCIe x8
1 x Intel Xeon E5520
6 x 2GB Registered ECC RAM = 12GB total
2 x 160GB Intel X25M MLC SSDs for ARC
2 x 32GB Intel X25-E SLC SSDs for ZIL
18 x WD 250GB RE3 7200RPM 16MB for storage (arranged as 4 x 6-disk raidz2)
2 x 250GB SATA II for rpool mirror

This will sit on a dedicated gigabit ethernet storage network and the above 
gives me 4 x gigabit NICs-worth of throughput (ignoring the two NICs on the 
motherboard, which I will need for management and maybe a crossover for copying 
data to another box).

We are hoping the hardware will scale to over 50 domUs across three dom0 boxes 
but wouldn't be surprised if the network is saturated well before then, at 
which point we may have to look to 10gig ethernet.  The decision to use iSCSI 
over NFS was made primarily because we thought the dom0s would cache some and 
thus reduce the amount of data travelling over the wire.

Does this configuration look OK?

Stupid question: should I have a battery-backed SAS adapter? -it will allegedly 
be protected by UPS, but...

Later on, I would like to add a second lower spec box to continuously (or 
near-continously) mirror the data (using a gig crossover cable, maybe).  I have 
seen lots of ways of mirroring data to other boxes which has left me with more 
questions than answers.  Is there a simple, robust way of doing this without 
setting up a complex HA service and at the same time minimising load on the 
master?

Thanks in advance and sorry for the barrage of questions,

Matt.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Khyron
Ian,

When you say you spool to tape for off-site archival, what software do you
use?

On Wed, Mar 17, 2010 at 18:53, Ian Collins  wrote:




>
> I have been using a two stage backup process with my main client,
> send/receive to a backup pool and spool to tape for off site archival.
>
> I use a pair (on connected, one off site) of removable drives as single
> volume pools for my own backups via send/receive.
>
> --
> Ian.
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Ian Collins

On 03/18/10 03:53 AM, David Dyer-Bennet wrote:

Anybody using the in-kernel CIFS is also concerned with the ACLs, and I
think that's the big issue.

   

Especially in a paranoid organisation with 100s of ACEs!


Also, snapshots.  For my purposes, I find snapshots at some level a very
important part of the backup process.  My old scheme was to rsync from
primary ZFS pool to backup ZFS pool, and snapshot both pools (with
somewhat different retention schedules).  My new scheme, forced by the ACL
issues, is to use ZFS send/receive (but I haven't been able to make it
work yet), including snapshots.

   

I'm sure the folks here can help with that.

I have been using a two stage backup process with my main client, 
send/receive to a backup pool and spool to tape for off site archival.


I use a pair (on connected, one off site) of removable drives as single 
volume pools for my own backups via send/receive.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub not completing?

2010-03-17 Thread Ian Collins

On 03/18/10 11:09 AM, Bill Sommerfeld wrote:

On 03/17/10 14:03, Ian Collins wrote:

I ran a scrub on a Solaris 10 update 8 system yesterday and it is 100%
done, but not complete:

   scrub: scrub in progress for 23h57m, 100.00% done, 0h0m to go


Don't panic.  If "zpool iostat" still shows active reads from all 
disks in the pool, just step back and let it do its thing until it 
says the scrub is complete.


There's a bug open on this:

6899970 scrub/resilver percent complete reporting in zpool status can 
be overly optimistic


scrub/resilver progress reporting compares the number of blocks read 
so far to the number of blocks currently allocated in the pool.


If blocks that have already been visited are freed and new blocks are 
allocated, the seen:allocated ratio is no longer an accurate estimate 
of how much more work is needed to complete the scrub.


Before the scrub prefetch code went in, I would routinely see scrubs 
last 75 hours which had claimed to be "100.00% done" for over a day.



Arse, thanks Bill.  I just stopped and restarted the scrub!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-17 Thread Khyron
For those following along, this is the e-mail I meant to send to the list
but
instead sent directly to Tonmaus.  My mistake, and I apologize for having to

re-send.

=== Start ===

My understanding, limited though it may be, is that a scrub touches ALL data
that
has been written, including the parity data.  It confirms the validity of
every bit that
has been written to the array.  Now, there may be an implementation detail
that is
responsible for the pathology that you observed.  More than likely, I'd
imagine.  Filing
a bug may be in order.  Since triple parity RAIDZ exists now, you may want
to test
with that by grabbing a LiveCD or LiveUSB image from genunix.org.  Maybe
RAIDZ3
has the same (or worse) problems?

As for "scrub management", I pointed out the specific responses from Richard
where
he noted that scrub I/O priority *can* be tuned.  How you do that, I'm not
sure.
Richard, how does one tune scrub I/O priority?  Other than that, as I said,
I don't
think there is a model (publicly available anyway) describing scrub behavior
and how it
scales with pool size (< 5 TB, 5 TB - 50 TB, > 50 TB, etc.) or data layout
(mirror vs.
RAIDZ vs. RAIDZ2).  ZFS is really that new, that all of this needs to be
reconsidered
and modeled.  Maybe this is something you can contribute to the community?
ZFS
is a new storage system, not the same old file systems whose behaviors and
quirks
are well known because of 20+ years of history.  We're all writing a new
chapter in
data storage here, so it is incumbent upon us to share knowledge in order to
answer
these types of questions.

I think the questions I raised in my longer response are also valid and need
to be
re-considered.  There are large pools in production today.  So how are
people
scrubbing these pools?  Please post your experiences with scrubbing 100+ TB
pools.

Tonmaus, maybe you should repost my other questions in a new, separate
thread?

=== End ===

On Tue, Mar 16, 2010 at 19:41, Tonmaus  wrote:

> > Are you sure that you didn't also enable
> > something which
> > does consume lots of CPU such as enabling some sort
> > of compression,
> > sha256 checksums, or deduplication?
>
> None of them is active on that pool or in any existing file system. Maybe
> the issue is particular to RAIDZ2, which is comparably recent. On that
> occasion: does anybody know if ZFS reads all parities during a scrub?
> Wouldn't it be sufficient for stale corruption detection to read only one
> parity set unless an error occurs there?
>
> > The main concern that one should have is I/O
> > bandwidth rather than CPU
> > consumption since "software" based RAID must handle
> > the work using the
> > system's CPU rather than expecting it to be done by
> > some other CPU.
> > There are more I/Os and (in the case of mirroring)
> > more data
> > transferred.
>
> What I am trying to say is that CPU may become the bottleneck for I/O in
> case of parity-secured stripe sets. Mirrors and simple stripe sets have
> almost 0 impact on CPU. So far at least my observations. Moreover, x86
> processors not optimized for that kind of work as much as i.e. an Areca
> controller with a dedicated XOR chip is, in its targeted field.
>
> Regards,
>
> Tonmaus
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-17 Thread Khyron
Ugh!  I meant that to go to the list, so I'll probably re-send it for the
benefit
of everyone involved in the discussion.  There were parts of that that I
wanted
others to read.

>From a re-read of Richard's e-mail, maybe he meant that the number of I/Os
queued to a device can be tuned lower and not the priority of the scrub (as
I took him to mean).  Hopefully Richard can clear that up.  I personally
stand
corrected for mis-reading Richard there.

Of course the performance of a given system cannot be described until it is
built.  Again, my interpretation of your e-mail was that you were looking
for
a model for the performance of concurrent scrub and I/O load of a RAIDZ2
VDEV that you could scale up from your "test" environment of 11 disks to a
200+ TB behemoth.  As I mentioned several times, I doubt such a model
exists, and I have not seen anything published to that effect.  I don't know

how useful it would be if it did exist because the performance of your disks

would be a critical factor.  (Although *any* model beats no model any day.)
Let's just face it.  You're using a new storage system that has not been
modeled.  To get the model you seek, you will probably have to create it
yourself.

(It's notable that most of the ZFS models that I have seen have been done
by Richard.  Of course, they were MTTDL models, not scrub vs. I/O
performance models for different VDEV types.)

As for your point about building large pools from lots of mirror VDEVs, my
response is "meh".  I've said several times, and maybe you've missed it
several times, that there may be pathologies for which YOU should open
bugs.  RAIDZ3 may exhibit the same kind of pathologies you observed with
RAIDZ2.  Apparently RAIDZ does not.  I've also noticed (and I'm sure I'll
be corrected if I'm mistaken) that there is not a limit on the number of
VDEVs in a pool but single digit RAID VDEVs are recommended.  So there
is nothing preventing you from building (for example) VDEVs from 1 TB
disks.  If you take 9 x 1 TB disks per VDEV, and use RAIDZ2, you get 7 TB
usable.  That means about 29 VDEVs to get 200 TB.  Double the disk
capacity and you can probably get to 15 top level VDEVs.  (And you'll want
that RAIDZ2 as well since I don't know if you could trust that many disks,
whether enterprise or consumer.)  However, that number of top level VDEVs
sounds reasonable based on what others have reported.  What's been
proven to be "A Bad Idea(TM)" is putting lots of disks in a single VDEV.

Remember that ZFS is a *new* software system.  It is complex.  It will have
bugs.  You have chosen ZFS; it didn't choose you.  So I'd say you can
contribute to the community by reporting back your experiences, opening
bugs on things which make sense to open bugs on, testing configurations,
modeling, documenting and sharing.  So far, you just seem to be interested
in taking w/o so much as an offer of helping the community or developers to
understand what works and what doesn't.  All take and no give is not cool.
And if you don't like ZFS, then choose something else.  I'm sure EMC or
NetApp will willingly sell you all the spindles you want.  However, I think
it is
still early to write off ZFS as a losing proposition, but that's my opinion.

So far, you seem to be spending a lot of time complaining about a *new*
software system that you're not paying for.  That's pretty tasteless, IMO.

And now I'll re-send that e-mail...

P.S.: Did you remember to re-read this e-mail?  Read it 2 or 3 times and be
clear about what I said and what I did _not_ say.

On Wed, Mar 17, 2010 at 16:12, Tonmaus  wrote:

> Hi,
>
> I got a message from you off-list that doesn't show up in the thread even
> after hours. As you mentioned the aspect here as well I'd like to respond
> to, I'll do it from here:
>
> > Third, as for ZFS scrub prioritization, Richard
> > answered your question about that.  He said it is
> > low priority and can be tuned lower.  However, he was
> > answering within the context of an 11 disk RAIDZ2
> > with slow disks  His exact words were:
> >
> >
> > This could be tuned lower, but your storage
> > is slow and *any* I/O activity will be
> > noticed.
>
> Richard told us two times that scrub already is as low in priority as can
> be. From another message:
>
> "Scrub is already the lowest priority. Would you like it to be lower?"
>
>
> =
>
> As much as the comparison goes between "slow" and "fast" storage. I have
> understood that Richard's message was that with storage providing better
> random I/O zfs priority scheduling will perform significantly better,
> providing less degradation of concurrent load. While I am even inclined to
> buy that, nobody will be able to tell me how a certain system will behave
> until it was tested, and to what degree concurrent scrubbing still will be
> possible.
> Another thing: people are talking a lot about narrow vdevs and mirrors.
> However, when you need to build a 2

Re: [zfs-discuss] Scrub not completing?

2010-03-17 Thread Bill Sommerfeld

On 03/17/10 14:03, Ian Collins wrote:

I ran a scrub on a Solaris 10 update 8 system yesterday and it is 100%
done, but not complete:

   scrub: scrub in progress for 23h57m, 100.00% done, 0h0m to go


Don't panic.  If "zpool iostat" still shows active reads from all disks 
in the pool, just step back and let it do its thing until it says the 
scrub is complete.


There's a bug open on this:

6899970 scrub/resilver percent complete reporting in zpool status can be 
overly optimistic


scrub/resilver progress reporting compares the number of blocks read so 
far to the number of blocks currently allocated in the pool.


If blocks that have already been visited are freed and new blocks are 
allocated, the seen:allocated ratio is no longer an accurate estimate of 
how much more work is needed to complete the scrub.


Before the scrub prefetch code went in, I would routinely see scrubs 
last 75 hours which had claimed to be "100.00% done" for over a day.


- Bill




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub not completing?

2010-03-17 Thread Freddie Cash
On Wed, Mar 17, 2010 at 2:03 PM, Ian Collins  wrote:

> I ran a scrub on a Solaris 10 update 8 system yesterday and it is 100%
> done, but not complete:
>
>  scrub: scrub in progress for 23h57m, 100.00% done, 0h0m to go
>
> Any ideas?


I've had that happen on FreeBSD 7-STABLE (post 7.2 release) using ZFSv13.
 scrub showed 100% complete, but "in progress" and timer kept increasing.
 After waiting an hour, I did a "zpool scrub -s" and then a "zpool scrub".
 This second scrub finished quicker, and finished completely.

No idea why it happened, or why that fixed it.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-17 Thread Daniel Carosone
On Wed, Mar 17, 2010 at 10:15:53AM -0500, Bob Friesenhahn wrote:
> Clearly there are many more reads per second occuring on the zfs  
> filesystem than the ufs filesystem.

yes

> Assuming that the application-level requests are really the same

From the OP, the workload is a "find /".

So, ZFS makes the disks busier.. but is it find'ing faster as a
result, or doing more reads per found file?   The ZFS io pipeline will
be able to use the cpu concurrency of the T1000 better than UFS, even
for a single-threaded find, and may just be issuing IO faster.

Count the number of lines printed and divide by the time taken to
compare whether the extra work being done is producing extra output or
not.

However, it might also be worthwhile to look for a better / more
representative benchmark and compare further using that.

Also, to be clear, could you clarify whether the "problem" you see
that the numbers in iostat are larger, or that find runs slower, or
that other processes are more impacted by find?

> this suggests that the system does not have 
> enough RAM installed in order to cache the "working set".  

Possibly, yes. 

> Another issue 
> could be fileystem block size since zfs defaults the block size to 128K 
> but some applications (e.g. database) work better with 4K, 8K, or 16K 
> block size.

Unlikely to be relevant to fs metadata for find.

> Regardless, I suggest measuring the statistics with a 30 second interval 
> rather than 5 seconds since zfs is assured to do whatever it does within 
> 30 seconds.

Relevant for write benchmarks more so than read.

--
Dan.


pgpXx2JnRah30.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Scrub not completing?

2010-03-17 Thread Ian Collins
I ran a scrub on a Solaris 10 update 8 system yesterday and it is 100% 
done, but not complete:


 scrub: scrub in progress for 23h57m, 100.00% done, 0h0m to go

Any ideas?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] checksum errors increasing on "spare" vdev?

2010-03-17 Thread Eric Sproul
Hi,
One of my colleagues was confused by the output of 'zpool status' on a pool
where a hot spare is being resilvered in after a drive failure:

$ zpool status data
  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h56m, 23.78% done, 3h1m to go
config:

NAME  STATE READ WRITE CKSUM
data  DEGRADED 0 0 0
  raidz1  ONLINE   0 0 0
c0t2d0ONLINE   0 0 0
c1t2d0ONLINE   0 0 0
c0t4d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
  raidz1  DEGRADED 0 0 0
spare DEGRADED 0 0 2.89M
  c0t1d0  REMOVED  0 0 0
  c0t6d0  ONLINE   0 0 0  59.3G resilvered
c1t5d0ONLINE   0 0 0
c0t3d0ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c1t3d0ONLINE   0 0 0
c1t6d0ONLINE   0 0 0
spares
  c0t6d0  INUSE currently in use

The CKSUM error count is increasing so he thought that the spare was also
failing.  I disagreed because the errors were being recorded on the "fake" vdev
"spare", but I want to make sure my hunch is correct.

My hunch is that since reads from userland continue to come to the pool, and
since it's raidz, some of those reads will be for zobject addresses on the
failed drive, now represented by the spare.  Because the data at those addresses
is uninitialized, we get checksum errors.

I guess I really have two questions:
1. Am I correct about the source of the checksum errors attributed to the
"spare" vdev?
2. During raidz resilver, if a read happens for an address that is among what's
already been resilvered, will that read succeed, or will ALL reads to that
top-level vdev require reconstruction from the other leaf vdevs?

If the answer to #2 is that reads will succeed if they ask for data that's been
resilvered, then I might expect my read performance to increase as resilver
progresses, as less and less data requires reconstruction.  I haven't measured
this in a controlled environment though, so I'm mostly just curious about the
theory.

Eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-17 Thread Tonmaus
Hi,

I got a message from you off-list that doesn't show up in the thread even after 
hours. As you mentioned the aspect here as well I'd like to respond to, I'll do 
it from here:

> Third, as for ZFS scrub prioritization, Richard
> answered your question about that.  He said it is
> low priority and can be tuned lower.  However, he was
> answering within the context of an 11 disk RAIDZ2
> with slow disks  His exact words were:
> 
> 
> This could be tuned lower, but your storage
> is slow and *any* I/O activity will be
> noticed.

Richard told us two times that scrub already is as low in priority as can be. 
From another message:

"Scrub is already the lowest priority. Would you like it to be lower?"

=

As much as the comparison goes between "slow" and "fast" storage. I have 
understood that Richard's message was that with storage providing better random 
I/O zfs priority scheduling will perform significantly better, providing less 
degradation of concurrent load. While I am even inclined to buy that, nobody 
will be able to tell me how a certain system will behave until it was tested, 
and to what degree concurrent scrubbing still will be possible.
Another thing: people are talking a lot about narrow vdevs and mirrors. 
However, when you need to build a 200 TB pool you end up with a lot of disks in 
the first place. You will need at least double failover resilience for such a 
pool. If one would do that with mirrors, ending up with app. 600 TB gross to 
provide 200 TB net capacity is definitely NOT an option.

Regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: clarification on meaning of the autoreplace propert

2010-03-17 Thread Dave Johnson
> Hi Dave,
> 
> I'm unclear about the autoreplace behavior with one
> spare that is
> connected to two pools. I don't see how it could work
> if the autoreplace 
> property is enabled on both pools, which formats and
> replaces a spare

Because I already partitioned the disk into slices. Then
I indicated the proper slice as the spare.

> disk that might be in-use in another pool (?) Maybe I
> misunderstand.
> 
> 1. I think autoreplace behavior might be inconsistent
> when a device is
> removed. CR 6935332 was filed recently but is not
> available yet through
> our public bug database.
> 
> 2. The current issue with adding a spare disk to a
> ZFS root pool is that 
> if a root pool mirror disk fails and the spare kicks
> in, the bootblock
> is not applied automatically. We're working on
> improving this
> experience.

While the bootblock may not have been applied automatically,
the root pool did show resilvering, but the storage pool
did not (at least per the status report)

> 
> My advice would be to create a 3-way mirrored root
> pool until we have a
> better solution for root pool spares.

That would be sort of a different topic. I'm just interested
in understanding the functionality of the hot spare at this
point.

> 
> 3. For simplicity and ease of recovery, consider
> using your disks as
> whole disks, even though you must use slices for the
> root pool.

I can't do this with a RAID 10 configuration on the
storage pool, and a mirrored root pool. I only have
places for 5 disks on a 2RU/ 3.5" drive server

> If one disk is part of two pools and it fails, two
> pool are impacted. 

Yes. This is why I used slices instead of a whole disk
for the hot spare.

> The beauty of ZFS is no longer having to deal with
> slice administration, 
> except for the root pool.
> 
> I like your mirror pool configurations but I would
> simplify it by
> converting store1 to using whole disks, and keep
> separate spare disks.`

I would have done that from the beginning with more
chassis space.

> One for the store1 pool, and either create a 3-way
> mirrored root pool
> or keep a spare disk connected to the system but
> unconfigured.

I still need confirmation on whether the hot spare function
will work with slices. I saw no errors when executing the commands
for the hot spare slices, but I got this funny response when I ran the 
test
> 
Dave
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17.03.2010 18:18, David Dyer-Bennet wrote:
> 
> On Wed, March 17, 2010 10:19, Edward Ned Harvey wrote:
> 
>> However, removable disks are not very
>> reliable compared to tapes, and the disks are higher cost per GB, and
>> require more volume in the safe deposit box, so the external disk usage is
>> limited...  Only going back for 2-4 weeks of archive...
> 
> Last 5 times I checked, tapes were vastly more expensive than disks, even
> if you ignored the drive cost.  Let's see...currently buy.com lists an
> lt30 cartridge at about $25, for 400GB; a 2TB drive is about $170.  So
> yeah, they're a little cheaper than disk now (IF you ignore the drive
> cost; the cheap ones seem to be around $1800).
> 
> I had a very satisfactory QIC-60 tape streamer in the 1980s, and a couple
> of rather unsatisfactory DAT drives later.  I've been backing up to disk
> since then, seems to be a lot easier, quicker, cheaper, and more stable,
> on my scale.  When you go to much bigger setups, the trade-offs change;
> the value of the space in the safe starts to show up as significant, the
> drive cost becomes less of an issue, and so forth.

Please, don't compare proper backup drives to that rotating head
non-standard catastrophy... DDS was (in)famous for being a delayed-fuse
tape-shredder.

//Svein

- -- 
- +---+---
  /"\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuhECYACgkQSBMQn1jNM7YcEACgzRyMvBRYwoaWkwIBujnPL06S
3LsAoLpr9YOBop4UBkz9JUrUQ9q8kVPz
=w+1I
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: clarification on meaning of the autoreplace property

2010-03-17 Thread Cindy Swearingen

Hi Dave,

I'm unclear about the autoreplace behavior with one spare that is
connected to two pools. I don't see how it could work if the autoreplace 
property is enabled on both pools, which formats and replaces a spare

disk that might be in-use in another pool (?) Maybe I misunderstand.

1. I think autoreplace behavior might be inconsistent when a device is
removed. CR 6935332 was filed recently but is not available yet through
our public bug database.

2. The current issue with adding a spare disk to a ZFS root pool is that 
if a root pool mirror disk fails and the spare kicks in, the bootblock

is not applied automatically. We're working on improving this
experience.

My advice would be to create a 3-way mirrored root pool until we have a
better solution for root pool spares.

3. For simplicity and ease of recovery, consider using your disks as
whole disks, even though you must use slices for the root pool.
If one disk is part of two pools and it fails, two pool are impacted. 
The beauty of ZFS is no longer having to deal with slice administration, 
except for the root pool.


I like your mirror pool configurations but I would simplify it by
converting store1 to using whole disks, and keep separate spare disks.`
One for the store1 pool, and either create a 3-way mirrored root pool
or keep a spare disk connected to the system but unconfigured.

Thanks,

Cindy



On 03/17/10 10:25, Dave Johnson wrote:

From pages 29,83,86,90 and 284 of the 10/09 Solaris ZFS Administration
guide, it sounds like a disk designated as a hot spare will:
1. Automatically take the place of a bad drive when needed
2. The spare will automatically be detached back to the spare
   pool when a new device is inserted and brought up to replace the
   original compromised one.

Should this work the same way for slices?

I have four active disks in a RAID 10 configuration,
for a storage pool, and the same disks are used
for mirrored root configurations, but only
only one of the possible mirrored root slice
pairs is currently active.

I wanted to designate slices on a 5th disk as
hot spares for the two existing pools, so
after partitioning the 5th disk (#4) identical
to the four existing disks, I ran:

# zpool add rpool spare c0t4d0s0
# zpool add store1 spare c0t4d0s7
# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s0  ONLINE   0 0 0
c0t1d0s0  ONLINE   0 0 0
spares
  c0t4d0s0AVAIL

errors: No known data errors

  pool: store1
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
store1ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s7  ONLINE   0 0 0
c0t1d0s7  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t2d0s7  ONLINE   0 0 0
c0t3d0s7  ONLINE   0 0 0
spares
  c0t4d0s7AVAIL

errors: No known data errors
--
So It looked like everything was set up how I was
hoping until I emulated a disk failure by pulling
one of the online disks. The root pool responded
how I expected, but the storage pool, on slice 7,
did not appear to perform the autoreplace:

Not too long after pulling one of the online disks:


# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 0h0m, 10.02% done, 0h5m to go
config:

NAMESTATE READ WRITE CKSUM
rpool   DEGRADED 0 0 0
  mirrorDEGRADED 0 0 0
c0t0d0s0ONLINE   0 0 0
spare   DEGRADED84 0 0
  c0t1d0s0  REMOVED  0 0 0
  c0t4d0s0  ONLINE   0 084  329M resilvered
spares
  c0t4d0s0  INUSE currently in use

errors: No known data errors

  pool: store1
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
store1ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s7  ONLINE   0 0 0
c0t1d0s7  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t2d0s7  ONLINE   0 0 0
c0t3d0s7  ONLINE   0 0 0
spares
  c0t4d0s7AVAIL

errors: No known data errors

I w

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread David Dyer-Bennet

On Wed, March 17, 2010 10:19, Edward Ned Harvey wrote:

> However, removable disks are not very
> reliable compared to tapes, and the disks are higher cost per GB, and
> require more volume in the safe deposit box, so the external disk usage is
> limited...  Only going back for 2-4 weeks of archive...

Last 5 times I checked, tapes were vastly more expensive than disks, even
if you ignored the drive cost.  Let's see...currently buy.com lists an
lt30 cartridge at about $25, for 400GB; a 2TB drive is about $170.  So
yeah, they're a little cheaper than disk now (IF you ignore the drive
cost; the cheap ones seem to be around $1800).

I had a very satisfactory QIC-60 tape streamer in the 1980s, and a couple
of rather unsatisfactory DAT drives later.  I've been backing up to disk
since then, seems to be a lot easier, quicker, cheaper, and more stable,
on my scale.  When you go to much bigger setups, the trade-offs change;
the value of the space in the safe starts to show up as significant, the
drive cost becomes less of an issue, and so forth.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Miles Nordin
> "la" == Lori Alt  writes:

la> This is no longer the case.  The send stream format is now
la> versioned in such a way that future versions of Solaris will
la> be able to read send streams generated by earlier versions of
la> Solaris.

Your memory of the thread is selective.  This is only one of the
several problems with it.

If you are not concerned with bitflip gremlins on tape, then all the
baloney about checksums and copies=2 metadata and insisting on
zpool-level redundancy is just a bunch of opportunistic FUD.

la> The comment in the zfs(1M) manpage discouraging the
la> use of send streams for later restoration has been removed.

The man page never warned of all the problems, nor did the si wiki.


pgpCjAGUvOlWe.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Miles Nordin
> "k" == Khyron   writes:

 k> Star is probably perfect once it gets ZFS (e.g. NFS v4) ACL

nope, because snapshots are lost and clones are expanded wrt their
parents, and the original tree of snapshots/clones can never be
restored.

we are repeating, though.  This is all in the archives.


pgpTLTb9Ads3W.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS: clarification on meaning of the autoreplace property

2010-03-17 Thread Dave Johnson
>From pages 29,83,86,90 and 284 of the 10/09 Solaris ZFS Administration
guide, it sounds like a disk designated as a hot spare will:
1. Automatically take the place of a bad drive when needed
2. The spare will automatically be detached back to the spare
   pool when a new device is inserted and brought up to replace the
   original compromised one.

Should this work the same way for slices?

I have four active disks in a RAID 10 configuration,
for a storage pool, and the same disks are used
for mirrored root configurations, but only
only one of the possible mirrored root slice
pairs is currently active.

I wanted to designate slices on a 5th disk as
hot spares for the two existing pools, so
after partitioning the 5th disk (#4) identical
to the four existing disks, I ran:

# zpool add rpool spare c0t4d0s0
# zpool add store1 spare c0t4d0s7
# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s0  ONLINE   0 0 0
c0t1d0s0  ONLINE   0 0 0
spares
  c0t4d0s0AVAIL

errors: No known data errors

  pool: store1
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
store1ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s7  ONLINE   0 0 0
c0t1d0s7  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t2d0s7  ONLINE   0 0 0
c0t3d0s7  ONLINE   0 0 0
spares
  c0t4d0s7AVAIL

errors: No known data errors
--
So It looked like everything was set up how I was
hoping until I emulated a disk failure by pulling
one of the online disks. The root pool responded
how I expected, but the storage pool, on slice 7,
did not appear to perform the autoreplace:

Not too long after pulling one of the online disks:


# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 0h0m, 10.02% done, 0h5m to go
config:

NAMESTATE READ WRITE CKSUM
rpool   DEGRADED 0 0 0
  mirrorDEGRADED 0 0 0
c0t0d0s0ONLINE   0 0 0
spare   DEGRADED84 0 0
  c0t1d0s0  REMOVED  0 0 0
  c0t4d0s0  ONLINE   0 084  329M resilvered
spares
  c0t4d0s0  INUSE currently in use

errors: No known data errors

  pool: store1
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
store1ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s7  ONLINE   0 0 0
c0t1d0s7  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t2d0s7  ONLINE   0 0 0
c0t3d0s7  ONLINE   0 0 0
spares
  c0t4d0s7AVAIL

errors: No known data errors

I was able to convert the state of store1 to DEGRADED by
writing to a file in that storage pool, but it always listed
the spare as available. This at the same time as showing
c0t1d0s7 as REMOVED in the same pool

Based on the manual, I expected the system to bring a
reinserted disk back on line automatically, but zpool status
still showed it as "REMOVED". To get it back on line:

# zpool detach rpool c0t4d0s0
# zpool clear rpool
# zpool clear store1

Then status showed *both* pools resilvering. So the questions are:

1. Does autoreplace work on slices, or just complete disks?
2. Is there a problem replacing a "bad" disk with the same disk
   to get the autoreplace function to work?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Lori Alt



I think what you're saying is:  Why bother trying to backup with "zfs
send"
when the recommended practice, fully supportable, is to use other tools
for
backup, such as tar, star, Amanda, bacula, etc.   Right?

The answer to this is very simple.
#1  ...
#2  ...
 

Oh, one more thing.  "zfs send" is only discouraged if you plan to store the
data stream and do "zfs receive" at a later date.
   


This is no longer the case.  The send stream format is now versioned in 
such a way that future versions of Solaris will be able to read send 
streams generated by earlier versions of Solaris.  The comment in the 
zfs(1M) manpage discouraging the use of send streams for later 
restoration has been removed.  This versioning leverages the ZFS object 
versioning that already exists (the versioning that allows earlier 
version pools to be read by later versions of zfs), plus versioning and 
feature flags in the stream header.


Lori



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17.03.2010 16:19, Edward Ned Harvey wrote:
*snip

> Still ... If you're in situation (b) then you want as many options available
> to you as possible.  I've helped many people and/or companies before, who
> ...  Had backup media, but didn't have the application that wrote the backup
> media and therefore couldn't figure out how to restore.   ...  Had a backup
> system that was live synchronizing the master file server to a slave file
> server, and then when something blew up the master, it propagated and
> deleted the slave too.  In this case, the only thing that saved them was an
> engineer who had copied the whole directory a week ago onto his iPod, if you
> can believe that.  ...  Had backup tapes but no tape drive  ...  Had
> archives on DVD, and the DVD's were nearly all bad  ...  Looked through the
> backups only to discover something critical had been accidentally excluded

I can add a few items on your list, that you've probably seen, but
forgot to list:

- -Using a tape format that really isn't a standard (anybody remember the
first generation of DDS with compression, that ... probably could only
be restored on the exact identical drive that wrote them?)
- -Enabling the fancy "security feature" and not keeping a safe copy of
the encryption keys they're using on the tapes?
- -Forgetting that a disaster-recovery setup SHOULD start out with
pen+paper planning "what to do _WHEN_ disaster strikes." Including
making a step by step list. This way you can plan what prerequisites
your disaster recovery has. How do you bootstrap your recovery? What do
you need up and running to read your tapes? What routines do you have
for bringing offline media off site for safekeeping? (and this "plan for
disaster" is the reasons for my many questions on these lists lately. I
simply *will not* switch to a new solution that I cannot properly plan
for a disaster recovery of!).

What I'm getting at, is that most of the disaster recovery procedure
should actually be done _BEFORE_ disaster strikes. And use that work to
do your best to make disasters unlikely.

If your backup solution depends on a running system, bring a livecd or
similar in the same suitcase as the tapes. If you don't you'll have a
major problem when you NEED the backup.

Proper backups for disaster-recovery should be offline media. Disaster
isn't only "place burning to the ground, switch to secondary location".
It can also be a rogue employee (or outsider) data-bombing your
infrastructure. If those backups are online-and-connected, there is
little reason to trust that they weren't destroyed as well.

If you back up to a mounted file system, using a cronjob, mount the fs
for the backup and dismount it after verify. Don't keep it eternally
mounted, since a situation corrupting the mounted data then could
corrupt the backup as well.

This falls back to "plan for disaster". Include "human failures" and
"sysadmin accidents" in the disaster plan. And document every step. This
because there's no guarantee the sysadmin manages to get out of his
dungeon when the building burns to the ground. A proper disaster
recovery plan would mean that anyone competent can read the hardcopy of
the plan, and restore the system.

And so on.

Maybe someone should add some "plan for disaster" to the "best current
practices" zfs-page? ;)

//Svein


- -- 
- +---+---
  /"\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkug9u8ACgkQSBMQn1jNM7bBmwCg/i+Wcd9rXmWmhm6HReaP7XRe
IFwAoNMg8UzbX+gs2ZvAzvplEtwehfbR
=0Xlw
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Edward Ned Harvey
> Why do we want to adapt "zfs send" to do something it was never
> intended
> to do, and probably won't be adapted to do (well, if at all) anytime
> soon instead of
> optimizing existing technologies for this use case?

The only time I see or hear of anyone using "zfs send" in a way it wasn't
intended is when people store the datastream on tape or a filesystem,
instead of feeding it directly into "zfs receive."

Although it's officially discouraged for this purpose, there is value in
doing so, and I can understand why some people sometimes (including myself)
would have interest in doing this.

So let's explore the reasons it's discouraged to store a "zfs send"
datastream:
#1  If a single bit goes bad, the whole dataset is bad.
#2  You can only receive the whole filesystem.  You cannot granularly
restore a single file or directory.

Now, if you acknowledge these two points, let's explore why somebody might
want to do it anyway:

To counter #1:
Let's acknowledge that storage media is pretty reliable.  We've all seen
tapes and disks go bad, but usually they don't.  If you've got a new tape
archive every week or every month...  The probability of *all* of those
tapes having one or more bad bits is astronomically low.  Nonzero risk, but
a calculated risk.

To counter #2:
There are two basic goals for backups.  (a) to restore some stuff upon
request, or (b) for the purposes of DR, to guarantee your manager that
you're able to get the company back into production quickly after a
disaster.  Such as the building burning down.

ZFS send to tape does not help you in situation (a).  So we can conclude
that "zfs send" to tape is not sufficient as an *only* backup technique.
You need something else, and at most, you might consider "zfs send" to tape
as an augmentation to your other backup technique.

Still ... If you're in situation (b) then you want as many options available
to you as possible.  I've helped many people and/or companies before, who
...  Had backup media, but didn't have the application that wrote the backup
media and therefore couldn't figure out how to restore.   ...  Had a backup
system that was live synchronizing the master file server to a slave file
server, and then when something blew up the master, it propagated and
deleted the slave too.  In this case, the only thing that saved them was an
engineer who had copied the whole directory a week ago onto his iPod, if you
can believe that.  ...  Had backup tapes but no tape drive  ...  Had
archives on DVD, and the DVD's were nearly all bad  ...  Looked through the
backups only to discover something critical had been accidentally excluded
...

Point is, having as many options available as possible is worthwhile in the
disaster situation.

Please see below for some more info, as it ties into some more of what
you've said ...


> But I got it.  "zfs send" is fast.  Let me ask you this, Ed...where do
> you "zfs send"
> your data to? Another pool?  Does it go to tape eventually?  If so,
> what is the setup
> such that it goes to tape?  I apologize for asking here, as I'm sure
> you described it
> in one of the other threads I mentioned, but I'm not able to go digging
> in those
> threads at the moment.

Here is my backup strategy:

I use "zfs send | ssh somehost 'zfs receive'" to send nightly incrementals
to a secondary backup server.  

This way, if something goes wrong with the primary fileserver, I can simply
change the IP address of the secondary, and let it assume the role of the
primary.  With the unfortunate loss of all of today's data ... going back to
last night.  I have had to do this once before, in the face of primary
fileserver disaster and service contract SLA failure by Netapp...  All the
users were very pleased that I was able to get them back into production
using last night's data in less than a few minutes.

>From the secondary server, I "zfs send | zfs receive" onto removable hard
disks.  This is ideal to restore either individual files, or the whole
filesystem.  No special tools would be necessary to restore on any random
ZFS server in the future, and nothing could be faster.  In fact, you
wouldn't even need to restore if you wanted to in a pinch, you could work
directly on the external disks.  However, removable disks are not very
reliable compared to tapes, and the disks are higher cost per GB, and
require more volume in the safe deposit box, so the external disk usage is
limited...  Only going back for 2-4 weeks of archive...

So there is also a need for tapes.  Once every so often, from the secondary
server, I "zfs send" the whole filesystem onto tape for archival purposes.
This would only be needed after a disaster, and also the failure or
overwriting of the removable disks.  We have so many levels of backups, this
is really unnecessary, but it makes me feel good.

And finally just because the data is worth millions of dollars, I also use
NetBackup to write tapes from the secondary server.  This way, nobody could
ever blame me if 

Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-17 Thread Bob Friesenhahn

On Wed, 17 Mar 2010, Kashif Mumtaz wrote:


but on UFS file system averge busy is 50% ,

any idea why ZFS makes disk more busy ?


Clearly there are many more reads per second occuring on the zfs 
filesystem than the ufs filesystem.  Assuming that the 
application-level requests are really the same, this suggests that the 
system does not have enough RAM installed in order to cache the 
"working set".  Another issue could be fileystem block size since zfs 
defaults the block size to 128K but some applications (e.g. database) 
work better with 4K, 8K, or 16K block size.


Regardless, I suggest measuring the statistics with a 30 second 
interval rather than 5 seconds since zfs is assured to do whatever it 
does within 30 seconds.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread David Dyer-Bennet

On Wed, March 17, 2010 06:28, Khyron wrote:

> The Best Practices Guide is also very clear about send and receive
> NOT being designed explicitly for backup purposes.  I find it odd
> that so many people seem to want to force this point.  ZFS appears
> to have been designed to allow the use of well known tools that are
> available today to perform backups and restores.  I'm not sure how
> many people are actually using NFS v4 style ACLs, but those people
> have the most to worry about when it comes to using tar or NetBackup
> or Networker or Amanda or Bacula or star to backup ZFS file systems.
> Everyone else, which appears to be the majority of people, have many
> tools to choose from, tools they've used for a long time in various
> environments on various platforms.  The learning curve doesn't
> appear to be as steep as most people seem to make it out to be.  I
> honestly think many people may be making this issue more complex
> than it needs to be.

Anybody using the in-kernel CIFS is also concerned with the ACLs, and I
think that's the big issue.

Also, snapshots.  For my purposes, I find snapshots at some level a very
important part of the backup process.  My old scheme was to rsync from
primary ZFS pool to backup ZFS pool, and snapshot both pools (with
somewhat different retention schedules).  My new scheme, forced by the ACL
issues, is to use ZFS send/receive (but I haven't been able to make it
work yet), including snapshots.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-17 Thread Giovanni Tirloni
On Wed, Mar 17, 2010 at 11:23 AM,  wrote:

>
>
> >IMHO, what matters is that pretty much everything from the disk controller
> >to the CPU and network interface is advertised in power-of-2 terms and
> disks
> >sit alone using power-of-10. And students are taught that computers work
> >with bits and so everything is a power of 2.
>
> That is simply not true:
>
>Memory: power of 2(bytes)
>Network: power of 10  (bits/s))
>Disk: power of 10 (bytes)
>CPU Frequency: power of 10 (cycles/s)
>SD/Flash/..: power of 10 (bytes)
>Bus speed: power of 10
>
> Main memory is the odd one out.
>

My bad on generalizing that information.

Perhaps the software stack dealing with disks should be changed to use
power-of-10. Unlikely too.

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to reserve space for a file on a zfs filesystem

2010-03-17 Thread Giovanni Tirloni
On Wed, Mar 17, 2010 at 6:43 AM, wensheng liu wrote:

> Hi all,
>
> How to reserve a space on a zfs filesystem? For mkfiel or dd will write
> data to the
> block, it is time consuming. whiel "mkfile -n" will not really hold the
> space.
> And zfs's set reservation only work on filesytem, not on file?
>
> Could anyone provide a solution for this?
>

Do you mean you want files created with "mkfile -n" to count against the
total filesystem usage ?

Since they've not allocated any blocks yet, ZFS would need to know about
each spare file and read it's metadata before enforcing the filesystem
reservation. I'm not sure it's doable.

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-17 Thread Bob Friesenhahn

On Tue, 16 Mar 2010, Tonmaus wrote:


None of them is active on that pool or in any existing file system. 
Maybe the issue is particular to RAIDZ2, which is comparably recent. 
On that occasion: does anybody know if ZFS reads all parities during 
a scrub? Wouldn't it be sufficient for stale corruption detection to 
read only one parity set unless an error occurs there?


Zfs scrub reads and verifies everything.  That is it's purpose.

What I am trying to say is that CPU may become the bottleneck for 
I/O in case of parity-secured stripe sets. Mirrors and simple stripe 
sets have almost 0 impact on CPU. So far at least my observations. 
Moreover, x86 processors not optimized for that kind of work as much 
as i.e. an Areca controller with a dedicated XOR chip is, in its 
targeted field.


It would be astonishing if the XOR algorithm consumed very much CPU 
with modern CPUs.  Zfs's own checksum is more brutal than XOR.  The 
scrub re-assembles full (usually 128K) data blocks and verifies the 
zfs checksum.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-17 Thread Edho P Arief
On Wed, Mar 17, 2010 at 9:09 PM, Giovanni Tirloni  wrote:
> IMHO, what matters is that pretty much everything from the disk controller
> to the CPU and network interface is advertised in power-of-2 terms and disks
> sit alone using power-of-10. And students are taught that computers work
> with bits and so everything is a power of 2.
>

Apparently someone wrote false information on Wikipedia [1].

[1] http://en.wikipedia.org/wiki/Data_rate_units#Examples

-- 
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted

2010-03-17 Thread Andrew
Hi all,

Great news - by attaching an identical size RDM to the server and then grabbing 
the first 128K using the command you specified Ross

dd if=/dev/rdsk/c8t4d0p0 of=~/disk.out bs=512 count=256

we then proceeded to inject this into the faulted RDM and lo and behold the 
volume recovered!

dd if=~/disk.out of=/dev/rdsk/c8t5d0p0 bs=512 count=256

Thanks for your help!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Performance on SATA Deive

2010-03-17 Thread Kashif Mumtaz
hi ,

I'm using sun T1000 machines one machine is installed Solaris 10 with UFS and 
other system with ZFS file system , ZFS machine is performing slow . Running 
following commands on both systems shows Disk get busy immediatly to 100% 

 ZFS MACHINE
find / > /dev/null 2>&1 &
iostat -xnmpz 5
[r...@zfs-serv ktahir]# iostat -xnmpz 5
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.40.2   12.32.2  0.0  0.06.53.9   0   0 c0d0
0.00.00.00.0  0.0  0.00.00.9   0   0 
192.168.150.131:/export/home2
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   86.40.0 5527.40.0  0.0  1.00.0   11.2   0  97 c0d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   87.40.0 5593.70.0  0.0  1.00.0   11.1   0  96 c0d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   85.20.0 5452.80.0  0.0  1.00.0   11.3   0  96 c0d0


but on UFS file system averge busy is 50% ,

any idea why ZFS makes disk more busy ?

any idea
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-17 Thread Casper . Dik


>IMHO, what matters is that pretty much everything from the disk controller
>to the CPU and network interface is advertised in power-of-2 terms and disks
>sit alone using power-of-10. And students are taught that computers work
>with bits and so everything is a power of 2.

That is simply not true:

Memory: power of 2(bytes)
Network: power of 10  (bits/s))
Disk: power of 10 (bytes)
CPU Frequency: power of 10 (cycles/s)
SD/Flash/..: power of 10 (bytes)
Bus speed: power of 10

Main memory is the odd one out.

>Just last week I had to remind people that a 24-disk JBOD with 1TB disks
>wouldn't provide 24TB of storage since disks show up as 931GB.

Well some will say it's 24T :-)

>It *is* an anomaly and I don't expect it to be fixed.
>
>Perhaps some disk vendor could add more bits to its drives and advertise a
>"real 1TB disk" using power-of-2 and show how people are being misled by
>other vendors that use power-of-10. Highly unlikely but would sure get some
>respect from the storage community.

You've not been misled unless you have your had in the sand for the last
five to ten years.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17.03.2010 15:15, Edward Ned Harvey wrote:
>> I think what you're saying is:  Why bother trying to backup with "zfs
>> send"
>> when the recommended practice, fully supportable, is to use other tools
>> for
>> backup, such as tar, star, Amanda, bacula, etc.   Right?
>>
>> The answer to this is very simple.
>> #1  ...
>> #2  ...
> 
> Oh, one more thing.  "zfs send" is only discouraged if you plan to store the
> data stream and do "zfs receive" at a later date.  
> 
> If instead, you are doing "zfs send | zfs receive" onto removable media, or
> another server, where the data is immediately fed through "zfs receive" then
> it's an entirely viable backup technique.

... Fine. Tell me how to make a zpool on LTO-3 tapes. ;)

Hence my earlier questions about the possibility of simply adding a
parameter to the send/receive commands to use the ALREADY BUILT IN code
for providing some sort of FEC to the stream. It _REALLY_ would solve
the "store" problem.

//Svein

- -- 
- +---+---
  /"\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkug5OEACgkQSBMQn1jNM7aO9ACfeRk+IWz+itts93TIoh8vLl1S
aIcAn3A+/YWmqhe1tEUx6UKutz4w0BIy
=MXk7
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Edward Ned Harvey
> I think what you're saying is:  Why bother trying to backup with "zfs
> send"
> when the recommended practice, fully supportable, is to use other tools
> for
> backup, such as tar, star, Amanda, bacula, etc.   Right?
> 
> The answer to this is very simple.
> #1  ...
> #2  ...

Oh, one more thing.  "zfs send" is only discouraged if you plan to store the
data stream and do "zfs receive" at a later date.  

If instead, you are doing "zfs send | zfs receive" onto removable media, or
another server, where the data is immediately fed through "zfs receive" then
it's an entirely viable backup technique.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-17 Thread Giovanni Tirloni
On Wed, Mar 17, 2010 at 9:34 AM, David Dyer-Bennet  wrote:

> On 3/16/2010 23:21, Erik Trimble wrote:
>
>> On 3/16/2010 8:29 PM, David Dyer-Bennet wrote:
>>
>>> On 3/16/2010 17:45, Erik Trimble wrote:
>>>
 David Dyer-Bennet wrote:

> On Tue, March 16, 2010 14:59, Erik Trimble wrote:
>
>  Has there been a consideration by anyone to do a class-action lawsuit
>> for false advertising on this?  I know they now have to include the
>> "1GB
>> = 1,000,000,000 bytes" thing in their specs and somewhere on the box,
>> but just because I say "1 L = 0.9 metric liters" somewhere on the box,
>> it shouldn't mean that I should be able to avertise in huge letters "2
>> L
>> bottle of Coke" on the outside of the package...
>>
>
> I think "giga" is formally defined as a prefix meaning 10^9; that is,
> the
> definition the disk manufacturers are using is the standard metric one
> and
> very probably the one most people expect.  There are international
> standards for these things.
>
> I'm well aware of the history of power-of-two block and disk sizes in
> computers (the first computers I worked with pre-dated that period);
> but I
> think we need to recognize that this is our own weird local usage of
> terminology, and that we can't expect the rest of the world to change
> to
> our way of doing things.
>

 That's RetConn-ing.  The only reason the stupid GiB / GB thing came
 around in the past couple of years is that the disk drive manufacturers
 pushed SI to do it.
 Up until 5 years ago (or so), GigaByte meant a power of 2 to EVERYONE,
 not just us techies.   I would hardly call 40+ years of using the various
 giga/mega/kilo  prefixes as a power of 2 in computer science as
 non-authoritative.  In fact, I would argue that the HD manufacturers don't
 have a leg to stand on - it's not like they were "outside" the field and
 used to the "standard" SI notation of powers of 10.  Nope. They're inside
 the industry, used the powers-of-2 for decades, then suddenly decided to
 "modify" that meaning, as it served their marketing purposes.

>>>
>>> The SI meaning was first proposed in the 1920s, so far as I can tell.
>>>  Our entire history of special usage took place while the SI definition was
>>> in place.  We simply mis-used it.  There was at the time no prefix for what
>>> we actually wanted (not giga then, but mega), so we borrowed and repurposed
>>> mega.
>>>
>>>  Doesn't matter whether the "original" meaning of K/M/G was a
>> power-of-10.  What matters is internal usage in the industry.  And that has
>> been consistent with powers-of-2 for 40+ years.  There has been NO outside
>> understanding that GB = 1 billion bytes until the Storage Industry decided
>> it wanted it that way.  That's pretty much the definition of distorted
>> advertising.
>>
>
> That's simply not true.  The first computer I programmed, an IBM 1620, was
> routinely referred to as having "20K" of core.  That meant 20,000 decimal
> digits; not 20,480.  The other two memory configurations were similarly
> "40K" for 40,000 and "60K" for 60,000.  The first computer I was *paid* for
> programming, the 1401, had "8K" of core, and that was 8,000 locations, not
> 8,192.  This was right on 40 years ago (fall of 1969 when I started working
> on the 1401).  Yes, neither was brand new, but IBM was still leasing them to
> customers (it came in configurations of 4k, 8k, 12k, and I think 16k; been a
> while!).


At this point in history it doesn't matter much who's right or wrong
anymore.

IMHO, what matters is that pretty much everything from the disk controller
to the CPU and network interface is advertised in power-of-2 terms and disks
sit alone using power-of-10. And students are taught that computers work
with bits and so everything is a power of 2.

Just last week I had to remind people that a 24-disk JBOD with 1TB disks
wouldn't provide 24TB of storage since disks show up as 931GB.

It *is* an anomaly and I don't expect it to be fixed.

Perhaps some disk vendor could add more bits to its drives and advertise a
"real 1TB disk" using power-of-2 and show how people are being misled by
other vendors that use power-of-10. Highly unlikely but would sure get some
respect from the storage community.

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Khyron
To be sure, Ed, I'm not asking:

Why bother trying to backup with "zfs send" when there are fully supportable
and
working options available right NOW?

Rather, I am asking:

Why do we want to adapt "zfs send" to do something it was never intended
to do, and probably won't be adapted to do (well, if at all) anytime soon
instead of
optimizing existing technologies for this use case?


But I got it.  "zfs send" is fast.  Let me ask you this, Ed...where do you
"zfs send"
your data to? Another pool?  Does it go to tape eventually?  If so, what is
the setup
such that it goes to tape?  I apologize for asking here, as I'm sure you
described it
in one of the other threads I mentioned, but I'm not able to go digging in
those
threads at the moment.

I ask this because I see an opportunity to kill 2 birds with one stone.
With proper
NDMP support and "zfs send" performance, why can't you get the advantages of

"zfs send" without trying to shoehorn "zfs send" into a use it's not
designed for?

Maybe NDMP support needs to be a higher focus of the ZFS team?  I noticed
not
many people even seem to be asking for it, never mind screaming for it.
However,
I did say this in my original e-mail - that I see NDMP support as being a
way to handle
the calls for "zfs send" to tape.

Maybe we can broaden the conversation at this point.  For all of those who
use
NDMP today to backup Filers, be they NetApp, EMC, or other vendors'
devices...how
is your experience with NDMP?  *IS* anyone using NDMP?  If you have the
option of
using NDMP and you don't, why don't you?  Backing up file servers directly
to tape
seems to be an obvious WIN, so if people aren't doing it, I'm curious why
they aren't.
That's any kind of file server, because (Open)Solaris will increasingly be
applied in this
role.  That was pretty much the goal of the Fishworks team, IIRC.  So this
looks like
an opportunity by Sun (Oracle) to take a neglected backup technology and
make it a
must-have backup technology, by making it integrate smoothly with ZFS and
high
performance.

On Wed, Mar 17, 2010 at 09:37, Edward Ned Harvey wrote:

> > The one thing that I keep thinking, and which I have yet to see
> > discredited, is that
> > ZFS file systems use POSIX semantics.  So, unless you are using
> > specific features
> > (notably ACLs, as Paul Henson is), you should be able to backup those
> > file systems
> > using well known tools.
>
> This is correct.  Many people do backup using tar, star, rsync, etc.
>
>
> > The Best Practices Guide is also very clear about send and receive NOT
> > being
> > designed explicitly for backup purposes.  I find it odd that so many
> > people seem to
> > want to force this point.  ZFS appears to have been designed to allow
> > the use of
> > well known tools that are available today to perform backups and
> > restores.  I'm not
> > sure how many people are actually using NFS v4 style ACLs, but those
> > people have
> > the most to worry about when it comes to using tar or NetBackup or
> > Networker or
> > Amanda or Bacula or star to backup ZFS file systems.  Everyone else,
> > which appears
> > to be the majority of people, have many tools to choose from, tools
> > they've used
> > for a long time in various environments on various platforms.  The
> > learning curve
> > doesn't appear to be as steep as most people seem to make it out to
> > be.  I honestly
> > think many people may be making this issue more complex than it needs
> > to be.
>
> I think what you're saying is:  Why bother trying to backup with "zfs send"
> when the recommended practice, fully supportable, is to use other tools for
> backup, such as tar, star, Amanda, bacula, etc.   Right?
>
> The answer to this is very simple.
> #1  "zfs send" is much faster.  Particularly for incrementals on large
> numbers of files.
> #2  "zfs send" will support every feature of the filesystem, including
> things like filesystem properties, hard links, symlinks, and objects which
> are not files, such as character special objects, fifo pipes, and so on.
> Not to mention ACL's.  If you're considering some other tool (rsync, star,
> etc), you have to read the man pages very carefully to formulate the exact
> backup command, and there's no guarantee you'll find a perfect backup
> command.  There is a certain amount of comfort knowing that the people who
> wrote "zfs send" are the same people who wrote the filesystem.  It's
> simple,
> and with no arguments, and no messing around with man page research, it's
> guaranteed to make a perfect copy of the whole filesystem.
>
> Did I mention fast?  ;-)  Prior to zfs, I backed up my file server via
> rsync.  It's 1TB of mostly tiny files, and it ran for 10 hours every night,
> plus 30 hours every weekend.  Now, I use zfs send, and it runs for an
> average 7 minutes every night, depending on how much data changed that day,
> and I don't know - 20 hours I guess - every month.
>
>


-- 
"You can choose your friends, you can choose the deals." - Equity Private

"I

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Edward Ned Harvey
> The one thing that I keep thinking, and which I have yet to see
> discredited, is that
> ZFS file systems use POSIX semantics.  So, unless you are using
> specific features
> (notably ACLs, as Paul Henson is), you should be able to backup those
> file systems
> using well known tools.  

This is correct.  Many people do backup using tar, star, rsync, etc.


> The Best Practices Guide is also very clear about send and receive NOT
> being
> designed explicitly for backup purposes.  I find it odd that so many
> people seem to
> want to force this point.  ZFS appears to have been designed to allow
> the use of
> well known tools that are available today to perform backups and
> restores.  I'm not
> sure how many people are actually using NFS v4 style ACLs, but those
> people have
> the most to worry about when it comes to using tar or NetBackup or
> Networker or
> Amanda or Bacula or star to backup ZFS file systems.  Everyone else,
> which appears
> to be the majority of people, have many tools to choose from, tools
> they've used
> for a long time in various environments on various platforms.  The
> learning curve
> doesn't appear to be as steep as most people seem to make it out to
> be.  I honestly
> think many people may be making this issue more complex than it needs
> to be.

I think what you're saying is:  Why bother trying to backup with "zfs send"
when the recommended practice, fully supportable, is to use other tools for
backup, such as tar, star, Amanda, bacula, etc.   Right?

The answer to this is very simple.
#1  "zfs send" is much faster.  Particularly for incrementals on large
numbers of files.
#2  "zfs send" will support every feature of the filesystem, including
things like filesystem properties, hard links, symlinks, and objects which
are not files, such as character special objects, fifo pipes, and so on.
Not to mention ACL's.  If you're considering some other tool (rsync, star,
etc), you have to read the man pages very carefully to formulate the exact
backup command, and there's no guarantee you'll find a perfect backup
command.  There is a certain amount of comfort knowing that the people who
wrote "zfs send" are the same people who wrote the filesystem.  It's simple,
and with no arguments, and no messing around with man page research, it's
guaranteed to make a perfect copy of the whole filesystem.

Did I mention fast?  ;-)  Prior to zfs, I backed up my file server via
rsync.  It's 1TB of mostly tiny files, and it ran for 10 hours every night,
plus 30 hours every weekend.  Now, I use zfs send, and it runs for an
average 7 minutes every night, depending on how much data changed that day,
and I don't know - 20 hours I guess - every month.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Khyron
Exactly!

This is what I meant, at least when it comes to backing up ZFS datasets.
There
are tools available NOW, such as Star, which will backup ZFS datasets due to
the
POSIX nature of those datasets.  As well, Amanda, Bacula, NetBackup,
Networker
and probably some others I missed.  Re-inventing the wheel is not required
in these
cases.

As I said in my original e-mail, Star is probably perfect once it gets ZFS
(e.g. NFS v4)
ACL and NDMP support (e.g. accepting NDMP input streams and ouputting onto
tape).

ZVOLs are the piece I'm still not sure about though.  So I repeat my
question: how
are people backing up ZVOLs today?  (If Star could do ZVOLs as well as NDMP
and
ZFS ACLs, then it literally *is* perfect.)

On Wed, Mar 17, 2010 at 09:01, Joerg Schilling <
joerg.schill...@fokus.fraunhofer.de> wrote:

> Stephen Bunn  wrote:
>
> > between our machine's pools and our backup server pool.  It would be
> > nice, however, if some sort of enterprise level backup solution in the
> > style of ufsdump was introduced to ZFS.
>
> Star can do the same as ufsdump does but independent of OS and filesystem.
>
> Star is currently missing support for ZFS ACLs and  for extended attributes
> from Solaris. If you are interested, make a test. If you need support for
> ZFS
> ACLs or Solaris extendd attributes, send me a note.
>
> Jörg
>
> --
>  
> EMail:jo...@schily.isdn.cs.tu-berlin.de(home)
>  Jörg Schilling D-13353 Berlin
>   j...@cs.tu-berlin.de(uni)
>   joerg.schill...@fokus.fraunhofer.de (work) Blog:
> http://schily.blogspot.com/
>  URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can we get some documentation on iSCSI sharing after comstar took over?

2010-03-17 Thread Ross Walker





On Mar 17, 2010, at 2:30 AM, Erik Ableson  wrote:



On 17 mars 2010, at 00:25, Svein Skogen  wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 16.03.2010 22:31, erik.ableson wrote:


On 16 mars 2010, at 21:00, Marc Nicholas wrote:

On Tue, Mar 16, 2010 at 3:16 PM, Svein Skogen mailto:sv...@stillbilde.net>> wrote:



I'll write you a Perl script :)


  I think there are ... several people that'd like a script that  
gave us
  back some of the ease of the old shareiscsi one-off, instead of  
having

  to spend time on copy-and-pasting GUIDs they have ... no real use
  for. ;)


I'll try and knock something up in the next few days, then!


Try this :

http://www.infrageeks.com/groups/infrageeks/wiki/56503/zvol2iscsi.html



Thank you! :)

Mind if I (after some sleep) look at extending your script a  
little? Of

course with feedback of the changes I make?

//Svein

Certainly! I just whipped that up since I was testing out a pile of  
clients with different volumes and got tired of going through all  
the steps so anything to make it more complete would be useful.


How about a perl script that emulates the functionality of iscsitadm  
so share=iscsi works as expected?


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Joerg Schilling
Stephen Bunn  wrote:

> between our machine's pools and our backup server pool.  It would be 
> nice, however, if some sort of enterprise level backup solution in the 
> style of ufsdump was introduced to ZFS.

Star can do the same as ufsdump does but independent of OS and filesystem.

Star is currently missing support for ZFS ACLs and  for extended attributes 
from Solaris. If you are interested, make a test. If you need support for ZFS 
ACLs or Solaris extendd attributes, send me a note.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17.03.2010 13:31, Svein Skogen wrote:
> On 17.03.2010 12:28, Khyron wrote:
>> Note to readers: There are multiple topics discussed herein.  Please
>> identify which

*SNIP*

> 
> How does backing up the NFSv4 acls help you backup up a zvol (shared for
> iSCSI)? Please enlighten me.
> 
> //Svein
> 

Again I replied before fully reading the mail I replied to (had I read
it properly I'd have spotted you already mentioned this).

My apologies to the list for the redundant posting.

//Svein
- -- 
- +---+---
  /"\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkugzLwACgkQSBMQn1jNM7advQCeN90sNaKKyYMVERnXFXDBtYD0
KbcAn2rOSt+/b2WfNi3ZHzs5c03+6Y2r
=08EZ
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-17 Thread David Dyer-Bennet

On 3/16/2010 23:21, Erik Trimble wrote:

On 3/16/2010 8:29 PM, David Dyer-Bennet wrote:

On 3/16/2010 17:45, Erik Trimble wrote:

David Dyer-Bennet wrote:

On Tue, March 16, 2010 14:59, Erik Trimble wrote:


Has there been a consideration by anyone to do a class-action lawsuit
for false advertising on this?  I know they now have to include 
the "1GB

= 1,000,000,000 bytes" thing in their specs and somewhere on the box,
but just because I say "1 L = 0.9 metric liters" somewhere on the 
box,
it shouldn't mean that I should be able to avertise in huge 
letters "2 L

bottle of Coke" on the outside of the package...


I think "giga" is formally defined as a prefix meaning 10^9; that 
is, the
definition the disk manufacturers are using is the standard metric 
one and

very probably the one most people expect.  There are international
standards for these things.

I'm well aware of the history of power-of-two block and disk sizes in
computers (the first computers I worked with pre-dated that 
period); but I

think we need to recognize that this is our own weird local usage of
terminology, and that we can't expect the rest of the world to 
change to

our way of doing things.


That's RetConn-ing.  The only reason the stupid GiB / GB thing came 
around in the past couple of years is that the disk drive 
manufacturers pushed SI to do it.
Up until 5 years ago (or so), GigaByte meant a power of 2 to 
EVERYONE, not just us techies.   I would hardly call 40+ years of 
using the various giga/mega/kilo  prefixes as a power of 2 in 
computer science as non-authoritative.  In fact, I would argue that 
the HD manufacturers don't have a leg to stand on - it's not like 
they were "outside" the field and used to the "standard" SI notation 
of powers of 10.  Nope. They're inside the industry, used the 
powers-of-2 for decades, then suddenly decided to "modify" that 
meaning, as it served their marketing purposes.


The SI meaning was first proposed in the 1920s, so far as I can 
tell.  Our entire history of special usage took place while the SI 
definition was in place.  We simply mis-used it.  There was at the 
time no prefix for what we actually wanted (not giga then, but mega), 
so we borrowed and repurposed mega.


Doesn't matter whether the "original" meaning of K/M/G was a 
power-of-10.  What matters is internal usage in the industry.  And 
that has been consistent with powers-of-2 for 40+ years.  There has 
been NO outside understanding that GB = 1 billion bytes until the 
Storage Industry decided it wanted it that way.  That's pretty much 
the definition of distorted advertising.


That's simply not true.  The first computer I programmed, an IBM 1620, 
was routinely referred to as having "20K" of core.  That meant 20,000 
decimal digits; not 20,480.  The other two memory configurations were 
similarly "40K" for 40,000 and "60K" for 60,000.  The first computer I 
was *paid* for programming, the 1401, had "8K" of core, and that was 
8,000 locations, not 8,192.  This was right on 40 years ago (fall of 
1969 when I started working on the 1401).  Yes, neither was brand new, 
but IBM was still leasing them to customers (it came in configurations 
of 4k, 8k, 12k, and I think 16k; been a while!).

--

David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17.03.2010 12:28, Khyron wrote:
> Note to readers: There are multiple topics discussed herein.  Please
> identify which
> idea(s) you are responding to, should you respond.  Also make sure to
> take in all of
> this before responding.  Something you want to discuss may already be
> covered at
> a later point in this e-mail, including NDMP and ZFS ACLs.  It's lng.
> 
> It seems to me that something is being overlooked (either by myself or
> others) in all
> of these discussions about backing up ZFS pools...
> 
> The one thing that I keep thinking, and which I have yet to see
> discredited, is that
> ZFS file systems use POSIX semantics.  So, unless you are using specific
> features
> (notably ACLs, as Paul Henson is), you should be able to backup those
> file systems
> using well known tools.  The ZFS Best Practices Guide speaks to this in
> section 4.4
> (specifically 4.4.3[1]) and there have been various posters who have
> spoken of using
> other tools.  (Star comes to mind, most prominently.)
> 
> The Best Practices Guide is also very clear about send and receive NOT
> being
> designed explicitly for backup purposes.  I find it odd that so many
> people seem to
> want to force this point.  ZFS appears to have been designed to allow
> the use of
> well known tools that are available today to perform backups and
> restores.  I'm not
> sure how many people are actually using NFS v4 style ACLs, but those
> people have
> the most to worry about when it comes to using tar or NetBackup or
> Networker or
> Amanda or Bacula or star to backup ZFS file systems.  Everyone else,
> which appears
> to be the majority of people, have many tools to choose from, tools
> they've used
> for a long time in various environments on various platforms.  The
> learning curve
> doesn't appear to be as steep as most people seem to make it out to be. 
> I honestly
> think many people may be making this issue more complex than it needs to be.
> 
> Maybe the people having the most problems are those who are new to
> Solaris, but
> if you have any real *nix experience, Solaris shouldn't be that
> difficult to figure out,
> especially for those with System V experience.  The Linux folks?  Well,
> I sorta feel
> sorry for you and I sorta don't.
> 
> So, am I missing something?  It wouldn't surprise me if I am.  What am I
> missing?
> 
> The other things I have been thinking about are NDMP support and what
> tools out
> there support NFS v4 ACLs. 
> 
> Has anyone successfully used NDMP support with ZFS?  If so, what did you
> do?  How
> did you configure your system, including any custom coding you did? 
> From the looks
> of the NDMP project on os.org , NDMP was integrated in
> build 102[3] but it appears
> to only be NDMP v4 not the latest, v5.  Maybe NDMP support would placate
> some of
> those screaming for the send stream to be a tape backup format?
> 
> As for ACLs[2], the list of tools supporting NFS v4 ACLs seems to be
> pretty small.  I
> plan to spend some quality time with RFC 3530 to get my head around NFS
> v4, and
> ACLs in particular.  star seems to be fairly adept, with the exception
> of the NFS v4
> ACL support.  Hopefully that is forthcoming?  Again, I think those
> people who are
> not using ZFS ACLs can probably perform actual tape backups (should they
> choose
> to) with existing tools.  If I'm mistaken or missing something, I invite
> someone to
> please point it out.
> 
> Finally, there's backup of ZVOLs.  I don't know what the commercial tool
> support
> for backing up ZVOLs looks like but I know this is the *perfect* place
> for NDMP. 
> Backing up ZVOLs should be priority #1 for NDMP support in
> (Open)Solaris, I think. 
> Looking through the symbols in libzfs.so, I don't see anything
> specifically related to
> backup of ZVOLs in the existing code.  How are people handling ZVOL backups
> today?
> 
> Not to be too flip, but star looks like it might be the perfect tape
> backup software
> if it supported NDMP, NFS v4 ACLs and ZVOLs.  Just thinking out loud...
> 
> [1]
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Using_ZFS_With_Enterprise_Backup_Solutions
> 
> [2] http://docs.sun.com/app/docs/doc/819-5461/ftyxi?l=en&a=view
> 
> 
> [3] http://hub.opensolaris.org/bin/view/Project+ndmp/
> 
> Aside: I see so many posts to this list about backup strategy for ZFS
> file systems,
> and I continue to be amazed by how few people check the archives for
> previous
> discussions before they start a new one.  So many of the conversations are
> repeated over and over, with good information being spread over multiple
> threads? 
> I personally find it interesting that so few people read first before
> posting.  Few
> even seem to bother to do so much (little?) as a Google search which
> would yield
> several previous discussions on the topic of ZFS pool backups to tape.
> 

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Stephen Bunn

On 03/17/2010 08:28 PM, Khyron wrote:

The Best Practices Guide is also very clear about send and receive NOT
being
designed explicitly for backup purposes.  I find it odd that so many
people seem to
want to force this point.  ZFS appears to have been designed to allow
the use of
well known tools that are available today to perform backups and
restores.  I'm not
sure how many people are actually using NFS v4 style ACLs, but those
people have
the most to worry about when it comes to using tar or NetBackup or
Networker or
Amanda or Bacula or star to backup ZFS file systems.  Everyone else,
which appears
to be the majority of people, have many tools to choose from, tools
they've used
for a long time in various environments on various platforms.  The
learning curve
doesn't appear to be as steep as most people seem to make it out to
be.  I honestly
think many people may be making this issue more complex than it needs
to be.

Maybe the people having the most problems are those who are new to
Solaris, but
if you have any real *nix experience, Solaris shouldn't be that
difficult to figure out,
especially for those with System V experience.  The Linux folks?
Well, I sorta feel
sorry for you and I sorta don't.


--
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
My only comment would be that your assumption is that most people are 
backing up their stuff via gzipped archives or using 3rd party 
solutions. I would argue (at least in my environment) that the majority 
of backups pre-zfs was doing using ufsdump.  When ZFS came along it 
obsoleted ufsdump and their was no direct replacement.  For my situation 
we have created some custom python code to wrap send/receive stuff 
between our machine's pools and our backup server pool.  It would be 
nice, however, if some sort of enterprise level backup solution in the 
style of ufsdump was introduced to ZFS.


--
Steve
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to reserve space for a file on a zfs filesystem

2010-03-17 Thread wensheng liu
Hi all,

How to reserve a space on a zfs filesystem? For mkfiel or dd will write
data to the
block, it is time consuming. whiel "mkfile -n" will not really hold the
space.
And zfs's set reservation only work on filesytem, not on file?

Could anyone provide a solution for this?

Thanks very much
Vincent
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-17 Thread Khyron
Note to readers: There are multiple topics discussed herein.  Please
identify which
idea(s) you are responding to, should you respond.  Also make sure to take
in all of
this before responding.  Something you want to discuss may already be
covered at
a later point in this e-mail, including NDMP and ZFS ACLs.  It's lng.

It seems to me that something is being overlooked (either by myself or
others) in all
of these discussions about backing up ZFS pools...

The one thing that I keep thinking, and which I have yet to see discredited,
is that
ZFS file systems use POSIX semantics.  So, unless you are using specific
features
(notably ACLs, as Paul Henson is), you should be able to backup those file
systems
using well known tools.  The ZFS Best Practices Guide speaks to this in
section 4.4
(specifically 4.4.3[1]) and there have been various posters who have spoken
of using
other tools.  (Star comes to mind, most prominently.)

The Best Practices Guide is also very clear about send and receive NOT being

designed explicitly for backup purposes.  I find it odd that so many people
seem to
want to force this point.  ZFS appears to have been designed to allow the
use of
well known tools that are available today to perform backups and restores.
I'm not
sure how many people are actually using NFS v4 style ACLs, but those people
have
the most to worry about when it comes to using tar or NetBackup or Networker
or
Amanda or Bacula or star to backup ZFS file systems.  Everyone else, which
appears
to be the majority of people, have many tools to choose from, tools they've
used
for a long time in various environments on various platforms.  The learning
curve
doesn't appear to be as steep as most people seem to make it out to be.  I
honestly
think many people may be making this issue more complex than it needs to be.

Maybe the people having the most problems are those who are new to Solaris,
but
if you have any real *nix experience, Solaris shouldn't be that difficult to
figure out,
especially for those with System V experience.  The Linux folks?  Well, I
sorta feel
sorry for you and I sorta don't.

So, am I missing something?  It wouldn't surprise me if I am.  What am I
missing?

The other things I have been thinking about are NDMP support and what tools
out
there support NFS v4 ACLs.

Has anyone successfully used NDMP support with ZFS?  If so, what did you
do?  How
did you configure your system, including any custom coding you did?  From
the looks
of the NDMP project on os.org, NDMP was integrated in build 102[3] but it
appears
to only be NDMP v4 not the latest, v5.  Maybe NDMP support would placate
some of
those screaming for the send stream to be a tape backup format?

As for ACLs[2], the list of tools supporting NFS v4 ACLs seems to be pretty
small.  I
plan to spend some quality time with RFC 3530 to get my head around NFS v4,
and
ACLs in particular.  star seems to be fairly adept, with the exception of
the NFS v4
ACL support.  Hopefully that is forthcoming?  Again, I think those people
who are
not using ZFS ACLs can probably perform actual tape backups (should they
choose
to) with existing tools.  If I'm mistaken or missing something, I invite
someone to
please point it out.

Finally, there's backup of ZVOLs.  I don't know what the commercial tool
support
for backing up ZVOLs looks like but I know this is the *perfect* place for
NDMP.
Backing up ZVOLs should be priority #1 for NDMP support in (Open)Solaris, I
think.
Looking through the symbols in libzfs.so, I don't see anything specifically
related to
backup of ZVOLs in the existing code.  How are people handling ZVOL backups
today?

Not to be too flip, but star looks like it might be the perfect tape backup
software
if it supported NDMP, NFS v4 ACLs and ZVOLs.  Just thinking out loud...

[1]
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Using_ZFS_With_Enterprise_Backup_Solutions

[2] http://docs.sun.com/app/docs/doc/819-5461/ftyxi?l=en&a=view

[3] http://hub.opensolaris.org/bin/view/Project+ndmp/

Aside: I see so many posts to this list about backup strategy for ZFS file
systems,
and I continue to be amazed by how few people check the archives for
previous
discussions before they start a new one.  So many of the conversations are
repeated over and over, with good information being spread over multiple
threads?
I personally find it interesting that so few people read first before
posting.  Few
even seem to bother to do so much (little?) as a Google search which would
yield
several previous discussions on the topic of ZFS pool backups to tape.

Oh well.

-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedupratio riddle

2010-03-17 Thread Paul van der Zwan

On 17 mrt 2010, at 10:56, zfs ml wrote:

> On 3/17/10 1:21 AM, Paul van der Zwan wrote:
>> 
>> On 16 mrt 2010, at 19:48, valrh...@gmail.com wrote:
>> 
>>> Someone correct me if I'm wrong, but it could just be a coincidence. That 
>>> is, perhaps the data that you copied happens to lead to a dedup ratio 
>>> relative to the data that's already on there. You could test this out by 
>>> copying a few gigabytes of data you know is unique (like maybe a DVD video 
>>> file or something), and that should change the dedup ratio.
>> 
>> The first copy of that data was unique and even dedup is switched off for 
>> the entire pool so it seems a bug in the calculation of the
>> dedupratio or it used a method that is giving unexpected results.
>> 
>>  Paul
> 
> beadm list -a
> and/or other snapshots that were taken before turning off dedup?

Possibly but that should not matter. If I triple the amount of data in the 
pool, with dedup switch off, the dedupratio
should IMHO change because the amount of non-deduped data has changed.

Paul
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedupratio riddle

2010-03-17 Thread zfs ml

On 3/17/10 1:21 AM, Paul van der Zwan wrote:


On 16 mrt 2010, at 19:48, valrh...@gmail.com wrote:


Someone correct me if I'm wrong, but it could just be a coincidence. That is, 
perhaps the data that you copied happens to lead to a dedup ratio relative to 
the data that's already on there. You could test this out by copying a few 
gigabytes of data you know is unique (like maybe a DVD video file or 
something), and that should change the dedup ratio.


The first copy of that data was unique and even dedup is switched off for the 
entire pool so it seems a bug in the calculation of the
dedupratio or it used a method that is giving unexpected results.

Paul


beadm list -a
and/or other snapshots that were taken before turning off dedup?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-17 Thread Casper . Dik

>Carson Gaspar wrote:
>>> Not quite. 
>>> 11 x 10^12 =~ 10.004 x (1024^4).
>>>
>>> So, the 'zpool list' is right on, at "10T" available.
>>
>> Duh, I was doing GiB math (y = x * 10^9 / 2^20), not TiB math (y = x * 
>> 10^12 / 2^40).
>>
>> Thanks for the correction.
>>
>You're welcome. :-)
>
>
>On a not-completely-on-topic note:
>
>Has there been a consideration by anyone to do a class-action lawsuit 
>for false advertising on this?  I know they now have to include the "1GB 
>= 1,000,000,000 bytes" thing in their specs and somewhere on the box, 
>but just because I say "1 L = 0.9 metric liters" somewhere on the box, 
>it shouldn't mean that I should be able to avertise in huge letters "2 L 
>bottle of Coke" on the outside of the package...

I think such attempts have been done and I think one was settled by 
Western Digital.

https://www.wdc.com/settlement/docs/document20.htm

This was in 2006.

I was apparently part of the 'class' as I had a disk registered; I think 
they gave some software.

See also:

http://en.wikipedia.org/wiki/Binary_prefix

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-17 Thread Roland Rambau

Eric,

in my understanding ( which I learned from more qualified people
but I may be mistaken anyway ), whenever we discuss a transfer rate
like  x Mb/s, y GB/s or z PB/d, the M, G, T or P refers to the
frequency and not to the data.

1 MB/s means  "transfer bytes at 1 MHz", NOT "transfer megabytes at 1Hz"

therefor its 1'000'000 B/s  ( strictly speaking )


Of course usually some protocol overhead is much larger and so the small
1000:1024 difference is irrelevant anyway and can+will be neglected.

  -- Roland





Am 17.03.2010 04:45, schrieb Erik Trimble:

On 3/16/2010 4:23 PM, Roland Rambau wrote:

Eric,

careful:

Am 16.03.2010 23:45, schrieb Erik Trimble:


Up until 5 years ago (or so), GigaByte meant a power of 2 to EVERYONE,
not just us techies. I would hardly call 40+ years of using the various
giga/mega/kilo prefixes as a power of 2 in computer science as
non-authoritative.


How long does it take to transmit 1 TiB over a 1 GB/sec tranmission
link, assuming no overhead ?

See ?

hth

-- Roland



I guess folks have gotten lazy all over.

Actually, for networking, it's all "GigaBIT", but I get your meaning.
Which is why it's all properly labeled "1Gb" Ethernet, not "1GB" ethernet.

That said, I'm still under the impression that Giga = 1024^3 for
networking, just like Mega = 1024^2. After all, it's 100Mbit Ethernet,
which doesn't mean it runs at 100Mhz.

That is, on Fast Ethernet, I should be sending a max 100 x 1024^2 BITS
per second.


Data amounts are (so far as I know universally) employing powers-of-2,
while frequencies are done in powers-of-10. Thus, baud (for modems) is
in powers-of-10, as are CPU/memory speeds. Memory (*RAM of all sorts),
bus THROUGHPUT (i.e. PCI-E is in powers-of-2), networking throughput,
and even graphics throughput is in powers-of-2.

If they want to use powers-of-10, then use the actual "normal" names,
like graphics performance ratings have done (i.e. 10 billion texels, not
"10 Gigatexels". Take a look at Nvidia's product literature:

http://www.nvidia.com/object/IO_11761.html


It's just the storage vendors using the broken measurements. Bastards!





--


Roland Rambau Server and Solution Architects
Principal Field Technologist  Global Systems Engineering
Phone: +49-89-46008-2520  Mobile:+49-172-84 58 129
Fax:   +49-89-46008-  mailto:roland.ram...@sun.com

Sitz der Gesellschaft: Sun Microsystems GmbH,
Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028;
Geschäftsführer: Thomas Schröder
*** UNIX ** /bin/sh * FORTRAN **
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedupratio riddle

2010-03-17 Thread Paul van der Zwan

On 16 mrt 2010, at 19:48, valrh...@gmail.com wrote:

> Someone correct me if I'm wrong, but it could just be a coincidence. That is, 
> perhaps the data that you copied happens to lead to a dedup ratio relative to 
> the data that's already on there. You could test this out by copying a few 
> gigabytes of data you know is unique (like maybe a DVD video file or 
> something), and that should change the dedup ratio.

The first copy of that data was unique and even dedup is switched off for the 
entire pool so it seems a bug in the calculation of the
dedupratio or it used a method that is giving unexpected results.

Paul

> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedupratio riddle

2010-03-17 Thread Paul van der Zwan

On 16 mrt 2010, at 19:48, valrh...@gmail.com wrote:

> Someone correct me if I'm wrong, but it could just be a coincidence. That is, 
> perhaps the data that you copied happens to lead to a dedup ratio relative to 
> the data that's already on there. You could test this out by copying a few 
> gigabytes of data you know is unique (like maybe a DVD video file or 
> something), and that should change the dedup ratio.

The first copy of that data was unique and even dedup is switched off for the 
entire pool so it seems a bug in the calculation of the
dedupratio or it used a method that is giving unexpected results.

Paul

> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss