Re: [zfs-discuss] zfs rewrite?

2007-02-15 Thread Matthew Ahrens

Pawel Jakub Dawidek wrote:


What do you guys think about implementing 'zfs/zpool rewrite' command?
It'll read every block older than the date when the command was executed
and write it again (using standard ZFS COW mechanism, simlar to how
resilvering works, but the data is read from the same disk it is written to).


Yeah, that would be great, and in fact we are implementing such a thing 
right now (to support pool shrinkage, among other features).  The tricky 
part is dealing with block pointers that appear in multiple places (eg, 
snapshots and clones).  Having rewrite everything result in more space 
being used would not be acceptable.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-28 Thread Pawel Jakub Dawidek
On Fri, Jan 26, 2007 at 06:08:50PM -0800, Darren Dunham wrote:
  What do you guys think about implementing 'zfs/zpool rewrite' command?
  It'll read every block older than the date when the command was executed
  and write it again (using standard ZFS COW mechanism, simlar to how
  resilvering works, but the data is read from the same disk it is written to=
  ).
 
 #1 How do you control I/O overhead?

The same way it is handled for scrub and resilver.

 #2 Snapshot blocks are never rewritten at the moment.  Most of your
suggestions seem to imply working on the live data, but doing that
for snapshots as well might be tricky. 

Good point, see below.

  3. I created file system with huge amount of data, where most of the
  data is read-only. I change my server from intel to sparc64 machine.
  Adaptive endianess only change byte order to native on write and because
  file system is mostly read-only, it'll need to byteswap all the time.
  And here comes 'zfs rewrite'!
 
 It's only the metadata that is modified anyway, not the file data.  I
 would hope that this could be done more easily than a full tree rewrite
 (and again the issue with snapshots).  Also, the overhead there probably
 isn't going to be very high (since the metadata will be cached in most
 cases).  

Agreed. Probably in this case there should be rewrite-only-metadata
mode. I agree the overhead is probably not high, but on the other hand,
I'm quite sure there are workload, which will see the difference, eg.
'find / -name something'.

 Other than that, I'm guessing something like this will be necessary to
 implement disk evacuation/removal.  If you have to rewrite data from one
 disk to elsewhere in the pool, then rewriting the entire tree shouldn't
 be much harder.

How did I forget about this one?:) That's right. I belive ZFS will gain
such ability at some point and rewrite functionality fits very nice
here: mark the disk/mirror/raid-z as no-more-writes and start rewrite
process (probably only limited to this entity). To implement such
functionality there also has to be a way to migrate snapshot data, so
sooner or later there will be a need for moving snapshot blocks.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpsIUZEgB2Q6.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-28 Thread Frank Cusack
On January 28, 2007 4:59:48 PM +0100 Pawel Jakub Dawidek [EMAIL PROTECTED] 
wrote:

On Fri, Jan 26, 2007 at 06:08:50PM -0800, Darren Dunham wrote:

 3. I created file system with huge amount of data, where most of the
 data is read-only. I change my server from intel to sparc64 machine.
 Adaptive endianess only change byte order to native on write and
 because file system is mostly read-only, it'll need to byteswap all
 the time. And here comes 'zfs rewrite'!

It's only the metadata that is modified anyway, not the file data.  I
would hope that this could be done more easily than a full tree rewrite
(and again the issue with snapshots).  Also, the overhead there probably
isn't going to be very high (since the metadata will be cached in most
cases).


Agreed. Probably in this case there should be rewrite-only-metadata
mode. I agree the overhead is probably not high, but on the other hand,
I'm quite sure there are workload, which will see the difference, eg.
'find / -name something'.


I'd imagine even for that it wouldn't matter.  The I/O time will dwarf
any time spent byte-swapping.  Easily tested though.  Make sure you
set atime=off so that your find isn't causing write I/O.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-27 Thread Toby Thain


On 27-Jan-07, at 4:57 AM, Frank Cusack wrote:

On January 27, 2007 12:27:17 AM -0200 Toby Thain  
[EMAIL PROTECTED] wrote:

On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:

3. I created file system with huge amount of data, where most of the
data is read-only. I change my server from intel to sparc64 machine.
Adaptive endianess only change byte order to native on write and
because
file system is mostly read-only, it'll need to byteswap all the  
time.

And here comes 'zfs rewrite'!


Why would this help? (Obviously file data is never 'swapped').


Metadata (incl checksums?) still has to be byte-swapped.


I'm aware, but is this really ever going to be an issue?

--T


Or would
atime updates also force a metadata update?  Or am I totally mistaken.

-frank


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-27 Thread Frank Cusack

On January 27, 2007 6:15:29 AM -0200 Toby Thain [EMAIL PROTECTED] wrote:


On 27-Jan-07, at 4:57 AM, Frank Cusack wrote:


On January 27, 2007 12:27:17 AM -0200 Toby Thain
[EMAIL PROTECTED] wrote:

On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:

3. I created file system with huge amount of data, where most of the
data is read-only. I change my server from intel to sparc64 machine.
Adaptive endianess only change byte order to native on write and
because
file system is mostly read-only, it'll need to byteswap all the
time.
And here comes 'zfs rewrite'!


Why would this help? (Obviously file data is never 'swapped').


Metadata (incl checksums?) still has to be byte-swapped.


I'm aware, but is this really ever going to be an issue?


Well, it IS extra work.  But yeah, it seems pretty insignificant to me.
-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Darren Dunham
 What do you guys think about implementing 'zfs/zpool rewrite' command?
 It'll read every block older than the date when the command was executed
 and write it again (using standard ZFS COW mechanism, simlar to how
 resilvering works, but the data is read from the same disk it is written to=
 ).

#1 How do you control I/O overhead?

#2 Snapshot blocks are never rewritten at the moment.  Most of your
   suggestions seem to imply working on the live data, but doing that
   for snapshots as well might be tricky. 

 3. I created file system with huge amount of data, where most of the
 data is read-only. I change my server from intel to sparc64 machine.
 Adaptive endianess only change byte order to native on write and because
 file system is mostly read-only, it'll need to byteswap all the time.
 And here comes 'zfs rewrite'!

It's only the metadata that is modified anyway, not the file data.  I
would hope that this could be done more easily than a full tree rewrite
(and again the issue with snapshots).  Also, the overhead there probably
isn't going to be very high (since the metadata will be cached in most
cases).  

Other than that, I'm guessing something like this will be necessary to
implement disk evacuation/removal.  If you have to rewrite data from one
disk to elsewhere in the pool, then rewriting the entire tree shouldn't
be much harder.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Toby Thain


On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:


Hi.

What do you guys think about implementing 'zfs/zpool rewrite' command?
It'll read every block older than the date when the command was  
executed

and write it again (using standard ZFS COW mechanism, simlar to how
resilvering works, but the data is read from the same disk it is  
written to).


I see few situations where it might be useful:

1. My file system is almost full (or not) and I'd like to enable
compression on it. Unfortunately compression will work from now on and
I'd also like to compress already stored data. Here comes 'zfs  
rewrite'!


2. I was bad boy and turned off checksuming. Now I suspect something
corrupts my data and I'd really like to checksum everything. Ok, here
comes 'zfs rewrite'!


In this case you deserve what you get.



3. I created file system with huge amount of data, where most of the
data is read-only. I change my server from intel to sparc64 machine.
Adaptive endianess only change byte order to native on write and  
because

file system is mostly read-only, it'll need to byteswap all the time.
And here comes 'zfs rewrite'!


Why would this help? (Obviously file data is never 'swapped').

--T



4. Not sure how ZFS traverse blocks tree, if it is done based on  
files,

it my be used to move data from one file closer to each other, which
will reduce seek times. Because of the way how ZFS works, the data may
become fragmented and 'zfs rewrite' could be used for defragmentation.

5. Once file system encryption is implemented, this mechanism can be
used to encrypt existing file system and also it can be used to change
encryption key.

What do you think?

--
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Frank Cusack

On January 27, 2007 12:27:17 AM -0200 Toby Thain [EMAIL PROTECTED] wrote:

On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:

3. I created file system with huge amount of data, where most of the
data is read-only. I change my server from intel to sparc64 machine.
Adaptive endianess only change byte order to native on write and
because
file system is mostly read-only, it'll need to byteswap all the time.
And here comes 'zfs rewrite'!


Why would this help? (Obviously file data is never 'swapped').


Metadata (incl checksums?) still has to be byte-swapped.  Or would
atime updates also force a metadata update?  Or am I totally mistaken.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Jeff Bonwick
On Fri, Jan 26, 2007 at 10:57:19PM -0800, Frank Cusack wrote:
 On January 27, 2007 12:27:17 AM -0200 Toby Thain [EMAIL PROTECTED] wrote:
 On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:
 3. I created file system with huge amount of data, where most of the
 data is read-only. I change my server from intel to sparc64 machine.
 Adaptive endianess only change byte order to native on write and
 because
 file system is mostly read-only, it'll need to byteswap all the time.
 And here comes 'zfs rewrite'!
 
 Why would this help? (Obviously file data is never 'swapped').
 
 Metadata (incl checksums?) still has to be byte-swapped.  Or would
 atime updates also force a metadata update?  Or am I totally mistaken.

You're all correct.  File data is never byte-swapped.  Most metadata
needs to be byte-swapped, but it's generally only 1-2% of your space.
So the overhead shouldn't be significant, even if you never rewrite.

An atime update will indeed cause a znode rewrite (unless you run
with zfs set atime=off), so znodes will get rewritten by reads.

The only other non-trivial metadata is the indirect blocks.
All files up to 128k are stored in a single block: ZFS has
variable blocksize from 512 bytes to 128k, so a 35k file consumes
exactly 35k (not, say, 40k as it would with a fixed 8k blocksize).
Single-block files have no indirect blocks, and hence no metadata
other than the znode.  So all that remains is the indirect blocks
for files larger than 128k -- which is to say, not very much.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss