On Fri, Oct 19, 2012 at 11:23 AM, Arne Jansen <sensi...@gmx.net> wrote:
> On 19.10.2012 11:16, Irek Szczesniak wrote:
>> On Wed, Oct 17, 2012 at 2:29 PM, Arne Jansen <sensi...@gmx.net> wrote:
>>> We have finished a beta version of the feature. A webrev for it
>>> can be found here:
>>>
>>> http://cr.illumos.org/~webrev/sensille/fits-send/
>>>
>>> It adds a command 'zfs fits-send'. The resulting streams can
>>> currently only be received on btrfs, but more receivers will
>>> follow.
>>> It would be great if anyone interested could give it some testing
>>> and/or review. If there are no objections, I'll send a formal
>>> webrev soon.
>>
>> Why are you trying to reinvent the wheel? AFAIK some tar versions and
>> AT&T AST pax support deltas based on a standard (I'll have to dig out
>> the exact specification, but from looking at it you did double work).
>>
>
> I haven't done the research myself, but the result was that pax would
> have needed significant extension, but I don't have the details. If
> you dig out a format already in use that supports everything we need
> (like sharing data between files, needed for btrfs reflinks), it should
> be easy to change the format. Stuffing the data into a specific format
> is not an essential part of the work and can be changed with a limited
> amount of work.
>
> -Arne
>
>> Irek
>

tar/pax was the initial format that was chosen for btrfs send/receive
as it looked like the best and most compatible way. In the middle of
development however I realized that we need more then storing whole
and incremental files/dirs in the format. We needed to store
information about moved, renamed, deleted, reflinked and even partial
clones where only some bits of a file are shared with another. This
can for sure all be implemented in pax, but then the next problem is
that in some situations renamed/moved files need multiple entries to
get to the desired result. For example, file a may be renamed to b
while at the same time file b got renamed to a. In such cases we need
3 entries that use a temporary name so that we don't loose one of the
files while receiving. There are much more complex examples where it
gets quite complicated.

Also, it needed support for metadata (mode, size, uid/gid, ...)
changes on already existing files/dirs. Reusing already existent
tar/pax entry types was not possible for this as standard tar would
overwrite the original files with empty files.

I had all that implemented with pax, using a lot of custom pax
entries. A lot...so many that it didn't look like tar/pax anymore. It
actually mutated from a list of file/dir/link entries (which tar/pax
is meant to be) to a list of filesystem instructions (rename, link,
unlink, rmdir, write parts of a file, clone parts of a file, chmod,
...).

My thought was, that this was already a big misuse of tar/pax, so I
decided to implement a simple format for this purpose only. Using pax
gave no advantages anymore. In tar/pax every entry must have a file
name, even the pax header entries need a file name. The problem now
is, that plain tar will treat every unknown entry type as regular file
and blindly overwrite existing ones which may result in data loss. To
prevent this, I always added something to the file name so that
unpacking with tar would not hurt the user. The unavoidable side
effect however is that the result of a plain untar is unusable without
further interpretation, which will be hard because tar by default does
not dump pax headers but instead ignores unknown entries.

Also, using tar/pax as the format for send/receive may give a user the
wrong impression that he can later use his good old standard tar to
restore his backups...this could be fatal for him.

Alex.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to