Erik Trimble <erik.trim...@oracle.com> wrote:

> rsync is indeed slower than star; so far as I can tell, this is due 
> almost exclusively to the fact that rsync needs to build an in-memory 
> table of all work being done *before* it starts to copy. After that, it 
> copies at about the same rate as star (my observations). I'd have to 
> look at the code, but rsync appears to internally buffer a signification 
> amount (due to its expect network use pattern), which helps for ZFS 
> copying.  The one thing I'm not sure of is whether rsync uses a socket, 
> pipe, or semaphore method when doing same-host copying. I presume socket 
> (which would slightly slow it down vs star).

The reason why star is faster than any other copy method is based on the fact 
that star is not implemented like historical tar or cpio implementations.

Since around 1990, star forks into two processes unless you forbid this by an 
option. In the normal modes, one of them is the "archive process" that just 
reads or writes from/to the archive file or tape, the other is the tar process 
that understands the archive content and deals with the filesystem (the 
direction of the filesystem operation depends on whether it is in extract or
create mode).

Between both processes, there is a large FIFO of shared memory that is used to 
share the data. If the FIFO has much free space, star will read files in one 
single chunk into the FIFO, this is another reason for it's speed.

Another advantage in star is that it reads every directory in one large chunk 
and thus allows the OS to do optimization at this place. BTW: An OS that floods 
(and probably overflows) the stat/vnode cache in such a case may cause an 
unneeded slow down.

In copy mode, star starts two archive processes and a FIFO between them.

The create process tries to keep the FIFO as full as possible and as is makes 
sense to use a FIFO size up to aprox. half of the real system memory, this FIFO 
may be really huge, so it will even be able to keep modern tapes streaming for 
at least 30-60 seconds. Ufsdump only allows a small number of 126 kB buffers 
(I belive is it 6 buffer) and thus ufsdump | ufsrestore is tightly coupled 
while star allows to freely run both creation and extration of the internal 
virtual archive nearly independent from each other. This way, star does not 
need to wait every time extraction slows down but just fills the FIFO instead.

Before SEEK_HOLE/SEEK_DATA existed, the only place where ufsdump was faster 
than star have been sparse files. This is why I talked with Jeff Bonwick in 
September 2004 to find a useful interface for user space programs (in special 
star) that do not read the filesystem at block level (like ufsdump) but cleanly 
in the documented POSIX way.

Since SEEK_HOLE/SEEK_DATA have been introduced, there is no single known case, 
where star is not at least 30% faster than ufsdump. BTW: ufsdump is another 
implementation that first sits and collects all filenames before it starts to 
read file content.


> That said, rsync is really the only solution if you have a partial or 
> interrupted copy.  It's also really the best method to do verification.

Star offers another method to continue  interrupted extracts or copies:

Star sets the time stamp of an incomplete file to 0 (1.1.1970 GMT). As star 
does not overwrite files in case they are not newer in the archive, star can 
skip the other files in extract mode and continue with the missing files or 
with the file(s) that have the time stamp 0.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       j...@cs.tu-berlin.de                (uni)  
       joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to