Erik Trimble <erik.trim...@oracle.com> wrote: > rsync is indeed slower than star; so far as I can tell, this is due > almost exclusively to the fact that rsync needs to build an in-memory > table of all work being done *before* it starts to copy. After that, it > copies at about the same rate as star (my observations). I'd have to > look at the code, but rsync appears to internally buffer a signification > amount (due to its expect network use pattern), which helps for ZFS > copying. The one thing I'm not sure of is whether rsync uses a socket, > pipe, or semaphore method when doing same-host copying. I presume socket > (which would slightly slow it down vs star).
The reason why star is faster than any other copy method is based on the fact that star is not implemented like historical tar or cpio implementations. Since around 1990, star forks into two processes unless you forbid this by an option. In the normal modes, one of them is the "archive process" that just reads or writes from/to the archive file or tape, the other is the tar process that understands the archive content and deals with the filesystem (the direction of the filesystem operation depends on whether it is in extract or create mode). Between both processes, there is a large FIFO of shared memory that is used to share the data. If the FIFO has much free space, star will read files in one single chunk into the FIFO, this is another reason for it's speed. Another advantage in star is that it reads every directory in one large chunk and thus allows the OS to do optimization at this place. BTW: An OS that floods (and probably overflows) the stat/vnode cache in such a case may cause an unneeded slow down. In copy mode, star starts two archive processes and a FIFO between them. The create process tries to keep the FIFO as full as possible and as is makes sense to use a FIFO size up to aprox. half of the real system memory, this FIFO may be really huge, so it will even be able to keep modern tapes streaming for at least 30-60 seconds. Ufsdump only allows a small number of 126 kB buffers (I belive is it 6 buffer) and thus ufsdump | ufsrestore is tightly coupled while star allows to freely run both creation and extration of the internal virtual archive nearly independent from each other. This way, star does not need to wait every time extraction slows down but just fills the FIFO instead. Before SEEK_HOLE/SEEK_DATA existed, the only place where ufsdump was faster than star have been sparse files. This is why I talked with Jeff Bonwick in September 2004 to find a useful interface for user space programs (in special star) that do not read the filesystem at block level (like ufsdump) but cleanly in the documented POSIX way. Since SEEK_HOLE/SEEK_DATA have been introduced, there is no single known case, where star is not at least 30% faster than ufsdump. BTW: ufsdump is another implementation that first sits and collects all filenames before it starts to read file content. > That said, rsync is really the only solution if you have a partial or > interrupted copy. It's also really the best method to do verification. Star offers another method to continue interrupted extracts or copies: Star sets the time stamp of an incomplete file to 0 (1.1.1970 GMT). As star does not overwrite files in case they are not newer in the archive, star can skip the other files in extract mode and continue with the missing files or with the file(s) that have the time stamp 0. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de (uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss