Shachar-
I think Wayne is mostly
pointing you to the correct location here.
If you look at the code where the checksum
is computer, you'll find that the rolling
checksum actually consists of two parts -
one is the sum of all of the byte values
in the window, the other is the offset
Kevin Day wrote:
Shachar-
I think Wayne is mostly pointing you to the correct location here. If
you look at the code where the checksum is computer, you'll find that
the rolling checksum actually consists of two parts - one is the sum
of all of the byte values in the window, the other is the
Lars Karlslund wrote:
And I'm suggesting making it static, by adjusting the hash table's size
according to the number of blocks. Just do
hashtablesize=(numblocks/8+1)*10;, and you should be set.
Or maybe it should really be dynamic.
I'm talking about the hash table load. I.e. - the ratio
Kevin Day wrote:
As a quick FYI, the block size absolutely has an impact on the
effectiveness of the checksum table - larger blocks means fewer
blocks, which means fewer hash colissions.
Since you wouldn't expect that many blocks in the file, a 32bit weak
checksum would only produce about 4 or
On Sat, Mar 05, 2005 at 02:18:01PM +0200, Shachar Shemesh wrote:
That's not what I read into it. It seems to me that the checksum
function gives a 32bit result, and we are squashing that into a 16bit
hash table. Can you point me to the code? Wayne?
Look in match.c. The hash table is
On Thu, Mar 03, 2005 at 10:18:01AM +0200, Shachar Shemesh wrote:
And I'm suggesting making it static, by adjusting the hash table's
size according to the number of blocks.
The block-size? No thanks. The smaller the block-size for a particular
file size, the more checksum data needs to be
[I accidentally sent this reply only to Lars instead of the list.]
On Fri, Mar 04, 2005 at 06:44:48PM +0100, Lars Karlslund wrote:
I read somewhere about the default blocksize of 700 bytes. Now I'm
told the blocksize is calculated automatically. Which is it?
700 is the minimum block-size. The
Wayne Davison wrote:
On Thu, Mar 03, 2005 at 10:18:01AM +0200, Shachar Shemesh wrote:
And I'm suggesting making it static, by adjusting the hash table's
size according to the number of blocks.
The block-size?
Definitely not! I was talking about the hash table load. I.e. - the
ratio
Wayne Davison wrote:
He's talking about potentially losing an even distribution of values
if the lowest order bits aren't random enough.
On Sat, Mar 05, 2005 at 08:10:37PM +0200, Shachar Shemesh wrote:
Even for N which is not a power of 2?
In the case of the weak checksum, it is computed
Shachar-
True enough - with one
additional thought - if the block size is
set to be the square root of the file size,
then the load factor on the hash table becomes
dynamic in and of itself (bigger block size
= less master table entries = fewer hash
collisions).
In the case of
Hi everyone,
Thank you for your replies.
On tor, 2005-02-24 at 17:59 +0100, Paul Slootman wrote:
It would certainly be possible to change the algorithm to not cache the
data (and thus only allow the current block to be compared), but I don't
think that idea has general enough interest
Lars Karlslund wrote:
Also as far as I could read, the default block size is 700 bytes? What
kind of application would default to moving data around 700 bytes at a
time internally in a file? I'm not criticizing rsync, merely
questioning the functionality of this feature.
I believe you may have
On man, 2005-02-28 at 12:23 +0200, Shachar Shemesh wrote:
Also as far as I could read, the default block size is 700 bytes? What
kind of application would default to moving data around 700 bytes at a
time internally in a file? I'm not criticizing rsync, merely
questioning the
Lars Karlslund wrote:
Maybe I didn't express myself thoroughly enough :-)
Or me.
Yes, a block is a minimum storage unit, which is considered for transfer.
In size, yes. Not in position.
But it's a fact that the rsync algorithm as it is now checks to see if
a block should have moved. And in that
Shachar Shemesh wrote:
No, because the rsync algorithm can detect single byte moves of this
700 bytes block.
I will just mention that I opened the ultimate documentation for rsync
(the source), and it says that the default block size is the rounded
square root of the file's size. This means
Wayne Davison wrote:
However, you should be sure to have measured what is causing the
slowdown first to know how much that will help. If it is not memory
that is swapping on the sender, it may be that the computing of the
checksums in maxing out your CPU, and removing the caching of the
remote
Shachar,It does use a hash table.
rsync adds the two components of the
rolling checksum together to come up with
a 16 bit hash, and performs a table lookup.
The table contains an offset into the
sorted rsync checksum table (which contains
both weak and strong checksums), and does
a
On Mon, Feb 28, 2005 at 08:33:52PM +0200, Shachar Shemesh wrote:
If so, we can probably make it much much (much much much) more
efficient by using a hash table instead.
That's what tag_table is -- it's an array of 65536 pointers into
the checksum data sorted by weak checksum. The code then
Lars Karlslund writes:
Also the numbers speak for themselves, as the --whole-file option is
*way* faster than the block-copy method on our setup.
At the risk of jumping into the middle of this thread without
remembering everything that was discussed...
Remember that by default rsync writes a
Kevin Day wrote:
I would *strongly* recommend that you dig into the thesis a bit (just
the section that describes the rsync algorithm itself).
I tried a few weeks ago. I started to print it, and my printer ran out
of ink :-). I will read it electronically eventually (I hope).
Now, if you have
On Thu 24 Feb 2005, Wayne Davison wrote:
It would certainly be possible to change the algorithm to not cache the
data (and thus only allow the current block to be compared), but I don't
think that idea has general enough interest for me to work on for
inclusion in rsync. You might want to
21 matches
Mail list logo