OT: A few weeks ago I read a really interesting paper[1] on CPISync, a novel synchronization scheme for PDAs which doesnt use timestamps ( theres a bunch of papers and a thesis, see http://www.google.com/search?q=CPISync ). The presentation in the papers is unecessarily complicated, the gist can be appreciated with high school maths:

Every piece of synchronizable data can be represented by an integer - the data bitstring itself in some cases, or a hash.

If Computer A holds data items (12, 35, 2001, 137)
and Computer B holds data items (12, 137, 148)

then we say Computer A has the /characteristic polynomial/ A(x)=(x-12)(x-35)(x-137)(x-2001).
Computer B has the characteristic polynomial B(x)=(x-12)(x-137)(x-148).

Delta(x) = A(x)/B(x) = (x-35)(x-2001)/(x-148)
Note this equation contains just the items of data that would need to be exchanged for a synchronization (ignore the problem of whether they should be added or deleted on each side)

Now suppose A(x) is published for x=1,2,3...n for some small n.
Then calculating B(x) for 1,2,3...n allows us to get the value of Delta(x) for these values of x. Methods exist for recovering the equation of Delta(x) from known values (glossing over the details here!). But knowing the equation Delta(x) would tell you all the differences between A and B! So it becomes possible to synchronize multiple sets efficiently, without storing timestamps, as long as calculating a few values of the characteristic polynomial is cheap enough[2]

There may be uses for this in things like source code control systems, build tools, etc; but I just think its a really neat trick :)

- Baz

[1] that I think this paper was interesting shows just how sad and geeky I am...
[2] their way of optimising this appears to be to use arithmetic over small finite fields. Heck I didn't say it was /all/ high school maths.

Attila Szegedi wrote:
----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: 2003. janu�r 3. 12:41
Subject: Re: Reducing Maven build times



Do we think that timestamp files are an ugly thing?
My experience with people using them to speed up "make" builds in the past
was not favourable - generally unreliable...
What do you think?


Given few preconditions to ensure that time as measured by your computer
flows uniformly and monotonely enough for all practical purposes, they are
reliable:
- you don't copy files over a network between machines with unsynchronized
clocks
- you're keeping your machine's clock accurate (preferrably syncing with an
atomic clock on a daily basis over the network), so it never needs to be
adjusted more than a second or two.

If in doubt, you can always do a full rebuild. Doing it once a day before
starting to work on a project is generally a healthy practice (you're
waiting for your morning coffee to brew anyway...), then proceed with
incremental builds that rely on timestamps. Imperatively do a clean rebuild
when generating a distribution package.

Attila.


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to