Synchronization puzzle.

Alex Tweedly Fri, 21 Jan 2005 15:52:31 -0800

[ slightly OT - this is a "design / algorithm" question rather than a Rev/xtalk one, really ] [and it's a long, rambling email - so feel free to skip it]

I am in the process of writing my own Address Book application. That may sound like an odd thing to do : aren't there already hundreds of programs to do that already ? isn't there an address book built into the OS and/or major apps ? can't you get everything from freeware to professional apps to do that ? what's wrong with the software that came with your Palm Pilot (or PDA, or Pocket PC, or ....) why re-invent the wheel ?

The reason that none of them quite satisfies my needs is simple - I want a shared address book, between myself, my wife and various other family members. There need to be up to 8 different computers running this application - all of them on a network, but never all on the same network. There are two separate home networks, plus maybe a server on a web-site somewhere so I can use it from anywhere with an Internet connection; the 3 laptops will intermittently appear on one or other network.

Any person can make updates to the address book, and I want a simple way to synchronize those changes on to the other instances of the app. What I've been doing until recently was using the Palm Desktop software that came with my Palm Pilot on each machine; periodically I would sync the Palm Desktop to my Palm Pilot, then re-connect the Pilot to another machine and sync there, then to another, then ..... which all worked OK, though it's a bit of a drag connecting the Pilot to the serial port of each machine in turn. And it just seems crazy when they're connected together by 100M Ethernet :-)

But now the two newest machines don't have a traditional serial port, only USBs. So I have the choice of spending Ł29.95 for a cable or tens (hundreds) of hours writing software - no competition :-)

I looked at using the Palm database, and simply (??) writing an app to sync databases - but the combination of incomplete documentation on the format (not to mention its complexity) and concern that I might conflict with Palm's synchronization technique has put me off that. I should mention that I never (or very, very rarely) actually use the Palm Pilot for anything other than this synch'ing effort. The Palm Pilot is almost never carried around; it has, for me, been squeezed out of its niche by my laptop one one side and my cell phone with its 100+ phone numbers and storage for many SMS messages on the other. (I sometime text myself an address or directions if I think I'm likely to need it and won't be taking the laptop - or occasionally even use pen+paper !!)

What I'm looking for, then, is a method to do "peer-to-peer" synchronization between an undefined number of instances of the address database. I've got a design, I've thought it through as much as I can, I've done "dry-run"s on paper to check it works, I've implemented it and tested it - but I'm still concerned that I've missed something, and that it will fail in some circumstances.

Here's the relevant section of my "design document" (i.e. transcription of the scribbling on the back of an envelope); I'd be very grateful for any comments, corrections, etc. On the other hand, if there is some standard technique or algorithm for doing this, and I just didn't know about it (and couldn't find it in a Google search), then a pointer to a good description would be also very useful.

---------------------------
Synchronization.

All synchs are user-triggered - i.e. a user on one machine will trigger a sync to another machine; the user is available to resolve anyh conflicts immediately) or she can choose to postpone that part of the synchronization until later. The two machines must be networked together at the time of a sync. That sync can use any of: - shared file systems (e.g. laptop to file server) - http request / post (e.g. laptop to web site) - TCP based connection (e.g. laptop to laptop while on same network).

(I don't export the filesystems on the laptops, nor do I run http server on them, so the plan is that each machine can run a address-book-sync server that will accept TCP connections on some odd port, and transfer data that way).

The fundamental sync algorithm is the same in each case; the description below is written as though the synching machine had complete access to the 'other' address files. The optimizations possible to avoid complete transfer of the data should be apparent - but in fact for my address book, the whole file is probably never going to be above 100K, so it's not clear if those optimizations are worth the implementaion effort.

Time synchronization.

It cannot be assumed that the clocks of the different computers are in sync, even approximately. It can (and will) be assumed that each computer's clock will return times that are continuously increasing.

(Yes, I do know about NTGP, and SNTP, and TSP, and various other time sync protocols - but I just know that I can't rely on people running them.)


Database format.

The Address Book consists of a number of records; each record has 3 "control" fields, and a number of data fields. The sync should use ONLY the 3 control fields, except for conflict resolution, and hence should be readily extended to cover Calendar, Tasks, and other similar PIM database synchronization.

The three control fields for each record are: ID : this is a 'globally' unique ID. It is allocated when the record is first created, and never changed thereafter. It comprises the machine name (these must be unique between all machines which are to share a database), and a numeric ID allocated by the machine which creates the record. The numeric portion of the ID will be increasing - but there should, if possible, be no dependency on the fact that a later-created record has a higher ID number than another record created earlier; they just need to be all different.

last-event: a machine/time stamp identifying when this record was last modified, and the machine on which this modification was done.

status: not currently defined, but put in here for future growth. Set to "OK" for now.

Record creation: when a new record is created, the new entry receives an ID consisting of the machine name of the machine in question plus the "next" numeric ID, a last-event of the (same) machine name plus current millisec timestamp, and status of OK.

Record modification: the ID and status are unchanged, the last-event is set to the machine in question plus current timestamp. The complete record (control and data fields) is added to the History database PRIOR to any changes.

Record deletion: the record is removed from the address database, and placed in the History database.

Note that the allocation of next record ID cannot simply check for the highest current record (because deleted records are removed): the highest numeric ID used must be tracked separately.

Synchronization set-up

When asked to do a sync, the program will fetch the the address and history databases from the other machine.

(Note : optimization possible - fetch only control fields !)

Synch main loop.

Sort each address database ('mine' and 'his') in ID order. Reset 'current' counters to start.

Compare current record IDs:

mine is before his: either mine has been added, or his has been deleted. check his history database: if it contains 'my' current entry, then it has been deleteed from his database: ACTION: delete from my database else: it must have been added to mine, so ACTION: add this recodrd to his database advance my current counter

 his is before mine:  (exact mirror of above)

same ID. compare 'last-events' : same : everything is good, advance both current counters. different - but the last-event machine is the same take whichever entry has the later edit time (note - from same machine, so they can be compared), and add to the other database advance both counters different - and different machines: there is no way to tell which is later - user needs to resolve differences, or postpone. then advance both counters.

User resolution of conflicts.

User should be presented with the set of common, and different fields, and for each field chose which of the alternates is to be chosen. BOTH prior versions are added to BOTH history databases (this may be overkill - but seems a good option) newly resolved entry gets last-event of the source machine plus its current timestamp, and is added to both address databases.

User postpones resolution. Both database retains its own version of the entry, and neither has its history updated. (i.e. all other entries are synched - just this one is left unchanged).

When both current counters have reached their end, all actions have been collected.

[optimization: if the entire database was not fetched ahead of time, then any required entries can be collected from the remote system (this would be those needing resolution plus those where 'his' entry needs to be added to 'my' database.]

Send to the remote system : either the complete updated databse and history for him, or a series of add/delete/change records arising from the actions collected above.

Done.

Note that even if the write to remote system fails, the updates can still be done to the local system without any ill effects (and vice versa, though that seems unlikely).

-------------------------------------------- I can't see any failing scenarios. The one part that I haven't worked out fully is archiving / cleaning-out the history database. Clearly, it can be pruned back such that only the latest event needs to be kept for each entry - so the max size of the history database is limited to the total size of all entries that have ever existed.

Since this is a 'tool' and not a 'product', that's probably good enough for me. I can wait until I *know* that every existing copy is in sync, and then manually clear out all the history files. Clearly that wouldn't be adequate for a real product - but it's good enough for me. ('tool' - something the author can use; 'product' - something that anyone can use, is easy to use, hard to misuse, fully documented, etc.) This is a tool for my own use (though it will naturally be available on revonline (or maybe a web site, if I ever organize one for my Rev stacks) for anyone who wants to play with it, look at it, modify it for their own purposes, etc.)

All comments welcome,
-- Alex.


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.7.1 - Release Date: 19/01/2005

_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
http://lists.runrev.com/mailman/listinfo/use-revolution

Synchronization puzzle.

Reply via email to