On Wed, May 13, 2009 at 1:27 AM, Frederik Ramm <frede...@remote.org> wrote:
> Matt,
>
>> the least invasive way is to use the minutely diffs, as it doesn't
>> touch the API or DB servers at all.
>
> Sure, but they are (a) delayed by 5 minutes and (b) broken ;-)

we're working on both (a) and (b) at the moment... we'll fix it real
soon now, i promise :-)

> I was initially opposed to the concept of diffs. I remember a developer
> meeting in Essen in 2007 where I rather violently requested more frequent
> updates and NickB said something like "we could do daily or hourly diffs"
> and I said "I want the f*ing real thing, not canned diffs".

the trouble with "the f*ing real thing" is that, because it needs the
very latest information, it has to hit the database. imagine that
TF*RT is like WMS - every different request has a slightly different
lat/lon/scale, so its basically uncacheable unless some clever things
are done. granular diffs are like tiles - you only get discrete
chunks, but it makes caching *so* much easier. in fact, you could look
at the files on planet.osm.org as direct access to the cache - no need
to hit the DB, no extra DB load which would be better used serving
editors**. :-)

> I must say that, especially with the convenience Osmosis brings in dealing
> with them, I have meanwhile changed my mind. The diffs are a very crude
> solution but they work remarkably well, and they are quite robust compared
> to some kind of replication feed that may go out of sync at any time.

exactly. because they're just files on disk they're robust against API
downtime or bugs, they're quick to download, etc...

> I still think that there are use cases for almost-realtime feeds but the
> diffs work for most people. - I didn't know the original poster was unaware
> of the diffs; I assumed he must know the diffs and was looking for something
> better!

i think we can find a compromise. if we could get the diff generation
time down from about 5 minutes (and fix (b)!) to 1-2 minutes, would
that be good enough for almost-realtime?

>> given that there are more efficient ways of doing the database
>> replication than aggregating these feeds from all the different API
>> servers into a coherent whole,
>
> As I said in another post, I was under the impression that while you can
> easily have any number of servers running API daemons on them, you'd rather
> not stuff too much into the database because at least for write requests
> we'll be stuck with it for a long while to come. But hey, maybe I
> underestimate the Postgres factor ;-)

but then a single something has to communicate with all the API
daemons, collate all the API activity, and ensure edits' atomicity,
consistency, isolation and durability... what kind of software might
have these ACID properties, i wonder? ;-)

>> unless, of course, you're talking about twittering the updates. that
>> would be teh moar ;-)
>
> For once, it would not be TomH who bans an IP range then ;-)

hey, the postgres guys were happy with OSM using postgres - why
wouldn't twitter be happy? they just re-wrote their backend for better
scalability, so we'd be doing them a favour by testing it!

cheers,

matt

**: yeah, there's going to be an overhead for pulling the minute diffs
out, but thats done once and amortised over all the consumers of the
data.

_______________________________________________
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk

Reply via email to