One of the decisions that Weave made was to use record expiry (and pruning scripts) on the server.
Records uploaded by a client can have a TTL set. History and forms expire after 60 days, tabs after 7 days, clients 21. There is a large default that applies to other records, but it's so large that we can ignore it. Once a record expires, it won't be returned from queries, and eventually a pruning script will delete it from the database. Uploading a new version of a record will reset the TTL and refresh the object. The purpose of a TTL, as I understand it, was threefold: * Clients can disappear without warning; because there is no strong concept of a connected device, Sync relies on periodic refreshes to simulate disconnection and cleanup. * Sync clients don't all have the same concept of expiration, and furthermore they don't propagate bulk-clear events. Without a TTL, records -- even ones that all clients have forgotten about -- would live on the server forever, even when a client wiped its history or encountered automatic expiration. * Clients produce a lot of history. A TTL helps to reduce overall space usage and query response sizes. And it's safe to do so, given the assumption that clients were canonical, and thus wiping really old stuff off the "whiteboard" was fine. It has downsides: * The pruning scripts are expensive, and we have to disable them during periods of high load. * It results in extra writes: clients write their own record once a week to ensure that we don't get expired. * It doesn't recover enough space to be massively important. To quote telliott, "is it a gigantic win? no". And to summarize: 09:47:56 < telliott> I'd assert that pruning never really lived up to its promise 09:48:04 < telliott> (or ttls, for that matter) I don't think the TTL approach works for Sync.next, for several reasons. * We plan to have a strong concept of attached clients, and device management outside of storage. The storage server shouldn't be making the time-based decision that a client has disappeared, and it's questionable whether there's value in doing so. * We're aiming for consistent storage, which is mostly incompatible with some old records just disappearing without client action! * We are moving in the direction of durable, if not entirely canonical, server storage. This somewhat implies shared state, rather than Sync's non-propagating model -- "profile in the cloud", not "whiteboard". The decision was already made for Sync 2.0 to propagate wipes: Bug 578694. That means that a *client* should decide when data should go away, and existing clients should have the same view of the world as a new client learning all it knows from the server. * TTLs are broadly incompatible with extended offline usage or simple recovery scenarios. (There are tradeoffs here: what if you just stop using an old phone? Do we keep data around, and stick to old data formats, to make it possible for that phone to sync more easily? I think my point is that we shouldn't be routinely deciding that a client has gone AWOL.) * Much of the win for TTLs is to improve query speed when iterating over whole collections. I think all of our proposed storage mechanisms will provide query mechanisms that avoid whole-collection iterations. Thoughts, folks? -R _______________________________________________ Sync-dev mailing list [email protected] https://mail.mozilla.org/listinfo/sync-dev

