Re: Syncing Bookmark Tombsones

Mark Hammond Mon, 29 Jan 2018 00:55:02 -0800

Thanks for writing this up Thom (and thanks for replying already Richard- I briefly respond to you near the end of this mail)


First, for my own benefit, let me try and restate the issue we have here:

* Currently no devices persist tombstones - they exist only on theserver. If the server data is lost, any clients which have not syncedbetween an item being deleted and the server wipe will resurrect deleteditems - because no tombstone exists on any client nor on the server.

* While we talk more about bookmarks, this is true for all collectionswith tombstones. This is also true no matter how we might lose theserver data (of which there are many)

* For bookmarks in particular, bookmark restoring may introduce otherissues that will require thought.


And that's pretty-much the full extent of our resurrection issue, right?

The solution to this is, roughly, "have all clients remember alltombstones". The for sake of this discussion, we assume the cost oflocally persisting tombstones is reasonable.


The edge-case here is that:

* client-1 syncs.

* client-2 deletes a bookmark, creates a persistent local tombstone anduploads it to the server.


* server data is lost

* client-1 does the next initial sync and uploads the item (as it has notombstone)

* client-2 does the next sync - sees the item it knows for sure waspreviously deleted *somewhere* and must decide what to do.

The only reason to not honor the deletion (ie, resurrect it) is thepossibility that the bookmark might have been edited on one device butdeleted on the other, so deleting it would be data-loss.


The possible solutions to *this* are:

* Always resurrect the item. This is what our current reconcillationlogic will do as the last-modified timestamp on the server will be laterthan the time we recorded the deletion. IIUC, this is (1) below.


* Always delete the item. This is (2) below.

* Track more state so we can try and be confident we made the correctchoice between resurrection and deletion. This is (3)

* Give up in disgust, deciding that this is "edge-cases all the waydown", which is (4).

* Make each client aware of the fact that the server was wiped and thatsome other device has uploaded *its* view of the world and attempt tomake decisions in a more holistic manner. I think this is what Richardwas implying in part of his response.

Am I missing something here? Are there other edge-cases not related tobookmark restore that I've omitted?

If not, then I believe Thom is correct in that (2) below is the rightthing to do. Closely followed by (4). (1) is clearly wrong, (3) is notworth the cost, and (holistic) is for some distant/alternate future.


More comments inline:

On Thu, Jan 25, 2018 at 2:45 PM, Thom Chiovoloni1. Treat tombstones more-or-less like we treat bookmarks now, that
        is, keep them around locally, store remote ones locally when
        they come in, and resolve conflicts based on the modified
        timestamp on the item.

To make sure I'm not missing some subtlety, is this "more-or-less likewe treat bookmarks" or "exactly like we treat bookmarks"?

     2. Store tombstones from the server locally in places (as in #1),
        but always reconcile in favor of deleted items over new items.
        Once an item is deleted it's gone. (There are some nuances, e.g.
        If an incoming item has a local tombstone, we reupload that
        tombstone to the server (to delete it), etc.)
        This has issues around bookmark restoring, which aren't handled
        by my patch (probably it's biggest TODO).

Restoring is a problem, particularly in edge-cases such as the restoreis from a completely different set of bookmarks (where the normal caseis typically a restore from an older version of the same bookmarks).Further, it's not clear to me that we should honor tombstones that existon the server on restore - the user may be restoring precisely to getback bookmarks they accidentally deleted - so it may be that storingtombstones doesn't make much sense.

We certainly should do a better job on restore (and particularlyautomatic restore that happens before we are loaded, and better handletransient failures wiping the server), but otherwise I'm not sure weshould conflate the 2 issues too much (which isn't to say we shouldn'tconsider how they interact)

        It also seems dangerous to never allow items to be undeleted --
        once a guid is discarded, it's effectively gone forever (except
        for cases where all clients with the guid wipeClient).

In what way is it dangerous? Is there any risk other than the "userdeletes on device-1, edits on device-2, and gets upset we took the delete"?

     3. The "most correct" approach is probably to store the (best guess
        for) the server timestamp when the bookmark was deleted in the
        places database, which would then be used for reconciliation
        (Possibly using a scheme similar to dateAdded for ratcheting
        time backwards).


IIUC we already do that.

        We'd also need to add the 'real' modification date to each
        bookmark record, e.g. the server timestamp upon editing the
        record. This would allow us to continue working as we do now
        (sometimes undeleting bookmarks).

Server timestamps have their own set of problems too - all we know isthe time of the last or next sync - we clearly can't always know theserver timestamp for a local deletion or modification - which itselfleads to edge-cases.

A local timestamp would work (poorly) too - but yeah, this seems trickyand more complex than is worthwhile given it's an edge-case^2 (or are weup to ^3? :)

     4. I guess we could also do nothing. It's an edge case, there
        doesn't seem to be a perfect solution (after writing this up I
        still think #2 is probably the best). I also suspect that most
        people who have seen resurrected items, it was really bug
        1177516 <https://bugzil.la/1177516>, which this won't fix at all.

I'm skeptical that users who reported resurrected items were talkingonly about default bookmarks - these defaults will not be resurrected bya node reassignment and users reporting issues explicitly mentioned theydid indeed sync an "old" device.

IOW, I believe bug 1177516 is "a very limited set of bookmarks may get'resurrected' when a new device is added", whereas we are talking about"potentially unlimited set of bookmarks may get resurrected on nodereassignment" - so 2 quite different things.

        I don't think #4 is the right choice, but I'm including it since
        Mark mentioned it in our one-on-one. I think we should still do
        this, since the code complexity isn't *that* bad (well, for #3
        it's a good amount worse), and it does solve a known issue -- we
        also knew it wouldn't be trivial when we decided to do it
        initially, and I don't think much has changed around this since
        then.

Another thing I mentioned was the possibility of "solving" this via UX -eg, when we notice an old device is about to Sync, we could check whatthey want to do (there's a bug on file for this, but can't find it) -there will be devil in the detail there too though (and even then, ifthe user selects "merge them", we are stuck in the same position we arenow.)

(Admittedly, I'm biased though, I'd rather not have another OKR I'm assigned to 
where we end up doing nothing; as with the sync-repair system addon)

Yeah, that would suck, but I'm sure you agree that alone isn't a goodenough reason to do something here :)

But all in all, and as mentioned above, I'm leaning towards (2) too,simply because I feel that a user deleting on one device and editing onanother is (a) unlikely, (b) may do the wrong thing today even when wedon't lose server data, and (c) having these items be deleted when weguess wrong seems less surprising to the user than having thenresurrected when they didn't actually edit the item.


On 27/01/2018 7:46 am, Richard Newman wrote:

> A quick reply to this:
>
> - First, to get it out of the way, I don't think we need to be
> concerned about cost of keeping tombstones. However, the more
> we store — timestamps, states, etc. — the more we have to
> worry about this. Imagine
> that poor user with multi-duplicated bookmarks thanks to old bugs,
> creating thousands of tombstones as they clean up.

I agree in general, although I think the cost of a tombstone without anextra timestamp vs one with an extra timestamp is noise. We could alsouse telemetry to help us understand how many tombstones we have andexpire them if necessary - I'm really not too bothered by a devicecoming back after 2 years having somewhat strange things happen.

IOW, local tombstones *do* have a cost that we will need to manageforever, both in places and sync. This is another argument for (4) :)


> - Secondly, I think it's worth taking one further analytical step beyond
> timestamps. Your node reassignment concern in #1 is an ordering concern,
> and your approach of keeping and ratcheting server timestamps is a
> heuristic for specifying happened-before relationships that aren't
> currently modeled in Sync. Reconciling conflicts and handling empty
> servers is, in part, a question of deciding which changes to discard
> because they happened in a previous 'epoch'.

I don't understand what specifically you are getting at here? Is my"holistic" note above roughly what you mean?

TBH though, IMO the further analytical step should really be aroundbookmark restore - there will certainly be additional trade-offs here.


> - When considering whether it's worth doing this work: how often do we
> have users who: (a) have a device that's out of sync, (b) have
> tombstones on the server, and (c) are node reassigned? That is, how
> often do users' devices drift out of sync and undelete items because
> of  a failure to propagate deletions? And what else would this work
> solve? I'm in favor of systems that behave predictably, but I'm not
> usually in  favor of trying to achieve predictability by piling on
> layers of additional behavior onto a system that simply isn't
> expressive enough.

I agree with this in general, but it's tricky because node reassignmentsaren't a regular occurrence (let alone reassigning *every* user like wedid last year) - but they are expected.

IIU Thom's (2) correctly, there's not a whole lot of extra work or stateto carry around here - it really is just "reconcile always prefersdeletions" - that doesn't seem like it's adding much extra complexity,the cases in which it fails don't seem worth worrying about, so isarguably justifiable. Without a more compelling reason, I'm against (3)for exactly the reason you imply above - the bang isn't worth the buck.

I do agree that we should be careful to understand what tradeoffs we areproposing here and make sure we understand and accept when (2) would doworse than the status quo in the face of these strange events,particularly with bookmark restore.

But all of this has a cost - there's still space for someone to advocatefor accepting (4) - the status-quo and doing nothing :)


Thanks!

Mark
_______________________________________________
Sync-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/sync-dev

Re: Syncing Bookmark Tombsones

Reply via email to