Re: Triple Win?

Richard Newman Wed, 24 Jul 2013 13:54:49 -0700

I think we're very much in the realm of intuition and open questions, so here's 
mine :D


To open, I appreciate your desire to hit the sweet spot of building little code 
and getting big wins. As you know, I have a little more cynicism here; I'm 
hoping the difference in our perspectives is just due to me having a different 
context and some different experience with problems in this domain. I'm doing 
my best to communicate those risks.

I might summarize as "I think you're following exactly the same path as Weave, 
which turned out to be less rigorous than reality requires". That's not 
pejorative; it's a natural way to approach the set of goals that both Weave and 
PiCL share.


> In preparation of our design review, I spent a good part of yesterday trying 
> to digest the pros and cons of Dropbox API, CouchDB API, and TreeSync (which 
> I'm frankly still a bit fuzzy about). 
> 
> In short, can we get a Triple Win, where can benefit from the pros of all 
> three of these options?
> 
> Observation 1: The CouchDB API or the Dropbox File API is likely good enough 
> for syncing independent (or even mostly independent) records (e..g, history, 
> passwords, tabs). Future stuff like calendar events, contacts, reading list, 
> etc. might fall in this category too.

Mostly concur. Areas in which I can imagine issues:

• Detection and reconciling of conflicts (not only to records, but order of 
operations like "clear history"). In particular, I haven't heard a good story 
for field-level reconciling.
• Control over what and when for downloading and uploading records.
• Implicit structure and consistency.

Consider that data types like form history, tabs, and passwords almost 
certainly need to be consistent — your user experience will be damaged, along 
with Firefox's reputation, if half of your tabs, half of your form field 
changes, or half of a password pair makes it to another client. "Where's the 
other half of my desktop tabs? Firefox is so unreliable."

We can probably manage without immediately-consistent history, so long as 
"eventually consistent" doesn't take too long.

If we can get to a point where an off-the-shelf Couch client lets us control 
which records to retrieve and when, allows us to detect conflicts before 
committing a write, offers consistent bulk writes with boundaries, and is 
compatible with our crypto story, then great; it could be a useful tool. But 
see below for another viewpoint.

My understanding is that you're suggesting using Couch basically like Sync 1.1 
— dump objects into it and pull them out, but don't use its built-in 
replication. We'll get Sync 1.1 quality out of that solution, with the win of 
writing less code (in theory), but the cost of maintaining that client. Which 
is to say, that's exactly the same as just using Sync 1.1, and I'm not all that 
interested in having Mozilla spend another six years repeating history 
verbatim, but with working bookmark sync!


> Observation 2: Bookmarks are hairy because there is structure, and changes 
> should probably be more transactional than what the CouchDB and Dropbox APIs 
> provide. We might be able to alter the Couch API slightly or work within it 
> to address these issues. It probably wouldn't be as good as a more tailored 
> solution, but it might be good enough! Weird things might still happen, but 
> we could accept it, particularly if users are moving away from maintaining 
> structured bookmarks. The upside is that we could potentially move quickly.

I don't believe we can discount structured bookmarks. The numbers are 
surprising, even to me:

ITEM COUNTS FOR bookmarks
 0 - 1:  4 users
 1 - 10:  389 users
 10 - 100:  25122 users
 100 - 1000:  28241 users
 1000 - 10000:  5882 users
 10000 - 100000:  187 users
 100000 - 1000000:  1 users
 total user count: 59826

Yeah, we have a Sync user in this sample with more than 100,000 bookmarks, and 
plenty with thousands.

And at 20M or 100M users, we can't really design with the acceptance of 
"might"; any hole we leave for failure will be a chasm in the wild. This was 
the approach that Weave took, and it was wrong.

But in principle, yes, we could perhaps hammer a simple object store protocol 
into shape.

My thought process is roughly as follows, though:

• We'd need to layer some kind of structural control on top, along with support 
for fetching based on structure, etc. etc.
• So we're talking about two protocols: a blob exchange protocol which can't be 
too naive, driven by a structure exchange protocol.
• At that point, two things are probably true:
  1. We don't get much or any value from using an off-the-shelf protocol for 
client-server communications, versus using the same off-the-shelf software 
purely on the server side, because we're already building a layer alongside or 
on top, and fighting the impedance mismatches.
  2. And if we've already built a storage layer that is just as performant but 
more reliable than the raw blob store, why not use it for other data types, 
too, and get the same benefits there, alongside simpler client code and the 
ability to do things like cross-referencing?


> Observation 3: It would be nice to have clean API boundaries and composable 
> subsystems. For example, it would be nice if we could have a simple API 
> between the content/data provider (e.g., the history component) and the sync 
> mediator (the thing responsible for talking to the storage server). This 
> could allow us to change (or allow the user to choose!) a storage provider 
> without having to make substantial changes to the content providers.

That's pretty much what we've got for Android Sync, btw.

I'm not worried about hitting an architectural sweet spot on clients, because 
we'll naturally be mocking out various layers, not to mention designing for 
functional extensibility.


> Likewise, content providers should handle their own merge conflicts.

I advise caution when using the term "content provider"; Android is already 
camping that term.

I would rephrase this as "local data repositories should handle record 
application", and I agree with that.


> I would also like new content providers to be able to make substantial 
> progress on integrating with sync without a lot of hand-holding from experts. 
> Many of you are aware that I like the idea of the Syncable Service API in 
> Chromium, which has similar goals. Here's a nice talk on it: 
> https://docs.google.com/viewer?a=v&pid=sites&srcid=Y2hyb21pdW0ub3JnfGRldnxneDo2MzU1NDEwZTA1NTUwNzlk

We're at that point for Android Sync, I think. There's a trivial API to 
implement, and everything else (e.g., peer-to-peer syncing) comes for free.

For example, here's the repo layer for Fennec's tabs, which I didn't write.

https://github.com/mozilla-services/android-sync/blob/develop/src/main/java/org/mozilla/gecko/sync/repositories/android/FennecTabsRepository.java

I'd be aiming for even more simplicity in the future. (Android Sync is, if 
anything, over-engineered for parallelism, which is a nice problem to have!)

(Of course, this wouldn't be the API; it's time-oriented, for one thing.)

> Triple Win?: Most datatypes would be fine with Couch or Dropbox. I propose we 
> consider a structure where the sync/wire protocol could be either the CouchDB 
> or Dropbox API and *data types that need more (e.g., bookmarks)* can layer 
> additional mechanisms on top of that. For example, TreeSync could run on top 
> of an abstraction backed by either the CouchDB API or Dropbox API (I think 
> they are close enough that we can pull that off). Brian and I brainstormed 
> how one might do this yesterday, and the idea has legs.
> 
> I call this a potential Triple Win, because if it works, then we could get 
> the "already debugged, already there" wins from both Couch and Dropbox, we 
> get the integrity win of TreeSync, and we get the Dropbox 
> offload-some-server-storage win if that somehow makes sense.

To be fair, if we have to change the protocol or the server — and my intuition 
is that we would — we wouldn't win much in terms of "already debugged". And I'm 
concerned that there's a fundamental mismatch between document replication and 
server-canonical single-timeline state that will bite us in the ass. But that's 
mostly intuition.

Beyond that, there's the concern that integration is as costly as building, and 
even more so when the integration is "this covers 80% of our needs". If we were 
building something for which the Dropbox or Couch APIs were a 100% fit -- 
storing unencrypted objects and using server-side conflict resolution, with 
each client having a full copy of the data -- then I'd wholeheartedly recommend 
Couch, because I don't want to have to write or maintain code if I don't have 
to!

_______________________________________________
Sync-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/sync-dev

Re: Triple Win?

Reply via email to