I think we're very much in the realm of intuition and open questions, so here's
mine :D
To open, I appreciate your desire to hit the sweet spot of building little code
and getting big wins. As you know, I have a little more cynicism here; I'm
hoping the difference in our perspectives is just due to me having a different
context and some different experience with problems in this domain. I'm doing
my best to communicate those risks.
I might summarize as "I think you're following exactly the same path as Weave,
which turned out to be less rigorous than reality requires". That's not
pejorative; it's a natural way to approach the set of goals that both Weave and
PiCL share.
> In preparation of our design review, I spent a good part of yesterday trying
> to digest the pros and cons of Dropbox API, CouchDB API, and TreeSync (which
> I'm frankly still a bit fuzzy about).
>
> In short, can we get a Triple Win, where can benefit from the pros of all
> three of these options?
>
> Observation 1: The CouchDB API or the Dropbox File API is likely good enough
> for syncing independent (or even mostly independent) records (e..g, history,
> passwords, tabs). Future stuff like calendar events, contacts, reading list,
> etc. might fall in this category too.
Mostly concur. Areas in which I can imagine issues:
• Detection and reconciling of conflicts (not only to records, but order of
operations like "clear history"). In particular, I haven't heard a good story
for field-level reconciling.
• Control over what and when for downloading and uploading records.
• Implicit structure and consistency.
Consider that data types like form history, tabs, and passwords almost
certainly need to be consistent — your user experience will be damaged, along
with Firefox's reputation, if half of your tabs, half of your form field
changes, or half of a password pair makes it to another client. "Where's the
other half of my desktop tabs? Firefox is so unreliable."
We can probably manage without immediately-consistent history, so long as
"eventually consistent" doesn't take too long.
If we can get to a point where an off-the-shelf Couch client lets us control
which records to retrieve and when, allows us to detect conflicts before
committing a write, offers consistent bulk writes with boundaries, and is
compatible with our crypto story, then great; it could be a useful tool. But
see below for another viewpoint.
My understanding is that you're suggesting using Couch basically like Sync 1.1
— dump objects into it and pull them out, but don't use its built-in
replication. We'll get Sync 1.1 quality out of that solution, with the win of
writing less code (in theory), but the cost of maintaining that client. Which
is to say, that's exactly the same as just using Sync 1.1, and I'm not all that
interested in having Mozilla spend another six years repeating history
verbatim, but with working bookmark sync!
> Observation 2: Bookmarks are hairy because there is structure, and changes
> should probably be more transactional than what the CouchDB and Dropbox APIs
> provide. We might be able to alter the Couch API slightly or work within it
> to address these issues. It probably wouldn't be as good as a more tailored
> solution, but it might be good enough! Weird things might still happen, but
> we could accept it, particularly if users are moving away from maintaining
> structured bookmarks. The upside is that we could potentially move quickly.
I don't believe we can discount structured bookmarks. The numbers are
surprising, even to me:
ITEM COUNTS FOR bookmarks
0 - 1: 4 users
1 - 10: 389 users
10 - 100: 25122 users
100 - 1000: 28241 users
1000 - 10000: 5882 users
10000 - 100000: 187 users
100000 - 1000000: 1 users
total user count: 59826
Yeah, we have a Sync user in this sample with more than 100,000 bookmarks, and
plenty with thousands.
And at 20M or 100M users, we can't really design with the acceptance of
"might"; any hole we leave for failure will be a chasm in the wild. This was
the approach that Weave took, and it was wrong.
But in principle, yes, we could perhaps hammer a simple object store protocol
into shape.
My thought process is roughly as follows, though:
• We'd need to layer some kind of structural control on top, along with support
for fetching based on structure, etc. etc.
• So we're talking about two protocols: a blob exchange protocol which can't be
too naive, driven by a structure exchange protocol.
• At that point, two things are probably true:
1. We don't get much or any value from using an off-the-shelf protocol for
client-server communications, versus using the same off-the-shelf software
purely on the server side, because we're already building a layer alongside or
on top, and fighting the impedance mismatches.
2. And if we've already built a storage layer that is just as performant but
more reliable than the raw blob store, why not use it for other data types,
too, and get the same benefits there, alongside simpler client code and the
ability to do things like cross-referencing?
> Observation 3: It would be nice to have clean API boundaries and composable
> subsystems. For example, it would be nice if we could have a simple API
> between the content/data provider (e.g., the history component) and the sync
> mediator (the thing responsible for talking to the storage server). This
> could allow us to change (or allow the user to choose!) a storage provider
> without having to make substantial changes to the content providers.
That's pretty much what we've got for Android Sync, btw.
I'm not worried about hitting an architectural sweet spot on clients, because
we'll naturally be mocking out various layers, not to mention designing for
functional extensibility.
> Likewise, content providers should handle their own merge conflicts.
I advise caution when using the term "content provider"; Android is already
camping that term.
I would rephrase this as "local data repositories should handle record
application", and I agree with that.
> I would also like new content providers to be able to make substantial
> progress on integrating with sync without a lot of hand-holding from experts.
> Many of you are aware that I like the idea of the Syncable Service API in
> Chromium, which has similar goals. Here's a nice talk on it:
> https://docs.google.com/viewer?a=v&pid=sites&srcid=Y2hyb21pdW0ub3JnfGRldnxneDo2MzU1NDEwZTA1NTUwNzlk
We're at that point for Android Sync, I think. There's a trivial API to
implement, and everything else (e.g., peer-to-peer syncing) comes for free.
For example, here's the repo layer for Fennec's tabs, which I didn't write.
https://github.com/mozilla-services/android-sync/blob/develop/src/main/java/org/mozilla/gecko/sync/repositories/android/FennecTabsRepository.java
I'd be aiming for even more simplicity in the future. (Android Sync is, if
anything, over-engineered for parallelism, which is a nice problem to have!)
(Of course, this wouldn't be the API; it's time-oriented, for one thing.)
> Triple Win?: Most datatypes would be fine with Couch or Dropbox. I propose we
> consider a structure where the sync/wire protocol could be either the CouchDB
> or Dropbox API and *data types that need more (e.g., bookmarks)* can layer
> additional mechanisms on top of that. For example, TreeSync could run on top
> of an abstraction backed by either the CouchDB API or Dropbox API (I think
> they are close enough that we can pull that off). Brian and I brainstormed
> how one might do this yesterday, and the idea has legs.
>
> I call this a potential Triple Win, because if it works, then we could get
> the "already debugged, already there" wins from both Couch and Dropbox, we
> get the integrity win of TreeSync, and we get the Dropbox
> offload-some-server-storage win if that somehow makes sense.
To be fair, if we have to change the protocol or the server — and my intuition
is that we would — we wouldn't win much in terms of "already debugged". And I'm
concerned that there's a fundamental mismatch between document replication and
server-canonical single-timeline state that will bite us in the ass. But that's
mostly intuition.
Beyond that, there's the concern that integration is as costly as building, and
even more so when the integration is "this covers 80% of our needs". If we were
building something for which the Dropbox or Couch APIs were a 100% fit --
storing unencrypted objects and using server-side conflict resolution, with
each client having a full copy of the data -- then I'd wholeheartedly recommend
Couch, because I don't want to have to write or maintain code if I don't have
to!
_______________________________________________
Sync-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/sync-dev