Re: Filtered replication of _users

Sebastien Thu, 17 Oct 2019 02:51:35 -0700

Thanks for sharing this Willem!

I'll need to digest that idea a bit more. But your approach looks appealing
to me as one of our concerns is about how to handle the obvious trust
issues with offline database modifications. Simply replicating everything
and accepting each modification to the shared databases without
verification is dangerous in a multi-user setting since anyone could be
modifying things that should be read-only for them (for instance) or do
other malicious things, potentially leading to data leaks for other users.


My initial idea there (but this is actually a different topic) was to
introduce a middleman between the databases and the clients so as to
perform validation of all incoming changes, before letting them through to
CouchDB. The problem there is that it doesn't seem very straightforward to
create such a "proxy" in Node for instance, which hooks into the data
changes and exposes the normal CouchDB API to the outside.

I was also thinking about using an event-drive architecture; pushing
normalized events (i.e., one collection of event documents, each with a
certain type, version and fields) and thus separating the reads from the
writes on the client side as well. Basically with that the clients would
maintain their own private read model and log events whenever an action is
taken. Then, synchronization would "simply" mean pushing the event log to
the server through a dedicated API, which would check everything before
inserting the changes where need be and let everyone get the changes
through the (then read-only) shared database.

This is for a tad later in our project so it's still brainstorming at this
point.

kr,
Sébastien

On Thu, Oct 17, 2019 at 10:19 AM Willem van der Westhuizen <
[email protected]> wrote:

> Hi Sebastien
>
> I though I will give you a quick input in our experience of supporting
> online and offline. We work in conditions where networks are poor and
> not reliable, and we have after quite some pain and trial and error
> reverted to use the replication mechanism to save data even for users
> working online. In our case which is a business process and workflow
> tool, it is absolutely essential that all documents arrive in the
> correct version. So we build a ACID - styled transaction engine, and
> when the user saves, it triggers a limited replication based on document
> ids. That has given us orders of magnitute greater stability in poor
> networks. Each user saves in a per-user databse and replicates to the
> server. the transactions engine processes it to the actuall correct
> database completing the save.
>
> Regards
>
> Willem
>
> On 2019/10/17 10:06, Sebastien wrote:
> > After all, we've decided not to rely on filtered replication for our use
> > case.
> >
> > The issue is that we will not only support an offline-first mode where a
> > filtered copy of the data will be retrieved, but there will also be an
> > online-only mode (e.g., when accessing the app from an untrusted device,
> > where the users might prefer not to store anything locally). In the
> > online-only mode, the users will need to directly access the database,
> but
> > it'll also need to be filtered and I'm not sure if there's a safe way to
> do
> > that.
> >
> > What we've chosen to do now is to keep the information colocated in
> _users
> > and to go through an API to retrieve the subset of information that is
> > required (e.g., n properties all members of database X). This way it
> works
> > fine in the online-only scenario, but also for the offline-first one
> since
> > we can persist the information after having retrieve it once. We also
> keep
> > better control over what happens with the data (up to some extent) and
> can
> > wipe it if/when necessary.
> >
> > This issue is rather hairy form a privacy protection point of view, but
> > such use cases are critical for multi-user offline-first systems.
> >
> > Thanks again for the useful feedback!
> >
> > kr,
> > Sébastien
> >
> > On Sun, Oct 13, 2019 at 10:34 AM Stefan Klein <[email protected]>
> wrote:
> >
> >> Hi Sebastien,
> >>
> >> Am Sa., 12. Okt. 2019 um 15:55 Uhr schrieb Sebastien <
> [email protected]
> >>> :
> >>> Taking that as starting point, one option could indeed be as you
> propose
> >> to
> >>> copy a subset of that "persons" database into each other database (of
> >>> course again only a subset of the info, ideally controllable by the end
> >>> users). One problem that I imagine with that is mainly the amount of
> >>> incurred data duplication.
> >> With the duplication it needs to be absolutely clear which of the
> >> copies is the authoritative version of the document and which are just
> >> copies, then it's manageable.
> >>
> >>> For instance, imagine that persons contains [A, B, C, D, E, F], then:
> >>> - If [A, B, C] have access to database X, then those users should have
> a
> >>> copy of [A, B, C] locally
> >>> - If [A, D, E] have access to database Y, then those users should have
> a
> >>> copy of [A,D,E] locally
> >>> Consequently, A should have A, B, C, D, E in his local "persons"
> database
> >>> copy.
> >>> If at some point E is removed from database Y, then user A should not
> >> have
> >>> E in his local database anymore.
> >>>
> >>> Does that sound like something that can be handled through filtered
> >>> replication?
> >> I am not aware of any way to delete documents in the target that still
> >> exist in the source.
> >> But if you have a copy of E in Y and delete E from Y at a later point,
> >> this delete will be replicated to the local DB too (If you don't
> >> filter out deleted documents).
> >> Since you probably have some kind of management system to remove E
> >> from Ys _security, you could either delete Es profile from Y in the
> >> same step or have a cron job or similar to remove the redundant
> >> profiles from the databases.
> >>
> >> One possible issue here though:
> >> If E gains access to Y again while Es profile wasn't changed, the
> >> former _deleted revision is still the "current" revision and Es
> >> profile stays _deleted in database Y.
> >> You would have to modify Es person document in the persons database,
> >> so it gets a new revision.
> >>
> >>> I hope that my system will be able to handle hundreds/thousands of
> >>> databases with 1-100 users in each database; each use having access to
> >>> ~1-10 database, thus potentially having access to ~1K user documents
> >>> locally (thus is really just an early guesstimate).
> >> Can't comment on pouchdb.
> >>  From my experience CouchDB doesn't care about how many databases
> >> exist, as long there is no current access to a database it is just a
> >> file in the file system.
> >>
> >>> The system currently doesn't allow users to manage their own profile
> but
> >>> it's indeed a requirement. I'll probably only allow users to modify
> their
> >>> own information while online through a dedicated API endpoint checking
> >> the
> >>> user's identity instead of letting them directly write to the "persons"
> >>> database.
> >> With this you do have a clear dataflow:
> >> Users modify their profile via API, this changes the persons database.
> >> Documents from the persons database are distributed to the destination
> >> databases.
> >> So there should be no issue with data duplication.
> >>
> >> regards,
> >> Stefan
> >>
>

Re: Filtered replication of _users

Reply via email to