Re: The state of filtered replication

Sinan Gabel Wed, 25 May 2016 01:56:02 -0700

Hi Stefan,

I recognise your description and problem: I also gave up on the server-side
performance. With 1.6.1 version of CouchDB I only saw two immediate options:


(1) More databases on the server-side to reduce the number of docs per
database
(2) Simply do the filtering on the client-side in PouchDB, this is actually
quite fast and robust: Here experiment with best settings of options:
*batch_size* and *timeout*.

For (2) possibly combine with: https://github.com/nolanlawson/worker-pouch
if there are a lot of documents


... however it would be best with a much faster "production-made"
server-side filtering opportunity in CouchDB 2.x.


Br,
Sinan

On 25 May 2016 at 10:34, Stefan du Fresne <[email protected]> wrote:

> Hello all,
>
> I work on an app that involves a large amount of CouchDB filtered
> replication (every user has a filtered subset of the DB locally via
> PouchDB). Currently filtered replication is our number 1 performance
> bottleneck for rolling out to more users, and I'm trying to work out where
> we can go from here.
>
> Our current setup is one CouchDB database and N PouchDB installations,
> which all two-way replicate, with the CouchDB->PouchDB replication being
> filtered based on user permissions / relevance [1].
>
> Our issue is that as we add users a) total document creation velocity
> increases, and b) the proportion of documents that are relevant to any
> particular user decreases. These two points cause replication-- both
> initial onboarding and continual-- to take longer and longer.
>
> At this stage we are being forced to manually limit the number of users we
> onboard at any particular time to half a dozen or so, or risk CouchDB being
> unresponsive [2]. As we'd want to be onboarding 50-100 at any particular
> time due to how we're rolling pit, you can imagine that this is pretty
> painful.
>
> I have already re-written the filter in Erlang, which halved its execution
> time, which is awesome!
>
> I also attempted to simplify the filter to increase performance. However,
> filter speed seems more dependent on the physical size of your filter as
> opposed to what code executes, which makes writing a simple filter that can
> fall-back to a complicated filter not terribly useful (see:
> https://issues.apache.org/jira/browse/COUCHDB-3021 <
> https://issues.apache.org/jira/browse/COUCHDB-3021>)
>
> If the above linked ticket is fixed (if it can be) this would make our
> filter 3-4x faster again. However, this still wouldn't address the
> fundamental issue that filtered replication is very CPU-intensive, and so
> as noted above doesn't seem to scale terribly well.
>
> Ideally then, I would like to remove filter replication completely, but
> there does not seem to be a good alternative right now.
>
> Looking through the archives there was talk of adding view replication,
> see:
> https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJNb-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E
> <
> https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJNb-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E>
> , but it doesn't look like this ever got resolved.
>
> There is also often talk of databases per user being a good scaling
> strategy, but we're basically doing that already (with PouchDB),  and for
> us documents aren't owned / viewed by just one person so this does not get
> us away from filtered replication (eg a supervisor replicates her documents
> as well as N sub-users documents). There are potentially wild and crazy
> schemes that involves many different databases where the equivalent of
> filtering is expressed in replication relationships, but this would add a
> massive amount of complexity to our app, and I’m not even convinced it
> would work as there are lots of edge cases to consider.
>
> Does anyone know of anything else I can try to increase replication
> performance? Or to safeguard against many replicators unacceptably
> degrading couchdb performance? Does Couch 2.0 address any of these concerns?
>
> Thanks in advance,
> - Stefan du Fresne
>
> [1] security is handled by not exposing couch and going through a wrapper
> service that validates couch requests, relevance is hierarchy based (i.e.
> documents you or your subordinates are authors of are replicated to you)
> [2] there are also administrators / configurers that access couchdb
> directly

Re: The state of filtered replication

Reply via email to