Hi Stefan, I recognise your description and problem: I also gave up on the server-side performance. With 1.6.1 version of CouchDB I only saw two immediate options:
(1) More databases on the server-side to reduce the number of docs per database (2) Simply do the filtering on the client-side in PouchDB, this is actually quite fast and robust: Here experiment with best settings of options: *batch_size* and *timeout*. For (2) possibly combine with: https://github.com/nolanlawson/worker-pouch if there are a lot of documents ... however it would be best with a much faster "production-made" server-side filtering opportunity in CouchDB 2.x. Br, Sinan On 25 May 2016 at 10:34, Stefan du Fresne <[email protected]> wrote: > Hello all, > > I work on an app that involves a large amount of CouchDB filtered > replication (every user has a filtered subset of the DB locally via > PouchDB). Currently filtered replication is our number 1 performance > bottleneck for rolling out to more users, and I'm trying to work out where > we can go from here. > > Our current setup is one CouchDB database and N PouchDB installations, > which all two-way replicate, with the CouchDB->PouchDB replication being > filtered based on user permissions / relevance [1]. > > Our issue is that as we add users a) total document creation velocity > increases, and b) the proportion of documents that are relevant to any > particular user decreases. These two points cause replication-- both > initial onboarding and continual-- to take longer and longer. > > At this stage we are being forced to manually limit the number of users we > onboard at any particular time to half a dozen or so, or risk CouchDB being > unresponsive [2]. As we'd want to be onboarding 50-100 at any particular > time due to how we're rolling pit, you can imagine that this is pretty > painful. > > I have already re-written the filter in Erlang, which halved its execution > time, which is awesome! > > I also attempted to simplify the filter to increase performance. However, > filter speed seems more dependent on the physical size of your filter as > opposed to what code executes, which makes writing a simple filter that can > fall-back to a complicated filter not terribly useful (see: > https://issues.apache.org/jira/browse/COUCHDB-3021 < > https://issues.apache.org/jira/browse/COUCHDB-3021>) > > If the above linked ticket is fixed (if it can be) this would make our > filter 3-4x faster again. However, this still wouldn't address the > fundamental issue that filtered replication is very CPU-intensive, and so > as noted above doesn't seem to scale terribly well. > > Ideally then, I would like to remove filter replication completely, but > there does not seem to be a good alternative right now. > > Looking through the archives there was talk of adding view replication, > see: > https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJNb-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E > < > https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJNb-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E> > , but it doesn't look like this ever got resolved. > > There is also often talk of databases per user being a good scaling > strategy, but we're basically doing that already (with PouchDB), and for > us documents aren't owned / viewed by just one person so this does not get > us away from filtered replication (eg a supervisor replicates her documents > as well as N sub-users documents). There are potentially wild and crazy > schemes that involves many different databases where the equivalent of > filtering is expressed in replication relationships, but this would add a > massive amount of complexity to our app, and I’m not even convinced it > would work as there are lots of edge cases to consider. > > Does anyone know of anything else I can try to increase replication > performance? Or to safeguard against many replicators unacceptably > degrading couchdb performance? Does Couch 2.0 address any of these concerns? > > Thanks in advance, > - Stefan du Fresne > > [1] security is handled by not exposing couch and going through a wrapper > service that validates couch requests, relevance is hierarchy based (i.e. > documents you or your subordinates are authors of are replicated to you) > [2] there are also administrators / configurers that access couchdb > directly
