Re: Multiple database backup strategy

2016-03-21 Thread Javier Candeira
On 16/03/16 04:55, Adam Kocoloski wrote: I’m missing something here Javier. A pull replication mediated on the target server should still be able to remotely hit _db_updates on the source, shouldn’t it? I do realize that there’s a potential permissions issue here, when the replication is e

Re: Multiple database backup strategy

2016-03-20 Thread Robert Newson
Groovy, that's consensus then. Where we can use _db_updates to know which pending jobs are worth running, we will. If it's a 404, we'll do something less optimal, start the job itself. Using the connection pool as discussed and accounting for and penalising jobs that complete very quickly or p

Re: Multiple database backup strategy

2016-03-20 Thread Adam Kocoloski
I’ll never berate anyone for top-posting (or bottom-posting for that matter). I just follow suit with whatever the current thread is doing — in this, very very clearly top-posting ;) Thank you for making this distinction clear. Personally I was only ever interested in the first case. Scoping th

Re: Multiple database backup strategy

2016-03-20 Thread Robert Samuel Newson
Final note, we've conflated two uses of /_db_updates that I want to be very clear on; 1) using /_db_updates to detect active source databases of a replication job. 2) using /_db_updates to hear about new/updated/deleted _replicator documents. It was the 2nd case where the unreliability was a con

Re: Multiple database backup strategy

2016-03-20 Thread Robert Samuel Newson
(I swear I'll stop soon...) Using /_db_updates as a cheap mechanism to detect activity at the source for any database we're interested in is an important optimization. We didn't discuss it this past week as we felt that /_db_updates wasn't sufficiently reliable. We can save a lot of churn in th

Re: Multiple database backup strategy

2016-03-20 Thread Robert Samuel Newson
I missed a point in Adam's earlier post. The current scheme uses couch_event for runtime changes to _replicator docs but has to read all updates of all _replicator databases at startup. In the steady state it is just receiving couch_event notifications. The /_db_updates option would change that

Re: Multiple database backup strategy

2016-03-20 Thread Robert Samuel Newson
Since I'm typing anyway, and haven't yet been dinged for top-posting, I wanted to mention one other optimization we had in mind. Currently each replicator job has its own connection pool. When we introduce the notion that we can stop and restart jobs, those become approximately useless. So we w

Re: Multiple database backup strategy

2016-03-20 Thread Robert Samuel Newson
My point is that we can (and currently do) trigger the replication manager on receipt of the database updated event, so it avoids all of the other parts of the sequence you describe which could fail. The obvious difference, and I suspect this is what motivates Adam's position, is that _db_updat

Re: Multiple database backup strategy

2016-03-20 Thread Robert Samuel Newson
Hi, If there's a chance that a user can add a single _replicator doc without it being picked up by _db_updates, I think that's a deal breaker. If a user is regularly adding/updating/deleting _replicator docs then, yes, I believe we can say we'll eventually notice. I did mean 'use couch_event'

Re: Multiple database backup strategy

2016-03-19 Thread Benjamin Bastian
When a shard is updated, it'll trigger a "database updated" event. CouchDB will hold those updates in memory for a configurable amount of time in order to dedupe updates. It'll then cast lists of updated databases to nodes which host the relevant _db_updates shards for further deduplication. It's o

Re: Multiple database backup strategy

2016-03-19 Thread Adam Kocoloski
Hi Bob, comments inline: > On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson wrote: > > Hi, > > The problem is that _db_updates is not guaranteed to see every update, so I > think it falls at the first hurdle. Do you mean to say that a listener of _db_updates is not guaranteed to see every u

Re: Multiple database backup strategy

2016-03-19 Thread Robert Samuel Newson
Hi, The problem is that _db_updates is not guaranteed to see every update, so I think it falls at the first hurdle. What couch_replicator_manager does in couchdb 2.0 (though not in the version that Cloudant originally contributed) is to us ecouch_event, notice which are to _replicator shards,

Re: Multiple database backup strategy

2016-03-15 Thread Adam Kocoloski
> On Mar 14, 2016, at 2:11 AM, jav...@candeira.com wrote: > > On 2016-03-14 13:40, Adam Kocoloski wrote: > My current solution watches the global _changes feed and fires up a continuous replication to an off-site server whenever it sees a change. If it doesn't see a change from a

Re: Multiple database backup strategy

2016-03-13 Thread javier
On 2016-03-14 13:40, Adam Kocoloski wrote: My current solution watches the global _changes feed and fires up a continuous replication to an off-site server whenever it sees a change. If it doesn't see a change from a database for 10 minutes, it kills that replication. This means I only have ~1

Re: Multiple database backup strategy

2016-03-13 Thread Adam Kocoloski
> On Mar 10, 2016, at 3:18 AM, Jan Lehnardt wrote: > >> >> On 09 Mar 2016, at 21:29, Nick Wood wrote: >> >> Hello, >> >> I'm looking to back up a CouchDB server with multiple databases. Currently >> 1,400, but it fluctuates up and down throughout the day as new databases >> are added and old