Thanks for this, Sorry it’s a bit light on details, it’s a very specific use case on iOS where cpu/disk speed is constrained and we have a large data set. The full replication protocol just requires too many reads/writes to be performant and we’ve optimised it various ways. We’re idempotent so long as per-document changes are in-order which I was just checking.
I appreciate the more technical analysis and it certainly clears up what I was asking. Cheers, Robert > On 4/09/2016, at 4:41 AM, Robert Samuel Newson <[email protected]> wrote: > > Hi, > > It is important to understand that the order of rows in the _changes response > is not important. In couchdb before 2.0 the response was totally ordered, but > this was never necessary for correctness. The essential contract for _changes > is that you are guaranteed to see all changes made since the 'since' > parameter you pass. The order of those changes is not guaranteed and it is > also not guaranteed that changes from _before_ that 'since' value are _not_ > also returned. The consequence of this contract is that all consumers of the > _changes response must apply each row idempotently. This is true for the > replicator, of course. > > The changes response in 2.0 is partially ordered. The changes from any given > shard will be in a consistent order, but we merge the changes from each shard > range of your database as they are collected from the various contributing > nodes, we don't apply a total ordering over that. The reason is simple; it's > expensive and unnecessary. It's important to also remember that replication, > even before 2.0, would not replicate in strict source update order either, > due to (valuable) parallelism when reading changes and applying them. > > Your question: "Is it possible for the changes feed to send older changes > before newer changes for the same document ID across multiple calls?" > requires a little further background knowledge before answering. > > While we call it a changes "feed" it's important to remember what it really > is, internally, first. Every database in couchdb, prior to 2.0, is a single > file with multiple b+trees recorded inside it that are kept in absolute sync > with each other. One b+tree allows you to look up a document by the _id > parameter. The other b+tree allows you to look up a document by its update > order. It is essential to note that these two b+trees have the same number of > key/value pairs in them at all times. > > To illustrate this more clearly, consider an empty database. We add one > document to it. It is retrievable by its _id and is also visible in the > _changes response as change number 1. Now, we update that document. It is now > change number 2. Change number 1 will never again appear in the _changes > response. That is, every document appears in the _changes response at its > most recent update number. > > When you call _changes without the continuous parameter, couchdb is simply > traversing that second b+tree and returning each row it finds. It may do this > from the beginning (which was 1 before our update and 2 after) or it may do > so from some update seq you supply with the 'since' parameter. > > With that now understood, we can look at what changes when we do > continuous=true which is what makes it a "feed" (that is, a potentially > unending response of changes as they are made). This is sent in two phases. > The first is exactly as the previous paragraph. Once all those changes have > been sent, couchdb enters a loop where it returns updates as they happen (or > shortly after). > > It is only in a continuous=true response in couchdb before 2.0 that you would > ever see more than one change for any given document. > > So, to cut a long story short (too late), the answer to your question is > "no". The changes feed is not a permanent history of all changes made to all > documents. Once a document is updated, it is _moved_ to a newer position and > no longer appears in its old one (and no record of that position is even > preserved). Do note, though, that couchdb might return 'Doc A change (seq: > 2-XXXX)' even if your 'since' parameter is _after_ the last change to doc A. > We won't return ' Doc A change (seq: 1-XXXX)' at all after its updated to > 2-XXXX. > > The algorithm for correctly processing the changes response is as follows, > and any variation on this is likely broken; > > 1) call /_changes?since=0 > 2) for each returned row, ensure the target has the change in question > (either use _id + _rev to prevent duplicate application of the change or > apply the change in a way that is idempotent) > 3) periodically store the update seq of the last processed row to stable > storage (a _local document is a good choice) > > If you wish to resume applying changes after a shutdown, reboot, or crash, > repeat the above process but substitute your stored update sequence in the > ?since= parameter. > > There are many things that use the changes feed in this way. Within couchdb, > there's database replication (obviously) but also couchdb views. Outside of > the core, software like pouchdb and couchdb-lucene use the changes feed to > replicate data or update search indexes. > > I hope this was useful, and I think it might expose some problems in your > couchdb-to-sqlite synchronisation protocol. Your email is obviously silent on > many details there, but if you've predicated its design on the total ordering > properties of couchdb < 2.0, you likely have some work to do. > > B. > > >> On 3 Sep 2016, at 00:04, Robert Payne <[email protected]> wrote: >> >> Hey Everyone, >> >> Reading up on the CouchDB 2.0 migration guides and getting a bit antsy >> around the mentions of out of order changes feed and sorts. Is it possible >> for the changes feed to send older changes before newer changes for the same >> document ID across multiple calls? >> >> Assuming start at ?since=“” and always pass in the “last_seq” on every >> additional call could a situation like this occur in a single or multiple >> HTTP calls: >> >> — Changes feed emits Doc A change (seq: 2-XXXX) >> — Changes feed emits Doc B change (seq: 3-XXXX) >> — Changes feed emits Doc A change (seq: 1-XXXX) >> >> I’m really hoping the case is just that across different doc ids changes can >> be out of order. Our use case on mobile is a bit particular as we duplicate >> edits into a separate SQLite table and use the changes feed to keep the >> local database up to date with winning revs from the server, it just >> increases the performance of sync by a ton since there is only 1 check and >> set in SQLite per change that comes in. >> >> Cheers, >> Robert >
