Thanks for this,

Sorry it’s a bit light on details, it’s a very specific use case on iOS where 
cpu/disk speed is constrained and we have a large data set. The full 
replication protocol just requires too many reads/writes to be performant and 
we’ve optimised it various ways. We’re idempotent so long as per-document 
changes are in-order which I was just checking.

I appreciate the more technical analysis and it certainly clears up what I was 
asking.

Cheers,

Robert

> On 4/09/2016, at 4:41 AM, Robert Samuel Newson <[email protected]> wrote:
> 
> Hi,
> 
> It is important to understand that the order of rows in the _changes response 
> is not important. In couchdb before 2.0 the response was totally ordered, but 
> this was never necessary for correctness. The essential contract for _changes 
> is that you are guaranteed to see all changes made since the 'since' 
> parameter you pass. The order of those changes is not guaranteed and it is 
> also not guaranteed that changes from _before_ that 'since' value are _not_ 
> also returned. The consequence of this contract is that all consumers of the 
> _changes response must apply each row idempotently. This is true for the 
> replicator, of course.
> 
> The changes response in 2.0 is partially ordered. The changes from any given 
> shard will be in a consistent order, but we merge the changes from each shard 
> range of your database as they are collected from the various contributing 
> nodes, we don't apply a total ordering over that. The reason is simple; it's 
> expensive and unnecessary. It's important to also remember that replication, 
> even before 2.0, would not replicate in strict source update order either, 
> due to (valuable) parallelism when reading changes and applying them.
> 
> Your question: "Is it possible for the changes feed to send older changes 
> before newer changes for the same document ID across multiple calls?" 
> requires a little further background knowledge before answering.
> 
> While we call it a changes "feed" it's important to remember what it really 
> is, internally, first. Every database in couchdb, prior to 2.0, is a single 
> file with multiple b+trees recorded inside it that are kept in absolute sync 
> with each other. One b+tree allows you to look up a document by the _id 
> parameter. The other b+tree allows you to look up a document by its update 
> order. It is essential to note that these two b+trees have the same number of 
> key/value pairs in them at all times.
> 
> To illustrate this more clearly, consider an empty database. We add one 
> document to it. It is retrievable by its _id and is also visible in the 
> _changes response as change number 1. Now, we update that document. It is now 
> change number 2. Change number 1 will never again appear in the _changes 
> response. That is, every document appears in the _changes response at its 
> most recent update number.
> 
> When you call _changes without the continuous parameter, couchdb is simply 
> traversing that second b+tree and returning each row it finds. It may do this 
> from the beginning (which was 1 before our update and 2 after) or it may do 
> so from some update seq you supply with the 'since' parameter.
> 
> With that now understood, we can look at what changes when we do 
> continuous=true which is what makes it a "feed" (that is, a potentially 
> unending response of changes as they are made). This is sent in two phases. 
> The first is exactly as the previous paragraph. Once all those changes have 
> been sent, couchdb enters a loop where it returns updates as they happen (or 
> shortly after).
> 
> It is only in a continuous=true response in couchdb before 2.0 that you would 
> ever see more than one change for any given document.
> 
> So, to cut a long story short (too late), the answer to your question is 
> "no". The changes feed is not a permanent history of all changes made to all 
> documents. Once a document is updated, it is _moved_ to a newer position and 
> no longer appears in its old one (and no record of that position is even 
> preserved). Do note, though, that couchdb might return 'Doc A change (seq: 
> 2-XXXX)' even if your 'since' parameter is _after_ the last change to doc A. 
> We won't return ' Doc A change (seq: 1-XXXX)' at all after its updated to 
> 2-XXXX.
> 
> The algorithm for correctly processing the changes response is as follows, 
> and any variation on this is likely broken;
> 
> 1) call /_changes?since=0
> 2) for each returned row, ensure the target has the change in question 
> (either use _id + _rev to prevent duplicate application of the change or 
> apply the change in a way that is idempotent)
> 3) periodically store the update seq of the last processed row to stable 
> storage (a _local document is a good choice)
> 
> If you wish to resume applying changes after a shutdown, reboot, or crash, 
> repeat the above process but substitute your stored update sequence in the 
> ?since= parameter.
> 
> There are many things that use the changes feed in this way. Within couchdb, 
> there's database replication (obviously) but also couchdb views. Outside of 
> the core, software like pouchdb and couchdb-lucene use the changes feed to 
> replicate data or update search indexes.
> 
> I hope this was useful, and I think it might expose some problems in your 
> couchdb-to-sqlite synchronisation protocol. Your email is obviously silent on 
> many details there, but if you've predicated its design on the total ordering 
> properties of couchdb < 2.0, you likely have some work to do.
> 
> B.
> 
> 
>> On 3 Sep 2016, at 00:04, Robert Payne <[email protected]> wrote:
>> 
>> Hey Everyone,
>> 
>> Reading up on the CouchDB 2.0 migration guides and getting a bit antsy 
>> around the mentions of out of order changes feed and sorts. Is it possible 
>> for the changes feed to send older changes before newer changes for the same 
>> document ID across multiple calls?
>> 
>> Assuming start at ?since=“” and always pass in the “last_seq” on every 
>> additional call could a situation like this occur in a single or multiple 
>> HTTP calls:
>> 
>> — Changes feed emits Doc A change (seq: 2-XXXX)
>> — Changes feed emits Doc B change (seq: 3-XXXX)
>> — Changes feed emits Doc A change (seq: 1-XXXX)
>> 
>> I’m really hoping the case is just that across different doc ids changes can 
>> be out of order. Our use case on mobile is a bit particular as we duplicate 
>> edits into a separate SQLite table and use the changes feed to keep the 
>> local database up to date with winning revs from the server, it just 
>> increases the performance of sync by a ton since there is only 1 check and 
>> set in SQLite per change that comes in.
>> 
>> Cheers,
>> Robert
> 

Reply via email to