Bah, our “cue”, not our “queue” ;) Adam
> On Mar 7, 2019, at 7:35 AM, Adam Kocoloski <[email protected]> wrote: > > Hi Garren, > > In general we wouldn’t know ahead of time whether we can complete in five > seconds. I believe the way it works is that we start a transaction, issue a > bunch of reads, and after 5 seconds any additional reads will start to fail > with something like “read version too old”. That’s our queue to start a new > transaction. All the reads that completed successfully are fine, and the > CouchDB API layer can certainly choose to start streaming as soon as the > first read completes (~2ms after the beginning of the transaction). > > Agree with Bob that steering towards a larger number of short-lived > operations is the way to go in general. But I also want to balance that with > backwards-compatibility where it makes sense. > > Adam > >> On Mar 7, 2019, at 7:22 AM, Garren Smith <[email protected]> wrote: >> >> I agree that option A seems the most sensibile. I just want to understand >> this comment: >> >>>> A _changes request that cannot be satisfied within the 5 second limit >> will be implemented as multiple FoundationDB transactions under the covers >> >> How will we know if a change request cannot be completed in 5 seconds? Can >> we tell that beforehand. Or would we try and complete a change request. The >> transaction fails after 5 seconds and then do multiple transactions to get >> the full changes? If that is the case the response from CouchDB to the user >> will be really slow as they have already waited 5 seconds and have still >> not received anything. Or if we start streaming a result back to the user >> in the first transaction (Is this even possible?) then we would somehow >> need to know how to continue the changes feed after the transaction has >> failed. >> >> Then Bob from your comment: >> >>>> Forcing clients to do short (<5s) requests feels like a general good, as >> long as meaningful things can be done in that time-frame, which I strongly >> believe from what we've said elsewhere that they can. >> >> That makes sense, but how would we do that? How do you help a user to make >> sure their request is under 5 seconds? >> >> Cheers >> Garren >> >> >> >> On Thu, Mar 7, 2019 at 11:15 AM Robert Newson <[email protected]> wrote: >> >>> Hi, >>> >>> Given that option A is the behaviour of feed=continuous today (barring the >>> initial whole-snapshot phase to catch up to "now") I think that's the right >>> move. I confess to not reading your option B too deeply but I was there on >>> IRC when the first spark was lit. We can build some sort of temporary >>> multi-index on FDB today, that's clear, but it's equally clear that we >>> should avoid doing so if at all possible. >>> >>> Perhaps the future Redwood storage engine for FDB will, as you say, >>> significantly improve on this, but, even if it does, I'm not 100% convinced >>> we should expose it. Forcing clients to do short (<5s) requests feels like >>> a general good, as long as meaningful things can be done in that >>> time-frame, which I strongly believe from what we've said elsewhere that >>> they can. >>> >>> CouchDB's API, as we both know from rich (heh, and sometimes poor) >>> experience in production, has a lot of endpoints of wildly varying >>> performance characteristics. It's right that we evolve away from that where >>> possible, and this seems a great candidate given the replicator in ~all >>> versions of CouchDB will handle the change without blinking. >>> >>> We have the same issue for _all_docs and _view and _find, in that the user >>> might ask for more data back than can be sent within a single FDB >>> transaction. I suggest that's a new thread, though. >>> >>> -- >>> Robert Samuel Newson >>> [email protected] >>> >>> On Thu, 7 Mar 2019, at 01:24, Adam Kocoloski wrote: >>>> Hi all, as the project devs are working through the design for the >>>> _changes feed in FoundationDB we’ve come across a limitation that is >>>> worth discussing with the broader user community. FoundationDB >>>> currently imposes a 5 second limit on all transactions, and read >>>> versions from old transactions are inaccessible after that window. This >>>> means that, unlike a single CouchDB storage shard, it is not possible >>>> to grab a long-lived snapshot of the entire database. >>>> >>>> In extant versions of CouchDB we rely on this long-lived snapshot >>>> behavior for a number of operations, some of which are user-facing. For >>>> example, it is possible to make a request to the _changes feed for a >>>> database of an arbitrary size and, if you’ve got the storage space and >>>> time to spare, you can pull down a snapshot of the entire database in a >>>> single request. That snapshot will contain exactly one entry for each >>>> document in the database. In CouchDB 1.x the documents appear in the >>>> order in which they were most recently updated. In CouchDB 2.x there is >>>> no guaranteed ordering, although in practice the documents are roughly >>>> ordered by most recent edit. Note that you really do have to complete >>>> the operation in a single HTTP request; if you chunk up the requests or >>>> have to retry because the connection was severed then the exactly-once >>>> guarantees disappear. >>>> >>>> We have a couple of different options for how we can implement _changes >>>> with FoundationDB as a backing store, I’ll describe them below and >>>> discuss the tradeoffs >>>> >>>> ## Option A: Single Version Index, long-running operations as multiple >>>> transactions >>>> >>>> In this option the internal index has exactly one entry for each >>>> document at all times. A _changes request that cannot be satisfied >>>> within the 5 second limit will be implemented as multiple FoundationDB >>>> transactions under the covers. These transactions will have different >>>> read versions, and a document that gets updated in between those read >>>> versions will show up *multiple times* in the response body. The entire >>>> feed will be totally ordered, and later occurrences of a particular >>>> document are guaranteed to represent more recent edits than than the >>>> earlier occurrences. In effect, it’s rather like the semantics of a >>>> feed=continuous request today, but with much better ordering and zero >>>> possibility of “rewinds” where large portions of the ID space get >>>> replayed because of issues in the cluster. >>>> >>>> This option is very efficient internally and does not require any >>>> background maintenance. A future enhancement in FoundationDB’s storage >>>> engine is designed to enable longer-running read-only transactions, so >>>> we will likely to be able to improve the semantics with this option >>>> over time. >>>> >>>> ## Option B: Multi-Version Index >>>> >>>> In this design the internal index can contain multiple entries for a >>>> given document. Each entry includes the sequence at which the document >>>> edit was made, and may also include a sequence at which it was >>>> overwritten by a more recent edit. >>>> >>>> The implementation of a _changes request would start by getting the >>>> current version of the datastore (call this the read version), and then >>>> as it examines entries in the index it would skip over any entries >>>> where there’s a “tombstone” sequence less than the read version. >>>> Crucially, if the request needs to be implemented across multiple >>>> transactions, each transaction would use the same read version when >>>> deciding whether to include entries in the index in the _changes >>>> response. The readers would know to stop when and if they encounter an >>>> entry where the created version is greater than the read version. >>>> Perhaps a diagram helps to clarify, a simplified version of the >>>> internal index might look like >>>> >>>> {“seq”: 1, “id”: ”foo”} >>>> {“seq”: 2, “id”: ”bar”, “tombstone”: 5} >>>> {“seq”: 3, “id”: “baz”} >>>> {“seq”: 4, “id”: “bif”, “tombstone": 6} >>>> {“seq”: 5, “id”: “bar”} >>>> {“seq”: 6, “id”: “bif”} >>>> >>>> A _changes request which happens to commence when the database is at >>>> sequence 5 would return (ignoring the format of “seq” for simplicity) >>>> >>>> {“seq”: 1, “id”: ”foo”} >>>> {“seq”: 3, “id”: “baz”} >>>> {“seq”: 4, “id”: “bif”} >>>> {“seq”: 5, “id”: “bar”} >>>> >>>> i.e., the first instance “bar” would be skipped over because a more >>>> recent version exists within the time horizon, but the first instance >>>> of “bif” would included because “seq”: 6 is outside our horizon. >>>> >>>> The downside of this approach is someone has to go in and clean up >>>> tombstoned index entries eventually (or else provision lots and lots of >>>> storage space). One way we could do this (inside CouchDB) would be to >>>> have each _changes session record its read version somewhere, and then >>>> have a background process go in and remove tombstoned entries where the >>>> tombstone is less than the earliest read version of any active request. >>>> It’s doable, but definitely more load on the server. >>>> >>>> Also, note this approach is not guaranteeing that the older versions of >>>> the documents referenced in those tombstoned entries are actually >>>> accessible. Much like today, the changes feed would include a revision >>>> identifier which, upon closer inspection, has been superseded by a more >>>> recent version of the document. Unlike today, that older version would >>>> be expunged from the database immediately if a descendant revision >>>> exists. >>>> >>>> — >>>> >>>> OK, so those are the two basic options. I’d particularly like to hear >>>> if the behavior described in Option A would prove problematic for >>>> certain use cases, as it’s the simpler and more efficient of the two >>>> options. Thanks! >>>> >>>> Adam >>> >
