Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-05-01 Thread Ilya Khlopotov
> `update_seq` - is same as earlier. Not entirely sure on the intent there. I just recall another reason why I did this. It helps with etag generation. I am not going to add it to the spec for now. On 2020/04/23 21:15:05, Paul Davis wrote: > I'd agree that my initial reaction to cursor was that

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-05-01 Thread Ilya Khlopotov
> Maybe we should set a hard limit on the maximum doc ids size, 2-4KB? > We have a config setting to do it already. I am +100 for stricter limit. We need to limit db name and doc ids. However I do agree that it is a bigger change and requires more discussion. > Also was curious, in the latest p

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-30 Thread Nick Vatamaniuc
Hi Ilya, Maybe we should set a hard limit on the maximum doc ids size, 2-4KB? We have a config setting to do it already. And we also have hard limit of 10KB for FDB keys. Due to a limitation in Erlang http header parser, used through mochiweb, 8KB is (was?) the limit based on the default socket r

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-30 Thread Adam Kocoloski
I think this is a good reason to fall back to just including the value of the bookmark in “first”, “next” and “last”, and then leaving it up to the client to decide whether to supply the bookmark in the URL or in the request body. Adam > On Apr 30, 2020, at 10:47 AM, Joan Touzet wrote: > > Ca

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-30 Thread Joan Touzet
Can we keep a distributed hash of doc keys server side with a smaller handle we hand to clients? If the cache can't be found or a restart happens, oh well. -Joan On 2020-04-30 10:23, Ilya Khlopotov wrote: There is a problem with representing `next`/`previous`/`first` as path. With 5kB sized

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-30 Thread Ilya Khlopotov
There is a problem with representing `next`/`previous`/`first` as path. With 5kB sized doc keys and we could exceed max URL length (8192 bytes). This means we would have to support POST. The question is how to handle the case when the URL is greater than 8192. The problem is CouchDB don't knows

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-29 Thread Ilya Khlopotov
I think I addressed all comments and created an RFC https://github.com/apache/couchdb-documentation/pull/530 On 2020/04/28 11:56:15, Ilya Khlopotov wrote: > Hello, > > I would like to introduce second proposal. > > 1) Add new optional query field called `bookmark` (or `token`) to following

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-28 Thread Adam Kocoloski
Hi Ilya, Initial reaction — there’s a lot to like here. It seems like a pragmatic step forward for the current API that handles the corner case of large responses while maintaining compatibility for the large majority of API requests that don’t exceed this limit. I think the `limit` parameter

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-28 Thread Nick Vatamaniuc
The `{"href": "https://myserver.com/myddb/_all_docs?limit=50&descending=true"}` might be tricky if requests have to go through a few reverse proxies before reaching CouchDB. CouchDB might not know its own "external" domain so to speak. I have used X-Forwarded-For before for this exact pattern befor

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-28 Thread Paul Davis
Seems reasonable to me. I'd agree that setting query string parameters with a bookmark should be rejected. I was also going to suggest eliding the href member. In the examples I've seen those are usually structured as something like: "links": { "previous": "/path/and/qs=foo", "next": "/pat

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-28 Thread Ilya Khlopotov
Hello, I would like to introduce second proposal. 1) Add new optional query field called `bookmark` (or `token`) to following endpoints - {db}/_all_docs - {db}/_all_docs/queries - _dbs_info - {db}/_design/{ddoc}/_view/{view} - {db}/_design/{ddoc}/_view/{view}/queries 2) Add following

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-27 Thread Nick Vatamaniuc
Hi Ilya, Good idea. Let's continue the discussion about versioning there. On Mon, Apr 27, 2020 at 12:51 PM Ilya Khlopotov wrote: > > Hi Nick, > > Thank you for extensive answer. > > > API versioning idea in principle sounds good, but can't think of a > > clean way to do it. /_v2/_all_dbs pattern

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-27 Thread Ilya Khlopotov
Hi Nick, Thank you for extensive answer. > API versioning idea in principle sounds good, but can't think of a > clean way to do it. /_v2/_all_dbs pattern might work, See separate discussion here https://lists.apache.org/thread.html/rcc742c0fdca0363bb338b54526045720868597ea35ee6842aef174e0%40%3

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-27 Thread Nick Vatamaniuc
It's good to see more activity in the thread. I thought everyone had lost interest :-) Nice work, Ilya, on the prototype. I think you picked what I had initially called option D and E. With the exception that we don't force clients to specify a limit when a max limit is configured in the settings.

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-24 Thread Joan Touzet
Hi Ilya, one question: On 2020-04-23 18:27, Ilya Khlopotov wrote: Hello, I did an experiment and would like to share the results. So far I implemented only _all_dbs/cursor Here is how it works (I have only 2 databases) curl -u adm:pass "http://127.0.0.1:15984/_all_dbs/cursor?limit=1"; | jq '.

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-24 Thread Ilya Khlopotov
> https://medium.com/@ignaciochiazzo/paginating-requests-in-apis-d4883d4c1c4c Very good article. My PoC experiment is in fact implementation of a cursor based pagination. Event though the bookmark encodes all non default values of mrargs the algorithm only uses: - limit - doesn't change - start_

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-24 Thread Ilya Khlopotov
> On versioning, I've not seen a better article than this one: > https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/ I wouldn't propose new endpoint if we would have a strong story for API versioning. Currently we don't. BTW we could put these new endpoints into a new namespace for e

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-24 Thread Ilya Khlopotov
This is exactly what I did in my experiment. > I don't think a whole new API is required here Some existing endpoints return lists and not objects. Which means we have no place to return bookmark. On 2020/04/23 21:33:49, Glynn Bird wrote: > I don't think a whole new API is required here, but

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Ilya Khlopotov
Hello, I did an experiment and would like to share the results. So far I implemented only _all_dbs/cursor Here is how it works (I have only 2 databases) curl -u adm:pass "http://127.0.0.1:15984/_all_dbs/cursor?limit=1"; | jq '.' { "items": [ "_users" ], "completed": false, "bookmark"

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Glynn Bird
I don't think a whole new API is required here, but I would like to see some sort of "bookmark" facility for _all_docs and views, as pagination with the current API is awkward. I would imagine it working as follows: // first request curl $URL/mydb/_all_docs?startkey="aardvark"&endkey="moose"&limi

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Robert Samuel Newson
On versioning, I've not seen a better article than this one: https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/ For _changes, definitely agree we should be including it in this discussion, it is the only endpoint with, in theory, an eternal response, and I think that's a bug not a

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Paul Davis
I'd agree that my initial reaction to cursor was that its not a great fit, but there does seem to be the existing name used in the greater REST world for this sort of pagination so I'm not concerned about using that terminology. I'm generally on board with allowing and setting some default sane li

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Robert Samuel Newson
I think it's a key difference from "cursor" as I've seen them elsewhere, that ours will point at an ever changing database, you couldn't seamlessly cursor through a large data set, one "page" at a time. Bookmarks began in search (raises guilty hand) in order to address a Lucene-specific issue

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Joan Touzet
I realise this is bikeshedding, but I guess that's kind of the point... Everything below is my opinion, not "fact." It's unfortunate we need a new endpoint for all of this. In a vacuum I might have just suggested we use the semantics we already have, perhaps with ?from= instead of ?since= .

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Robert Newson
cursor has established meaning in other databases and ours would not be very close to them. I don’t think it’s a good idea. B. > On 23 Apr 2020, at 11:50, Ilya Khlopotov wrote: > >  >> >> The best I could come up with is replacing page with >> cursor - {db}/_all_docs/cursor or possibly {db

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Ilya Khlopotov
> The best I could come up with is replacing page with > cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs Good idea, I like {db}/_all_docs/cursor (or {db}/_all_docs/_cursor). On 2020/04/23 08:54:36, Garren Smith wrote: > I agree with Bob that page doesn't make sense as an endpoi

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Ilya Khlopotov
> All the fields in the bookmark make sense except timestamp. Why would it > matter if the timestamp is old? What happens if a node's time is an hour > behind another node? Bookmarks are not permalinks. The exact behavior would depend on the type of pagination. # offset based pagination There w

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-23 Thread Garren Smith
I agree with Bob that page doesn't make sense as an endpoint. I'm also rubbish with naming. The best I could come up with is replacing page with cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs All the fields in the bookmark make sense except timestamp. Why would it matter if the t

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-22 Thread Ilya Khlopotov
- page is to provide some notion of progress for user - timestamp - I was thinking that we should drop requests if user would try to pass bookmark created an hour ago. On 2020/04/22 21:58:40, Robert Samuel Newson wrote: > "page" and "page number" are odd to me as these don't exist as concepts,

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-22 Thread Robert Samuel Newson
"page" and "page number" are odd to me as these don't exist as concepts, I'd rather not invent them. I note there's no mention of page size, which makes "page number" very vague. What is "timestamp" in the bookmark and what effect does it have when the bookmark is passed back in? I guess, why

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-22 Thread Ilya Khlopotov
Hello everyone, Based on the discussions on the thread I would like to propose a number of first steps: 1) introduce new endpoints - {db}/_all_docs/page - {db}/_all_docs/queries/page - _all_dbs/page - _dbs_info/page - {db}/_design/{ddoc}/_view/{view}/page - {db}/_design/{ddoc}/_view/{

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-10 Thread Nick Vatamaniuc
ing to matching headers > https://tools.ietf.org/html/rfc2616#section-10.4.13 which I don't think > apply in this case. > > Rich > > > > From: Nick Vatamaniuc > To: dev@couchdb.apache.org > Date: 09/04/2020 00:25 > Subject:[EXTERNAL] Re: [DISCUSS

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-09 Thread Gordon Baird
I would like to chime in on this (apologies if it is out of sequence or if not right context, but would like to raise the concept). One of the issues we face for our enterprise deployments in considering CouchDB is our ability to log or serialize the edits and changes to the documents (a k

RE: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-09 Thread Richard Ellis
13 which I don't think apply in this case. Rich From: Nick Vatamaniuc To: dev@couchdb.apache.org Date: 09/04/2020 00:25 Subject:[EXTERNAL] Re: [DISCUSS] Streaming API in CouchDB 4.0 Thanks for replying, Adam! Thinking about it some more, it seems there are two benefits

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-08 Thread Nick Vatamaniuc
Thanks for replying, Adam! Thinking about it some more, it seems there are two benefits to changing the streaming APIs: 1) To provide users with a serializable snapshot. We don't currently have that, as Mike pointed out, unless we use n=1&q=1 or CouchDB version 1.x. It would be nice to get tha

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-04-01 Thread Adam Kocoloski
This is a really important topic; thanks Nick for bringing it up. Sorry I didn’t comment earlier. I think Mike neatly captures my perspective with this bit: >> Our current behaviour seems extremely subtle and, I'd argue, unexpected. It >> is hard to reason about if you really need a particular

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-02-25 Thread Nick Vatamaniuc
Hi Mike, Good point about CouchDB not actually providing point-in-time snapshots. I missed those cases when thinking about it. I wonder if that points to defaulting to option A since it maintains the API compatibility and doesn't loosen the current constraints anyway. At least it will un-break t

Re: [DISCUSS] Streaming API in CouchDB 4.0

2020-02-24 Thread Mike Rhodes
Nick, Thanks for thinking this through, it's certainly subtle and very unclear what is a "good" solution :( I have a couple of thoughts, firstly about the guarantees we currently offer and then wondering whether there is an opportunity to improve our API by offering a single guarantee across a

[DISCUSS] Streaming API in CouchDB 4.0

2020-02-19 Thread Nick Vatamaniuc
Hello everyone, I'd like to discuss the shape and behavior of streaming APIs for CouchDB 4.x By "streaming APIs" I mean APIs which stream data in row as it gets read from the database. These are the endpoints I was thinking of: _all_docs, _all_dbs, _dbs_info and query results I want to focus