Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-05-10 Thread Jan Lehnardt
Hey all, I like the discussion here and the suggestion too, being a lot more predictable about timeouts will be great if only from an error message perspective, where currently we need a seance to find out what really caused a fabric_timeout, and/or spurious rpc errors that are tough to figure

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-26 Thread Adam Kocoloski
Hi Joan, great topic. We don’t have enough realistic benchmarking data to be really specific yet, but my expectation is that the aggregate size of the underlying KV pairs is at least as important as number of documents in the batch. I have no doubt we’ll be able to ingest 1,000 1KB documents

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-26 Thread Joan Touzet
Hi Adam, I'll bring up a concern from a recent client with whom I engaged. They're on 1.x. On 1.x they have been doing 50k bulk update operations in a single request. 1.x doesn't time out. The updates are such that they guarantee that none will result in a conflict or be rejected, so all 50k

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-26 Thread Adam Kocoloski
Hi all, The point I’m on is that we should take advantage of this extra bit of information that we acquire out-of-band (e.g. we just decide as a project that all operations take less than 5 seconds) and come up with smarter / cheaper / faster ways of doing load shedding based on that

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-23 Thread Nick Vatamaniuc
We don't spawn (/link) or monitor remote processes, just monitor the local coordinator process. That should cheaper performance-wise. It's also for relatively long running streaming fabric requests (changes, all_docs). But you're right, perhaps doing these for shorter requests (doc updates, doc

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-22 Thread Robert Newson
My memory is fuzzy, but those items sound a lot like what happens with rex, that motivated us (i.e, Adam) to build rexi, which deliberately does less than the stock approach. -- Robert Samuel Newson rnew...@apache.org On Mon, 22 Apr 2019, at 18:33, Nick Vatamaniuc wrote: > Hi everyone, >

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-22 Thread Nick Vatamaniuc
Hi everyone, We partially implement the first part (cleaning rexi workers) for all the fabric streaming requests. Which should be all_docs, changes, view map, view reduce: https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532cb7541b80355d The pattern there is the following: -

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-18 Thread Robert Samuel Newson
My view is a) the server was unavailable for this request due to all the other requests it’s currently dealing with b) the connection was not idle, the client is not at fault. B. > On 18 Apr 2019, at 22:03, Done Collectively wrote: > > Any reason 408 would be undesirable? > >

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-18 Thread Done Collectively
Any reason 408 would be undesirable? https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408 On Thu, Apr 18, 2019 at 10:37 AM Robert Newson wrote: > 503 imo. > > -- > Robert Samuel Newson > rnew...@apache.org > > On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote: > > Yes, we

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-18 Thread Robert Newson
503 imo. -- Robert Samuel Newson rnew...@apache.org On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote: > Yes, we should. Currently it’s a 500, maybe there’s something more > appropriate: > >

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-18 Thread Adam Kocoloski
Yes, we should. Currently it’s a 500, maybe there’s something more appropriate: https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949 Adam > On Apr 18, 2019, at 12:50 PM, Joan Touzet wrote: > > What happens when it turns out the

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-18 Thread Joan Touzet
What happens when it turns out the client *hasn't* timed out and we just...hang up on them? Should we consider at least trying to send back some sort of HTTP status code? -Joan On 2019-04-18 10:58, Garren Smith wrote: > I'm +1 on this. With partition queries, we added a few more timeouts that >

Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-18 Thread Garren Smith
I'm +1 on this. With partition queries, we added a few more timeouts that can be enabled which Cloudant enable. So having the ability to shed old requests when these timeouts get hit would be great. Cheers Garren On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski wrote: > Hi all, > > For once, I’m

[DISCUSS] Improve load shedding by enforcing timeouts throughout stack

2019-04-15 Thread Adam Kocoloski
Hi all, For once, I’m coming to you with a topic that is not strictly about FoundationDB :) CouchDB offers a few config settings (some of them undocumented) to put a limit on how long the server is allowed to take to generate a response. The trouble with many of these timeouts is that, when