Re: Listen performance

Robbie Gemmell Tue, 14 Jun 2022 09:43:44 -0700

I dont actually know, but I've certainly never seen anyone mention it
before so it wouldnt be very surprising if it isnt possible. I also
doubt few will have put it in a position they would need to change it
regardless.


Essentially I'd test if you think it is actually needed, and if so you
could raise a JIRA to request making it configurable...maybe even
provide a PR.

On Mon, 13 Jun 2022 at 16:40, Fredrik Hallenberg <[email protected]> wrote:
>
> Thanks for the detailed response. I have made some changes in my message
> queue handling and it does perform better.
> I am still not sure that increasing the listen backlog is necessary, but I
> think it would be good to have the option to do it without patching the c++
> container implementation.
> Have I missed something that makes it possible to do this? Could I provide
> my own container that overrides some method to only change the listen setup?
>
>
>
> On Tue, Jun 7, 2022 at 9:59 AM Robbie Gemmell <[email protected]>
> wrote:
>
> > Theres a fair bit of detail there which wasnt present originally.
> >
> > So it sounds like you are indeed only using the single thread, by
> > scheduling a regular (how often is regular?) periodic task on it,
> > since that wasnt mentioned it was unclear that was the case (usually
> > means it isnt the case). That periodic task then effectively schedules
> > return tasks for (individual message?) responses in the batch of
> > queued requests by passing tasks using the connection work queue.
> >
> > That actually seems unnecessary if you are already running everything
> > on the connection/container thread anyway, the connection work queue
> > is typically used to pass work for a connections [container] thread to
> > do from a different thread, rather than passing to itself. Its not
> > clear you really gain in this case with the work queue if its the same
> > lone thread doing all the work, mainly its just doing the same work at
> > different times (i.e later) than it might have otherwise done it, plus
> > there is then additional scheduling overhead and work needed to
> > service the additional tasks than there might have been without. Plus
> > if there is a batched backlog of incoming requests to then
> > periodically respond to, that could increase the amount of time the
> > thread needs to spend on doing that in a single sequence and then
> > processing the response tasks, perhaps making it less available to
> > accept new things while it does so for that now-grouped work, at which
> > point the acceptor backlog might come more into play. I would have a
> > play with either changing/removing use of the work queue for responses
> > if you are really using a single thread, or changing the way the
> > handling is done so it is another thread actually handling things and
> > only the send is being passed back (maybe even a batch of sends in one
> > task). Or perhaps using multiple container threads so connections
> > arent all on the same one, but again dropping the work queue usage and
> > just processing the response inline.
> >
> > All that said, if your 'processing X is quick' is true (how quick is
> > quick?) then I still wouldnt really expect to see any delays of
> > anything like the length you are talking about regardless, unless
> > something else odd was going on, such as the prior talk of
> > reconnection. Although if that was previously occurring and you have
> > significantly raised the backlog, it should then not really be in play
> > anymore, which should be very simple to identify from the timings of
> > what is happening. I'd suggest instrumenting your code to get a more
> > precise handle on where the time is going..you should be able to tell
> > exactly how long it takes for every connection to open from the
> > clients perspective, for a message to arrive at the server, for it to
> > be acked, for the request to be processed, for the request to arrive
> > at consumer etc. Related to that, also note you can run client and/or
> > server ends with protocol tracing enabled (PN_TRACE_FRM=1) to
> > visualise what traffic is/isnt happening. If you have clients seeing 1
> > minute delays, that might be something fairly visible just looking as
> > it runs. E.g run a bunch of clients without it as normal, and also
> > manually run some with tracing on and observe.
> >
> > Perhaps you can narrow down an issue in your codes handling, or
> > perhaps you can establish a case you think there is actually an issue
> > in Proton, providing a minimal reproduce that can show it.
> >
> > On Mon, 6 Jun 2022 at 20:07, Fredrik Hallenberg <[email protected]>
> > wrote:
> > >
> > > Maybe my wording was not correct, responses to clients are handled fine
> > > when connection is achieved. The issue is only about the time before the
> > > connection is made and the initial client message shows up in the server
> > > handler. When this happens I will push the message to a queue and return
> > > immediately. The queue is handled by a fiber running at regular
> > intervals,
> > > this is done by using the qpid scheduler. Each message will get its own
> > > fiber which will use the qpid work queue to send a reply when processing
> > is
> > > done. This processing should happen quickly. I am pretty sure this system
> > > is safe, I have done a lot of testing on it. If you think it will cause
> > > delays in qpid I will try to improve it using threads etc. I have tried
> > > running the message queue consumer on a separate thread but as I
> > mentioned
> > > I did not see any obvious improvements so I opted to go for a single
> > thread
> > > solution.
> > >
> > > On Mon, Jun 6, 2022 at 11:29 AM Robbie Gemmell <[email protected]
> > >
> > > wrote:
> > >
> > > > Personally from the original mail I think its as likely issue lies in
> > > > just how the messages are being handled and responses generated. If
> > > > adding threads is not helping any, it would only reinforce that view
> > > > for me.
> > > >
> > > > Note is made that a single thread is being used, and that messages are
> > > > only queued by the thread and "handled elsewhere" "quickly", but
> > > > "responses" take a long time. What type of queuing is being done? Can
> > > > the queue block (which would stop the container doing _anything_ )?
> > > > How are the messages actually then being handled and responses
> > > > generated exactly? Unless that process is using the appropriate
> > > > mechanisms for passing work back to the connection (/container if
> > > > single) thread, it would both be both unsafe and may very well result
> > > > in delays, because no IO would actually happen until something else
> > > > entirely caused that connection to process again later (e.g heartbeat
> > > > checking).
> > > >
> > > > On Fri, 3 Jun 2022 at 18:04, Cliff Jansen <[email protected]>
> > wrote:
> > > > >
> > > > > Adding threads should allow connection setup (socket creation,
> > accept,
> > > > and
> > > > > initial malloc of data structures) to run in parallel with connection
> > > > > processing (socket read/write, TLS overhead, AMQP encode/decode, your
> > > > > application on_message callback).
> > > > >
> > > > > The epoll proactor scales better with additional threads than the
> > libuv
> > > > > implementation.  If you are seeing no benefit with extra threads,
> > trying
> > > > > the libuv  proactor is a worthwhile idea.
> > > > >
> > > > > On Fri, Jun 3, 2022 at 2:38 AM Fredrik Hallenberg <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Yes, the fd limit is already raised a lot. Increasing backlog has
> > > > improved
> > > > > > performance and more file descriptors are in use but still I feel
> > > > > > connection times are too long. Is there anything else to tune in
> > the
> > > > > > proactor? Should I try with the libuv proactor instead of epoll?
> > > > > > I have tried with multiple threads in the past but did not notice
> > any
> > > > > > difference, but perhaps it is worth trying again with the current
> > > > backlog
> > > > > > setting?
> > > > > >
> > > > > >
> > > > > > On Thu, Jun 2, 2022 at 5:11 PM Cliff Jansen <[email protected]
> > >
> > > > wrote:
> > > > > >
> > > > > > > Please try raising your fd limit too. Perhaps doubling it or
> > more.
> > > > > > >
> > > > > > > I would also try running your proton::container with more
> > threads,
> > > > say 4
> > > > > > > and then 16, and see if that makes a difference.  It shouldn’t if
> > > > your
> > > > > > > processing within Proton is as minimal as you describe.
> >  However, if
> > > > > > there
> > > > > > > is lengthy lock contention as you pass work out and then back in
> > to
> > > > > > Proton,
> > > > > > > that may introduce delays.
> > > > > > >
> > > > > > > On Thu, Jun 2, 2022 at 7:43 AM Fredrik Hallenberg <
> > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I have done some experiments raising the backlog value, and it
> > is
> > > > > > > > possibly a bit better, I have to test it more. Even if it
> > works I
> > > > would
> > > > > > > of
> > > > > > > > course like to avoid having to rely on a patched qpid. Also,
> > maybe
> > > > some
> > > > > > > > internal queues or similar should be modified to handle this?
> > > > > > > >
> > > > > > > > I have not seen transport errors in the clients, but this may
> > be
> > > > > > because
> > > > > > > > reconnection is enabled. I am unsure on what the reconnection
> > > > feature
> > > > > > > > actually does, I never seen an on_connection_open where
> > > > > > > > connection.reconnection() returns true.
> > > > > > > > Perhaps it is only useful when a connection is established and
> > then
> > > > > > lost?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jun 2, 2022 at 1:44 PM Ted Ross <[email protected]>
> > wrote:
> > > > > > > >
> > > > > > > > > On Thu, Jun 2, 2022 at 9:06 AM Fredrik Hallenberg <
> > > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, my application tends to get a lot of short lived
> > incoming
> > > > > > > > > connections.
> > > > > > > > > > Messages are very short sync messages that usually can be
> > > > responded
> > > > > > > > with
> > > > > > > > > > very little processing on the server side. It works fine
> > but I
> > > > feel
> > > > > > > > > > that the performance is a bit lacking when many connections
> > > > happen
> > > > > > at
> > > > > > > > the
> > > > > > > > > > same time and would like advice on how to improve it. I am
> > > > using
> > > > > > qpid
> > > > > > > > > > proton c++ 0.37 with epoll proactor.
> > > > > > > > > > My current design uses a single thread for the listener
> > but it
> > > > will
> > > > > > > > > > immediately push incoming messages in on_message to a queue
> > > > that is
> > > > > > > > > handled
> > > > > > > > > > elsewhere. I can see that clients have to wait for a long
> > time
> > > > (up
> > > > > > > to a
> > > > > > > > > > minute) until they get a response, but I don't believe
> > there
> > > > is an
> > > > > > > > issue
> > > > > > > > > on
> > > > > > > > > > my end as I as will quickly deal with any client messages
> > as
> > > > soon
> > > > > > as
> > > > > > > > they
> > > > > > > > > > show up. Rather the issues seems to be that messages are
> > not
> > > > pushed
> > > > > > > > into
> > > > > > > > > > the queue quickly enough.
> > > > > > > > > > I have noticed that the pn_proactor_listen is hardcoded to
> > use
> > > > a
> > > > > > > > backlog
> > > > > > > > > of
> > > > > > > > > > 16 in the default container implementation, this seems low,
> > > > but I
> > > > > > am
> > > > > > > > not
> > > > > > > > > > sure if it is correct to change it.
> > > > > > > > > > Any advice apppreciated. My goal is that a client should
> > never
> > > > need
> > > > > > > to
> > > > > > > > > wait
> > > > > > > > > > more than a few seconds for a response even under
> > reasonably
> > > > high
> > > > > > > load,
> > > > > > > > > > maybe a few hundred connections per seconds.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I would try increasing the backlog.  16 seems low to me as
> > > > well.  Do
> > > > > > > you
> > > > > > > > > know if any of your clients are re-trying the connection
> > setup
> > > > > > because
> > > > > > > > they
> > > > > > > > > overran the server's backlog?
> > > > > > > > >
> > > > > > > > > -Ted
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Listen performance

Reply via email to