libpq async command processing methods are difficult to use with edge-triggered epoll

Alex Robinson Wed, 02 Dec 2020 01:05:41 -0800

Hi,

I've been using libpq to access postgres from within a system that uses an
edge-triggered epoll in its event-loop. The instructions on
https://www.postgresql.org/docs/current/libpq-async.html are pretty good,
but I've run into a bit of an edge case that would be much easier to handle
if the interfaces exposed by libpq were a little more ergonomic. If it
makes a difference, I'm running with single-row mode enabled on the client.


Specifically, PQconsumeInput returns no information that allows the caller
to distinguish whether there might be more data available to be read on the
network socket if PQconsumeInput were to be called again (it just returns 0
on error and 1 otherwise), and PQisBusy doesn't return false until a full
row's worth of data has been read by PQconsumeInput. This is a bad
combination if a result set contains rows larger than PGconn's default
buffer size, since calling PQconsumeInput followed by PQisBusy can suggest
that we need to wait on the socket's readability even if there's already
more data available to be read on the socket.

If more detail helps, here's a slightly more detailed summary based on my
trawling through the code:

* The recommended pattern for processing responses to async commands is to
wait until the socket is readable, then call PQconsumeInput, then check
PQisBusy. If PQisBusy returns true, the docs suggest waiting on the socket
again.
* When PQconsumeInput is called, it doubles the PGconn's buffer size if
less than 8k of space is unused in it.
* PQconsumeInput will then read either until its remaining buffer space
fills up or until the socket has no more data in it ready to be read.
* If the buffer fills up, PQconsumeInput will return to the caller even if
there's still more data to be read
* PQconsumeInput's return value only indicates whether or not there was an
error, not whether any data was read.
* PQisBusy will return true unless the buffer contains an entire row; it
does not actually check the status of the socket.
* If the PGconn's buffer wasn't large enough to fit an entire row in it
when you called PQconsumeInput, PQisBusy will return true, suggesting that
you ought to wait on socket readability, when really the right thing to do
would be to call PQconsumeInput again (potentially multiple times) until
the buffer finally grows to be large enough to contain the whole row before
PQisBusy can return false.

This can be worked around by making a poll() syscall on the socket without
timeout 0 before handing the socket off to epoll, but libpq could make this
case easier to deal with with a slightly different interface.

The function that PQconsumeInput uses internally, pqReadData, has a much
more helpful interface -- it returns 1 if it successfully read at least 1
byte, 0 if no data was available, or -1 if there was an error. If
PQconsumeInput had a similar range of return values, this ability to
distinguish between whether we read data or not would be enough information
to know whether we ought to call PQconsumeInput again when PQisBusy returns
true.

As for what we can realistically do about it, I imagine it may be
disruptive to add an additional possible return value to PQconsumeInput
(e.g. return 1 if at least one byte was read or 2 if no data was
available). And I'm not sure edge-triggered polling is a use case the
community cares enough about to justify adding a separate method that does
basically the same thing as PQconsumeInput but with more information
conveyed by its return value.

So maybe there isn't any change worth making here, but I wanted to at least
mention the problem, since it was pretty disappointing to me that calling
PQconsumeInput followed by PQisBusy and then waiting on readability
occasionally caused a hang on large rows. I'd expect other developers using
edge-triggered epoll to run into problems like this too. If there's a
backwards-compatible way of making it less error prone, it'd be nice to do
so.

Alex

libpq async command processing methods are difficult to use with edge-triggered epoll

Reply via email to