On 2016-01-03 13:57:18 -0500, Tom Lane wrote:
> Done, we'll soon see what the buildfarm thinks.
Thanks.
I wonder if we ought to backport this further: e.g. walsender
continously uses nonblocking sockets via pq_getbyte_if_available(). On
the other hand I can't immediately see a problem with that,
On 2016-01-04 10:20:43 -0500, Tom Lane wrote:
> I'm slightly worried about breaking 3rd-party code that might be using
> recv() and somehow expecting the current behavior. However, it's equally
> arguable that such code would have Windows-specific problems that would be
> fixed by the patch.
I
On Mon, Jan 4, 2016 at 4:20 PM, Tom Lane wrote:
> Andres Freund writes:
> > I wonder if we ought to backport this further: e.g. walsender
> > continously uses nonblocking sockets via pq_getbyte_if_available(). On
> > the other hand I can't immediately see
Magnus Hagander writes:
> On Mon, Jan 4, 2016 at 4:20 PM, Tom Lane wrote:
>> I'm slightly worried about breaking 3rd-party code that might be using
>> recv() and somehow expecting the current behavior. However, it's equally
>> arguable that such code
Robert Haas writes:
> OK, well, if the consensus is in favor of a back-patch, so be it. It
> seems a little strange to me to back-patch a commit that doesn't fix
> anything, but I just work here.
Well, it's true that we can't point to specific field reports and say
that
Andres Freund writes:
> On 2016-01-04 10:35:12 -0500, Robert Haas wrote:
>> If we don't know of a specific problem that would be fixed by
>> back-patching this commit to pre-9.5 branches, and it seems like we
>> don't, then I don't really see much upside to back-patching it.
On Mon, Jan 4, 2016 at 10:22 AM, Magnus Hagander wrote:
> On Mon, Jan 4, 2016 at 4:20 PM, Tom Lane wrote:
>> Andres Freund writes:
>> > I wonder if we ought to backport this further: e.g. walsender
>> > continously uses nonblocking
On 2016-01-04 10:35:12 -0500, Robert Haas wrote:
> If we don't know of a specific problem that would be fixed by
> back-patching this commit to pre-9.5 branches, and it seems like we
> don't, then I don't really see much upside to back-patching it. I
> mean, yeah, we think that this is wrong
On Mon, Jan 4, 2016 at 10:49 AM, Tom Lane wrote:
> Andres Freund writes:
>> On 2016-01-04 10:35:12 -0500, Robert Haas wrote:
>>> If we don't know of a specific problem that would be fixed by
>>> back-patching this commit to pre-9.5 branches, and it seems
Andres Freund writes:
> I wonder if we ought to backport this further: e.g. walsender
> continously uses nonblocking sockets via pq_getbyte_if_available(). On
> the other hand I can't immediately see a problem with that, besides
> differing messages on windows/the rest of the
On Mon, Jan 4, 2016 at 10:59 AM, Tom Lane wrote:
> Robert Haas writes:
>> OK, well, if the consensus is in favor of a back-patch, so be it. It
>> seems a little strange to me to back-patch a commit that doesn't fix
>> anything, but I just work here.
>
On 2016-01-03 10:03:41 +0530, Amit Kapila wrote:
> On Sun, Jan 3, 2016 at 3:01 AM, Andres Freund wrote:
> > Indeed it does use shutdown(). If I read the npgsql code that'll even be
> > done in the exception handling path. So fixing the 0 byte case might
> > already do the
Andres Freund writes:
> On January 3, 2016 6:23:20 PM GMT+01:00, Tom Lane wrote:
>> Agreed. Let's do it and ship this puppy.
> Unless somebody beats me to it, I'll push in the European morning.
Um. For something that at least potentially has
On January 3, 2016 7:04:29 PM GMT+01:00, Tom Lane wrote:
>Andres Freund writes:
>> On January 3, 2016 6:23:20 PM GMT+01:00, Tom Lane
>wrote:
>>> Agreed. Let's do it and ship this puppy.
>
>> Unless somebody beats me to it, I'll push
Andres Freund writes:
> On January 3, 2016 7:04:29 PM GMT+01:00, Tom Lane wrote:
>> Um. For something that at least potentially has portability issues
>> (we think not, but we could be wrong), it's pretty scary to push only
>> a couple of hours before the
On January 3, 2016 6:23:20 PM GMT+01:00, Tom Lane wrote:
>> I really think we have a host of buggy code around the event handling
>-
>> but most of it has been used for a long while. So I think fixing the
>0
>> byte case for 9.5 is good enough.
>
>Agreed. Let's do it and
Andres Freund writes:
> On 2016-01-03 10:03:41 +0530, Amit Kapila wrote:
>> I think this true for a TCP socket, but this code-path is used for UDP
>> (SOCK_DGRAM) sockets as well and there is a comment below in
>> that function which seems to be indicating why originally 0
Hi Petr,
On 2016-01-02 09:17:02 +0100, Petr Jelinek wrote:
> so the commit which triggers this issue is
> 387da18874afa17156ee3af63766f17efb53c4b9 , not sure why yet (wanted to give
> heads up early since multiple people are looking at this). Note that the
> compilation around this commit is made
On 2016-01-02 14:26:47 +0100, Andres Freund wrote:
> On 2016-01-02 18:40:38 +0530, Amit Kapila wrote:
> > If we
> > remember the closed socket event and then take appropriate action,
> > then this problem won't happen. Attached patch which by no-means
> > a complete fix shows what I wanted to say
On 2016-01-02 12:05, Amit Kapila wrote:
On Sat, Jan 2, 2016 at 3:16 PM, Andres Freund > wrote:
Hi Petr,
On 2016-01-02 09:17:02 +0100, Petr Jelinek wrote:
> so the commit which triggers this issue is
>
On 2016-01-02 18:40:38 +0530, Amit Kapila wrote:
> What I wanted to say is that the handling of socket closure is not
> same in WaitLatchOrSocket() and pgwin32_waitforsinglesocket()
> due to which this problem can arise and it seems that is the
> right line of direction to pursue. I have found
Hi,
so the commit which triggers this issue is
387da18874afa17156ee3af63766f17efb53c4b9 , not sure why yet (wanted to
give heads up early since multiple people are looking at this). Note
that the compilation around this commit is made harder by the fact that
commit
On Sat, Jan 2, 2016 at 3:16 PM, Andres Freund wrote:
> Hi Petr,
>
> On 2016-01-02 09:17:02 +0100, Petr Jelinek wrote:
> > so the commit which triggers this issue is
> > 387da18874afa17156ee3af63766f17efb53c4b9 , not sure why yet (wanted to
> give
> > heads up early since
On 2016-01-02 10:46, Andres Freund wrote:
Hi Petr,
On 2016-01-02 09:17:02 +0100, Petr Jelinek wrote:
so the commit which triggers this issue is
387da18874afa17156ee3af63766f17efb53c4b9 , not sure why yet (wanted to give
heads up early since multiple people are looking at this). Note that the
On Sat, Jan 2, 2016 at 5:02 PM, Petr Jelinek wrote:
> On 2016-01-02 12:05, Amit Kapila wrote:
>>
>> I am also able to reproduce now. The reason was that I didn't have
>> latest .Net framework and Visual Studio, which is must for the recent
>> version of Npgsql.
>>
>> One
On 2016-01-02 16:20:58 +0100, Andres Freund wrote:
> I really right now can see only two somewhat surgical fixes:
>
> 1) We do a nonblocking or select() *after* registering our events. Both
>in WaitLatchOrSocket() and waitforsinglesocket. Since select/poll are
>explicitly level triggered,
Andres Freund writes:
> A bit of searching around brought up that we saw issues around this
> before:
> http://www.postgresql.org/message-id/4351.1336927...@sss.pgh.pa.us
Indeed. It doesn't look like any of the cleanup I suggested in that
thread has ever gotten done. I
On 2016-01-02 13:00:09 -0500, Tom Lane wrote:
> : More generally, it seems clear to me that Microsoft's code is designed
> : around the assumption that an event object remains attached to a socket
> : for the lifetime of the socket. This business of transiently associating
> : event objects with
On 2016-01-02 15:40:03 +0100, Andres Freund wrote:
> I wonder if the following is the problem: The docs for WSAEventSelect()
> says:
> "Having successfully recorded the occurrence of the network event (by
> setting the corresponding bit in the internal network event record) and
> signaled the
On January 2, 2016 6:28:10 PM GMT+01:00, Tom Lane wrote:
>Andres Freund writes:
>> A bit of searching around brought up that we saw issues around this
>> before:
>> http://www.postgresql.org/message-id/4351.1336927...@sss.pgh.pa.us
>
>Indeed. It doesn't
On 2016-01-02 15:40:03 +0100, Andres Freund wrote:
> If FD_CLOSE is indeed edge and not level triggered - which imo would be
> supremely insane - we'd be in trouble. It'd explain why some failures
> are noticed and others not.
I wonder if the FD_CLOSE and FD_WRITE being edge-triggered is the
Andres Freund writes:
> On January 2, 2016 6:28:10 PM GMT+01:00, Tom Lane wrote:
>> Indeed. It doesn't look like any of the cleanup I suggested in that
>> thread has ever gotten done. I suspect that we'll continue to see
>> problems until we get rid of
Andres Freund writes:
> I found a few more resources confirming that FD_CLOSE is edge
> triggered. Which probably doesn't just make our code buggy when waiting
> twice on the same socket, but probably also makes it very timing
> dependent: As the event is only triggered when
On 2016-01-02 22:31, Andres Freund wrote:
On 2016-01-02 22:25:31 +0100, Brar Piening wrote:
Andres Freund wrote:
That seems like a pretty straight forward bug. But it hinges on the
client side calling shutdown() on the socket. I don't know enough about
.net's internals to judge wether it does
On 2016-01-02 15:11:42 -0500, Tom Lane wrote:
> Andres Freund writes:
> > I found a few more resources confirming that FD_CLOSE is edge
> > triggered. Which probably doesn't just make our code buggy when waiting
> > twice on the same socket, but probably also makes it very
Andres Freund wrote:
That seems like a pretty straight forward bug. But it hinges on the
client side calling shutdown() on the socket. I don't know enough about
.net's internals to judge wether it does so. I've traced things far
enough to find
"Disposing a Stream object flushes any buffered
On 2016-01-02 22:25:31 +0100, Brar Piening wrote:
> Andres Freund wrote:
> >That seems like a pretty straight forward bug. But it hinges on the
> >client side calling shutdown() on the socket. I don't know enough about
> >.net's internals to judge wether it does so. I've traced things far
>
On Sun, Jan 3, 2016 at 3:01 AM, Andres Freund wrote:
> On 2016-01-02 22:25:31 +0100, Brar Piening wrote:
> > Andres Freund wrote:
> > >That seems like a pretty straight forward bug. But it hinges on the
> > >client side calling shutdown() on the socket. I don't know enough
>
> On googling, it seems this is related to .Net framework compatibility. I am
> using .Net Framework 4 to build the program.cs and that is what I have
> on my m/c. Are you using the same for Npgsql or some different version?
>
That is probably the problem. Npgsql 3.0 is only available for .NET
On Wed, Dec 30, 2015 at 10:31 PM, Shay Rojansky wrote:
> OK, I finally found some time to dive into this.
>
> The backends seem to hang when the client closes a socket without first
> sending a Terminate message - some of the tests make this happen. I've
> confirmed this happens
On Fri, Jan 1, 2016 at 4:40 PM, Amit Kapila wrote:
> On Wed, Dec 30, 2015 at 10:31 PM, Shay Rojansky wrote:
>
>> OK, I finally found some time to dive into this.
>>
>> The backends seem to hang when the client closes a socket without first
>> sending a
Andres Freund writes:
> FWIW, the
> if (sock == PGINVALID_SOCKET)
> wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
> block in both latch implementations looks like a problem waiting to happen.
You think it should throw an error instead? Seems
On 2015-12-30 19:38:23 +0200, Shay Rojansky wrote:
> > Hm. So that seems to indicate that, on windows, we're not properly
> > recognizing dead sockets in the latch code. Could you check, IIRC with
> > netstat or something like it, in what state the connections are?
> netstat shows the socket is
Hi,
On 2015-12-30 19:01:10 +0200, Shay Rojansky wrote:
> OK, I finally found some time to dive into this.
>
> The backends seem to hang when the client closes a socket without first
> sending a Terminate message - some of the tests make this happen. I've
> confirmed this happens with 9.5rc1
Shay Rojansky writes:
> The backends seem to hang when the client closes a socket without first
> sending a Terminate message - some of the tests make this happen. I've
> confirmed this happens with 9.5rc1 running on Windows (versions 10 and 7),
> but this does not occur on Ubuntu
On 2015-12-30 12:41:56 -0500, Tom Lane wrote:
> Andres Freund writes:
> > FWIW, the
> > if (sock == PGINVALID_SOCKET)
> > wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
> > block in both latch implementations looks like a problem waiting to happen.
Andres Freund writes:
> On 2015-12-30 12:30:43 -0500, Tom Lane wrote:
>> Nor OS X. Ugh. My first thought was that ac1d7945f broke this, but
>> that's only in HEAD not 9.5, so some earlier change must be responsible.
> The backtrace in
>
On 2015-12-30 12:50:58 -0500, Tom Lane wrote:
> Right, and what I was wondering was whether adding the additional wait-for
> condition had exposed some pre-existing flaw in the Windows latch code.
> But that's not it, so we're left with the conclusion that we broke
> something that used to work.
On 2015-12-30 20:12:07 +0200, Shay Rojansky wrote:
> >
> > Is this in a backend with ssl?
> >
>
> No.
There goes that theory. Amongst others. The aforementioned problem with
waitfor doesn't seem to be actually armed because waitfor is only used
if errno == EWOULDBLOCK || errno == EAGAIN.
> If
On 2015-12-30 20:18:52 +0200, Shay Rojansky wrote:
> Tom's probably right about the optimized code. I could try compiling a
> debug version..
Seems to be the next unfortunately. Sorry.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
>
> > The backends seem to hang when the client closes a socket without first
> > sending a Terminate message - some of the tests make this happen. I've
> > confirmed this happens with 9.5rc1 running on Windows (versions 10 and
> 7),
> > but this does not occur on Ubuntu 15.10. The client runs on
>
> Hm. Is this with a self compiled postgres? If so, is it with assertions
> enabled?
>
No, it's just the EnterpriseDB 9.5rc1 installer...
Tom's probably right about the optimized code. I could try compiling a
debug version..
Andres Freund writes:
> On 2015-12-30 19:54:19 +0200, Shay Rojansky wrote:
>> wakeEvents is 8387808 and so is sock.
> Hm. That seems like an extremely weird value.
Probably just means the debugger is confused by optimized code.
> I think it's indicative of
> a bug in
Hi,
On 2015-12-30 13:17:47 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2015-12-30 19:54:19 +0200, Shay Rojansky wrote:
> >> wakeEvents is 8387808 and so is sock.
>
> > Hm. That seems like an extremely weird value.
>
> Probably just means the debugger is confused
OK, I finally found some time to dive into this.
The backends seem to hang when the client closes a socket without first
sending a Terminate message - some of the tests make this happen. I've
confirmed this happens with 9.5rc1 running on Windows (versions 10 and 7),
but this does not occur on
On 2015-12-30 19:54:19 +0200, Shay Rojansky wrote:
> >
> > Things that'd be interesting:
> > 1) what are the arguments passed to WaitLatchOrSocket(), most
> >importantly wakeEvents and sock
> >
>
> wakeEvents is 8387808 and so is sock.
Hm. That seems like an extremely weird value. I think
>
> > > Any chance you could single-step through WaitLatchOrSocket() with a
> > > debugger? Without additional information this is rather hard to
> > > diagnose.
> > >
> >
> > Uh I sure can, but I have no idea what to look for :) Anything
> > specific?
>
> Things that'd be interesting:
> 1) what
>
> Things that'd be interesting:
> 1) what are the arguments passed to WaitLatchOrSocket(), most
>importantly wakeEvents and sock
>
wakeEvents is 8387808 and so is sock.
Tom, this bug doesn't occur with 9.4.4 (will try to download 9.4.5 and
test).
>
> Are we sure this is a 9.5-only bug? Shay, can you try 9.4 branch tip
> and see if it misbehaves? Can anyone else reproduce the problem?
>
>
Doesn't occur with 9.4.5 either. The first version I tested which exhibited
this was 9.5beta2.
>
> Is this in a backend with ssl?
>
No.
If you go up one frame, what value does port->sock have?
>
For some reason VS is telling me "Unable to read memory" on port->sock... I
have no idea why that is...
Andres Freund writes:
> There goes that theory. Amongst others. The aforementioned problem with
> waitfor doesn't seem to be actually armed because waitfor is only used
> if errno == EWOULDBLOCK || errno == EAGAIN.
Mumble. It is clearly possible that we'd reach the Assert
Shay Rojansky wrote:
> >
> > Are we sure this is a 9.5-only bug? Shay, can you try 9.4 branch tip
> > and see if it misbehaves? Can anyone else reproduce the problem?
> >
> >
> Doesn't occur with 9.4.5 either. The first version I tested which exhibited
> this was 9.5beta2.
Maybe it's time for
On 2015-12-30 13:26:34 -0500, Tom Lane wrote:
> I doubt that is what is happening here, because those errnos don't
> seem sensible for an EOF condition, but I'd still feel more comfortable
> if be_tls_read/be_tls_write handled SSL_ERROR_SYSCALL like this:
>
> if (n != -1)
>
Andres Freund writes:
> On 2015-12-30 13:26:34 -0500, Tom Lane wrote:
>> I doubt that is what is happening here, because those errnos don't
>> seem sensible for an EOF condition, but I'd still feel more comfortable
>> if be_tls_read/be_tls_write handled SSL_ERROR_SYSCALL like
On 2015-12-30 12:30:43 -0500, Tom Lane wrote:
> Nor OS X. Ugh. My first thought was that ac1d7945f broke this, but
> that's only in HEAD not 9.5, so some earlier change must be responsible.
The backtrace in
Andres Freund writes:
> On 2015-12-30 19:01:10 +0200, Shay Rojansky wrote:
>> The backends seem to hang when the client closes a socket without first
>> sending a Terminate message - some of the tests make this happen. I've
>> confirmed this happens with 9.5rc1 running on
On Tue, Dec 29, 2015 at 7:04 PM, Shay Rojansky wrote:
> Could you describe the worklad a bit more? Is this rather concurrent? Do
>> you use optimized or debug builds? How long did you wait for the
>> backends to die? Is this all over localhost, external ip but local,
>> remotely?
On 2015-12-29 12:41:40 +0200, Shay Rojansky wrote:
> >
> > > The tests run for a couple minutes, open and close some connection. With
> > my
> > > pre-9.5 backends, the moment the test runner exits I can see that all
> > > backend processes exit immediately, and pg_activity_stat has no rows
> > >
>
> > The tests run for a couple minutes, open and close some connection. With
> my
> > pre-9.5 backends, the moment the test runner exits I can see that all
> > backend processes exit immediately, and pg_activity_stat has no rows
> > (except the querying one). With 9.5beta2, however, some backend
>
> Could you describe the worklad a bit more? Is this rather concurrent? Do
> you use optimized or debug builds? How long did you wait for the
> backends to die? Is this all over localhost, external ip but local,
> remotely?
>
The workload is a a rather diverse set of integration tests executed
Shay Rojansky writes:
> After setting up 9.5beta2 on the Npgsql build server and running the Npgsql
> test suite against I've noticed some weird behavior.
> The tests run for a couple minutes, open and close some connection. With my
> pre-9.5 backends, the moment the test runner
71 matches
Mail list logo