Re: [HACKERS] windows shared memory error

2009-05-05 Thread Magnus Hagander
Tom Lane wrote: > Magnus Hagander writes: >> Passes my tests, but I can't really reproduce the requirement to retry, >> so I haven't been able to test that part :( > > The patch looks sane to me. If you want to test, perhaps reducing the > sleep to 1 msec or so would reproduce the need to go aro

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Tom Lane
Magnus Hagander writes: > Passes my tests, but I can't really reproduce the requirement to retry, > so I haven't been able to test that part :( The patch looks sane to me. If you want to test, perhaps reducing the sleep to 1 msec or so would reproduce the need to go around the loop more than onc

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Tom Lane
Alvaro Herrera writes: > I'm disappointed :-( I thought this thread (without reading it too > deeply) was about fixing the problem that backends sometimes fail to > connect to shmem, on a system that's been running for a while. Nobody knows yet what's wrong there or how to fix it. This thread i

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Alvaro Herrera
Magnus Hagander wrote: > How does this look? > > Passes my tests, but I can't really reproduce the requirement to retry, > so I haven't been able to test that part :( I'm disappointed :-( I thought this thread (without reading it too deeply) was about fixing the problem that backends sometimes

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Magnus Hagander
Andrew Dunstan wrote: > > > Magnus Hagander wrote: >> >> Andrew, you want to write up a patch or do you want me to do it? >> >> >> > > Go for it. How does this look? Passes my tests, but I can't really reproduce the requirement to retry, so I haven't been able to test that part :( //Magnus

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Andrew Dunstan
Magnus Hagander wrote: Andrew, you want to write up a patch or do you want me to do it? Go for it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Tom Lane
Magnus Hagander writes: > Tom Lane wrote: >> I still think there's absolutely no evidence suggesting that a variable >> backoff is necessary. Given how little this code is going to be >> exercised in the real world, how long will it take till we find out >> if you get it wrong? Use a simple retr

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Tom Lane
Alvaro Herrera writes: > This is going to be backpatched, I assume? Yeah, back to 8.2 I suppose. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hacker

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Alvaro Herrera
Magnus Hagander wrote: > > Andrew, you want to write up a patch or do you want me to do it? This is going to be backpatched, I assume? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Magnus Hagander
Tom Lane wrote: > Andrew Dunstan writes: >> Magnus Hagander wrote: >>> The actual 1 second value was completely random - it fixed all the >>> issues on my test VM at the time. I don't recall exactly the details, >>> but I do recall having to run a lot of tests before I managed to provoke >>> an er

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Andrew Dunstan
Tom Lane wrote: I still think there's absolutely no evidence suggesting that a variable backoff is necessary. Given how little this code is going to be exercised in the real world, how long will it take till we find out if you get it wrong? Use a simple retry loop and be done with it.

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Tom Lane
Andrew Dunstan writes: > Magnus Hagander wrote: >> The actual 1 second value was completely random - it fixed all the >> issues on my test VM at the time. I don't recall exactly the details, >> but I do recall having to run a lot of tests before I managed to provoke >> an error, and that with the

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Andrew Dunstan
Magnus Hagander wrote: Tom Lane wrote: Andrew Dunstan writes: Now presumably we sleep for 1 sec between the CloseHandle() call and the CreateFileMapping() call in that code for a reason. I'm not sure. Magnus never did answer my question about why the sleep and retry was put

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Magnus Hagander
Tom Lane wrote: > Magnus Hagander writes: >> Tom Lane wrote: >>> It says here: >>> http://msdn.microsoft.com/en-us/library/ms885627.aspx > >> FWIW, this is the Windows CE documentation. The one for win32 is at: >> http://msdn.microsoft.com/en-us/library/ms679360(VS.85).aspx > > Sorry, that was t

Re: [HACKERS] windows shared memory error

2009-05-04 Thread Magnus Hagander
Tom Lane wrote: > Andrew Dunstan writes: >> Now presumably we sleep for 1 sec between the CloseHandle() call and the >> CreateFileMapping() call in that code for a reason. > > I'm not sure. Magnus never did answer my question about why the sleep > and retry was put in at all; it seems not unlik

Re: [HACKERS] windows shared memory error

2009-05-03 Thread Tom Lane
Andrew Dunstan writes: > Now presumably we sleep for 1 sec between the CloseHandle() call and the > CreateFileMapping() call in that code for a reason. I'm not sure. Magnus never did answer my question about why the sleep and retry was put in at all; it seems not unlikely from here that it was

Re: [HACKERS] windows shared memory error

2009-05-03 Thread Andrew Dunstan
Tom Lane wrote: The quick try would be to stick a SetLastError(0) in there, just to be sure... Could be worth a try? I kinda think we should do that whether or not it can be proven to have anything to do with Andrew's report. It's just like "errno = 0" for Unix --- sometimes you have to

Re: [HACKERS] windows shared memory error

2009-05-03 Thread Tom Lane
Magnus Hagander writes: > Tom Lane wrote: >> It says here: >> http://msdn.microsoft.com/en-us/library/ms885627.aspx > FWIW, this is the Windows CE documentation. The one for win32 is at: > http://msdn.microsoft.com/en-us/library/ms679360(VS.85).aspx Sorry, that was the one that came up first in

Re: [HACKERS] windows shared memory error

2009-05-03 Thread Andrew Dunstan
Magnus Hagander wrote: Andrew, just to confirm: you've found a case where this happens *repeatably*? That's what we've failed to do before - it's happened now and then, but never during testing... Well, it happened several times to my client within a matter of hours. I didn't see any s

Re: [HACKERS] windows shared memory error

2009-05-03 Thread Magnus Hagander
Andrew Dunstan wrote: > > > Tom Lane wrote: >> >> Now this would only explain problems if there were some code path >> through the postmaster that could leave the errno set to >> ERROR_ALREADY_EXISTS (a/k/a EEXIST) when this code is reached. I'm not >> sure there is one, and I have even less of

Re: [HACKERS] windows shared memory error

2009-05-03 Thread Magnus Hagander
Tom Lane wrote: > Andrew Dunstan writes: >> I am seeing Postgres 8.3.7 running as a service on Windows Server 2003 >> repeatedly fail to restart after a backend crash because of the >> following code in port/win32_shmem.c: > > On further review, I see an entirely different explanation for possi

Re: [HACKERS] windows shared memory error

2009-05-02 Thread Tom Lane
Andrew Dunstan writes: > Maybe we need to look at all the places we call GetLastError(). There > are quite a few of them. It would only be an issue with syscalls that have badly designed APIs like this one. Most of the time you know that the function has failed and is supposed to have set the e

Re: [HACKERS] windows shared memory error

2009-05-02 Thread Andrew Dunstan
Tom Lane wrote: Now this would only explain problems if there were some code path through the postmaster that could leave the errno set to ERROR_ALREADY_EXISTS (a/k/a EEXIST) when this code is reached. I'm not sure there is one, and I have even less of a theory as to why system load might mak

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Tom Lane
Andrew Dunstan writes: > I am seeing Postgres 8.3.7 running as a service on Windows Server 2003 > repeatedly fail to restart after a backend crash because of the > following code in port/win32_shmem.c: On further review, I see an entirely different explanation for possible failures of that code

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan writes: We've seen similar things with other Windows file operations, IIRC. What bothers me is that the problem might be precisely because the 1 second sleep between the CloseHandle() call and the CreateFileMapping() call might not be enough due to system l

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Tom Lane
Andrew Dunstan writes: > We've seen similar things with other Windows file operations, IIRC. What > bothers me is that the problem might be precisely because the 1 second > sleep between the CloseHandle() call and the CreateFileMapping() call > might not be enough due to system load, so repeati

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan writes: It strikes me that we really need to try reconnecting to the shared memory here several times, and maybe the backoff need to increase each time. Adding a backoff would make the code significantly more complex, with no gain that I can see. Jus

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Tom Lane
Andrew Dunstan writes: > It strikes me that we really need to try reconnecting to the shared > memory here several times, and maybe the backoff need to increase each > time. Adding a backoff would make the code significantly more complex, with no gain that I can see. Just loop a few times arou

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Heikki Linnakangas
Dave Page wrote: On Fri, May 1, 2009 at 4:10 PM, Heikki Linnakangas wrote: Dave Page wrote: On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan wrote: It strikes me that we really need to try reconnecting to the shared memory here several times, and maybe the backoff need to increase each time.

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Andrew Dunstan
Heikki Linnakangas wrote: Dave Page wrote: On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan wrote: It strikes me that we really need to try reconnecting to the shared memory here several times, and maybe the backoff need to increase each time. On a loaded server this cause postgres to fail

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Dave Page
On Fri, May 1, 2009 at 4:10 PM, Heikki Linnakangas wrote: > Dave Page wrote: >> >> On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan >> wrote: >>> >>> It strikes me that we really need to try reconnecting to the shared >>> memory >>> here several times, and maybe the backoff need to increase each t

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Heikki Linnakangas
Dave Page wrote: On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan wrote: It strikes me that we really need to try reconnecting to the shared memory here several times, and maybe the backoff need to increase each time. On a loaded server this cause postgres to fail to restart fairly reliably. A

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Dave Page
On Fri, May 1, 2009 at 11:05 AM, Greg Stark wrote: > Do we have any idea why "it may take a short while before it gets > dropped from the global namespace"? Is there some demon running which > only wakes up periodically? Or any specific reason it takes so long? > That might give us a clue exactly

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Greg Stark
On Fri, May 1, 2009 at 8:42 AM, Dave Page wrote: > On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan wrote: >> >> It strikes me that we really need to try reconnecting to the shared memory >> here several times, and maybe the backoff need to increase each time. On a >> loaded server this cause post

Re: [HACKERS] windows shared memory error

2009-05-01 Thread Dave Page
On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan wrote: > > It strikes me that we really need to try reconnecting to the shared memory > here several times, and maybe the backoff need to increase each time. On a > loaded server this cause postgres to fail to restart fairly reliably. At the risk of

[HACKERS] windows shared memory error

2009-05-01 Thread Andrew Dunstan
I am seeing Postgres 8.3.7 running as a service on Windows Server 2003 repeatedly fail to restart after a backend crash because of the following code in port/win32_shmem.c: /* * If the segment already existed, CreateFileMapping() will return a * handle to the existing one. */