Tom Lane wrote:
> Magnus Hagander writes:
>> Passes my tests, but I can't really reproduce the requirement to retry,
>> so I haven't been able to test that part :(
>
> The patch looks sane to me. If you want to test, perhaps reducing the
> sleep to 1 msec or so would reproduce the need to go aro
Magnus Hagander writes:
> Passes my tests, but I can't really reproduce the requirement to retry,
> so I haven't been able to test that part :(
The patch looks sane to me. If you want to test, perhaps reducing the
sleep to 1 msec or so would reproduce the need to go around the loop
more than onc
Alvaro Herrera writes:
> I'm disappointed :-( I thought this thread (without reading it too
> deeply) was about fixing the problem that backends sometimes fail to
> connect to shmem, on a system that's been running for a while.
Nobody knows yet what's wrong there or how to fix it. This thread
i
Magnus Hagander wrote:
> How does this look?
>
> Passes my tests, but I can't really reproduce the requirement to retry,
> so I haven't been able to test that part :(
I'm disappointed :-( I thought this thread (without reading it too
deeply) was about fixing the problem that backends sometimes
Andrew Dunstan wrote:
>
>
> Magnus Hagander wrote:
>>
>> Andrew, you want to write up a patch or do you want me to do it?
>>
>>
>>
>
> Go for it.
How does this look?
Passes my tests, but I can't really reproduce the requirement to retry,
so I haven't been able to test that part :(
//Magnus
Magnus Hagander wrote:
Andrew, you want to write up a patch or do you want me to do it?
Go for it.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Magnus Hagander writes:
> Tom Lane wrote:
>> I still think there's absolutely no evidence suggesting that a variable
>> backoff is necessary. Given how little this code is going to be
>> exercised in the real world, how long will it take till we find out
>> if you get it wrong? Use a simple retr
Alvaro Herrera writes:
> This is going to be backpatched, I assume?
Yeah, back to 8.2 I suppose.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hacker
Magnus Hagander wrote:
>
> Andrew, you want to write up a patch or do you want me to do it?
This is going to be backpatched, I assume?
--
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
--
Sent via pgsql-hackers mailing
Tom Lane wrote:
> Andrew Dunstan writes:
>> Magnus Hagander wrote:
>>> The actual 1 second value was completely random - it fixed all the
>>> issues on my test VM at the time. I don't recall exactly the details,
>>> but I do recall having to run a lot of tests before I managed to provoke
>>> an er
Tom Lane wrote:
I still think there's absolutely no evidence suggesting that a variable
backoff is necessary. Given how little this code is going to be
exercised in the real world, how long will it take till we find out
if you get it wrong? Use a simple retry loop and be done with it.
Andrew Dunstan writes:
> Magnus Hagander wrote:
>> The actual 1 second value was completely random - it fixed all the
>> issues on my test VM at the time. I don't recall exactly the details,
>> but I do recall having to run a lot of tests before I managed to provoke
>> an error, and that with the
Magnus Hagander wrote:
Tom Lane wrote:
Andrew Dunstan writes:
Now presumably we sleep for 1 sec between the CloseHandle() call and the
CreateFileMapping() call in that code for a reason.
I'm not sure. Magnus never did answer my question about why the sleep
and retry was put
Tom Lane wrote:
> Magnus Hagander writes:
>> Tom Lane wrote:
>>> It says here:
>>> http://msdn.microsoft.com/en-us/library/ms885627.aspx
>
>> FWIW, this is the Windows CE documentation. The one for win32 is at:
>> http://msdn.microsoft.com/en-us/library/ms679360(VS.85).aspx
>
> Sorry, that was t
Tom Lane wrote:
> Andrew Dunstan writes:
>> Now presumably we sleep for 1 sec between the CloseHandle() call and the
>> CreateFileMapping() call in that code for a reason.
>
> I'm not sure. Magnus never did answer my question about why the sleep
> and retry was put in at all; it seems not unlik
Andrew Dunstan writes:
> Now presumably we sleep for 1 sec between the CloseHandle() call and the
> CreateFileMapping() call in that code for a reason.
I'm not sure. Magnus never did answer my question about why the sleep
and retry was put in at all; it seems not unlikely from here that it
was
Tom Lane wrote:
The quick try would be to stick a SetLastError(0) in there, just to be
sure... Could be worth a try?
I kinda think we should do that whether or not it can be proven to
have anything to do with Andrew's report. It's just like "errno = 0"
for Unix --- sometimes you have to
Magnus Hagander writes:
> Tom Lane wrote:
>> It says here:
>> http://msdn.microsoft.com/en-us/library/ms885627.aspx
> FWIW, this is the Windows CE documentation. The one for win32 is at:
> http://msdn.microsoft.com/en-us/library/ms679360(VS.85).aspx
Sorry, that was the one that came up first in
Magnus Hagander wrote:
Andrew, just to confirm: you've found a case where this happens
*repeatably*? That's what we've failed to do before - it's happened now
and then, but never during testing...
Well, it happened several times to my client within a matter of hours. I
didn't see any s
Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
>>
>> Now this would only explain problems if there were some code path
>> through the postmaster that could leave the errno set to
>> ERROR_ALREADY_EXISTS (a/k/a EEXIST) when this code is reached. I'm not
>> sure there is one, and I have even less of
Tom Lane wrote:
> Andrew Dunstan writes:
>> I am seeing Postgres 8.3.7 running as a service on Windows Server 2003
>> repeatedly fail to restart after a backend crash because of the
>> following code in port/win32_shmem.c:
>
> On further review, I see an entirely different explanation for possi
Andrew Dunstan writes:
> Maybe we need to look at all the places we call GetLastError(). There
> are quite a few of them.
It would only be an issue with syscalls that have badly designed APIs
like this one. Most of the time you know that the function has failed
and is supposed to have set the e
Tom Lane wrote:
Now this would only explain problems if there were some code path
through the postmaster that could leave the errno set to
ERROR_ALREADY_EXISTS (a/k/a EEXIST) when this code is reached. I'm not
sure there is one, and I have even less of a theory as to why system
load might mak
Andrew Dunstan writes:
> I am seeing Postgres 8.3.7 running as a service on Windows Server 2003
> repeatedly fail to restart after a backend crash because of the
> following code in port/win32_shmem.c:
On further review, I see an entirely different explanation for possible
failures of that code
Tom Lane wrote:
Andrew Dunstan writes:
We've seen similar things with other Windows file operations, IIRC. What
bothers me is that the problem might be precisely because the 1 second
sleep between the CloseHandle() call and the CreateFileMapping() call
might not be enough due to system l
Andrew Dunstan writes:
> We've seen similar things with other Windows file operations, IIRC. What
> bothers me is that the problem might be precisely because the 1 second
> sleep between the CloseHandle() call and the CreateFileMapping() call
> might not be enough due to system load, so repeati
Tom Lane wrote:
Andrew Dunstan writes:
It strikes me that we really need to try reconnecting to the shared
memory here several times, and maybe the backoff need to increase each
time.
Adding a backoff would make the code significantly more complex, with
no gain that I can see. Jus
Andrew Dunstan writes:
> It strikes me that we really need to try reconnecting to the shared
> memory here several times, and maybe the backoff need to increase each
> time.
Adding a backoff would make the code significantly more complex, with
no gain that I can see. Just loop a few times arou
Dave Page wrote:
On Fri, May 1, 2009 at 4:10 PM, Heikki Linnakangas
wrote:
Dave Page wrote:
On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan
wrote:
It strikes me that we really need to try reconnecting to the shared
memory
here several times, and maybe the backoff need to increase each time.
Heikki Linnakangas wrote:
Dave Page wrote:
On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan
wrote:
It strikes me that we really need to try reconnecting to the shared
memory
here several times, and maybe the backoff need to increase each
time. On a
loaded server this cause postgres to fail
On Fri, May 1, 2009 at 4:10 PM, Heikki Linnakangas
wrote:
> Dave Page wrote:
>>
>> On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan
>> wrote:
>>>
>>> It strikes me that we really need to try reconnecting to the shared
>>> memory
>>> here several times, and maybe the backoff need to increase each t
Dave Page wrote:
On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan wrote:
It strikes me that we really need to try reconnecting to the shared memory
here several times, and maybe the backoff need to increase each time. On a
loaded server this cause postgres to fail to restart fairly reliably.
A
On Fri, May 1, 2009 at 11:05 AM, Greg Stark wrote:
> Do we have any idea why "it may take a short while before it gets
> dropped from the global namespace"? Is there some demon running which
> only wakes up periodically? Or any specific reason it takes so long?
> That might give us a clue exactly
On Fri, May 1, 2009 at 8:42 AM, Dave Page wrote:
> On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan wrote:
>>
>> It strikes me that we really need to try reconnecting to the shared memory
>> here several times, and maybe the backoff need to increase each time. On a
>> loaded server this cause post
On Fri, May 1, 2009 at 12:59 AM, Andrew Dunstan wrote:
>
> It strikes me that we really need to try reconnecting to the shared memory
> here several times, and maybe the backoff need to increase each time. On a
> loaded server this cause postgres to fail to restart fairly reliably.
At the risk of
I am seeing Postgres 8.3.7 running as a service on Windows Server 2003
repeatedly fail to restart after a backend crash because of the
following code in port/win32_shmem.c:
/*
* If the segment already existed, CreateFileMapping() will return a
* handle to the existing one.
*/
36 matches
Mail list logo