Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-14 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > The bug only affected fsync/forget requests that are forwarded from > backends, not the ones that bgwriter puts into the hash table itself. Oh, of course. So the actual sequence of events was: * bgwriter queues an fsync request for a FSM

Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-14 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Zdenek Kotala wrote: For security reason any OS should clean memory pages before process first touches them. Yeah. But it doesn't necessarily need to fill them with zeros, any garbage will do. Yeah, but the observed symptoms se

Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-14 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Zdenek Kotala wrote: >> Tom Lane napsal(a): >>> Hmm ... AFAICS this mistake would mean that no forknum field of the >>> requests[] array ever gets set at all, so they would stay at whatever >>> the virgin value in the shmem segment had been. Perhaps

Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-14 Thread Heikki Linnakangas
Zdenek Kotala wrote: Tom Lane napsal(a): Heikki Linnakangas <[EMAIL PROTECTED]> writes: I still wonder, though, why we're seeing the error consistently on kudu, and not on any other animal. Perhaps the forknum field that's left uninitialized gets a different value there than on other platforms

Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-14 Thread Zdenek Kotala
Tom Lane napsal(a): Heikki Linnakangas <[EMAIL PROTECTED]> writes: I still wonder, though, why we're seeing the error consistently on kudu, and not on any other animal. Perhaps the forknum field that's left uninitialized gets a different value there than on other platforms. Hmm ... AFAICS thi

Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-14 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > I still wonder, though, why we're seeing the error consistently on kudu, > and not on any other animal. Perhaps the forknum field that's left > uninitialized gets a different value there than on other platforms. Hmm ... AFAICS this mistake would m

Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-14 Thread Heikki Linnakangas
Tom Lane wrote: I wrote: Two different buildfarm machines are currently showing the same failure: ERROR: could not fsync segment 0 of relation 1663/16384/29270/1: No such file or directory ERROR: checkpoint request failed Some tests show that when the serial regression tests are run in a fr

Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-13 Thread Kris Jurka
On Mon, 13 Oct 2008, Tom Lane wrote: I notice now that kudu and dragonfly are actually the same machine ... could this be an OS-specific problem? Kris, has there been any system-software change on that machine recently? This is a VM that I haven't touched in some time. It was turned off a

Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-13 Thread Tom Lane
I wrote: > Two different buildfarm machines are currently showing the same failure: > ERROR: could not fsync segment 0 of relation 1663/16384/29270/1: No such > file or directory > ERROR: checkpoint request failed Some tests show that when the serial regression tests are run in a freshly initdb

[HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-13 Thread Tom Lane
Two different buildfarm machines are currently showing the same failure: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=kudu&dt=2008-10-13%2015:30:00 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dragonfly&dt=2008-10-13%2016:30:01 The postmaster log in each case shows ERROR: could not fsy