On 13/09/10 20:43, Jeff Davis wrote:
On Mon, 2010-09-13 at 09:10 +0300, Heikki Linnakangas wrote:
but we should be consistent and document that:
(a) it shouldn't happen
(b) that it's just a sanity check and we're ignoring the race
Would this be sufficient?
--- a/src/backend/port/unix_la
On Mon, 2010-09-13 at 09:10 +0300, Heikki Linnakangas wrote:
> > but we should be consistent and document that:
> > (a) it shouldn't happen
> > (b) that it's just a sanity check and we're ignoring the race
>
> Would this be sufficient?
>
> --- a/src/backend/port/unix_latch.c
> +++ b/src/backe
On 12/09/10 20:13, Jeff Davis wrote:
On Sun, 2010-09-12 at 12:29 -0400, Tom Lane wrote:
... why throw an ERROR there if it can't happen (or
indicates an inconsistent state when it does happen)?
Are you suggesting that an Assert would be sufficient?
I'm not too picky about whether it's Assert
On Sun, 2010-09-12 at 14:12 -0400, Tom Lane wrote:
> Uh, this is nonsense. You have to have something like these functions
> to support transferring ownership of a latch from one process to
> another, which is required at least for the walreceiver usage.
Oh, I see. It's needed to know where to se
Jeff Davis writes:
> However, that also means that the whole concept of OwnLatch/DisownLatch
> is entirely redundant, and only there for asserts because it doesn't do
> anything else. That seems a little strange to me, as well, so (at
> minimum) it should be documented that the functions really ha
On Sun, 2010-09-12 at 10:13 -0700, Jeff Davis wrote:
> I'm not too picky about whether it's Assert, ERROR, or PANIC (Asserts
> aren't available in production systems, which I assume is why elog was
> used); but we should be consistent and document that:
> (a) it shouldn't happen
> (b) that it's j
On Sun, 2010-09-12 at 12:29 -0400, Tom Lane wrote:
> > ... why throw an ERROR there if it can't happen (or
> > indicates an inconsistent state when it does happen)?
>
> Are you suggesting that an Assert would be sufficient?
>
I'm not too picky about whether it's Assert, ERROR, or PANIC (Asserts
Jeff Davis writes:
> I glanced at the code, and I see (in OwnLatch()):
> + if (latch->owner_pid != 0)
> + elog(ERROR, "latch already owned");
> + latch->owner_pid = MyProcPid;
> But it looks like there may be a race there.
Yeah, that error check is only intended to cat
On Sat, 2010-09-11 at 19:15 +0300, Heikki Linnakangas wrote:
> Committed. I'll take a look at making walreceiver respond quickly when
> WAL arrives in the standby, using latches, but that shouldn't interfere
> with what you're doing.
I glanced at the code, and I see (in OwnLatch()):
+ if
On 11/09/10 18:02, Tom Lane wrote:
Heikki Linnakangas writes:
Barring any last-minute objections, I'll commit this in the next few
days. This patch doesn't affect walreceiver yet; I think the next step
is to use the latches to eliminate the polling loop in walreceiver too,
so that as soon as a
Heikki Linnakangas writes:
> On 06/09/10 19:27, Heikki Linnakangas wrote:
>> It seems that NumSharedLatches() is entirely wrongly placed if it's in
>> the win32 specific code! That needs to be somewhere shared, so people find
>> it,
> Yeah. There's a notice of that in OwnLatch(), but it would be
On 06/09/10 19:27, Heikki Linnakangas wrote:
On 06/09/10 17:18, Tom Lane wrote:
BTW, on reflection the AcquireLatch/ReleaseLatch terminology seems a bit
ill chosen: ReleaseLatch sounds way too much like something that would
just unlock or clear the latch. Perhaps OwnLatch/DisownLatch, or
somethi
On 09/08/2010 08:01 PM, Heikki Linnakangas wrote:
Yeah, there isn't much you can do about it. Perhaps you could set a
"mayday flag" (a global boolean variable) if it fails, and check that in
the main loop, elogging a warning there instead. But I don't think we
need to go to such lengths, realisti
On 09/08/2010 08:18 PM, Tom Lane wrote:
Considering that we know that major platforms
such as FreeBSD have changed their implementations *very* recently,
it seems foolish to assume that an executable built on a machine with
corrected pselect could not be run on one with an older implementation.
On 08/09/10 23:07, Martijn van Oosterhout wrote:
On Mon, Sep 06, 2010 at 07:24:59PM +0200, Markus Wanner wrote:
Do I understand correctly that the purpose of this patch is to work
around the brokenness of select() on very few platforms? Or is there any
additional feature that plain signals don't
On Mon, Sep 06, 2010 at 07:24:59PM +0200, Markus Wanner wrote:
> Do I understand correctly that the purpose of this patch is to work
> around the brokenness of select() on very few platforms? Or is there any
> additional feature that plain signals don't give us?
If the issue is just that selec
Martijn van Oosterhout writes:
> If the issue is just that select() doesn't get interrupted and we don't
> care about a couple of syscalls, would it not be better to simply use
> sigaction to turn on SA_RESTART just prior to the select() and turn it
> off just after. Or are these systems so broken
Heikki Linnakangas writes:
> On 08/09/10 20:36, Markus Wanner wrote:
>> It should be possible to reliably determine the platforms that provide
>> an atomic pselect(). For those, I'm hesitant to use a "trick", where
>> pselect() clearly provides a simpler and more "official" alternative.
>> Especia
On 08/09/10 20:36, Markus Wanner wrote:
On 09/06/2010 11:03 PM, Tom Lane wrote:
I don't entirely see the point of opening ourselves up to the risk of
using a pselect that's not safe under the hood.
It should be possible to reliably determine the platforms that provide
an atomic pselect(). For
Hi,
On 09/06/2010 11:03 PM, Tom Lane wrote:
I don't entirely see the point of opening ourselves up to the risk of
using a pselect that's not safe under the hood.
It should be possible to reliably determine the platforms that provide
an atomic pselect(). For those, I'm hesitant to use a "trick
On 09/07/2010 09:06 AM, Heikki Linnakangas wrote:
Setting a latch that's already set is very fast, so you want to keep it
set until the last moment. See the coding in walsender for example, it
goes to some lengths to avoid clearing the latch until it's very sure
there's no more work for it to do.
On 06/09/10 20:24, Markus Wanner wrote:
On 09/06/2010 06:27 PM, Heikki Linnakangas wrote:
+ * It's important to reset the latch*before* checking if there's work to
+ * do. Otherwise, if someone sets the latch between the check and the
+ * ResetLatch call, you will miss it and Wait will block.
On 06/09/10 23:10, Markus Wanner wrote:
Good. How about syscall overhead? One more write operation to the
self-pipe per signal from within the signal handler and one read to
actually clear the 'ready' state of the pipe from the waiter portion of
the code, right?
Right.
Do we plan to replace a
Markus Wanner writes:
> AFAICT the custom select() implementation we are using for Windows could
> easily be changed to mimic pselect() instead. Thus most reasonably
> up-to-date Linux distributions plus Windows certainly provide a workable
> pselect() syscall. Would it be worth using pselect()
On 09/06/2010 08:46 PM, Tom Lane wrote:
Well, it's not defined in the Single Unix Spec, which is our customary
reference for portability.
FWIW, I bet the self-pipe trick isn't mentioned there, either... any
good precedence that it actually works as expected on all of the target
platforms? Exi
Markus Wanner writes:
> Is pselect() really as unportable as stated in the patch? What platforms
> have problems with pselect()?
Well, it's not defined in the Single Unix Spec, which is our customary
reference for portability. Also, it's alleged that some platforms have
it but in a form that's
Hi,
On 09/06/2010 06:27 PM, Heikki Linnakangas wrote:
Here's an updated patch, with all the issues reported this far fixed,
except for that naming issue, and Fujii's suggestion to use poll()
instead of select() where available. I've also polished it quite a bit,
improving comments etc. Magnus, c
On 06/09/10 17:18, Tom Lane wrote:
Heikki Linnakangas writes:
I think we have just a terminology issue. What you're describing is
exactly how it works now, if you just s/InitLatch/AcquireLatch.
No, it isn't. What I'm suggesting requires breaking InitLatch into two
operations.
We also need
Heikki Linnakangas writes:
> On 03/09/10 21:50, Tom Lane wrote:
>> Well, in that case what we need to do is presume that the latch object
>> has a continuing existence but the owner/receiver can come and go.
>> I would suggest that InitLatch needs to initialize the object into a
>> valid but unown
On 03/09/10 21:50, Tom Lane wrote:
Heikki Linnakangas writes:
On 03/09/10 21:16, Tom Lane wrote:
It's probably not too unreasonable to assume that pid_t assignment is
atomic. But I'm still thinking that we have bigger problems than that
if there are really cases where SetLatch can execute at
On Fri, 2010-09-03 at 18:24 -0400, Tom Lane wrote:
> Now the HS case likewise appears to be set up so that the signal can
> only directly interrupt ProcWaitForSignal, so I think the core issue
> is
> whether any deadlock situations are possible. Given that this gets
> called from a low-level place
On Fri, Sep 3, 2010 at 6:24 PM, Tom Lane wrote:
> Now the HS case likewise appears to be set up so that the signal can
> only directly interrupt ProcWaitForSignal, so I think the core issue is
> whether any deadlock situations are possible. Given that this gets
> called from a low-level place lik
Robert Haas writes:
> Color me confused; I may need to backpedal rapidly here. I had
> thought that what Tom was complaining about was the fact that the
> signal handler was taking LWLocks, which I would have thought to be
> totally unsafe.
Well, it's unsafe if the signal could interrupt mainlin
Heikki Linnakangas writes:
> A safer approach would be to just PGSemaphoreUnlock() in the signal
> handler, and do all the other processing outside it.
I don't see any particularly good reason to assume that
PGSemaphoreUnlock is safe either: you're still talking about nested
semop operations.
T
On Fri, Sep 3, 2010 at 4:20 PM, Heikki Linnakangas
wrote:
> Maybe that's ok, if I'm reading the deadlock checker code correctly, it also
> calls semop() to increment the another process' semaphore, and the deadlock
> checker can be invoked from a signal handler while in semop() to wait on our
> pr
On 03/09/10 19:38, Robert Haas wrote:
On Fri, Sep 3, 2010 at 12:10 PM, Tom Lane wrote:
Robert Haas writes:
On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane wrote:
[ shrug... ] I stated before that the Hot Standby patch is doing
utterly unsafe things in signal handlers. Simon rejected that.
I am
On Fri, Sep 3, 2010 at 3:11 PM, Ron Mayer wrote:
> Tom Lane wrote:
>> Robert Haas writes:
>>> On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane wrote:
[ shrug... ] I stated before that the Hot Standby patch is doing
utterly unsafe things in signal handlers. Simon rejected that.
I am wai
Tom Lane wrote:
> Robert Haas writes:
>> On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane wrote:
>>> [ shrug... ] I stated before that the Hot Standby patch is doing
>>> utterly unsafe things in signal handlers. Simon rejected that.
>>> I am waiting for irrefutable evidence to emerge from the field
>>
Heikki Linnakangas writes:
> On 03/09/10 21:16, Tom Lane wrote:
>> It's probably not too unreasonable to assume that pid_t assignment is
>> atomic. But I'm still thinking that we have bigger problems than that
>> if there are really cases where SetLatch can execute at approximately
>> the same ti
On 03/09/10 21:16, Tom Lane wrote:
Heikki Linnakangas writes:
WaitLatch had to set the pid on the Latch struct to allow other
processes to send the signal. Another process could call SetLatch and
read the pid field, while WaitLatch is just setting it. I think we'll
have to put a spinlock there,
Heikki Linnakangas writes:
> On 03/09/10 17:51, Tom Lane wrote:
>> If there is *any* possibility of that happening then you have far worse
>> problems than whether the field is atomically readable or not: the
>> behavior will be unpredictable at just slightly larger timescales.
> Each Walsender n
On 03/09/10 17:51, Tom Lane wrote:
Heikki Linnakangas writes:
On 02/09/10 23:13, Tom Lane wrote:
Also, using sig_atomic_t for owner_pid is entirely not sane.
Hmm, true, it doesn't need to be set from signal handler, but is there
an atomicity problem if one process calls ReleaseLatch while a
On Fri, Sep 3, 2010 at 12:10 PM, Tom Lane wrote:
> Robert Haas writes:
>> On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane wrote:
>>> [ shrug... ] I stated before that the Hot Standby patch is doing
>>> utterly unsafe things in signal handlers. Simon rejected that.
>>> I am waiting for irrefutable ev
Robert Haas writes:
> On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane wrote:
>> [ shrug... ] I stated before that the Hot Standby patch is doing
>> utterly unsafe things in signal handlers. Simon rejected that.
>> I am waiting for irrefutable evidence to emerge from the field
>> (and am very confiden
On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane wrote:
> Fujii Masao writes:
>> On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane wrote:
>>> elog(FATAL) is *certainly* not a better idea. I think there's really
>>> nothing that can be done, you just have to silently ignore the error.
>
>> Hmm.. some functions
Heikki Linnakangas writes:
> On 02/09/10 23:13, Tom Lane wrote:
>>> (Yeah, I realize the caller
>>> could look into the latch to find that out, but callers really ought to
>>> treat latches as opaque structs.)
> Hmm, maybe we need a TestLatch function to check if a latch is set.
+1. It could be
Fujii Masao writes:
> On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane wrote:
>> elog(FATAL) is *certainly* not a better idea. I think there's really
>> nothing that can be done, you just have to silently ignore the error.
> Hmm.. some functions called by a signal handler use elog(FATAL), e.g.,
> Reco
On 02/09/10 23:13, Tom Lane wrote:
The WaitLatch ...timeout API could use a bit of refinement. I'd suggest
defining negative timeout as meaning wait forever, so that timeout = 0
can be used for "check but don't wait". Also, it seems like the
function shouldn't just return void but should return
On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane wrote:
> Fujii Masao writes:
>>> + * XXX: Is it safe to elog(ERROR) in a signal handler?
>>>
>>> No, it isn't.
>
>> We should use elog(FATAL) or check proc_exit_inprogress, instead?
>
> elog(FATAL) is *certainly* not a better idea. I think
Fujii Masao writes:
>> + * XXX: Is it safe to elog(ERROR) in a signal handler?
>>
>> No, it isn't.
> We should use elog(FATAL) or check proc_exit_inprogress, instead?
elog(FATAL) is *certainly* not a better idea. I think there's really
nothing that can be done, you just have to s
On Fri, Sep 3, 2010 at 5:13 AM, Tom Lane wrote:
> Does ReleaseLatch() have any actual use-case, and if so what would it be?
> I think we could do without it.
In Unix, probably we can live without that. But in Windows, we need to
free SharedEventHandles slot for upcoming process using a latch when
Heikki Linnakangas writes:
> Ok, here's an updated patch with WaitLatchOrSocket that let's you do that.
Minor code review items:
s/There is three/There are three/ in unix_latch.c header comment
The header comment points out the correct usage of ResetLatch, but I
think it should also emphasize t
On 02/09/10 06:46, Fujii Masao wrote:
On Wed, Sep 1, 2010 at 4:11 PM, Heikki Linnakangas
wrote:
The obvious next question is how to wait for multiple sockets and a latch at
the same time? Perhaps we should have a select()-like interface where you
can pass multiple file descriptors. Then again,
On Wed, Sep 1, 2010 at 4:11 PM, Heikki Linnakangas
wrote:
> The obvious next question is how to wait for multiple sockets and a latch at
> the same time? Perhaps we should have a select()-like interface where you
> can pass multiple file descriptors. Then again, looking at the current
> callers of
On 31/08/10 15:47, Fujii Masao wrote:
On Fri, Aug 27, 2010 at 4:39 PM, Fujii Masao wrote:
/*
* XXX: Should we invent an API to wait for data coming from the
* client connection too? It's not critical, but we could then
* eliminate the timeout altogether and go to sleep for good.
*/
Ye
Tom Lane wrote:
> Greg Smith writes:
> > Tom Lane wrote:
> >> Well, yes they are. They cause unnecessary process wakeups and thereby
> >> consume cycles even when the database is idle. See for example a
> >> longstanding complaint here:
> >> https://bugzilla.redhat.com/show_bug.cgi?id=252129
>
On Fri, Aug 27, 2010 at 4:39 PM, Fujii Masao wrote:
>> /*
>> * XXX: Should we invent an API to wait for data coming from the
>> * client connection too? It's not critical, but we could then
>> * eliminate the timeout altogether and go to sleep for good.
>> */
>
> Yes, it would be very helpful
On Tue, Aug 31, 2010 at 4:06 PM, Heikki Linnakangas
wrote:
> Here's a 2nd version of the "latch" patch. Now with a Windows
> implementation. Comments welcome.
Seems good.
Two minor comments:
> rc = WaitForSingleObject(latch->event, timeout / 1000);
> if (rc == WAIT_FAILED)
> {
> ereport(E
Here's a 2nd version of the "latch" patch. Now with a Windows
implementation. Comments welcome.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
diff --git a/configure b/configure
index bd9b347..432cd58 100755
--- a/configure
+++ b/configure
@@ -27773,6 +27773,13 @@ _ACEOF
Tom Lane wrote:
Greg Smith writes:
... The only clear case where this is
always a win is when the system it totally idle.
If you'll climb down off that horse for a moment: yeah, the idle case is
*exactly* what they're complaining about.
I wasn't on a horse here--I saw the specific c
Greg Smith writes:
> Tom Lane wrote:
>> Well, yes they are. They cause unnecessary process wakeups and thereby
>> consume cycles even when the database is idle. See for example a
>> longstanding complaint here:
>> https://bugzilla.redhat.com/show_bug.cgi?id=252129
> ... The only clear case whe
Robert Haas writes:
> On Fri, Aug 27, 2010 at 5:54 PM, Greg Smith wrote:
>> The way the background writer wakes up periodically to absorb fsync requests
>> is already way too infrequent on a busy system.
> Maybe instead of a fixed-duration sleep we could wake it up when it
> needs to do somethin
On Fri, Aug 27, 2010 at 5:54 PM, Greg Smith wrote:
> Tom Lane wrote:
>>
>> Well, yes they are. They cause unnecessary process wakeups and thereby
>> consume cycles even when the database is idle. See for example a
>> longstanding complaint here:
>> https://bugzilla.redhat.com/show_bug.cgi?id=252
Tom Lane wrote:
Well, yes they are. They cause unnecessary process wakeups and thereby
consume cycles even when the database is idle. See for example a
longstanding complaint here:
https://bugzilla.redhat.com/show_bug.cgi?id=252129
If we're going to go to the trouble of having a mechanism like
On 27/08/10 10:39, Fujii Masao wrote:
On Thu, Aug 26, 2010 at 7:40 PM, Heikki Linnakangas
wrote:
There's two kinds of latches, local and global. Local latches can only be
set from the same process - allowing you to replace pg_usleep() with
something that is always interruptible by signals (by
On Thu, Aug 26, 2010 at 7:40 PM, Heikki Linnakangas
wrote:
> Here's a first attempt at implementing that. To demonstrate how it works, I
> modified walsender to use the new latch facility, also to respond quickly to
> SIGHUP and SIGTERM.
Great!
> There's two kinds of latches, local and global. L
On 24/08/10 02:44, Tom Lane wrote:
Heikki Linnakangas writes:
[ "latch" proposal ]
This seems reasonably clean as far as signal conditions generated
internally to Postgres go, but I remain unclear on how it helps for
response to actual signals.
You can can set the latch in the signal handle
On 24/08/10 04:08, Alvaro Herrera wrote:
Excerpts from Tom Lane's message of lun ago 23 19:44:02 -0400 2010:
Heikki Linnakangas writes:
[ "latch" proposal ]
This seems reasonably clean as far as signal conditions generated
internally to Postgres go, but I remain unclear on how it helps for
Excerpts from Tom Lane's message of lun ago 23 19:44:02 -0400 2010:
> Heikki Linnakangas writes:
> > [ "latch" proposal ]
>
> This seems reasonably clean as far as signal conditions generated
> internally to Postgres go, but I remain unclear on how it helps for
> response to actual signals.
Thi
Heikki Linnakangas writes:
> On 20/08/10 17:28, Tom Lane wrote:
>> Well, yes they are. They cause unnecessary process wakeups and thereby
>> consume cycles even when the database is idle. See for example a
>> longstanding complaint here:
>> https://bugzilla.redhat.com/show_bug.cgi?id=252129
>>
On 20/08/10 17:28, Tom Lane wrote:
[ It's way past time to change the thread title ]
Heikki Linnakangas writes:
On 20/08/10 16:24, Tom Lane wrote:
You keep on proposing solutions that only work for walsender :-(.
Well yes, the other places where we use pg_usleep() are not really a
problem
[ It's way past time to change the thread title ]
Heikki Linnakangas writes:
> On 20/08/10 16:24, Tom Lane wrote:
>> You keep on proposing solutions that only work for walsender :-(.
> Well yes, the other places where we use pg_usleep() are not really a
> problem as is.
Well, yes they are. Th
72 matches
Mail list logo