Re: [HACKERS] Hot standby and b-tree killed items

2009-01-13 Thread Simon Riggs
On Tue, 2008-12-30 at 18:31 +0200, Heikki Linnakangas wrote: > You have to be careful to ignore the flags in read-only transactions > that started in hot standby mode, even if recovery has since ended and > we're in normal operation now. My initial implementation in v6 worked, but had a corner

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-31 Thread Simon Riggs
On Tue, 2008-12-30 at 18:31 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > (a) always ignore LP_DEAD flags we see when reading index during > > recovery. > > This sounds simplest, and it's nice to not clear the flags for the > benefit of transactions running after the recovery is done

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-30 Thread Heikki Linnakangas
Simon Riggs wrote: (a) always ignore LP_DEAD flags we see when reading index during recovery. This sounds simplest, and it's nice to not clear the flags for the benefit of transactions running after the recovery is done. You have to be careful to ignore the flags in read-only transactions t

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-30 Thread Heikki Linnakangas
Simon Riggs wrote: Issues (2) and (3) would go away entirely if both standby and primary always had the same xmin value as a system-wide setting. i.e. the standby and primary are locked together at their xmins. Perhaps that was Heikki's intention in recent suggestions? No, I only suggested tha

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-30 Thread Simon Riggs
On Fri, 2008-12-19 at 09:22 -0500, Greg Stark wrote: > I'm confused shouldn't read-only transactions on the slave just be > hacked to not set any hint bits including lp_delete? It seems there are multiple issues involved and I saw only the first of these initially. I want to explicitly separat

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-29 Thread Heikki Linnakangas
marcin mank wrote: Perhaps we should listen to the people that have said they don't want queries cancelled, even if the alternative is inconsistent answers. I don't like that much. PostgreSQL has traditionally avoided that very hard. It's hard to tell what kind of inconsistencies you'd get, as

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-25 Thread marcin mank
> Perhaps we should listen to the people that have said they don't want > queries cancelled, even if the alternative is inconsistent answers. I think an alternative to that would be "if the wal backlog is too big, let current queries finish and let incoming queries wait till the backlog gets small

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Pavan Deolasee
On Wed, Dec 24, 2008 at 7:18 PM, Simon Riggs wrote: > > > > With respect, I was hoping you might look in the patch and see if you > agree with the way it is handled. No need to remember. The whole > latestRemovedXid concept is designed to do help. > Well, that's common for all cleanup record incl

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Simon Riggs
On Wed, 2008-12-24 at 09:59 -0500, Robert Treat wrote: > I think the uncertainty comes from peoples experience with typical > replication > use cases vs a lack of experience with this current implementation. Quite possibly. Publishing user feedback on this will be very important in making t

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Robert Treat
On Wednesday 24 December 2008 08:48:04 Simon Riggs wrote: > On Wed, 2008-12-24 at 17:56 +0530, Pavan Deolasee wrote: > > Again, I haven't seen how frequently queries may get canceled. Or if > > the delay is set to a large value, how far behind standby may get > > during replication, so I can't real

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Simon Riggs
On Wed, 2008-12-24 at 17:56 +0530, Pavan Deolasee wrote: > On Wed, Dec 24, 2008 at 5:26 PM, Simon Riggs wrote: > > > > > > > The patch does go to some trouble to handle that case, as I'm sure > > you've seen. Are you saying that part of the patch is ineffective and > > should be removed, or? > >

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Pavan Deolasee
On Wed, Dec 24, 2008 at 5:26 PM, Simon Riggs wrote: > > > The patch does go to some trouble to handle that case, as I'm sure > you've seen. Are you saying that part of the patch is ineffective and > should be removed, or? > Umm.. are you talking about the "wait" mechanism ? That's the only thing

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Simon Riggs
On Wed, 2008-12-24 at 16:48 +0530, Pavan Deolasee wrote: > On Wed, Dec 24, 2008 at 4:41 PM, Simon Riggs wrote: > > > > > > Greg and Heikki have highlighted in this thread some aspects of btree > > garbage collection that will increase the chance of queries being > > cancelled in various circumsta

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Pavan Deolasee
On Wed, Dec 24, 2008 at 4:41 PM, Simon Riggs wrote: > > > Greg and Heikki have highlighted in this thread some aspects of btree > garbage collection that will increase the chance of queries being > cancelled in various circumstances Even HOT-prune may lead to frequent query cancellations and unli

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Simon Riggs
On Tue, 2008-12-23 at 23:59 -0500, Robert Treat wrote: > On Friday 19 December 2008 19:36:42 Simon Riggs wrote: > > Perhaps we should listen to the people that have said they don't want > > queries cancelled, even if the alternative is inconsistent answers. That > > is easily possible yet is not c

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Robert Treat
On Friday 19 December 2008 19:36:42 Simon Riggs wrote: > Perhaps we should listen to the people that have said they don't want > queries cancelled, even if the alternative is inconsistent answers. That > is easily possible yet is not currently an option. Plus we have the > option I referred to up t

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Robert Treat
On Saturday 20 December 2008 04:10:21 Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Sat, 2008-12-20 at 09:21 +0200, Heikki Linnakangas wrote: > >> Gregory Stark wrote: > >>> Simon Riggs writes: > Increasing the waiting time increases the failover time and thus > decreases the val

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Simon Riggs
On Fri, 2008-12-19 at 14:23 -0600, Kevin Grittner wrote: > > I guess making it that SQLSTATE would make it simpler to understand > why > > the error occurs and also how to handle it (i.e. resubmit). > > Precisely. Just confirming I will implement the SQLSTATE as requested. I recognize my own

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Simon Riggs
On Sat, 2008-12-20 at 20:09 -0300, Alvaro Herrera wrote: > Heikki Linnakangas wrote: > > Gregory Stark wrote: > >> A vacuum being replayed -- even in a different database -- could trigger > >> the > >> error. Or with the btree split issue, a data load -- again even in a > >> different > >> databa

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Simon Riggs
On Sat, 2008-12-20 at 22:07 +0200, Heikki Linnakangas wrote: > Gregory Stark wrote: > > A vacuum being replayed -- even in a different database -- could trigger the > > error. Or with the btree split issue, a data load -- again even in a > > different > > database -- would be quite likely cause yo

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-21 Thread Alvaro Herrera
Heikki Linnakangas wrote: > Alvaro Herrera wrote: >> Heikki Linnakangas wrote: >>> Gregory Stark wrote: A vacuum being replayed -- even in a different database -- could trigger the error. Or with the btree split issue, a data load -- again even in a different database --

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-21 Thread Heikki Linnakangas
Alvaro Herrera wrote: Heikki Linnakangas wrote: Gregory Stark wrote: A vacuum being replayed -- even in a different database -- could trigger the error. Or with the btree split issue, a data load -- again even in a different database -- would be quite likely cause your SELECT to be killed. Hmm,

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-20 Thread Alvaro Herrera
Heikki Linnakangas wrote: > Gregory Stark wrote: >> A vacuum being replayed -- even in a different database -- could trigger the >> error. Or with the btree split issue, a data load -- again even in a >> different >> database -- would be quite likely cause your SELECT to be killed. > > Hmm, I wond

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-20 Thread Heikki Linnakangas
Gregory Stark wrote: A vacuum being replayed -- even in a different database -- could trigger the error. Or with the btree split issue, a data load -- again even in a different database -- would be quite likely cause your SELECT to be killed. Hmm, I wonder if we should/could track the "latestRe

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-20 Thread Heikki Linnakangas
Simon Riggs wrote: On Sat, 2008-12-20 at 09:21 +0200, Heikki Linnakangas wrote: Gregory Stark wrote: Simon Riggs writes: Increasing the waiting time increases the failover time and thus decreases the value of the standby as an HA system. Others value high availability higher than you and so

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-20 Thread Simon Riggs
On Sat, 2008-12-20 at 09:21 +0200, Heikki Linnakangas wrote: > Gregory Stark wrote: > > Simon Riggs writes: > > > >> Increasing the waiting time increases the failover time and thus > >> decreases the value of the standby as an HA system. Others value high > >> availability higher than you and s

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Heikki Linnakangas
Heikki Linnakangas wrote: Gregory Stark wrote: The question I had was whether your solution for btree pointers marked dead and later dropped from the index works when the user hasn't configured a timeout and doesn't want standby queries killed. Yes, it's not any different from vacuum WAL reco

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Heikki Linnakangas
Gregory Stark wrote: Simon Riggs writes: Increasing the waiting time increases the failover time and thus decreases the value of the standby as an HA system. Others value high availability higher than you and so we had agreed to provide an option to allow the max waiting time to be set. Sure

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs
On Fri, 2008-12-19 at 19:29 -0500, Robert Treat wrote: > On Friday 19 December 2008 05:52:42 Simon Riggs wrote: > > BTW, I noticed the other day that Oracle 11g only allows you to have a > > read only slave *or* allows you to continue replaying. You need to > > manually switch back and forth betwe

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs
On Fri, 2008-12-19 at 20:54 +, Gregory Stark wrote: > "Kevin Grittner" writes: > > > PostgreSQL is much less prone to serialization failures, but it is > > certainly understandable if hot standby replication introduces new > > cases of it. > > In this case it will be possible to get this er

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Robert Treat
On Friday 19 December 2008 05:52:42 Simon Riggs wrote: > BTW, I noticed the other day that Oracle 11g only allows you to have a > read only slave *or* allows you to continue replaying. You need to > manually switch back and forth between those modes. They can't do > *both*, as Postgres will be able

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
>>> Gregory Stark wrote: > I think the fundamental difference is that a deadlock or serialization > failure > can be predicted as a potential problem when writing the code. This is > something that can happen for any query any time, even plain old read-only > select queries. I've heard that

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
>>> Gregory Stark wrote: > "Kevin Grittner" writes: > >> PostgreSQL is much less prone to serialization failures, but it is >> certainly understandable if hot standby replication introduces new >> cases of it. > > In this case it will be possible to get this error even if you're just > runnin

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Gregory Stark
"Kevin Grittner" writes: > PostgreSQL is much less prone to serialization failures, but it is > certainly understandable if hot standby replication introduces new > cases of it. In this case it will be possible to get this error even if you're just running a single SELECT query -- and that's the

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
>>> Simon Riggs wrote: > The SQL Standard specifically names this error as thrown when "it > detects the inability to guarantee the serializability of two or more > concurrent SQL-transactions". Now that really should only apply when > running with SERIALIZABLE transactions, I disagree. Data

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Gregory Stark
"Kevin Grittner" writes: Simon Riggs wrote: > >> max_standby_delay is set in recovery.conf, value 0 (forever) - > 2,000,000 >> secs, settable in milliseconds. So think of it like a deadlock > detector >> for recovery apply. > > Aha! A deadlock is a type of serialization failure. (In

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Gregory Stark
Simon Riggs writes: > Increasing the waiting time increases the failover time and thus > decreases the value of the standby as an HA system. Others value high > availability higher than you and so we had agreed to provide an option > to allow the max waiting time to be set. Sure, it's a nice opt

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs
On Fri, 2008-12-19 at 13:47 -0600, Kevin Grittner wrote: > >>> Simon Riggs wrote: > > > max_standby_delay is set in recovery.conf, value 0 (forever) - > 2,000,000 > > secs, settable in milliseconds. So think of it like a deadlock > detector > > for recovery apply. > > Aha! A deadlock is a t

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
>>> Simon Riggs wrote: > max_standby_delay is set in recovery.conf, value 0 (forever) - 2,000,000 > secs, settable in milliseconds. So think of it like a deadlock detector > for recovery apply. Aha! A deadlock is a type of serialization failure. (In fact, on databases with lock-based concur

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs
On Fri, 2008-12-19 at 18:59 +, Gregory Stark wrote: > Simon Riggs writes: > > > The error message ought to be "snapshot too old", which could raise a > > chuckle, so I called it something else. > > > > The point you raise is a good one and I think we should publish a list > > of retryable er

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Gregory Stark
Simon Riggs writes: > The error message ought to be "snapshot too old", which could raise a > chuckle, so I called it something else. > > The point you raise is a good one and I think we should publish a list > of retryable error messages. I contemplated once proposing a special log > level for a

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
>>> Simon Riggs wrote: > I understand the need, but we won't be using SQLSTATE = 40001. > > That corresponds to ERRCODE_T_R_SERIALIZATION_FAILURE, which that error > would not be. Isn't it a problem with serialization of database transactions? You hit it in a different way, but if it is a t

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs
On Fri, 2008-12-19 at 10:52 +, Simon Riggs wrote: > > You could > > conservatively use OldestXmin as latestRemovedXid, but that could stall > > the WAL redo a lot more than necessary. Or you could store > > latestRemovedXid in the page header, but that would need to be > > WAL-logged to

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs
On Fri, 2008-12-19 at 11:54 -0600, Kevin Grittner wrote: > >>> Simon Riggs wrote: > > > If I was going to add anything to the btree page header, it would be > > latestRemovedLSN, only set during recovery. That way we don't have > to > > explicitly kill queries, we can do the a wait on OldestXm

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
>>> Simon Riggs wrote: > If I was going to add anything to the btree page header, it would be > latestRemovedLSN, only set during recovery. That way we don't have to > explicitly kill queries, we can do the a wait on OldestXmin then let > them ERROR out when they find a page that has been modif

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs
On Fri, 2008-12-19 at 09:22 -0500, Greg Stark wrote: > I'm confused shouldn't read-only transactions on the slave just be > hacked to not set any hint bits including lp_delete? They could be, though I see no value in doing so. But that is not Heikki's point. He is discussing what happens on

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Greg Stark
I'm confused shouldn't read-only transactions on the slave just be hacked to not set any hint bits including lp_delete? -- Greg On 19 Dec 2008, at 03:49, Heikki Linnakangas > wrote: Whenever a B-tree index scan fetches a heap tuple that turns out to be dead, the B-tree item is marked as k

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs
On Fri, 2008-12-19 at 12:24 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > We have infrastructure in place to make this work correctly, just need > > to add latestRemovedXid field to xl_btree_vacuum. So that part is easily > > solved. > > That's tricky because there's no xmin/xmax on i

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Heikki Linnakangas
Simon Riggs wrote: We have infrastructure in place to make this work correctly, just need to add latestRemovedXid field to xl_btree_vacuum. So that part is easily solved. That's tricky because there's no xmin/xmax on index tuples. You could conservatively use OldestXmin as latestRemovedXid, bu

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs
On Fri, 2008-12-19 at 10:49 +0200, Heikki Linnakangas wrote: > Whenever a B-tree index scan fetches a heap tuple that turns out to be > dead, the B-tree item is marked as killed by calling _bt_killitems. When > the page gets full, all the killed items are removed by calling > _bt_vacuum_one_pa