Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-18 Thread Heikki Linnakangas
On 02/06/2014 06:42 AM, Peter Geoghegan wrote: I'm not sure about this: *** _bt_findinsertloc(Relation rel, *** 675,680 --- 701,707 static void _bt_insertonpg(Relation rel, Buffer buf, + Buffer cbuf,

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-18 Thread Heikki Linnakangas
On 02/06/2014 01:54 AM, Peter Geoghegan wrote: On Thu, Jan 23, 2014 at 1:36 PM, Peter Geoghegan p...@heroku.com wrote: So while post-recovery callbacks no longer exist for any rmgr-managed-resource, 100% of remaining startup and cleanup callbacks concern the simple management of memory of

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-18 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: Yeah, it's a bit silly that each resource manager has to do that on their own. It would be useful to have a memory context that was automatically reset between each WAL record. In fact that should probably be the default memory context you

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-14 Thread Peter Geoghegan
Ping? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-14 Thread Heikki Linnakangas
On 03/14/2014 01:03 PM, Peter Geoghegan wrote: Ping? I committed the other patch this depends on now. I'll take another stab at this one next. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-05 Thread Peter Geoghegan
On Tue, Feb 4, 2014 at 11:56 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Since, as I mentioned, _bt_finish_split() ultimately unlocks *and unpins*, it may not be the same buffer as before, so even with the refactoring there are race conditions. Care to elaborate? Or are you just

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-05 Thread Peter Geoghegan
On Thu, Jan 23, 2014 at 1:36 PM, Peter Geoghegan p...@heroku.com wrote: So while post-recovery callbacks no longer exist for any rmgr-managed-resource, 100% of remaining startup and cleanup callbacks concern the simple management of memory of AM-specific recovery contexts (for GiST, GiN and

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-05 Thread Peter Geoghegan
On Tue, Feb 4, 2014 at 11:56 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I also changed _bt_moveright to never return a write-locked buffer, when the caller asked for a read-lock (an issue you pointed out earlier in this thread). I think that _bt_moveright() looks good now. There is

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-05 Thread Peter Geoghegan
Some more thoughts: Please add comments above _bt_mark_page_halfdead(), a new routine from the dependency patch. I realize that this is substantially similar to part of how _bt_pagedel() used to work, but it's still incongruous. ! Our approach is to create any missing downlinks on-they-fly,

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-04 Thread Heikki Linnakangas
On 02/04/2014 02:40 AM, Peter Geoghegan wrote: On Fri, Jan 31, 2014 at 9:09 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I refactored the loop in _bt_moveright to, well, not have that bug anymore. The 'page' and 'opaque' pointers are now fetched at the beginning of the loop. Did I miss

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-03 Thread Peter Geoghegan
On Fri, Jan 31, 2014 at 9:09 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I refactored the loop in _bt_moveright to, well, not have that bug anymore. The 'page' and 'opaque' pointers are now fetched at the beginning of the loop. Did I miss something? I think so, yes. You still aren't

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-31 Thread Heikki Linnakangas
On 01/30/2014 12:46 AM, Peter Geoghegan wrote: On Mon, Jan 27, 2014 at 10:54 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think I see some bugs in _bt_moveright(). If you examine _bt_finish_split() in detail,

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-29 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 10:54 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think I see some bugs in _bt_moveright(). If you examine _bt_finish_split() in detail, you'll see that it doesn't just drop the write

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Heikki Linnakangas
On 01/23/2014 11:36 PM, Peter Geoghegan wrote: The first thing I noticed about this patchset is that it completely expunges btree_xlog_startup(), btree_xlog_cleanup() and btree_safe_restartpoint(). The post-recovery cleanup that previously occurred to address both sets of problems (the problem

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think I see some bugs in _bt_moveright(). If you examine _bt_finish_split() in detail, you'll see that it doesn't just drop the write buffer lock that the caller will have provided (per its comments) - it

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 10:58 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Okay, promise not to laugh. I did write a bunch of hacks, to generate graphviz .dot files from the btree pages, and render them into pictures. It consist of multiple parts, all in the attached tarball. It's

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-23 Thread Peter Geoghegan
On Thu, Nov 14, 2013 at 9:23 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Ok, here's a new version of the patch to handle incomplete B-tree splits. I finally got around to taking a look at this. Unlike with the as-yet uncommitted Race condition in b-tree page deletion patch that Kevin

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-25 Thread Heikki Linnakangas
On 22.10.2013 19:55, Heikki Linnakangas wrote: I fixed the the same problem in GiST a few years back, by making it tolerate missing downlinks, and inserting them lazily. The B-tree code tolerates them already on scans, but gets confused on insertion, as seen above. I propose that we use the same

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Peter Geoghegan
On Tue, Oct 22, 2013 at 9:55 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I propose that we use the same approach I used with GiST, and add a flag to the page header to indicate the downlink hasn't been inserted yet. When insertion (or vacuum) bumps into a flagged page, it can finish

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Heikki Linnakangas
On 22.10.2013 20:27, Peter Geoghegan wrote: On Tue, Oct 22, 2013 at 9:55 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I propose that we use the same approach I used with GiST, and add a flag to the page header to indicate the downlink hasn't been inserted yet. When insertion (or

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Peter Geoghegan
On Tue, Oct 22, 2013 at 10:30 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I may be missing something, but there are already plenty of b-tree specific flags. See BTP_* in nbtree.h. I'll just add another to that list. Based on your remarks, I thought that you were intent on directly

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Andres Freund
On 2013-10-22 19:55:09 +0300, Heikki Linnakangas wrote: Splitting a B-tree page is a two-stage process: First, the page is split, and then a downlink for the new right page is inserted into the parent (which might recurse to split the parent page, too). What happens if inserting the downlink

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Heikki Linnakangas
On 22.10.2013 21:25, Andres Freund wrote: On 2013-10-22 19:55:09 +0300, Heikki Linnakangas wrote: Splitting a B-tree page is a two-stage process: First, the page is split, and then a downlink for the new right page is inserted into the parent (which might recurse to split the parent page, too).

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Andres Freund
On 2013-10-22 21:29:13 +0300, Heikki Linnakangas wrote: On 22.10.2013 21:25, Andres Freund wrote: On 2013-10-22 19:55:09 +0300, Heikki Linnakangas wrote: Splitting a B-tree page is a two-stage process: First, the page is split, and then a downlink for the new right page is inserted into the

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2013-10-22 21:29:13 +0300, Heikki Linnakangas wrote: We could put a critical section around the whole recursion that inserts the downlinks, so that you would get a PANIC and the incomplete split mechanism would fix it at recovery. But that would

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Heikki Linnakangas
On 22.10.2013 22:24, Tom Lane wrote: I wonder whether Heikki's approach could be used to remove the need for the incomplete-split-fixup code altogether, thus eliminating a class of recovery failure possibilities. Yes. I intend to do that, too. - Heikki -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Andres Freund
On 2013-10-22 15:24:40 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-10-22 21:29:13 +0300, Heikki Linnakangas wrote: We could put a critical section around the whole recursion that inserts the downlinks, so that you would get a PANIC and the incomplete split

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2013-10-22 15:24:40 -0400, Tom Lane wrote: No, that's hardly a good idea. As Heikki says, that would amount to converting an entirely foreseeable situation into a PANIC. But IIUC this can currently lead to an index giving wrong answers, not

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Andres Freund
On 2013-10-22 16:38:05 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-10-22 15:24:40 -0400, Tom Lane wrote: No, that's hardly a good idea. As Heikki says, that would amount to converting an entirely foreseeable situation into a PANIC. But IIUC this can