Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure mmonc...@gmail.com wrote: I still haven't categorically ruled out pl/sh yet; that's something to keep in mind. Well, after bisection proved not to be fruitful, I replaced the pl/sh calls with dummy calls that approximated the same behavior and the

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes: ...hm, I spoke to soon. So I deleted everything, and booted up a new instance 9.4 vanilla with asserts on and took no other action. Applying the script with no data activity fails an assertion every single time: TRAP: FailedAssertion(!(flags

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Wed, Jan 28, 2015 at 8:05 AM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure mmonc...@gmail.com wrote: I still haven't categorically ruled out pl/sh yet; that's something to keep in mind. Well, after bisection proved not to be fruitful, I replaced

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Wed, Jan 28, 2015 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: ...hm, I spoke to soon. So I deleted everything, and booted up a new instance 9.4 vanilla with asserts on and took no other action. Applying the script with no data activity fails an

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-24 Thread Martijn van Oosterhout
On Thu, Jan 22, 2015 at 03:50:03PM -0600, Merlin Moncure wrote: Quick update: not done yet, but I'm making consistent progress, with several false starts. (for example, I had a .conf problem with the new dynamic shared memory setting and git merrily bisected down to the introduction of the

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-23 Thread Jeff Janes
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure mmonc...@gmail.com wrote: So far, the 'nasty' damage seems to generally if not always follow a checksum failure and the checksum failures are always numerically adjacent. For example: [cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING:

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-23 Thread Peter Geoghegan
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure mmonc...@gmail.com wrote: Quick update: not done yet, but I'm making consistent progress, with several false starts. (for example, I had a .conf problem with the new dynamic shared memory setting and git merrily bisected down to the

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-22 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 5:20 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure mmonc...@gmail.com wrote: ISTM the next step is to bisect the problem down over the weekend in order to to narrow the search. If that doesn't turn up anything productive

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure mmonc...@gmail.com wrote: Running this test on another set of hardware to verify -- if this turns out to be a false alarm which it may very well be, I can only offer my

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 8:05 AM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure mmonc...@gmail.com wrote: Running this test on another set of hardware to verify -- if this turns

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Heikki Linnakangas
On 01/16/2015 04:05 PM, Merlin Moncure wrote: On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure mmonc...@gmail.com wrote: Running this test on another set of hardware to verify -- if this turns out to be a false alarm which

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Andres Freund
Hi, On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote: On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure mmonc...@gmail.com wrote: Running this test on another set of hardware to verify -- if this turns out to be a

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund and...@2ndquadrant.com wrote: Is there any chance you can package this somehow so that others can run it locally? It looks hard to find the actual bug here without adding instrumentation to to postgres. That's possible but involves a lot of

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote: On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure mmonc...@gmail.com wrote: Running

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Peter Geoghegan
On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure mmonc...@gmail.com wrote: ISTM the next step is to bisect the problem down over the weekend in order to to narrow the search. If that doesn't turn up anything productive I'll look into taking other steps. That might be the quickest way to do

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Peter Geoghegan
On Fri, Jan 16, 2015 at 6:21 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: It looks very much like that a page has for some reason been moved to a different block number. And that's exactly what Peter found out in his investigation too; an index page was mysteriously copied to a

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Peter Geoghegan
On Thu, Jan 15, 2015 at 6:02 AM, Merlin Moncure mmonc...@gmail.com wrote: Question: Coming in this morning I did an immediate restart and logged into the database and queried pg_class via index. Everything was fine, and the leftright verify returns nothing. How did it repair itself without

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 8:02 AM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/15/2015 03:23 AM, Peter Geoghegan wrote: So now the question is: how did that inconsistency arise? It didn't necessarily arise at

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 1:15 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, The plot thickens! I looped the test, still stock 9.4 as of this time and went to lunch. When I came back, the database was in recovery mode. Here is the rough sequence of events. Whoa. That looks scary. Did

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Andres Freund
On 2015-01-15 20:15:42 +0100, Andres Freund wrote: WARNING: did not find subXID 14955 in MyProc CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup WARNING: you don't own a lock of type RowExclusiveLock CONTEXT: PL/pgSQL function

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure mmonc...@gmail.com wrote: Since it's possible the database is a loss, do you see any value in bootstrappinng it again with checksums turned on? One point of note is that this is a brand spanking new SSD, maybe we nee to rule out hardware based

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 4:03 PM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure mmonc...@gmail.com wrote: Since it's possible the database is a loss, do you see any value in bootstrappinng it again with checksums turned on? One point of note is that

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Peter Geoghegan
On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure mmonc...@gmail.com wrote: Running this test on another set of hardware to verify -- if this turns out to be a false alarm which it may very well be, I can only offer my apologies! I've never had a new drive fail like that, in that manner. I'll

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 8:50 PM, Peter Geoghegan p...@heroku.com wrote: I am mistaken on one detail here - blocks 2 and 9 are actually fully identical. I still have no idea why, though. So, I've looked at it in more detail and it appears that the page of block 2 split at some point, thereby

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Heikki Linnakangas
On 01/15/2015 03:23 AM, Peter Geoghegan wrote: So now the question is: how did that inconsistency arise? It didn't necessarily arise at the time of the (presumed) split of block 2 to create 9. It could be that the opaque area was changed by something else, some time later. I'll investigate more.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/15/2015 03:23 AM, Peter Geoghegan wrote: So now the question is: how did that inconsistency arise? It didn't necessarily arise at the time of the (presumed) split of block 2 to create 9. It could be that

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Andres Freund
Hi, The plot thickens! I looped the test, still stock 9.4 as of this time and went to lunch. When I came back, the database was in recovery mode. Here is the rough sequence of events. Whoa. That looks scary. Did you see (some of) those errors before? Most of them should have been emitted

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund and...@2ndquadrant.com wrote: If you gdb in, and type 'fin' a couple times, to wait till the function finishes, is

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 4:53 PM, Merlin Moncure mmonc...@gmail.com wrote: yeah. via: cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index', s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv header; My immediate observation here is that blocks 2 and 9 have

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 6:50 PM, Peter Geoghegan p...@heroku.com wrote: This is great, but it's not exactly clear which bt_page_items() page is which - some are skipped, but I can't be sure which. Would you mind rewriting that query to indicate which block is under consideration by

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
This is great, but it's not exactly clear which bt_page_items() page is which - some are skipped, but I can't be sure which. Would you mind rewriting that query to indicate which block is under consideration by bt_page_items()? Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 2:32 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan p...@heroku.com wrote: Could you write some code to print out the block number (i.e. BlockNumber blkno) if there are more than, say, 5 retries within _bt_moveright()?

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 4:26 PM, Merlin Moncure mmonc...@gmail.com wrote: The index is the oid index on pg_class. Some more info: *) temp table churn is fairly high. Several dozen get spawned and destroted at the start of a replication run, all at once, due to some dodgy coding via dblink.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure mmonc...@gmail.com wrote: (gdb) print BufferGetBlockNumber(buf) $15 = 9 ..and it stays 9, continuing several times having set breakpoint. And the index involved? I'm

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure mmonc...@gmail.com wrote: (gdb) print BufferGetBlockNumber(buf) $15 = 9 ..and it stays 9, continuing several times having set breakpoint. And the index involved? I'm pretty sure that this in an internal page, no? -- Peter Geoghegan --

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 6:26 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure mmonc...@gmail.com wrote: (gdb) print BufferGetBlockNumber(buf) $15 = 9 ..and it stays 9,

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan p...@heroku.com wrote: My immediate observation here is that blocks 2 and 9 have identical metadata (from their page opaque area), but partially non-matching data items (however, the number of items on each block is consistent and correct

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan p...@heroku.com wrote: My immediate observation here is that blocks 2 and 9 have identical metadata (from their page opaque area), but partially non-matching data items (however, the number of items on each block is consistent and correct

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class Hah, I suspected

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes: There were seven process with that backtrace exact backtrace (except that randomly they are sleeping in the spinloop). Something else interesting: autovacuum has been running all night as well. Unlike the other process however, cpu utilization does

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:05 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 09:22:45 -0600, Merlin Moncure wrote: On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-14 10:05:01 -0500, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: There were seven process with that backtrace exact backtrace (except that randomly they are sleeping in the spinloop). Something else interesting: autovacuum has been running all

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 10:05:01 -0500, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query 7105,00:28:40.789221,f,autovacuum:

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 10:13:32 -0500, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: Yes, it is pg_class is coming from LockBufferForCleanup (). As you can see above, it has a shorter runtime. So it was killed off once about a half hour ago which did not free up the logjam. However,

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2015-01-14 10:05:01 -0500, Tom Lane wrote: Hah, I suspected as much. Is that the one that's stuck in LockBufferForCleanup, or the other one that's got a similar backtrace to all the user processes? Do you have a theory? Right now it primarily

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-14 10:05:01 -0500, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the autovac processes doing (according to

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes: Yes, it is pg_class is coming from LockBufferForCleanup (). As you can see above, it has a shorter runtime. So it was killed off once about a half hour ago which did not free up the logjam. However, AV spawned it again and now it does not respond

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund and...@2ndquadrant.com wrote: If you gdb in, and type 'fin' a couple times, to wait till the function finishes, is there actually any progress? I'm wondering whether it's just many catalog accesses + contention, or some other problem.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund and...@2ndquadrant.com wrote: If you gdb in, and type 'fin' a couple times, to wait till the function finishes, is there actually any progress? I'm wondering whether it's just many catalog

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 7:22 AM, Merlin Moncure mmonc...@gmail.com wrote: I'll try to pull commits that Peter suggested and see if that helps (I'm getting ready to bring the database down). I can send the code off-list if you guys think it'd help. Thanks for the code! I think it would be

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 7:24 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure mmonc...@gmail.com wrote: Some more information what's happening: This is a ghetto logical replication engine that migrates data from sql sever to postgres, consolidating a

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 8:03 AM, Merlin Moncure mmonc...@gmail.com wrote: Here's a backtrace: #0 0x00750a97 in spin_delay () #1 0x00750b19 in s_lock () #2 0x00750844 in LWLockRelease () #3 0x0073 in LockBuffer () #4 0x004b2db4 in

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:54 PM, Andres Freund and...@2ndquadrant.com wrote: I don't remember seeing _bt_moveright() or _bt_compare() figuring so prominently, where _bt_binsrch() is nowhere to be seen. I can't see a reference to _bt_binsrch() in either profile. Well, we do a _bt_moveright

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure mmonc...@gmail.com wrote: Some more information what's happening: This is a ghetto logical replication engine that migrates data from sql sever to postgres, consolidating a sharded database into a single set of tables (of which there are only

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not convinced that Peter is barking up the right tree. I'm noticing that the profiles seem rather skewed towards parser/planner work; so I suspect the contention is probably on access to system catalogs. No idea exactly

[HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On my workstation today (running vanilla 9.4.0) I was testing some new code that does aggressive parallel loading to a couple of tables. It ran ok several dozen times and froze up with no external trigger. There were at most 8 active backends that were stuck (the loader is threaded to a cap) --

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
Hi, On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote: On my workstation today (running vanilla 9.4.0) I was testing some new code that does aggressive parallel loading to a couple of tables. It ran ok several dozen times and froze up with no external trigger. There were at most 8 active

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 4:33 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote: On my workstation today (running vanilla 9.4.0) I was testing some new code that does aggressive parallel loading to a couple of tables. It ran ok several

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 2:29 PM, Merlin Moncure mmonc...@gmail.com wrote: On my workstation today (running vanilla 9.4.0) I was testing some new code that does aggressive parallel loading to a couple of tables. Could you give more details, please? For example, I'd like to see representative

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: I'm inclined to think that this is a livelock, and so the problem isn't evident from the structure of the B-Tree, but it can't hurt to check. My guess is rather that it's contention on the freelist lock via StrategyGetBuffer's. I've seen

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: I'm inclined to think that this is a livelock, and so the problem isn't evident from the structure of the B-Tree, but it can't hurt to check. My guess is rather

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote: On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: I'm inclined to think that this is a livelock, and so the problem isn't evident from the structure of the

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund and...@2ndquadrant.com wrote: My guess is rather that it's contention on the freelist lock via StrategyGetBuffer's. I've seen profiles like this due to exactly that before - and it fits to parallel loading quite well. I'm not saying you're wrong,

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:49 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund and...@2ndquadrant.com wrote: My guess is rather that it's contention on the freelist lock via StrategyGetBuffer's. I've seen profiles like this due to exactly that before -

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 15:49:33 -0800, Peter Geoghegan wrote: On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund and...@2ndquadrant.com wrote: My guess is rather that it's contention on the freelist lock via StrategyGetBuffer's. I've seen profiles like this due to exactly that before - and it fits to

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure mmonc...@gmail.com wrote: I don't remember seeing _bt_moveright() or _bt_compare() figuring so prominently, where _bt_binsrch() is nowhere to be seen. I can't see a reference to _bt_binsrch() in either profile. hm, this is hand compiled now, I

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure mmonc...@gmail.com wrote: I don't remember seeing _bt_moveright() or _bt_compare() figuring so prominently, where _bt_binsrch() is nowhere to be seen. I can't see a

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:42 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote: On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: I'm inclined to think that this

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes: On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan p...@heroku.com wrote: In case it isn't clear, I think that the proximate cause here may well be either one (or both) of commits efada2b8e920adfdf7418862e939925d2acd1b89 and/or

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 19:05:10 -0500, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan p...@heroku.com wrote: In case it isn't clear, I think that the proximate cause here may well be either one (or both) of commits