Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Wed, Jan 28, 2015 at 12:47 PM, Tom Lane wrote: > Merlin Moncure writes: >> ...hm, I spoke to soon. So I deleted everything, and booted up a new >> instance 9.4 vanilla with asserts on and took no other action. >> Applying the script with no data activity fails an assertion every >> single tim

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Tom Lane
Merlin Moncure writes: > ...hm, I spoke to soon. So I deleted everything, and booted up a new > instance 9.4 vanilla with asserts on and took no other action. > Applying the script with no data activity fails an assertion every > single time: > TRAP: FailedAssertion("!(flags & 0x0010)", File: "d

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Wed, Jan 28, 2015 at 8:05 AM, Merlin Moncure wrote: > On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure wrote: >> I still haven't categorically ruled out pl/sh yet; that's something to >> keep in mind. > > Well, after bisection proved not to be fruitful, I replaced the pl/sh > calls with dummy c

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure wrote: > I still haven't categorically ruled out pl/sh yet; that's something to > keep in mind. Well, after bisection proved not to be fruitful, I replaced the pl/sh calls with dummy calls that approximated the same behavior and the problem went awa

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-24 Thread Martijn van Oosterhout
On Thu, Jan 22, 2015 at 03:50:03PM -0600, Merlin Moncure wrote: > Quick update: not done yet, but I'm making consistent progress, with > several false starts. (for example, I had a .conf problem with the > new dynamic shared memory setting and git merrily bisected down to the > introduction of th

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-23 Thread Peter Geoghegan
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure wrote: > Quick update: not done yet, but I'm making consistent progress, with > several false starts. (for example, I had a .conf problem with the > new dynamic shared memory setting and git merrily bisected down to the > introduction of the featur

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-23 Thread Jeff Janes
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure wrote: > > So far, the 'nasty' damage seems to generally if not always follow a > checksum failure and the checksum failures are always numerically > adjacent. For example: > > [cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING: page > verificat

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-22 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 5:20 PM, Peter Geoghegan wrote: > On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure wrote: >> ISTM the next step is to bisect the problem down over the weekend in >> order to to narrow the search. If that doesn't turn up anything >> productive I'll look into taking other s

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Peter Geoghegan
On Fri, Jan 16, 2015 at 6:21 AM, Heikki Linnakangas wrote: > It looks very much like that a page has for some reason been moved to a > different block number. And that's exactly what Peter found out in his > investigation too; an index page was mysteriously copied to a different > block with ident

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Peter Geoghegan
On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure wrote: > ISTM the next step is to bisect the problem down over the weekend in > order to to narrow the search. If that doesn't turn up anything > productive I'll look into taking other steps. That might be the quickest way to do it, provided you c

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund wrote: > Hi, > > On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote: >> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: >> > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: >> >> Running this test on another set of hardware to verify

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund wrote: > Is there any chance you can package this somehow so that others can run > it locally? It looks hard to find the actual bug here without adding > instrumentation to to postgres. That's possible but involves a lot of complexity in the setup be

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Andres Freund
Hi, On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: > > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: > >> Running this test on another set of hardware to verify -- if this > >> turns out to be a false alarm which it may very

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Heikki Linnakangas
On 01/16/2015 04:05 PM, Merlin Moncure wrote: On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: Running this test on another set of hardware to verify -- if this turns out to be a false alarm which it may very well be, I can only of

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 8:05 AM, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: >> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: >>> Running this test on another set of hardware to verify -- if this >>> turns out to be a false alarm which it may very wel

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan wrote: > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: >> Running this test on another set of hardware to verify -- if this >> turns out to be a false alarm which it may very well be, I can only >> offer my apologies! I've never had a new

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Peter Geoghegan
On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure wrote: > Running this test on another set of hardware to verify -- if this > turns out to be a false alarm which it may very well be, I can only > offer my apologies! I've never had a new drive fail like that, in > that manner. I'll burn the other

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 4:03 PM, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure wrote: >> Since it's possible the database is a loss, do you see any value in >> bootstrappinng it again with checksums turned on? One point of note >> is that this is a brand spanking new SS

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure wrote: > Since it's possible the database is a loss, do you see any value in > bootstrappinng it again with checksums turned on? One point of note > is that this is a brand spanking new SSD, maybe we nee to rule out > hardware based corruption? hm!

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 1:15 PM, Andres Freund wrote: > Hi, > >> The plot thickens! I looped the test, still stock 9.4 as of this time >> and went to lunch. When I came back, the database was in recovery >> mode. Here is the rough sequence of events. >> > > Whoa. That looks scary. Did you see (s

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Andres Freund
On 2015-01-15 20:15:42 +0100, Andres Freund wrote: > > WARNING: did not find subXID 14955 in MyProc > > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > > during exception cleanup > > WARNING: you don't own a lock of type RowExclusiveLock > > CONTEXT: PL/pgSQL function cdsrecon

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Andres Freund
Hi, > The plot thickens! I looped the test, still stock 9.4 as of this time > and went to lunch. When I came back, the database was in recovery > mode. Here is the rough sequence of events. > Whoa. That looks scary. Did you see (some of) those errors before? Most of them should have been emitte

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 8:02 AM, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas > wrote: >> On 01/15/2015 03:23 AM, Peter Geoghegan wrote: >>> >>> So now the question is: how did that inconsistency arise? It didn't >>> necessarily arise at the time of the (presumed) s

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Peter Geoghegan
On Thu, Jan 15, 2015 at 6:02 AM, Merlin Moncure wrote: > Question: Coming in this morning I did an immediate restart and logged > into the database and queried pg_class via index. Everything was > fine, and the leftright verify returns nothing. How did it repair > itself without a reindex? May

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Merlin Moncure
On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas wrote: > On 01/15/2015 03:23 AM, Peter Geoghegan wrote: >> >> So now the question is: how did that inconsistency arise? It didn't >> necessarily arise at the time of the (presumed) split of block 2 to >> create 9. It could be that the opaque area

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Heikki Linnakangas
On 01/15/2015 03:23 AM, Peter Geoghegan wrote: So now the question is: how did that inconsistency arise? It didn't necessarily arise at the time of the (presumed) split of block 2 to create 9. It could be that the opaque area was changed by something else, some time later. I'll investigate more.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-15 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 8:50 PM, Peter Geoghegan wrote: > I am mistaken on one detail here - blocks 2 and 9 are actually fully > identical. I still have no idea why, though. So, I've looked at it in more detail and it appears that the page of block 2 split at some point, thereby creating a new pa

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan wrote: > My immediate observation here is that blocks 2 and 9 have identical > metadata (from their page opaque area), but partially non-matching > data items (however, the number of items on each block is consistent > and correct according to that

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan wrote: > My immediate observation here is that blocks 2 and 9 have identical > metadata (from their page opaque area), but partially non-matching > data items (however, the number of items on each block is consistent > and correct according to that

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 4:53 PM, Merlin Moncure wrote: > yeah. via: > cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index', > s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv > header; My immediate observation here is that blocks 2 and 9 have identical metadata (fr

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 6:50 PM, Peter Geoghegan wrote: > This is great, but it's not exactly clear which bt_page_items() page > is which - some are skipped, but I can't be sure which. Would you mind > rewriting that query to indicate which block is under consideration by > bt_page_items()? yeah.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
This is great, but it's not exactly clear which bt_page_items() page is which - some are skipped, but I can't be sure which. Would you mind rewriting that query to indicate which block is under consideration by bt_page_items()? Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 6:26 PM, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan wrote: >> On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure wrote: >>> (gdb) print BufferGetBlockNumber(buf) >>> $15 = 9 >>> >>> ..and it stays 9, continuing several times having set breakpoi

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 4:26 PM, Merlin Moncure wrote: > The index is the oid index on pg_class. Some more info: > > *) temp table churn is fairly high. Several dozen get spawned and > destroted at the start of a replication run, all at once, due to some > dodgy coding via dblink. During the re

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan wrote: > On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure wrote: >> (gdb) print BufferGetBlockNumber(buf) >> $15 = 9 >> >> ..and it stays 9, continuing several times having set breakpoint. > > > And the index involved? I'm pretty sure that this in

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure wrote: > (gdb) print BufferGetBlockNumber(buf) > $15 = 9 > > ..and it stays 9, continuing several times having set breakpoint. And the index involved? I'm pretty sure that this in an internal page, no? -- Peter Geoghegan -- Sent via pgsql-hac

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 2:32 PM, Peter Geoghegan wrote: > On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan wrote: >> Could you write some code to print out the block number (i.e. >> "BlockNumber blkno") if there are more than, say, 5 retries within >> _bt_moveright()? > > Obviously I mean that t

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan wrote: > Could you write some code to print out the block number (i.e. > "BlockNumber blkno") if there are more than, say, 5 retries within > _bt_moveright()? Obviously I mean that the block number should be printed, no matter whether or not the P

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 11:49 AM, Merlin Moncure wrote: > so it looks like nobody ever exits from _bt_moveright. any last > requests before I start bisecting down? Could you write some code to print out the block number (i.e. "BlockNumber blkno") if there are more than, say, 5 retries within _b

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund wrote: > On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: >> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund >> wrote: >> > If you gdb in, and type 'fin' a couple times, to wait till the function >> > finishes, is there actually any progress? I'm

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan
On Wed, Jan 14, 2015 at 7:22 AM, Merlin Moncure wrote: > I'll try to pull commits that Peter suggested and see if that helps > (I'm getting ready to bring the database down). I can send the code > off-list if you guys think it'd help. Thanks for the code! I think it would be interesting to see

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund wrote: > > If you gdb in, and type 'fin' a couple times, to wait till the function > > finishes, is there actually any progress? I'm wondering whether it's > > just many catalog accesses + contenti

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund wrote: > If you gdb in, and type 'fin' a couple times, to wait till the function > finishes, is there actually any progress? I'm wondering whether it's > just many catalog accesses + contention, or some other > problem. Alternatively set a breakpoint

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 09:22:45 -0600, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund wrote: > > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: > >> Merlin Moncure writes: > >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: > >> >> What are the autovac processes doing (accordin

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund wrote: > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: >> Merlin Moncure writes: >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: >> >> What are the autovac processes doing (according to pg_stat_activity)? >> >> > pid,running,waiting,query >> >

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 10:13:32 -0500, Tom Lane wrote: > Merlin Moncure writes: > > Yes, it is pg_class is coming from LockBufferForCleanup (). As you > > can see above, it has a shorter runtime. So it was killed off once > > about a half hour ago which did not free up the logjam. However, AV > > spaw

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Andres Freund writes: > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: >> Hah, I suspected as much. Is that the one that's stuck in >> LockBufferForCleanup, or the other one that's got a similar backtrace >> to all the user processes? > Do you have a theory? Right now it primarily looks like cont

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Merlin Moncure writes: > Yes, it is pg_class is coming from LockBufferForCleanup (). As you > can see above, it has a shorter runtime. So it was killed off once > about a half hour ago which did not free up the logjam. However, AV > spawned it again and now it does not respond to cancel. Int

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund
On 2015-01-14 10:05:01 -0500, Tom Lane wrote: > Merlin Moncure writes: > > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: > >> What are the autovac processes doing (according to pg_stat_activity)? > > > pid,running,waiting,query > > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 9:05 AM, Tom Lane wrote: > Merlin Moncure writes: >> On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: >>> What are the autovac processes doing (according to pg_stat_activity)? > >> pid,running,waiting,query >> 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.p

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Merlin Moncure writes: > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: >> What are the autovac processes doing (according to pg_stat_activity)? > pid,running,waiting,query > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class Hah, I suspected as much. Is that the one that'

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: > Merlin Moncure writes: >> There were seven process with that backtrace exact backtrace (except >> that randomly they are sleeping in the spinloop). Something else >> interesting: autovacuum has been running all night as well. Unlike >> the oth

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Tom Lane
Merlin Moncure writes: > There were seven process with that backtrace exact backtrace (except > that randomly they are sleeping in the spinloop). Something else > interesting: autovacuum has been running all night as well. Unlike > the other process however, cpu utilization does not register on

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Wed, Jan 14, 2015 at 8:03 AM, Merlin Moncure wrote: > Here's a backtrace: > > #0 0x00750a97 in spin_delay () > #1 0x00750b19 in s_lock () > #2 0x00750844 in LWLockRelease () > #3 0x0073 in LockBuffer () > #4 0x004b2db4 in _bt_relandgetbuf () > #5

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 7:24 PM, Peter Geoghegan wrote: > On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure wrote: >> Some more information what's happening: >> This is a ghetto logical replication engine that migrates data from >> sql sever to postgres, consolidating a sharded database into a sing

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure wrote: > Some more information what's happening: > This is a ghetto logical replication engine that migrates data from > sql sever to postgres, consolidating a sharded database into a single > set of tables (of which there are only two). There is onl

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:54 PM, Andres Freund wrote: >> I don't remember seeing _bt_moveright() or _bt_compare() figuring so >> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a >> reference to _bt_binsrch() in either profile. > > Well, we do a _bt_moveright pretty early on,

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 4:05 PM, Tom Lane wrote: > I'm not convinced that Peter is barking up the right tree. I'm noticing > that the profiles seem rather skewed towards parser/planner work; so I > suspect the contention is probably on access to system catalogs. No > idea exactly why though. I

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 19:05:10 -0500, Tom Lane wrote: > Merlin Moncure writes: > > On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan wrote: > >> In case it isn't clear, I think that the proximate cause here may well > >> be either one (or both) of commits > >> efada2b8e920adfdf7418862e939925d2acd1b89 and/

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Tom Lane
Merlin Moncure writes: > On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan wrote: >> In case it isn't clear, I think that the proximate cause here may well >> be either one (or both) of commits >> efada2b8e920adfdf7418862e939925d2acd1b89 and/or >> 40dae7ec537c5619fc93ad602c62f37be786d161. Probably

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan wrote: > On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure wrote: >>> I don't remember seeing _bt_moveright() or _bt_compare() figuring so >>> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a >>> reference to _bt_binsrch() in ei

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:42 PM, Andres Freund wrote: > On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote: >> On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund >> wrote: >> > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: >> >> I'm inclined to think that this is a livelock, and so the proble

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure wrote: >> I don't remember seeing _bt_moveright() or _bt_compare() figuring so >> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a >> reference to _bt_binsrch() in either profile. > > hm, this is hand compiled now, I bet the sym

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 15:49:33 -0800, Peter Geoghegan wrote: > On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund wrote: > > My guess is rather that it's contention on the freelist lock via > > StrategyGetBuffer's. I've seen profiles like this due to exactly that > > before - and it fits to parallel loading q

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:49 PM, Peter Geoghegan wrote: > On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund wrote: >> My guess is rather that it's contention on the freelist lock via >> StrategyGetBuffer's. I've seen profiles like this due to exactly that >> before - and it fits to parallel loading

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund wrote: > My guess is rather that it's contention on the freelist lock via > StrategyGetBuffer's. I've seen profiles like this due to exactly that > before - and it fits to parallel loading quite well. I'm not saying you're wrong, but the breakdown of

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote: > On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund wrote: > > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: > >> I'm inclined to think that this is a livelock, and so the problem > >> isn't evident from the structure of the B-Tree, but it ca

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund wrote: > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: >> I'm inclined to think that this is a livelock, and so the problem >> isn't evident from the structure of the B-Tree, but it can't hurt to >> check. > > My guess is rather that it's conte

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: > I'm inclined to think that this is a livelock, and so the problem > isn't evident from the structure of the B-Tree, but it can't hurt to > check. My guess is rather that it's contention on the freelist lock via StrategyGetBuffer's. I've seen p

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Peter Geoghegan
On Tue, Jan 13, 2015 at 2:29 PM, Merlin Moncure wrote: > On my workstation today (running vanilla 9.4.0) I was testing some new > code that does aggressive parallel loading to a couple of tables. Could you give more details, please? For example, I'd like to see representative data, or at least th

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On Tue, Jan 13, 2015 at 4:33 PM, Andres Freund wrote: > Hi, > > On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote: >> On my workstation today (running vanilla 9.4.0) I was testing some new >> code that does aggressive parallel loading to a couple of tables. It >> ran ok several dozen times and fr

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Andres Freund
Hi, On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote: > On my workstation today (running vanilla 9.4.0) I was testing some new > code that does aggressive parallel loading to a couple of tables. It > ran ok several dozen times and froze up with no external trigger. > There were at most 8 active

[HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-13 Thread Merlin Moncure
On my workstation today (running vanilla 9.4.0) I was testing some new code that does aggressive parallel loading to a couple of tables. It ran ok several dozen times and froze up with no external trigger. There were at most 8 active backends that were stuck (the loader is threaded to a cap) -- eac

[HACKERS] Hung backends

2000-11-23 Thread Schmidt, Peter
Title: Hung backends Hi, I'm new to PostgreSQL and have been asked to determine the cause of what appear to be hung processes on FreeBSD after one or more frontend apps crash. I did alot of searching through the msg lists and found a few discussions that seem related, but I was unable to find