Re: [HACKERS] assertion failure 9.3.4

2014-04-24 Thread Alvaro Herrera
Alvaro Herrera wrote: > Alvaro Herrera wrote: > > > I'm thinking about the comparison of full infomask as you propose > > instead of just the bits that we actually care about. I think the only > > thing that could cause a spurious failure (causing an extra execution of > > the HeapTupleSatisfies

Re: [HACKERS] assertion failure 9.3.4

2014-04-23 Thread Alvaro Herrera
Alvaro Herrera wrote: > I'm thinking about the comparison of full infomask as you propose > instead of just the bits that we actually care about. I think the only > thing that could cause a spurious failure (causing an extra execution of > the HeapTupleSatisfiesUpdate call and the stuff below) i

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
On 04/22/2014 05:07 PM, Alvaro Herrera wrote: > If you want to make it easier to reproduce, you need to insert some > pg_usleep() calls in carefully selected spots. As Andres says, the > window is small normally. Yeah, but the whole point of this is that having pg_stat-statements/auto_explain loa

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Alvaro Herrera
Andres Freund wrote: > On 2014-04-22 18:01:40 -0300, Alvaro Herrera wrote: > > Thanks for the analysis and patches. I've been playing with this on my > > own a bit, and one thing that I just noticed is that at least for > > heap_update I cannot reproduce a problem when the xmax is originally a > >

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Alvaro Herrera
Josh Berkus wrote: > > >> In order to encounter this issue, I'd need to have two concurrent > >> processes update the child records of the same parent record? That is: > >> > >> A ---> B1 > >> \---> B2 > >> > >> ... and the issue should only happen if I update both B1 and B2 > >> concurrently i

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 14:49:00 -0700, Josh Berkus wrote: > > >> In order to encounter this issue, I'd need to have two concurrent > >> processes update the child records of the same parent record? That is: > >> > >> A ---> B1 > >> \---> B2 > >> > >> ... and the issue should only happen if I update both

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
>> In order to encounter this issue, I'd need to have two concurrent >> processes update the child records of the same parent record? That is: >> >> A ---> B1 >> \---> B2 >> >> ... and the issue should only happen if I update both B1 and B2 >> concurrently in separate sessions? > > I don't thi

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 18:01:40 -0300, Alvaro Herrera wrote: > Thanks for the analysis and patches. I've been playing with this on my > own a bit, and one thing that I just noticed is that at least for > heap_update I cannot reproduce a problem when the xmax is originally a > multixact, so AFAICT the numbe

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 14:40:46 -0700, Josh Berkus wrote: > On 04/22/2014 02:01 PM, Alvaro Herrera wrote: > > Some testing later, I think the issue only occurs if we determine that > > we don't need to wait for the xid/multi to complete, because otherwise > > the wait itself saves us. (It's easy to cause t

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 17:36:42 -0400, Andrew Dunstan wrote: > > On 04/22/2014 05:20 PM, Josh Berkus wrote: > >On 04/22/2014 02:01 PM, Alvaro Herrera wrote: > >>I think I should push this patch first, so that Andrew and Josh can try > >>their respective test cases which should start throwing errors, then >

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
On 04/22/2014 02:01 PM, Alvaro Herrera wrote: > Some testing later, I think the issue only occurs if we determine that > we don't need to wait for the xid/multi to complete, because otherwise > the wait itself saves us. (It's easy to cause the problem by adding a > breakpoint in heapam.c:3325, i.e

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andrew Dunstan
On 04/22/2014 05:20 PM, Josh Berkus wrote: On 04/22/2014 02:01 PM, Alvaro Herrera wrote: I think I should push this patch first, so that Andrew and Josh can try their respective test cases which should start throwing errors, then push the actual fixes. Does that sound okay? Note that I have a

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
On 04/22/2014 02:01 PM, Alvaro Herrera wrote: > I think I should push this patch first, so that Andrew and Josh can try > their respective test cases which should start throwing errors, then > push the actual fixes. Does that sound okay? Note that I have a limited ability to actually test my fail

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Alvaro Herrera
Andres Freund wrote: > On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote: > > > > On 04/21/2014 02:54 PM, Andres Freund wrote: > > >Hi, > > > > > >I spent the last two hours poking arounds in the environment Andrew > > >provided and I was able to reproduce the issue, find a assert to > > >reprodu

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund
On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote: > > On 04/21/2014 02:54 PM, Andres Freund wrote: > >Hi, > > > >I spent the last two hours poking arounds in the environment Andrew > >provided and I was able to reproduce the issue, find a assert to > >reproduce it much faster and find a possible

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
All, I've taken a stab at creating a reproduceable test case based on the characterisitics of the production issues I'm seeing. But clearly there's an element I'm missing, because I'm not able to produce the bug with a pgbench-based test case. My current test has FKs, updating both FK'd tables,

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andrew Dunstan
On 04/21/2014 02:54 PM, Andres Freund wrote: Hi, I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root cause. What's the assert that makes it happen faster? That m

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
> Josh, how long does it take you to reproduce the issue? A couple hours. > And can you > reproduce it outside of a production environment? Not yet, still working on that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@pos

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund
On 2014-04-21 12:31:09 -0700, Josh Berkus wrote: > On 04/21/2014 12:26 PM, Tom Lane wrote: > > Andres Freund writes: > >> I spent the last two hours poking arounds in the environment Andrew > >> provided and I was able to reproduce the issue, find a assert to > >> reproduce it much faster and find

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund
On 2014-04-21 15:26:03 -0400, Tom Lane wrote: > Andres Freund writes: > > I spent the last two hours poking arounds in the environment Andrew > > provided and I was able to reproduce the issue, find a assert to > > reproduce it much faster and find a possible root cause. > > Hmm ... is this the s

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andrew Dunstan
On 04/21/2014 03:26 PM, Tom Lane wrote: Andres Freund writes: I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root cause. Hmm ... is this the same thing Josh is rep

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
On 04/21/2014 12:26 PM, Tom Lane wrote: > Andres Freund writes: >> I spent the last two hours poking arounds in the environment Andrew >> provided and I was able to reproduce the issue, find a assert to >> reproduce it much faster and find a possible root cause. > > Hmm ... is this the same thing

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Tom Lane
Andres Freund writes: > I spent the last two hours poking arounds in the environment Andrew > provided and I was able to reproduce the issue, find a assert to > reproduce it much faster and find a possible root cause. Hmm ... is this the same thing Josh is reporting? If so, why the apparent conn

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund
Hi, I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root cause. Since the symptom of the problem seem to be multixacts with more than one updating xid, I added a check

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Tom Lane
Josh Berkus writes: > 1) I've confirmed at the 2nd site that the issue doesn't happen if > pg_stat_statements.so is not loaded. So this seems to be confirmation > that either auto_explain, pg_stat_statements, or both need to be loaded > (but not necessarily created as extensions) in order to have

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
> Can you get the infomask bits..? What's does pg_controldata report wrt > the MultiXid's? Can't get the infomask bits. pg_controldata attached, with some redactions. Unfortunately, it appears that they've continued to do tests on this system, so the XID counter has advanced somewhat. pg_cont

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote: > 1) I've confirmed at the 2nd site that the issue doesn't happen if > pg_stat_statements.so is not loaded. So this seems to be confirmation > that either auto_explain, pg_stat_statements, or both need to be loaded > (but not necessarily created as extensio

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
All, More on this: 1) I've confirmed at the 2nd site that the issue doesn't happen if pg_stat_statements.so is not loaded. So this seems to be confirmation that either auto_explain, pg_stat_statements, or both need to be loaded (but not necessarily created as extensions) in order to have the iss

Re: [HACKERS] assertion failure 9.3.4

2014-04-18 Thread Josh Berkus
On 04/18/2014 09:42 AM, Andrew Dunstan wrote: > There definitely seems to be something going on involving these two > pre-loaded modules. With both auto_explain and pg_stat_statements > preloaded I can reproduce the error fairly reliably. I have also > reproduced it, but less reliably, with auto_e

Re: [HACKERS] assertion failure 9.3.4

2014-04-18 Thread Andrew Dunstan
On 04/17/2014 10:15 AM, Andrew Dunstan wrote: On 04/16/2014 10:28 PM, Tom Lane wrote: Andrew Dunstan writes: On 04/16/2014 07:19 PM, Tom Lane wrote: Yeah, it would be real nice to see a self-contained test case for this. Well, that might be hard to put together, but I did try running witho

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Andrew Dunstan
On 04/17/2014 09:04 PM, Peter Geoghegan wrote: On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan wrote: track_activity_query_size = 10240 shared_preload_libraries = 'auto_explain,pg_stat_statements' As you can see, auto_explain's log_min_duration hasn't been set, so it shouldn't be doing

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan wrote: > track_activity_query_size = 10240 >shared_preload_libraries = 'auto_explain,pg_stat_statements' > > As you can see, auto_explain's log_min_duration hasn't been set, so it > shouldn't be doing anything very much, I should think. track_a

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Josh Berkus
All, So have encountered a 2nd report of this issue, or of an issue which sounds very similar: - corruption in two "queue" tables - the tables are written in a high-concurrency, lock-contested environment - user uses SELECT FOR UPDATE with these tables. - pg_stat_statements .so is loaded, but

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Andrew Dunstan
On 04/16/2014 10:28 PM, Tom Lane wrote: Andrew Dunstan writes: On 04/16/2014 07:19 PM, Tom Lane wrote: Yeah, it would be real nice to see a self-contained test case for this. Well, that might be hard to put together, but I did try running without pg_stat_statements and auto_explain loaded an

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Alvaro Herrera
Andrew Dunstan wrote: > > On 04/16/2014 07:19 PM, Tom Lane wrote: > >Alvaro Herrera writes: > >>I'm not quite clear on why the third query, the one in ri_PerformCheck, > >>is invoking a sequence. > >It's not --- SeqNext is the next-tuple function for a sequential scan. > >Nothing to do with seque

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Tom Lane
Andrew Dunstan writes: > On 04/16/2014 07:19 PM, Tom Lane wrote: >> Yeah, it would be real nice to see a self-contained test case for this. > Well, that might be hard to put together, but I did try running without > pg_stat_statements and auto_explain loaded and the error did not occur. > Not s

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Andrew Dunstan
On 04/16/2014 07:19 PM, Tom Lane wrote: Alvaro Herrera writes: I'm not quite clear on why the third query, the one in ri_PerformCheck, is invoking a sequence. It's not --- SeqNext is the next-tuple function for a sequential scan. Nothing to do with sequences. Now, it *is* worth wondering why

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Tom Lane
Alvaro Herrera writes: > I'm not quite clear on why the third query, the one in ri_PerformCheck, > is invoking a sequence. It's not --- SeqNext is the next-tuple function for a sequential scan. Nothing to do with sequences. Now, it *is* worth wondering why the heck a query on the table's primary

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Alvaro Herrera
So, from top to bottom I see the following elements: * backend is executing a query * this query is getting captured by pg_stat_statements * the query is also getting captured by autoexplain, in chain from pg_stat_statements * autoexplain runs the query, which invokes a plpgsql function * this

Re: [HACKERS] assertion failure 9.3.4

2014-04-14 Thread Andrew Dunstan
On 04/14/2014 10:02 PM, Alvaro Herrera wrote: Andrew Dunstan wrote: and here the stack trace: #0 0x00361ba36285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00361ba37b9b in __GI_abort () at abort.c:91 #2 0x0075c157 in ExceptionalC

Re: [HACKERS] assertion failure 9.3.4

2014-04-14 Thread Alvaro Herrera
Andrew Dunstan wrote: > and here the stack trace: > >#0 0x00361ba36285 in __GI_raise (sig=6) at >../nptl/sysdeps/unix/sysv/linux/raise.c:64 >#1 0x00361ba37b9b in __GI_abort () at abort.c:91 >#2 0x0075c157 in ExceptionalCondition >(conditionName=, errorType=,

Re: [HACKERS] assertion failure 9.3.4

2014-04-14 Thread Andrew Dunstan
On 04/14/2014 09:28 PM, Andrew Dunstan wrote: With a client's code I have just managed to produce the following assertion failure on 9.3.4: 2014-04-15 01:02:46 GMT [19854] 76299: LOG: execute : select * from "asp_ins_event_task_log"( job_id:=1, event_id:=3164, task_name:='EventUtcC

[HACKERS] assertion failure 9.3.4

2014-04-14 Thread Andrew Dunstan
With a client's code I have just managed to produce the following assertion failure on 9.3.4: 2014-04-15 01:02:46 GMT [19854] 76299: LOG: execute : select * from "asp_ins_event_task_log"( job_id:=1, event_id:=3164, task_name:='EventUtcComputeTask', task_status_code:='VALID' , task