Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>
> > I'm thinking about the comparison of full infomask as you propose
> > instead of just the bits that we actually care about. I think the only
> > thing that could cause a spurious failure (causing an extra execution of
> > the HeapTupleSatisfies
Alvaro Herrera wrote:
> I'm thinking about the comparison of full infomask as you propose
> instead of just the bits that we actually care about. I think the only
> thing that could cause a spurious failure (causing an extra execution of
> the HeapTupleSatisfiesUpdate call and the stuff below) i
On 04/22/2014 05:07 PM, Alvaro Herrera wrote:
> If you want to make it easier to reproduce, you need to insert some
> pg_usleep() calls in carefully selected spots. As Andres says, the
> window is small normally.
Yeah, but the whole point of this is that having
pg_stat-statements/auto_explain loa
Andres Freund wrote:
> On 2014-04-22 18:01:40 -0300, Alvaro Herrera wrote:
> > Thanks for the analysis and patches. I've been playing with this on my
> > own a bit, and one thing that I just noticed is that at least for
> > heap_update I cannot reproduce a problem when the xmax is originally a
> >
Josh Berkus wrote:
>
> >> In order to encounter this issue, I'd need to have two concurrent
> >> processes update the child records of the same parent record? That is:
> >>
> >> A ---> B1
> >> \---> B2
> >>
> >> ... and the issue should only happen if I update both B1 and B2
> >> concurrently i
On 2014-04-22 14:49:00 -0700, Josh Berkus wrote:
>
> >> In order to encounter this issue, I'd need to have two concurrent
> >> processes update the child records of the same parent record? That is:
> >>
> >> A ---> B1
> >> \---> B2
> >>
> >> ... and the issue should only happen if I update both
>> In order to encounter this issue, I'd need to have two concurrent
>> processes update the child records of the same parent record? That is:
>>
>> A ---> B1
>> \---> B2
>>
>> ... and the issue should only happen if I update both B1 and B2
>> concurrently in separate sessions?
>
> I don't thi
On 2014-04-22 18:01:40 -0300, Alvaro Herrera wrote:
> Thanks for the analysis and patches. I've been playing with this on my
> own a bit, and one thing that I just noticed is that at least for
> heap_update I cannot reproduce a problem when the xmax is originally a
> multixact, so AFAICT the numbe
On 2014-04-22 14:40:46 -0700, Josh Berkus wrote:
> On 04/22/2014 02:01 PM, Alvaro Herrera wrote:
> > Some testing later, I think the issue only occurs if we determine that
> > we don't need to wait for the xid/multi to complete, because otherwise
> > the wait itself saves us. (It's easy to cause t
On 2014-04-22 17:36:42 -0400, Andrew Dunstan wrote:
>
> On 04/22/2014 05:20 PM, Josh Berkus wrote:
> >On 04/22/2014 02:01 PM, Alvaro Herrera wrote:
> >>I think I should push this patch first, so that Andrew and Josh can try
> >>their respective test cases which should start throwing errors, then
>
On 04/22/2014 02:01 PM, Alvaro Herrera wrote:
> Some testing later, I think the issue only occurs if we determine that
> we don't need to wait for the xid/multi to complete, because otherwise
> the wait itself saves us. (It's easy to cause the problem by adding a
> breakpoint in heapam.c:3325, i.e
On 04/22/2014 05:20 PM, Josh Berkus wrote:
On 04/22/2014 02:01 PM, Alvaro Herrera wrote:
I think I should push this patch first, so that Andrew and Josh can try
their respective test cases which should start throwing errors, then
push the actual fixes. Does that sound okay?
Note that I have a
On 04/22/2014 02:01 PM, Alvaro Herrera wrote:
> I think I should push this patch first, so that Andrew and Josh can try
> their respective test cases which should start throwing errors, then
> push the actual fixes. Does that sound okay?
Note that I have a limited ability to actually test my fail
Andres Freund wrote:
> On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote:
> >
> > On 04/21/2014 02:54 PM, Andres Freund wrote:
> > >Hi,
> > >
> > >I spent the last two hours poking arounds in the environment Andrew
> > >provided and I was able to reproduce the issue, find a assert to
> > >reprodu
On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote:
>
> On 04/21/2014 02:54 PM, Andres Freund wrote:
> >Hi,
> >
> >I spent the last two hours poking arounds in the environment Andrew
> >provided and I was able to reproduce the issue, find a assert to
> >reproduce it much faster and find a possible
All,
I've taken a stab at creating a reproduceable test case based on the
characterisitics of the production issues I'm seeing. But clearly
there's an element I'm missing, because I'm not able to produce the bug
with a pgbench-based test case.
My current test has FKs, updating both FK'd tables,
On 04/21/2014 02:54 PM, Andres Freund wrote:
Hi,
I spent the last two hours poking arounds in the environment Andrew
provided and I was able to reproduce the issue, find a assert to
reproduce it much faster and find a possible root cause.
What's the assert that makes it happen faster? That m
> Josh, how long does it take you to reproduce the issue?
A couple hours.
> And can you
> reproduce it outside of a production environment?
Not yet, still working on that.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@pos
On 2014-04-21 12:31:09 -0700, Josh Berkus wrote:
> On 04/21/2014 12:26 PM, Tom Lane wrote:
> > Andres Freund writes:
> >> I spent the last two hours poking arounds in the environment Andrew
> >> provided and I was able to reproduce the issue, find a assert to
> >> reproduce it much faster and find
On 2014-04-21 15:26:03 -0400, Tom Lane wrote:
> Andres Freund writes:
> > I spent the last two hours poking arounds in the environment Andrew
> > provided and I was able to reproduce the issue, find a assert to
> > reproduce it much faster and find a possible root cause.
>
> Hmm ... is this the s
On 04/21/2014 03:26 PM, Tom Lane wrote:
Andres Freund writes:
I spent the last two hours poking arounds in the environment Andrew
provided and I was able to reproduce the issue, find a assert to
reproduce it much faster and find a possible root cause.
Hmm ... is this the same thing Josh is rep
On 04/21/2014 12:26 PM, Tom Lane wrote:
> Andres Freund writes:
>> I spent the last two hours poking arounds in the environment Andrew
>> provided and I was able to reproduce the issue, find a assert to
>> reproduce it much faster and find a possible root cause.
>
> Hmm ... is this the same thing
Andres Freund writes:
> I spent the last two hours poking arounds in the environment Andrew
> provided and I was able to reproduce the issue, find a assert to
> reproduce it much faster and find a possible root cause.
Hmm ... is this the same thing Josh is reporting? If so, why the
apparent conn
Hi,
I spent the last two hours poking arounds in the environment Andrew
provided and I was able to reproduce the issue, find a assert to
reproduce it much faster and find a possible root cause.
Since the symptom of the problem seem to be multixacts with more than
one updating xid, I added a check
Josh Berkus writes:
> 1) I've confirmed at the 2nd site that the issue doesn't happen if
> pg_stat_statements.so is not loaded. So this seems to be confirmation
> that either auto_explain, pg_stat_statements, or both need to be loaded
> (but not necessarily created as extensions) in order to have
> Can you get the infomask bits..? What's does pg_controldata report wrt
> the MultiXid's?
Can't get the infomask bits.
pg_controldata attached, with some redactions. Unfortunately, it
appears that they've continued to do tests on this system, so the XID
counter has advanced somewhat.
pg_cont
* Josh Berkus (j...@agliodbs.com) wrote:
> 1) I've confirmed at the 2nd site that the issue doesn't happen if
> pg_stat_statements.so is not loaded. So this seems to be confirmation
> that either auto_explain, pg_stat_statements, or both need to be loaded
> (but not necessarily created as extensio
All,
More on this:
1) I've confirmed at the 2nd site that the issue doesn't happen if
pg_stat_statements.so is not loaded. So this seems to be confirmation
that either auto_explain, pg_stat_statements, or both need to be loaded
(but not necessarily created as extensions) in order to have the iss
On 04/18/2014 09:42 AM, Andrew Dunstan wrote:
> There definitely seems to be something going on involving these two
> pre-loaded modules. With both auto_explain and pg_stat_statements
> preloaded I can reproduce the error fairly reliably. I have also
> reproduced it, but less reliably, with auto_e
On 04/17/2014 10:15 AM, Andrew Dunstan wrote:
On 04/16/2014 10:28 PM, Tom Lane wrote:
Andrew Dunstan writes:
On 04/16/2014 07:19 PM, Tom Lane wrote:
Yeah, it would be real nice to see a self-contained test case for
this.
Well, that might be hard to put together, but I did try running witho
On 04/17/2014 09:04 PM, Peter Geoghegan wrote:
On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan wrote:
track_activity_query_size = 10240
shared_preload_libraries = 'auto_explain,pg_stat_statements'
As you can see, auto_explain's log_min_duration hasn't been set, so it
shouldn't be doing
On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan wrote:
> track_activity_query_size = 10240
>shared_preload_libraries = 'auto_explain,pg_stat_statements'
>
> As you can see, auto_explain's log_min_duration hasn't been set, so it
> shouldn't be doing anything very much, I should think.
track_a
All,
So have encountered a 2nd report of this issue, or of an issue which
sounds very similar:
- corruption in two "queue" tables
- the tables are written in a high-concurrency, lock-contested environment
- user uses SELECT FOR UPDATE with these tables.
- pg_stat_statements .so is loaded, but
On 04/16/2014 10:28 PM, Tom Lane wrote:
Andrew Dunstan writes:
On 04/16/2014 07:19 PM, Tom Lane wrote:
Yeah, it would be real nice to see a self-contained test case for this.
Well, that might be hard to put together, but I did try running without
pg_stat_statements and auto_explain loaded an
Andrew Dunstan wrote:
>
> On 04/16/2014 07:19 PM, Tom Lane wrote:
> >Alvaro Herrera writes:
> >>I'm not quite clear on why the third query, the one in ri_PerformCheck,
> >>is invoking a sequence.
> >It's not --- SeqNext is the next-tuple function for a sequential scan.
> >Nothing to do with seque
Andrew Dunstan writes:
> On 04/16/2014 07:19 PM, Tom Lane wrote:
>> Yeah, it would be real nice to see a self-contained test case for this.
> Well, that might be hard to put together, but I did try running without
> pg_stat_statements and auto_explain loaded and the error did not occur.
> Not s
On 04/16/2014 07:19 PM, Tom Lane wrote:
Alvaro Herrera writes:
I'm not quite clear on why the third query, the one in ri_PerformCheck,
is invoking a sequence.
It's not --- SeqNext is the next-tuple function for a sequential scan.
Nothing to do with sequences.
Now, it *is* worth wondering why
Alvaro Herrera writes:
> I'm not quite clear on why the third query, the one in ri_PerformCheck,
> is invoking a sequence.
It's not --- SeqNext is the next-tuple function for a sequential scan.
Nothing to do with sequences.
Now, it *is* worth wondering why the heck a query on the table's primary
So, from top to bottom I see the following elements:
* backend is executing a query
* this query is getting captured by pg_stat_statements
* the query is also getting captured by autoexplain, in chain from
pg_stat_statements
* autoexplain runs the query, which invokes a plpgsql function
* this
On 04/14/2014 10:02 PM, Alvaro Herrera wrote:
Andrew Dunstan wrote:
and here the stack trace:
#0 0x00361ba36285 in __GI_raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00361ba37b9b in __GI_abort () at abort.c:91
#2 0x0075c157 in ExceptionalC
Andrew Dunstan wrote:
> and here the stack trace:
>
>#0 0x00361ba36285 in __GI_raise (sig=6) at
>../nptl/sysdeps/unix/sysv/linux/raise.c:64
>#1 0x00361ba37b9b in __GI_abort () at abort.c:91
>#2 0x0075c157 in ExceptionalCondition
>(conditionName=, errorType=,
On 04/14/2014 09:28 PM, Andrew Dunstan wrote:
With a client's code I have just managed to produce the following
assertion failure on 9.3.4:
2014-04-15 01:02:46 GMT [19854] 76299: LOG: execute :
select * from "asp_ins_event_task_log"( job_id:=1, event_id:=3164,
task_name:='EventUtcC
With a client's code I have just managed to produce the following
assertion failure on 9.3.4:
2014-04-15 01:02:46 GMT [19854] 76299: LOG: execute :
select * from "asp_ins_event_task_log"( job_id:=1, event_id:=3164,
task_name:='EventUtcComputeTask', task_status_code:='VALID'
, task
43 matches
Mail list logo