Greg Smith writes:
> On 1/27/13 2:32 AM, Satoshi Nagayasu wrote:
>> This patch is intended to improve warning message at
>> AtEOXact_Buffers(), but I guess another function,
>> AtProcExit_Buffers(), needs to be modified as well. Right?
> Yes, good catch. I've attached an updated patch that does
On 1/27/13 2:32 AM, Satoshi Nagayasu wrote:
This patch is intended to improve warning message at
AtEOXact_Buffers(), but I guess another function,
AtProcExit_Buffers(), needs to be modified as well. Right?
Yes, good catch. I've attached an updated patch that does the same sort
of modificatio
Hi,
I just reviewed this patch.
https://commitfest.postgresql.org/action/patch_view?id=1035
2013/1/13 Greg Smith :
> On 12/26/12 7:23 PM, Greg Stark wrote:
>>
>> It's also possible it's a bad cpu, not bad memory. If it affects
>> decrement or increment in particular it's possible that the patter
On Sun, Jan 13, 2013 at 12:34:07AM -0500, Greg Smith wrote:
> On 12/26/12 7:23 PM, Greg Stark wrote:
> >It's also possible it's a bad cpu, not bad memory. If it affects
> >decrement or increment in particular it's possible that the pattern of
> >usage on LocalRefCount is particularly prone to trigg
On 12/26/12 7:23 PM, Greg Stark wrote:
It's also possible it's a bad cpu, not bad memory. If it affects
decrement or increment in particular it's possible that the pattern of
usage on LocalRefCount is particularly prone to triggering it.
This looks to be the winning answer. It turns out that u
On 12/23/12 3:17 PM, Simon Riggs wrote:
We already have PrintBufferLeakWarning() for this, which might be a bit neater.
It does look like basically the same info. I hacked the code to
generate this warning all the time. Patch from Andres I've been using:
WARNING: refcount of buf 1 contain
On Sun, Dec 30, 2012 at 3:07 AM, Greg Smith wrote:
> It is a strange power of two to be appearing there. I can follow your
> reasoning for why this could be a bit flipping error. There's no sign of
> that elsewhere though, no other crashes under load. I'm using this server
> here because it's w
On 30 December 2012 04:37, Robert Haas wrote:
> On Sat, Dec 29, 2012 at 10:07 PM, Greg Smith wrote:
>> It is a strange power of two to be appearing there. I can follow your
>> reasoning for why this could be a bit flipping error. There's no sign of
>> that elsewhere though, no other crashes und
On Sat, Dec 29, 2012 at 10:07 PM, Greg Smith wrote:
> It is a strange power of two to be appearing there. I can follow your
> reasoning for why this could be a bit flipping error. There's no sign of
> that elsewhere though, no other crashes under load. I'm using this server
> here because it's
On 12/27/12 7:43 AM, Greg Stark wrote:
If it's always the first buffer then it could conceivably still be
some other heap allocated object that always lands before
LocalRefCount. It does seem a bit weird to be storing 1<<30 though --
there are no 1<<30 constants that we might be storing for examp
On Thu, Dec 27, 2012 at 11:33 AM, Tom Lane wrote:
> Greg Stark writes:
>> On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane wrote:
>>> The thing that this theory has a hard time with is that the buffer's
>>> global refcount is zero. If you assume that there's a bit that
>>> sometimes randomly goes to 1
Greg Stark writes:
> On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane wrote:
>> The thing that this theory has a hard time with is that the buffer's
>> global refcount is zero. If you assume that there's a bit that
>> sometimes randomly goes to 1 when it should be 0, then what I'd expect
>> to typicall
On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane wrote:
> The thing that this theory has a hard time with is that the buffer's
> global refcount is zero. If you assume that there's a bit that
> sometimes randomly goes to 1 when it should be 0, then what I'd expect
> to typically happen is that UnpinBuff
Greg Stark writes:
> On Wed, Dec 26, 2012 at 11:47 PM, Greg Smith wrote:
>> It would be nice if this were just something like a memory issue on this
>> system. That I'm getting the same very odd value every time--this refcount
>> of 1073741824--makes it seem less random than I expect from bad me
On Wed, Dec 26, 2012 at 11:47 PM, Greg Smith wrote:
> It would be nice if this were just something like a memory issue on this
> system. That I'm getting the same very odd value every time--this refcount
> of 1073741824--makes it seem less random than I expect from bad memory.
> Once I get a few
On 12/26/12 5:28 PM, Greg Stark wrote:
Did you ever say what kind of hardware it was? This is the local
reference count so I can't see how it could be a race condition or
anything like that but it sure smells a bit like one.
Agreed, that smell is the reason I'm proceeding so far like this is an
On 12/26/12 5:40 PM, Greg Stark wrote:
Also, do you have the buffer id of the broken buffer? I wonder if it's
not just any buffer but always the same same buffer even if it's a
different block in that buffer.
I just added something looking for that.
Before I got to that I found another crash:
On Wed, Dec 26, 2012 at 6:33 PM, Tom Lane wrote:
> Yeah, that destroys my theory that there's something broken about index
> management specifically. Now we're looking for something that can
> affect any buffer's refcount, which more than likely means it has
> nothing to do with the buffer's cont
On Wed, Dec 26, 2012 at 6:33 PM, Tom Lane wrote:
> Yeah, that destroys my theory that there's something broken about index
> management specifically. Now we're looking for something that can
> affect any buffer's refcount, which more than likely means it has
> nothing to do with the buffer's cont
On 12/26/12 1:58 PM, anara...@anarazel.de wrote:
I don't think its necessarily only one buffer - if I read the above output
correctly Greg used the suggested debug output which just put the elog(WARN)
before the Assert...
Greg, could you output all "bad" buffers and only assert after the loop
Tom Lane schrieb:
>Greg Smith writes:
>> To try and speed up replicating this problem I switched to a smaller
>> database scale, 100, and I was able to get a crash there. Here's the
>
>> latest:
>
>> 2012-12-26 00:01:19 EST [2278]: WARNING: refcount of
>base/16384/57610
>> blockNum=118571,
Greg Smith writes:
> To try and speed up replicating this problem I switched to a smaller
> database scale, 100, and I was able to get a crash there. Here's the
> latest:
> 2012-12-26 00:01:19 EST [2278]: WARNING: refcount of base/16384/57610
> blockNum=118571, flags=0x106 is 1073741824 shou
To try and speed up replicating this problem I switched to a smaller
database scale, 100, and I was able to get a crash there. Here's the
latest:
2012-12-26 00:01:19 EST [2278]: WARNING: refcount of base/16384/57610
blockNum=118571, flags=0x106 is 1073741824 should be 0, globally: 0
2012-12-
Greg Smith writes:
> I kicked off another test that includes the block number just before Tom
> suggested it, so I should have the block by tomorrow at the latest. The
> range of runtime before crash is 3 to 14 hours so far.
Cool. Once you get the crash, please also capture the contents of th
On 24 December 2012 16:25, Greg Smith wrote:
> On 12/24/12 11:10 AM, Simon Riggs wrote:
>
>> I wonder if you're having a hardware problem?
>
>
> Always possible. I didn't report this until I had replicated the crash and
> seen exactly the same thing twice. I've seen it crash on this assertion 6
On 12/24/12 11:10 AM, Simon Riggs wrote:
I wonder if you're having a hardware problem?
Always possible. I didn't report this until I had replicated the crash
and seen exactly the same thing twice. I've seen it crash on this
assertion 6 times now. Bad hardware is not normally so consistent
On 24 December 2012 16:07, Tom Lane wrote:
> Huh. Looks a bit like overflow of the refcount, which would explain why
> it takes such a long test case to reproduce it. But how could that be
> happening without somebody forgetting to decrement the refcount, which
> ought to lead to a visible fail
On 24 December 2012 15:57, Greg Smith wrote:
> 2012-12-24 06:08:46 EST [26015]: WARNING: refcount of base/16384/49169 is
> 1073741824 should be 0, globally: 0
>
> That is pgbench_accounts_pkey. 1073741824 =
> 0100 = 2^30
>
> Pretty odd value to find in a Priva
Greg Smith writes:
> I did get some output from the variation Andres suggested. There was
> exactly one screwed up buffer:
> 2012-12-24 06:08:46 EST [26015]: WARNING: refcount of base/16384/49169
> is 1073741824 should be 0, globally: 0
> That is pgbench_accounts_pkey. 1073741824 =
> 0100 0
On 12/23/12 3:17 PM, Simon Riggs wrote:
We already have PrintBufferLeakWarning() for this, which might be a bit neater.
Maybe. I tried using this, and I just got a seg fault within that code.
I can't figure out if I called it incorrectly or if the buffer
involved is so damaged that PrintBuff
On 23 December 2012 21:52, Greg Smith wrote:
> On 12/23/12 3:17 PM, Simon Riggs wrote:
>>
>> If that last change was the cause, then its caused within VACUUM. I'm
>> running a thrash test with autovacuums set much more frequently but
>> nothing yet.
>
>
> I am not very suspicious of that VACUUM ch
On 12/23/12 3:17 PM, Simon Riggs wrote:
If that last change was the cause, then its caused within VACUUM. I'm
running a thrash test with autovacuums set much more frequently but
nothing yet.
I am not very suspicious of that VACUUM change; just pointed it out for
completeness sake.
Are you b
On 23 December 2012 19:42, Greg Smith wrote:
> diff --git a/src/backend/storage/buffer/bufmgr.c
> b/src/backend/storage/buffer/bufmgr.c
> index dddb6c0..df43643 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -1697,11 +1697,21 @@ AtEOXact_Buffer
On 12/23/12 1:10 PM, Tom Lane wrote:
It might also be interesting to know if there is more than one
still-pinned buffer --- that is, if you're going to hack the code, fix
it to elog(LOG) each pinned buffer and then panic after completing the
loop.
Easy enough; I kept it so the actual source o
Andres Freund writes:
>> This is my first test like this against 9.3 development though, so the cause
>> could be an earlier commit. I'm just starting with the most recent work as
>> the first suspect. Next I think I'll try autovacuum=off and see if the
>> crash goes away. Other ideas are welco
Hi,
On 2012-12-23 02:36:42 -0500, Greg Smith wrote:
> I'm testing a checkout from a few days ago and trying to complete a day long
> pgbench stress test, with assertions and debugging on. I want to make sure
> the base code works as expected before moving on to testing checksums. It's
> crashing
I'm testing a checkout from a few days ago and trying to complete a day
long pgbench stress test, with assertions and debugging on. I want to
make sure the base code works as expected before moving on to testing
checksums. It's crashing before finishing though. Here's a sample:
2012-12-20 2
37 matches
Mail list logo