Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-02 Thread Duncan Rance
I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last reproduction of the problem I saw this: Client 2 aborted in state 0: ERROR: invalid memory alloc request size 18446744073709551613 So like Tom said, these two issues could well be related. I just wanted to mention

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-02 Thread Duncan Rance
I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last reproduction of the problem I saw this: Client 2 aborted in state 0: ERROR: invalid memory alloc request size 18446744073709551613 So like Tom said, these two issues could well be related. I just wanted to mention

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-01 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mié feb 01 18:06:27 -0300 2012: > Robert Haas writes: > >>> No, I wasn't thinking about a tuple descriptor mismatch. I was > >>> imagining that the page contents themselves might be in flux while > >>> we're trying to read from it. > > > It would be nice to ge

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-01 Thread Tom Lane
Robert Haas writes: >>> No, I wasn't thinking about a tuple descriptor mismatch.  I was >>> imagining that the page contents themselves might be in flux while >>> we're trying to read from it. > It would be nice to get a dump of what PostgreSQL thought the entire > block looked like at the time t

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-01 Thread Robert Haas
On Wed, Feb 1, 2012 at 11:19 AM, Tom Lane wrote: > Robert Haas writes: >> No, I wasn't thinking about a tuple descriptor mismatch.  I was >> imagining that the page contents themselves might be in flux while >> we're trying to read from it. > > Oh, gotcha.  Yes, that's a horribly plausible idea.

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-01 Thread Tom Lane
Robert Haas writes: > No, I wasn't thinking about a tuple descriptor mismatch. I was > imagining that the page contents themselves might be in flux while > we're trying to read from it. Oh, gotcha. Yes, that's a horribly plausible idea. All it'd take is one WAL replay routine that hasn't been

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-01 Thread Robert Haas
On Tue, Jan 31, 2012 at 4:25 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane wrote: >>> BTW, after a bit more reflection it occurs to me that it's not so much >>> that the data is necessarily *bad*, as that it seemingly doesn't match >>> the tuple descript

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-01 Thread Bridget Frey
So here's a better stack trace for the segfault issue (again, just to summarize, since this is a long thread, we're seeing two issues: 1) alloc errors that do not crash the DB (although we modified postgres to panic when this happens in our test environment, and posted a stack earlier) 2) a postgre

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-31 Thread Tom Lane
Robert Haas writes: > On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane wrote: >> BTW, after a bit more reflection it occurs to me that it's not so much >> that the data is necessarily *bad*, as that it seemingly doesn't match >> the tuple descriptor that the backend's trying to interpret it with. > Hm

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-31 Thread Robert Haas
On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane wrote: > I wrote: >> Hm.  The stack trace is definitive that it's finding the bad data in a >> tuple that it's trying to print to the client, not in an index. > > BTW, after a bit more reflection it occurs to me that it's not so much > that the data is ne

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-31 Thread Alvaro Herrera
Excerpts from Bridget Frey's message of lun ene 30 18:59:08 -0300 2012: > Anyway, here goes... Maybe a "bt full" could give more insight into what's going on ... > #0 0x003a83e30265 in raise () from /lib64/libc.so.6 > #1 0x003a83e31d10 in abort () from /lib64/libc.so.6 > #2 0x000

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-31 Thread Bridget Frey
We have no DDL whatsoever in the code. We do update rows in the logins table frequently, but we basically have a policy of only doing DDL changes during scheduled upgrades when we bring the site down. We have been discussing this issue a lot and we really haven't come up with anything that would

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-30 Thread Tom Lane
I wrote: > Hm. The stack trace is definitive that it's finding the bad data in a > tuple that it's trying to print to the client, not in an index. BTW, after a bit more reflection it occurs to me that it's not so much that the data is necessarily *bad*, as that it seemingly doesn't match the tupl

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-30 Thread Tom Lane
Bridget Frey writes: > Thanks for the reply, we appreciate you time on this. The alloc error > queries all seem to be selects from a btree primary index. I gave an > example in my initial post from the logins table. Usually for us it > is logins but sometimes we have seen it on a few other tab

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-30 Thread Bridget Frey
Hi Tom, Thanks for the reply, we appreciate you time on this. The alloc error queries all seem to be selects from a btree primary index. I gave an example in my initial post from the logins table. Usually for us it is logins but sometimes we have seen it on a few other tables, and it's always a

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-30 Thread Tom Lane
Bridget Frey writes: > The second error is an invalid memory alloc error that we're getting ~2 > dozen times per day in production. The bt for this alloc error is below. This trace is consistent with the idea that we're getting a corrupt tuple out of a table, although it doesn't entirely preclud

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-30 Thread Bridget Frey
All right, so we were able to get a full bt of the alloc error on a test system. Also, since we have a lot of emails going around on this - I wanted to make it clear that we're seeing *two* production errors, which may or may not be related. (The OP for bug #6200 also sees both issues.) One is a

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-30 Thread Robert Haas
On Sat, Jan 28, 2012 at 8:45 PM, Michael Brauwerman wrote: > We did try that with a postgres 9.1.2, compiled from source with debug > flags, but we got 0x10 bad address in gdb. (Obviously we did it wrong > somehow) > > We will keep trying to get a good set of symbols set up. Hmm. Your backtrace

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-28 Thread Michael Brauwerman
We did try that with a postgres 9.1.2, compiled from source with debug flags, but we got 0x10 bad address in gdb. (Obviously we did it wrong somehow) We will keep trying to get a good set of symbols set up. On Jan 28, 2012 2:34 PM, "Peter Geoghegan" wrote: > On 28 January 2012 21:34, Michael Bra

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-28 Thread Peter Geoghegan
On 28 January 2012 21:34, Michael Brauwerman wrote: > We have the (5GB) core file, and are happy to do any more forensics anyone > can advise. Ideally, you'd be able to install debug information packages, which should give a more detailed and useful stack trace, as described here: http://wiki.po

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-28 Thread Michael Brauwerman
I work with Bridget at Redfin. We have a core dump from a once-in-5-days (multi-million queries) hot standby segfault in pg 9.1.2 . (It might or might be the same root issue as the "alloc" errors. If I should file a new bug report, let me know. The postgres executable that crashed did not have de

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-27 Thread Bridget Frey
Thanks for the info - that's very helpful. We had also noted that the alloc seems to be -3 bytes. We have run pg_check and it found no instances of corruption. We've also replayed queries that have failed, and have never been able to get the same query to fail twice. In the case you investigated

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-27 Thread Robert Haas
On Fri, Jan 27, 2012 at 1:31 PM, Bridget Frey wrote: > Thanks for the info - that's very helpful.  We had also noted that the alloc > seems to be -3 bytes.  We have run pg_check and it found no instances of > corruption. We've also replayed queries that have failed, and have never > been able to g

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-27 Thread Robert Haas
On Mon, Jan 23, 2012 at 3:22 PM, Bridget Frey wrote: > Hello, > We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing an > issue that seems very similar to the one reported as bug 6200.  We see > approximately 2 dozen alloc errors per day across 3 slaves, and we are > getting o

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-01-23 Thread Bridget Frey
Hello, We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing an issue that seems very similar to the one reported as bug 6200. We see approximately 2 dozen alloc errors per day across 3 slaves, and we are getting one segfault approximately every 3 days. We did not experience t

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2011-09-09 Thread Simon Riggs
On Thu, Sep 8, 2011 at 11:33 PM, Daniel Farina wrote: >  ERROR: invalid memory alloc request size 18446744073709551613 > At least once, a hot standby was promoted to a primary and the errors seem > to discontinue, but then reappear on a newly-provisioned standby. So the query that fails is a bt

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2011-09-09 Thread Heikki Linnakangas
On 09.09.2011 18:02, Tom Lane wrote: The way that I'd personally proceed to investigate it would probably be to change the "invalid memory alloc request size" size errors (in src/backend/utils/mmgr/mcxt.c; there are about four occurrences) from ERROR to PANIC so that they'll provoke a core dump,

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2011-09-09 Thread Tom Lane
"Daniel Farina" writes: > A huge thanks to Conrad Irwin of Rapportive for furnishing virtually all the > details of this bug report. This isn't really enough information to reproduce the problem ... > The occurrence rate is somewhere in the one per tens-of-millions of > queries. ... and that st

[BUGS] BUG #6200: standby bad memory allocations on SELECT

2011-09-08 Thread Daniel Farina
The following bug has been logged online: Bug reference: 6200 Logged by: Daniel Farina Email address: dan...@heroku.com PostgreSQL version: 9.0.4 Operating system: Ubuntu 10.04 Description:standby bad memory allocations on SELECT Details: A huge thanks to Conrad Irw