Just a quick update, we have now deployed the patch to all three of our
production slave databases, and none has experienced an alloc error or
segfault since receiving the patch. So it's looking very good! We would
not be able to deploy the whole 9.1 stable build to our production
environment sin
We deployed the patch to one of our production slaves at 3:30 PM yesterday
(so roughly 20 hours ago), and since then we have not seen any alloc
errors. On Feb 2nd, the last full day in which we ran without the patch,
we saw 13 alloc errors. We're going to continue monitoring this slave, but
we're
individual WAL replay routines. I think we'd better go over
> all of them with a fine-tooth comb. In general, a WAL replay routine
> can no longer be allowed to create transiently invalid page states
> that would not have occurred in the "live" version of the page change.
>
> I am even more troubled than I was before about what this says about
> the amount of testing Hot Standby has gotten, because AFAICS absolutely
> any use of Hot Standby, no matter the particulars, ought to be heavily
> exposed to this bug.
>
>regards, tom lane
>
--
Bridget Frey Director, Data & Analytics Engineering | Redfin
bridget.f...@redfin.com | tel: 206.576.5894
So here's a better stack trace for the segfault issue (again, just to
summarize, since this is a long thread, we're seeing two issues: 1) alloc
errors that do not crash the DB (although we modified postgres to panic
when this happens in our test environment, and posted a stack earlier) 2) a
postgre
We have no DDL whatsoever in the code. We do update rows in the
logins table frequently, but we basically have a policy of only doing
DDL changes during scheduled upgrades when we bring the site down. We
have been discussing this issue a lot and we really haven't come up
with anything that would
e any other thoughts on what we should look at...
Sent from my iPhone
On Jan 30, 2012, at 7:01 PM, Tom Lane wrote:
> Bridget Frey writes:
>> The second error is an invalid memory alloc error that we're getting ~2
>> dozen times per day in production. The bt for this alloc
All right, so we were able to get a full bt of the alloc error on a test
system. Also, since we have a lot of emails going around on this - I
wanted to make it clear that we're seeing *two* production errors, which
may or may not be related. (The OP for bug #6200 also sees both issues.)
One is a
bert Haas wrote:
> On Mon, Jan 23, 2012 at 3:22 PM, Bridget Frey
> wrote:
> > Hello,
> > We upgraded to postgres 9.1.2 two weeks ago, and we are also
> experiencing an
> > issue that seems very similar to the one reported as bug 6200. We see
> > approximately 2 do
osting
about what we're seeing.
Thanks,
-Bridget Frey
Redfin