On Wed, Jun 17, 2015 at 6:58 AM, Alvaro Herrera
wrote:
> Thomas Munro wrote:
>
>> Thanks. As mentioned elsewhere in the thread, I discovered that the
>> same problem exists for page boundaries, with a different error
>> message. I've tried the attached repro scripts on 9.3.0, 9.3.5, 9.4.1
>> an
Thomas Munro wrote:
> Thanks. As mentioned elsewhere in the thread, I discovered that the
> same problem exists for page boundaries, with a different error
> message. I've tried the attached repro scripts on 9.3.0, 9.3.5, 9.4.1
> and master with the same results:
>
> FATAL: could not access s
Robert Haas wrote:
> On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch wrote:
> > On Thu, Jun 04, 2015 at 05:29:51PM -0400, Robert Haas wrote:
> >> Here's a new version with some more fixes and improvements:
> >
> > I read through this version and found nothing to change. I encourage other
> > hackers t
On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch wrote:
> On Thu, Jun 04, 2015 at 05:29:51PM -0400, Robert Haas wrote:
>> Here's a new version with some more fixes and improvements:
>
> I read through this version and found nothing to change. I encourage other
> hackers to study the patch, though. The
On Fri, Jun 5, 2015 at 1:47 PM, Thomas Munro
wrote:
> On Fri, Jun 5, 2015 at 11:47 AM, Thomas Munro
> wrote:
>> On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas wrote:
>>> Here's a new version with some more fixes and improvements:
>>> [...]
>>
>> With this patch, when I run the script
>> "checkpoint
On Thu, Jun 04, 2015 at 05:29:51PM -0400, Robert Haas wrote:
> Here's a new version with some more fixes and improvements:
I read through this version and found nothing to change. I encourage other
hackers to study the patch, though. The surrounding code is challenging.
> With this version, I'm
On Fri, Jun 5, 2015 at 11:47 AM, Thomas Munro
wrote:
> On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas wrote:
>> Here's a new version with some more fixes and improvements:
>>
>> - SetOffsetVacuumLimit was failing to set MultiXactState->oldestOffset
>> when the oldest offset became known if the now-k
On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas wrote:
> Here's a new version with some more fixes and improvements:
>
> - SetOffsetVacuumLimit was failing to set MultiXactState->oldestOffset
> when the oldest offset became known if the now-known value happened to
> be zero. Fixed.
>
> - SetOffsetVac
On Thu, Jun 4, 2015 at 5:29 PM, Robert Haas wrote:
> - Forces aggressive autovacuuming when the control file's
> oldestMultiXid doesn't point to a valid MultiXact and enables member
> wraparound at the next checkpoint following the correction of that
> problem.
Err, enables member wraparound *pro
On Thu, Jun 4, 2015 at 12:57 PM, Robert Haas wrote:
> On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas wrote:
>> Thanks for the review.
>
> Here's a new version. I've fixed the things Alvaro and Noah noted,
> and some compiler warnings about set but unused variables.
>
> I also tested it, and it does
Alvaro Herrera wrote:
> Robert Haas wrote:
>
> > So here's a patch taking a different approach.
>
> I tried to apply this to 9.3 but it's messy because of pgindent. Anyone
> would have a problem with me backpatching a pgindent run of multixact.c?
Done.
--
Álvaro Herrerahttp://
On Thu, Jun 4, 2015 at 1:27 PM, Andres Freund wrote:
> On 2015-06-04 12:57:42 -0400, Robert Haas wrote:
>> + /*
>> + * Do we need an emergency autovacuum? If we're not sure, assume yes.
>> + */
>> + return !oldestOffsetKnown ||
>> + (nextOffset - oldestOffset > MULTI
Hi,
On 2015-06-04 12:57:42 -0400, Robert Haas wrote:
> + /*
> + * Do we need an emergency autovacuum? If we're not sure, assume yes.
> + */
> + return !oldestOffsetKnown ||
> + (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
I think without teaching a
On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas wrote:
> Thanks for the review.
Here's a new version. I've fixed the things Alvaro and Noah noted,
and some compiler warnings about set but unused variables.
I also tested it, and it doesn't quite work as hoped. If started on a
cluster where oldestMu
On Thu, Jun 4, 2015 at 2:42 AM, Noah Misch wrote:
> I like that change a lot. It's much easier to seek forgiveness for wasting <=
> 28 GiB of disk than for deleting visibility information wrongly.
I'm glad you like it. I concur.
>> 2. If setting the offset stop limit (the point where we refuse
On Wed, Jun 03, 2015 at 04:53:46PM -0400, Robert Haas wrote:
> So here's a patch taking a different approach. In this approach, if
> the multixact whose members we want to look up doesn't exist, we don't
> use a later one (that might or might not be valid). Instead, we
> attempt to cope with the
On Mon, Jun 1, 2015 at 4:55 PM, Noah Misch wrote:
> While testing this (with inconsistent-multixact-fix-master.patch applied,
> FWIW), I noticed a nearby bug with a similar symptom. TruncateMultiXact()
> omits the nextMXact==oldestMXact special case found in each other
> find_multixact_start() ca
Robert Haas wrote:
> So here's a patch taking a different approach.
I tried to apply this to 9.3 but it's messy because of pgindent. Anyone
would have a problem with me backpatching a pgindent run of multixact.c?
Also, you have a new function SlruPageExists, but we already have
SimpleLruDoesPhy
On Wed, Jun 3, 2015 at 8:24 AM, Robert Haas wrote:
> On Tue, Jun 2, 2015 at 5:22 PM, Andres Freund wrote:
>>> > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning
>>> > the disk it'll always get one at a segment boundary, right? I'm not sure
>>> > that's actually ok; because th
Andres Freund wrote:
> On 2015-06-03 15:01:46 -0300, Alvaro Herrera wrote:
> > One idea I had was: what if the oldestMulti pointed to another multi
> > earlier in the same 0046 file, so that it is read-as-zeroes (and the
> > file is created), and then a subsequent multixact truncate tries to read
On 2015-06-03 15:01:46 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > That's not necessarily the case though, given how the code currently
> > works. In a bunch of places the SLRUs are accessed *before* having been
> > made consistent by WAL replay. Especially if several checkpoints/vacuum
Alvaro Herrera wrote:
> Really, the whole question of how this code goes past the open() failure
> in SlruPhysicalReadPage baffles me. I don't see any possible way for
> the file to be created ...
Hmm, the checkpointer can call TruncateMultiXact when in recovery, on
restartpoints. I wonder if in
Andres Freund wrote:
> On 2015-06-03 00:42:55 -0300, Alvaro Herrera wrote:
> > Thomas Munro wrote:
> > > On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera
> > > wrote:
> > > > My guess is that the file existed, and perhaps had one or more pages,
> > > > but the wanted page doesn't exist, so we tried
On 2015-06-03 00:42:55 -0300, Alvaro Herrera wrote:
> Thomas Munro wrote:
> > On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera
> > wrote:
> > > My guess is that the file existed, and perhaps had one or more pages,
> > > but the wanted page doesn't exist, so we tried to read but got 0 bytes
> > > ba
Thomas Munro wrote:
> I have finally reproduced that error! See attached repro shell script.
>
> The conditions are:
>
> 1. next multixact == oldest multixact (no active multixacts, pointing
> past the end)
> 2. next multixact would be the first item on a new page (multixact % 2048 ==
> 0)
>
On Tue, Jun 2, 2015 at 5:22 PM, Andres Freund wrote:
>> > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning
>> > the disk it'll always get one at a segment boundary, right? I'm not sure
>> > that's actually ok; because the value at the beginning of the segment
>> > can very wel
On Wed, Jun 3, 2015 at 4:48 AM, Thomas Munro
wrote:
> On Wed, Jun 3, 2015 at 3:42 PM, Alvaro Herrera
> wrote:
>> Thomas Munro wrote:
>>> On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera
>>> wrote:
>>> > My guess is that the file existed, and perhaps had one or more pages,
>>> > but the wanted pa
On Wed, Jun 3, 2015 at 3:42 PM, Alvaro Herrera wrote:
> Thomas Munro wrote:
>> On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera
>> wrote:
>> > My guess is that the file existed, and perhaps had one or more pages,
>> > but the wanted page doesn't exist, so we tried to read but got 0 bytes
>> > back
Thomas Munro wrote:
> On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera
> wrote:
> > My guess is that the file existed, and perhaps had one or more pages,
> > but the wanted page doesn't exist, so we tried to read but got 0 bytes
> > back. read() returns 0 in this case but doesn't set errno.
> >
>
On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera wrote:
> My guess is that the file existed, and perhaps had one or more pages,
> but the wanted page doesn't exist, so we tried to read but got 0 bytes
> back. read() returns 0 in this case but doesn't set errno.
>
> I didn't find a way to set things
> > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning
> > the disk it'll always get one at a segment boundary, right? I'm not sure
> > that's actually ok; because the value at the beginning of the segment
> > can very well end up being a 0, as MaybeExtendOffsetSlru() will have
>
On Tue, Jun 2, 2015 at 4:19 PM, Andres Freund wrote:
> I'm not really convinced tying things closer to having done trimming is
> easier to understand than tying things to recovery having finished.
>
> E.g.
> if (did_trim)
> oldestOffset = GetOldestReferencedOffset(oldest_da
On 2015-06-01 14:22:32 -0400, Robert Haas wrote:
> commit d33b4eb0167f465edb00bd6c0e1bcaa67dd69fe9
> Author: Robert Haas
> Date: Fri May 29 14:35:53 2015 -0400
>
> foo
Hehe!
> diff --git a/src/backend/access/transam/multixact.c
> b/src/backend/access/transam/multixact.c
> index 9568ff1.
On 2015-06-02 11:49:56 -0400, Robert Haas wrote:
> On Tue, Jun 2, 2015 at 11:44 AM, Andres Freund wrote:
> > On 2015-06-02 11:37:02 -0400, Robert Haas wrote:
> >> The exact circumstances under which we're willing to replace a
> >> relminmxid with a newly-computed one that differs are not altogethe
On Tue, Jun 02, 2015 at 11:16:22AM -0400, Robert Haas wrote:
> On Tue, Jun 2, 2015 at 1:21 AM, Noah Misch wrote:
> > On Mon, Jun 01, 2015 at 02:06:05PM -0400, Robert Haas wrote:
> > Granted. Would it be better to update both functions at the same time, and
> > perhaps to make that a master-only
On Tue, Jun 2, 2015 at 11:44 AM, Andres Freund wrote:
> On 2015-06-02 11:37:02 -0400, Robert Haas wrote:
>> The exact circumstances under which we're willing to replace a
>> relminmxid with a newly-computed one that differs are not altogether
>> clear to me, but there's an "if" statement protectin
On Tue, Jun 2, 2015 at 11:36 AM, Andres Freund wrote:
>> That would be a departure from the behavior of every existing release
>> that includes this code based on, to my knowledge, zero trouble
>> reports.
>
> On the other hand we're now at about bug #5 attributeable to the odd way
> truncation wo
On 2015-06-02 11:37:02 -0400, Robert Haas wrote:
> The exact circumstances under which we're willing to replace a
> relminmxid with a newly-computed one that differs are not altogether
> clear to me, but there's an "if" statement protecting that logic, so
> there are some circumstances in which we'
On Tue, Jun 2, 2015 at 11:27 AM, Andres Freund wrote:
> On 2015-06-02 11:16:22 -0400, Robert Haas wrote:
>> I'm having trouble figuring out what to do about this. I mean, the
>> essential principle of this patch is that if we can't count on
>> relminmxid, datminmxid, or the control file to be acc
On 2015-06-02 11:29:24 -0400, Robert Haas wrote:
> On Tue, Jun 2, 2015 at 8:56 AM, Andres Freund wrote:
> > But what *definitely* looks wrong to me is that a TruncateMultiXact() in
> > this scenario now (since a couple weeks ago) does a
> > SimpleLruReadPage_ReadOnly() in the members slru via
> >
On Tue, Jun 2, 2015 at 8:56 AM, Andres Freund wrote:
> But what *definitely* looks wrong to me is that a TruncateMultiXact() in
> this scenario now (since a couple weeks ago) does a
> SimpleLruReadPage_ReadOnly() in the members slru via
> find_multixact_start(). That just won't work acceptably whe
On 2015-06-02 11:16:22 -0400, Robert Haas wrote:
> I'm having trouble figuring out what to do about this. I mean, the
> essential principle of this patch is that if we can't count on
> relminmxid, datminmxid, or the control file to be accurate, we can at
> least look at what is present on the disk
On Tue, Jun 2, 2015 at 1:21 AM, Noah Misch wrote:
> On Mon, Jun 01, 2015 at 02:06:05PM -0400, Robert Haas wrote:
>> On Mon, Jun 1, 2015 at 12:46 AM, Noah Misch wrote:
>> > On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote:
>> >> SetMultiXactIdLimit() bracketed certain parts of its
>> >>
On 2015-06-01 14:22:32 -0400, Robert Haas wrote:
> On Mon, Jun 1, 2015 at 4:58 AM, Andres Freund wrote:
> > The lack of WAL logging actually has caused problems in the 9.3.3 (?)
> > era, where we didn't do any truncation during recovery...
>
> Right, but now we're piggybacking on the checkpoint r
On Mon, Jun 01, 2015 at 02:06:05PM -0400, Robert Haas wrote:
> On Mon, Jun 1, 2015 at 12:46 AM, Noah Misch wrote:
> > On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote:
> >> SetMultiXactIdLimit() bracketed certain parts of its
> >> logic with if (!InRecovery), but those guards were ineff
Alvaro Herrera wrote:
> Anyway here's a quick script to almost-reproduce the problem.
Meh. Really attached now.
I also wanted to post the error messages we got:
2015-05-27 16:15:17 UTC [4782]: [3-1] user=,db= LOG: entering standby mode
2015-05-27 16:15:18 UTC [4782]: [4-1] user=,db= LOG: resto
Alvaro Herrera wrote:
> Robert Haas wrote:
> > In the process of investigating this, we found a few other things that
> > seem like they may also be bugs:
> >
> > - As noted upthread, replaying an older checkpoint after a newer
> > checkpoint has already happened may lead to similar problems. Th
Thomas Munro wrote:
> > - There's a third possible problem related to boundary cases in
> > SlruScanDirCbRemoveMembers, but I don't understand that one well
> > enough to explain it. Maybe Thomas can jump in here and explain the
> > concern.
>
> I noticed something in passing which is probably n
On Mon, Jun 1, 2015 at 4:58 AM, Andres Freund wrote:
>> I'm probably biased here, but I think we should finish reviewing,
>> testing, and committing my patch before we embark on designing this.
>
> Probably, yes. I am wondering whether doing this immediately won't end
> up making some things simpl
On Mon, Jun 1, 2015 at 12:46 AM, Noah Misch wrote:
> Incomplete review, done in a relative rush:
Thanks.
> On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote:
>> OK, here's a patch. Actually two patches, differing only in
>> whitespace, for 9.3 and for master (ha!). I now think that t
On 2015-05-31 07:51:59 -0400, Robert Haas wrote:
> > 1) We continue determining the oldest
> > SlruScanDirectory(SlruScanDirCbFindEarliest)
> >on the master to find the oldest offsets segment to
> >truncate. Alternatively, if we determine it to be safe, we could use
> >oldestMulti to f
On Fri, May 29, 2015 at 10:37:57AM +1200, Thomas Munro wrote:
> On Fri, May 29, 2015 at 7:56 AM, Robert Haas wrote:
> > - There's a third possible problem related to boundary cases in
> > SlruScanDirCbRemoveMembers, but I don't understand that one well
> > enough to explain it. Maybe Thomas can j
Incomplete review, done in a relative rush:
On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote:
> OK, here's a patch. Actually two patches, differing only in
> whitespace, for 9.3 and for master (ha!). I now think that the root
> of the problem here is that DetermineSafeOldestOffset() a
On Sat, May 30, 2015 at 8:55 PM, Andres Freund wrote:
> Is oldestMulti, nextMulti - 1 really suitable for this? Are both
> actually guaranteed to exist in the offsets slru and be valid? Hm. I
> guess you intend to simply truncate everything else, but just in
> offsets?
oldestMulti in theory is t
On 2015-05-30 00:52:37 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
>
> > I considered for a second whether the solution for that could be to not
> > truncate while inconsistent - but I think that doesn't solve anything as
> > then we can end up with directories where every single offsets/me
Bruce Momjian wrote:
> I think we need to step back and look at the brain power required to
> unravel the mess we have made regarding multi-xact and fixes. (I bet
> few people can even remember which multi-xact fixes went into which
> releases --- I can't.) Instead of working on actual features,
Andres Freund wrote:
> I considered for a second whether the solution for that could be to not
> truncate while inconsistent - but I think that doesn't solve anything as
> then we can end up with directories where every single offsets/member
> file exists.
Hang on a minute. We don't need to scan
On Sat, May 30, 2015 at 1:46 PM, Andres Freund wrote:
> On 2015-05-29 15:08:11 -0400, Robert Haas wrote:
>> It seems pretty clear that we can't effectively determine anything
>> about member wraparound until the cluster is consistent.
>
> I wonder if this doesn't actually hints at a bigger problem
On Fri, May 29, 2015 at 9:46 PM, Andres Freund wrote:
> On 2015-05-29 15:08:11 -0400, Robert Haas wrote:
>> It seems pretty clear that we can't effectively determine anything
>> about member wraparound until the cluster is consistent.
>
> I wonder if this doesn't actually hints at a bigger problem
On Fri, May 29, 2015 at 3:08 PM, Robert Haas wrote:
> It won't fix the fact that pg_upgrade is putting
> a wrong value into everybody's datminmxid field, which should really
> be addressed too, but I've been working on this for about three days
> virtually non-stop and I don't have the energy to t
On 2015-05-29 15:08:11 -0400, Robert Haas wrote:
> It seems pretty clear that we can't effectively determine anything
> about member wraparound until the cluster is consistent.
I wonder if this doesn't actually hints at a bigger problem. Currently,
to determine where we need to truncate SlruScanD
On 2015-05-29 15:49:53 -0400, Bruce Momjian wrote:
> I think we need to step back and look at the brain power required to
> unravel the mess we have made regarding multi-xact and fixes. (I bet
> few people can even remember which multi-xact fixes went into which
> releases --- I can't.) Instead o
On 2015-05-30 10:55:30 +1200, Thomas Munro wrote:
> That's the error message, but then further down:
Ooops.
> "I have confirmed that directory "pg_multixact/members" does not
> existing in the restored data directory.
>
> I can see this directory and the file if i restore a few days old
> backup.
On Sat, May 30, 2015 at 10:48 AM, Andres Freund wrote:
> On 2015-05-30 10:41:01 +1200, Thomas Munro wrote:
>> On Sat, May 30, 2015 at 10:29 AM, Robert Haas wrote:
>> > On Fri, May 29, 2015 at 5:14 PM, Josh Berkus wrote:
>> >> Just saw what looks like a report of this issue on 9.2.
>> >>
>> >> ht
On 2015-05-30 10:41:01 +1200, Thomas Munro wrote:
> On Sat, May 30, 2015 at 10:29 AM, Robert Haas wrote:
> > On Fri, May 29, 2015 at 5:14 PM, Josh Berkus wrote:
> >> Just saw what looks like a report of this issue on 9.2.
> >>
> >> https://github.com/wal-e/wal-e/issues/177
> >
> > Urk. That look
On Sat, May 30, 2015 at 10:29 AM, Robert Haas wrote:
> On Fri, May 29, 2015 at 5:14 PM, Josh Berkus wrote:
>> Just saw what looks like a report of this issue on 9.2.
>>
>> https://github.com/wal-e/wal-e/issues/177
>
> Urk. That looks awfully similar, but I don't think any of the code
> that is a
On Fri, May 29, 2015 at 5:14 PM, Josh Berkus wrote:
> Just saw what looks like a report of this issue on 9.2.
>
> https://github.com/wal-e/wal-e/issues/177
Urk. That looks awfully similar, but I don't think any of the code
that is affected here exists in 9.2, or that any of the fixes involved
we
On Fri, May 29, 2015 at 12:08 PM Robert Haas wrote:
> OK, here's a patch.
>
I grabbed branch REL9_4_STABLE from git, and Robert got me a 9.4-specific
patch. I rebuilt, installed, and postgres started up successfully! I did a
bunch of checks, had our app run several thousand SQL queries against
All,
Just saw what looks like a report of this issue on 9.2.
https://github.com/wal-e/wal-e/issues/177
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql
On Thu, May 28, 2015 at 07:24:26PM -0400, Robert Haas wrote:
> On Thu, May 28, 2015 at 4:06 PM, Joshua D. Drake
> wrote:
> > FTR: Robert, you have been a Samurai on this issue. Our many thanks.
>
> Thanks! I really appreciate the kind words.
>
> So, in thinking through this situation further,
On Fri, May 29, 2015 at 12:43 PM, Robert Haas wrote:
> Working on that now.
OK, here's a patch. Actually two patches, differing only in
whitespace, for 9.3 and for master (ha!). I now think that the root
of the problem here is that DetermineSafeOldestOffset() and
SetMultiXactIdLimit() were larg
On Fri, May 29, 2015 at 10:17 AM, Tom Lane wrote:
> Thomas Munro writes:
>> On Fri, May 29, 2015 at 11:24 AM, Robert Haas wrote:
>>> B. We need to change find_multixact_start() to fail softly.
>
>> Here is an experimental WIP patch that changes StartupMultiXact and
>> SetMultiXactIdLimit to find
Thomas Munro writes:
> On Fri, May 29, 2015 at 11:24 AM, Robert Haas wrote:
>> B. We need to change find_multixact_start() to fail softly.
> Here is an experimental WIP patch that changes StartupMultiXact and
> SetMultiXactIdLimit to find the oldest multixact that exists on disk
> (by scanning t
Re: Robert Haas 2015-05-29
> > FTR: Robert, you have been a Samurai on this issue. Our many thanks.
>
> Thanks! I really appreciate the kind words.
I'm still watching with admiration. This list of steps-to-reproduce is
the longest and at the same time best I've ever seen.
If anyone ever asks
On Fri, May 29, 2015 at 11:24 AM, Robert Haas wrote:
> A. Most obviously, we should fix pg_upgrade so that it installs
> chkpnt_oldstMulti instead of chkpnt_nxtmulti into datfrozenxid, so
> that we stop creating new instances of this problem. That won't get
> us out of the hole we've dug for ours
On Thu, May 28, 2015 at 10:41 PM, Alvaro Herrera
wrote:
>> 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid
>> values which are equal to the next-mxid counter instead of the correct
>> value; in other words, they are too new.
>
> [ discussion of how the control file's oldestMul
Alvaro Herrera wrote:
> Robert Haas wrote:
>
> > 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid
> > values which are equal to the next-mxid counter instead of the correct
> > value; in other words, they are too new.
>
> What you describe is what happens if you upgrade from 9
Robert Haas wrote:
> 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid
> values which are equal to the next-mxid counter instead of the correct
> value; in other words, they are too new.
What you describe is what happens if you upgrade from 9.2 or earlier.
For this case we use
On Thu, May 28, 2015 at 4:06 PM, Joshua D. Drake wrote:
> FTR: Robert, you have been a Samurai on this issue. Our many thanks.
Thanks! I really appreciate the kind words.
So, in thinking through this situation further, it seems to me that
the situation is pretty dire:
1. If you pg_upgrade to 9
On Fri, May 29, 2015 at 7:56 AM, Robert Haas wrote:
> On Thu, May 28, 2015 at 8:51 AM, Robert Haas wrote:
>> [ speculation ]
>
> [...] However, since
> the vacuum did advance relfrozenxid, it will call vac_truncate_clog,
> which will call SetMultiXactIdLimit, which will propagate the bogus
> dat
Robert Haas wrote:
> On Thu, May 28, 2015 at 8:51 AM, Robert Haas wrote:
> > [ speculation ]
>
> OK, I finally managed to reproduce this, after some off-list help from
> Steve Kehlet (the reporter), Alvaro, and Thomas Munro. Here's how to
> do it:
It's a long list of steps, but if you consider
On 05/28/2015 12:56 PM, Robert Haas wrote:
FTR: Robert, you have been a Samurai on this issue. Our many thanks.
Sincerely,
jD
--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended"
On Thu, May 28, 2015 at 8:51 AM, Robert Haas wrote:
> [ speculation ]
OK, I finally managed to reproduce this, after some off-list help from
Steve Kehlet (the reporter), Alvaro, and Thomas Munro. Here's how to
do it:
1. Install any pre-9.3 version of the server and generate enough
multixacts to
On Thu, May 28, 2015 at 8:03 AM, Robert Haas wrote:
>> Steve, is there any chance we can get your pg_controldata output and a
>> list of all the files in pg_clog?
>
> Err, make that pg_multixact/members, which I assume is at issue here.
> You didn't show us the DETAIL line from this message, which
On Thu, May 28, 2015 at 8:01 AM, Robert Haas wrote:
> On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera
> wrote:
>> Steve Kehlet wrote:
>>> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
>>> just dropped new binaries in place) but it wouldn't start up. I found this
>>> i
On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera
wrote:
> Steve Kehlet wrote:
>> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
>> just dropped new binaries in place) but it wouldn't start up. I found this
>> in the logs:
>>
>> waiting for server to start2015-05-27 1
On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera
wrote:
> Steve Kehlet wrote:
>> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
>> just dropped new binaries in place) but it wouldn't start up. I found this
>> in the logs:
>>
>> waiting for server to start2015-05-27 1
On Wed, May 27, 2015 at 10:14 PM, Alvaro Herrera
wrote:
> Well I'm not very clear on what's the problematic case. The scenario I
> actually saw this first reported was a pg_basebackup taken on a very
> large database, so the master could have truncated multixact and the
> standby receives a trunc
Robert Haas wrote:
> On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera
> wrote:
> > Steve Kehlet wrote:
> >> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
> >> just dropped new binaries in place) but it wouldn't start up. I found this
> >> in the logs:
> >>
> >> waiting
On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera
wrote:
> Steve Kehlet wrote:
>> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
>> just dropped new binaries in place) but it wouldn't start up. I found this
>> in the logs:
>>
>> waiting for server to start2015-05-27 1
Steve Kehlet wrote:
> On Wed, May 27, 2015 at 3:21 PM Alvaro Herrera
> wrote:
>
> > I think a patch like this should be able to fix it ... not tested yet.
> >
>
> Thanks Alvaro. I got a compile error, so looked for other uses of
> SimpleLruDoesPhysicalPageExist and added MultiXactOffsetCtl, does
On Wed, May 27, 2015 at 3:21 PM Alvaro Herrera
wrote:
> I think a patch like this should be able to fix it ... not tested yet.
>
Thanks Alvaro. I got a compile error, so looked for other uses of
SimpleLruDoesPhysicalPageExist and added MultiXactOffsetCtl, does this look
right?
+ (!InRecovery |
Steve Kehlet wrote:
> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
> just dropped new binaries in place) but it wouldn't start up. I found this
> in the logs:
>
> waiting for server to start2015-05-27 13:13:00 PDT [27341]: [1-1] LOG:
> database system was shut do
93 matches
Mail list logo