ended, we register that.
I can see one advantage that block nested loop join would retain here:
it does block-based accesses on both sides of the join. Since it
"looks ahead" on both sides of the join, more repeat accesses are
likely to be avoided.
Not too sure how much that matters in practice, though.
--
Peter Geoghegan
ery execution much faster in some specific context, while avoiding
harmful second order effects. Intuitively, I think that it should be
possible to do this with the transformations performed by your patch.
In other words, "helpful serendipity" is an important advantage, while
"harmful anti-serendipity" is what we really want to avoid. Ideally by
making the harmful cases impossible "by construction".
--
Peter Geoghegan
n important condition for
triggering freezing in practice.
Your question about 2 seems equivalent to "why not just always
freeze?". I don't think that that's a bad question -- quite the
opposite. Even trying to give an answer to this question would amount
to getting involved in new work on VACUUM, t
I'm no longer interested in working on VACUUM (though I do
hope that Melanie or somebody else picks up where I left off). I have
nothing to say about any new work in this area. If you want me to do
something in the scope of the work on 16, as a release management
task, please be clear about what that is.
--
Peter Geoghegan
stop working on
VACUUM, though, so I'm afraid I won't be able to offer much help with
any of this. (Happy to give more background information, though.)
--
Peter Geoghegan
A=1t1fx_kmrdovopzxkpa-t...@mail.gmail.com
https://www.postgresql.org/message-id/attachment/146830/routine-vacuuming.html
I've been meaning to get back to that, but other commitments have kept
me from it. I'd welcome your involvement with that effort.
--
Peter Geoghegan
e executor stuff -- it's almost all just describing the
preprocessing/transformation process. It seems as if optimizations
like the one from my patch were considered too obvious to talk about
and/or out of scope by the authors. Thinking about the MDAM paper like
that was what made everything fall into place for me. Remember,
"missing key predicates" isn't all that special.
--
Peter Geoghegan
safety issues notwithstanding).
--
Peter Geoghegan
ing through each distinct set of array keys in the
patch.
What you describe is a problem in theory, but I doubt that it's a
problem in practice. You don't actually have to materialize the
predicates up-front, or at all. Plus you can skip over them using the
next index tuple. So skipping works both ways.
--
Peter Geoghegan
urrently, the optimizer doesn't recognize multi-column indexes with
SAOPs on every column as having a valid sort order, except on the
first column. It seems possible that that has consequences for your
patch. (I'm really only guessing, though; don't trust anything that I
say about the optimizer too much.)
--
Peter Geoghegan
On Mon, Jul 10, 2023 at 10:29 PM Peter Geoghegan wrote:
> > Let's add a src/backend/access/rmgrdesc/README file. We don't currently
> > have any explanation anywhere why the rmgr desc functions are in a
> > separate directory. The README would be a good place to explain tha
term, many queries that don't make use
of SAOPs should ultimately see similar benefits. For example, the
paper also describes transformations that apply to BETWEEN/range
predicates. We might end up needing a third type of expression for
those. They're all just DNF single value predicates, under
another.
I think that you're probably right about it being overly restrictive
-- that was just a starting point for discussion. Perhaps there is an
identifiable class of clauses that can benefit, but don't have the
downside that I'm concerned about.
--
Peter Geoghegan
ust path
in the sense that I've described. This is obviously true because there
can't possibly be index quals/scan keys for non-key columns within the
index AM.
--
Peter Geoghegan
s just an example that jumped out at me.)
Perhaps this example will make my confusion about the boundaries
between each of our patches a bit more understandable. I was confused
-- and I still am. I look forward to being less confused at some point
in the future.
--
Peter Geoghegan
bles skipping, in all
its forms (skipping individual comparisons, skipping whole subsections
of the index, etc).
I'm not saying that this is 100% problem free. But it seems like a
promising high level direction.
> In a way, focusing on the worst case does that by assuming the worst
> combination - which is fine, although it may choose the slower (but
> safer) approach in some cases.
I don't think that it has to be slower on average (even by a tiny
bit). It might just end up being slightly faster on average, and way
faster on occasion.
--
Peter Geoghegan
ke perfect decisions.
The attribute value independence assumption is wishful thinking, in no
small part -- it's quite surprising that it works as well as it does,
really.
--
Peter Geoghegan
to what I need to do with my patch. Right now my
patch assumes that making SAOP clauses into proper index quals (that
usually preserve index ordering) is an unalloyed good (when safe!).
This assumption is approximately true on average, as far as I can
tell. But it's probably quite untrue in various specific cases, that
somebody is bound to care about.
--
Peter Geoghegan
be a good place to explain that,
> and to have the formatting guidelines. See attached.
I agree that it's better this way, though.
--
Peter Geoghegan
We're just keeping our options open in more cases.
(My thinking on these topics was influenced by Goetz Graefe -- "choice
is confusion" [2]).
[1]
https://www.postgresql.org/message-id/flat/1397.1486598083%40sss.pgh.pa.us#310f974a8dc84478d6d3c70f336807bb
[2]
https://sigmodrecord.org/pub
On Mon, Jun 26, 2023 at 11:27 PM Andres Freund wrote:
> On 2023-06-26 21:53:12 -0700, Peter Geoghegan wrote:
> > It should be safe to allow searchers to see a version of the root page
> > that is out of date. The Lehman & Yao design is very permissive about
> > these
root page isn't
really special, except in the obvious way. We can even have two roots
at the same time (the true root, and the fast root).
--
Peter Geoghegan
page LSN?
--
Peter Geoghegan
the universe than the use case I was focusing on at
> the time.
They're not just the hottest. They're also among the least likely to
change from one moment to the next. (If that ever failed to hold then
it wouldn't take long for the index to become grotesquely tall.)
--
Peter Geoghegan
s index quals. It thinks (correctly) that the query plan is
very inefficient. That happens to match reality right now, but the
underlying reality could change significantly. Something to think
about.
--
Peter Geoghegan
saop_patch_test.sql
Description: Binary data
On Tue, Jun 20, 2023 at 11:13 PM Peter Geoghegan wrote:
> FWIW, I'm almost certain that I'll completely run out of ERRORs to
> demote to LOGs before too long. In fact, this might very well be the
> last ERROR that I ever have to demote to a LOG to harden nbtree
> VACUUM.
Pushed t
id you know that ginInsertCleanup() is the only code that uses
heavyweight page locks these days? Though only on the index metapage!
Isn't this the kind of thing that VACUUM's relation level lock is
supposed to take care of?
--
Peter Geoghegan
addressed come up very infrequently in practice.
--
Peter Geoghegan
On Fri, Jun 16, 2023 at 2:15 PM Peter Geoghegan wrote:
> Attached patch adds additional hardening to nbtree page deletion. It
> makes nbtree VACUUM tolerate a certain sort of cross-page
> inconsistencies in the structure of an index (corruption). VACUUM can
> press on, avoiding
On Mon, Jun 19, 2023 at 4:28 PM Peter Geoghegan wrote:
> We still fall short when it comes to handling boundary cases optimally
> during backwards scans. This is at least true for a subset of
> backwards scans that request "goback=true" processing inside
> _bt_first. A
real
behavioral change takes place in _bt_first, the higher level calling
code. It has been taught to set its insertion/initial positioning
scankey's pivotsearch/goback field to "true" in the patch. Before now,
this option was exclusively during VACUUM, for page deletion. It turns
out that so-called "pivotsearch" behavior is far more general than
currently supposed.)
Thoughts?
--
Peter Geoghegan
v1-0001-Add-nbtree-goback-boundary-case-optimization.patch
Description: Binary data
ule enforced by the assertion.
--
Peter Geoghegan
ch to
page deletion (after 2014 commit efada2b8e9).
--
Peter Geoghegan
v1-0001-nbtree-VACUUM-cope-with-topparent-inconsistencies.patch
Description: Binary data
On Fri, Jun 9, 2023 at 12:23 PM Peter Geoghegan wrote:
> > I'm not sure there is that concensus (for me half the changes shouldn't be
> > done, the rest should be in 17), but in the end it doesn't matter that much.
I pushed this just now. I have also closed out the open item.
>
at all to GiST, even though
the relevant parts of GiST are heavily based on nbtree. Did you just
forget to plaster similar heaprel arguments all over GiST and SP-GiST?
I'm really disappointed that you're still pushing back here, even
after I got a +1 on backpatching from Heikki. This should have been
straightforward.
--
Peter Geoghegan
te coordinate,
> >
> > int sortopt);
>
> I think we should continue to provide the table here, even if we don't need it
> today.
I don't see why, but okay. I'll do it that way.
--
Peter Geoghegan
Interactions like that tend to be really pernicious -- they
lead to bad performance that goes unnoticed and unfixed because the
problem effectively camouflages itself. It may even be easier to make
the conservative (perhaps paranoid) assumption that weird nasty
interactions will cause harm somewhere down the line...why take a
chance?
I might end up prototyping this myself. I may have to put my money
where my mouth is. :-)
--
Peter Geoghegan
On Thu, Jun 8, 2023 at 4:38 PM Peter Geoghegan wrote:
> This is conceptually a "mini bitmap index scan", though one that takes
> place "inside" a plain index scan, as it processes one particular leaf
> page. That's the kind of design that "plain index scan vs
future someone.
Right. You probably noticed that this is another case where we'd be
making index scans behave more like bitmap index scans (perhaps even
including the downsides for kill_prior_tuple that accompany not
processing each leaf page inline). There is probably a point where
th
ry where bitmapscan is terrible
> (much worse than seqscan, in fact), and the patch is a massive
> improvement over master (about an order of magnitude).
>
> Of course, if you only scan a couple rows, the benefits are much more
> modest (say 40% for 100 rows, which is still significant).
Nice! And, it'll be nice to be able to use the kill_prior_tuple
optimization in many more cases (possible by teaching the optimizer to
favor index scans over bitmap index scans more often).
--
Peter Geoghegan
tached is v4, which goes back to using "heaprel" in new-to-16 code.
As a result, it is slightly smaller than v3.
My new plan is to commit this tomorrow, since the clear consensus is
that we should go ahead with this for 16.
--
Peter Geoghegan
v4-0001-nbtree-Allocate-new-pages-in-separate-function.patch
Description: Binary data
t. I had no intention of making a fuss about it, but then I
never expected this push back.
--
Peter Geoghegan
significant
robustness advantages. Maybe they do, but it's hard to say either way
because these benefits only apply "when the impossible happens". In
any given case it's reasonable to wonder if the user was protected by
our multi-process architecture, or protected by dumb luck. Could even
be both.
--
Peter Geoghegan
ITY() contains its own START_CRIT_SECTION(),
despite not being involved in WAL logging. And so critical sections
could indeed be described as something that we use whenever shared
memory cannot be left in an inconsistent state (which often coincides
with WAL logging, but need not).
--
Peter Geoghegan
one and only choke point where
new pages/buffers can be allocated by nbtree, and the only possible
source of recovery conflicts during REDO besides opportunistic
deletion record conflicts -- so it really isn't strange for _bt_search
callers to be thinking about whether _bt_allocbuf is safe to call.)
--
icular issue, the error is "right sibling's
left-link doesn't match". Per:
https://stackoverflow.com/questions/49307292/error-in-postgresql-right-siblings-left-link-doesnt-match-block-5-links-to-8
--
Peter Geoghegan
=on; $QUERY". You'll see lots of LOG messages with
specific information about the use of abbreviated keys and the
progress of each sort.
Thanks
--
Peter Geoghegan
On Sun, May 28, 2023 at 9:34 AM Peter Geoghegan wrote:
> I'll try to come up with a standard abi-compliance-checker Postgres
> workflow once I'm back from pgCon.
Ideally, we'd be able to produce reports that cover an entire stable
release branch at once, including details about how
at that time (this was with glibc). Low cardinality inputs were more
like 2.5x.
I believe that ICU is faster than glibc in general -- even with
TRUST_STRXFRM enabled. But the TRUST_STRXFRM thing is bound to be the
most important factor here, by far.
--
Peter Geoghegan
of ABI compatibility in stable releases, without any real downside,
which is encouraging. I have spent very little time on this, so it's
quite possible that some detail or other was overlooked.
--
Peter Geoghegan
tical/actual x86_64 ABI breaks in each point release. I'd
appreciate having greater visibility into these issues.
[1] https://github.com/lvc/abi-dumper
[2] https://manpages.debian.org/unstable/abi-dumper/abi-dumper.1.en.html
--
Peter Geoghegan
Title: libTest: X to Y compatibility report
AP
https://postgr.es/m/CAH2-Wz=jgryxwm74g1khst0znpunhezyjnvsjno2t3jswtb...@mail.gmail.com
--
Peter Geoghegan
er tuplesort, no matter what.
This has nothing to do with any underlying implementation detail from
nbtree, or from any other index AM.
--
Peter Geoghegan
On Fri, May 26, 2023 at 10:28 AM Peter Geoghegan wrote:
> I've added several defensive assertions that make it hard to get the
> details wrong. These will catch the issue much earlier than the main
> "heapRel != NULL" assertion in _bt_allocbuf(). So, the rules are
> reas
h the issue much earlier than the main
"heapRel != NULL" assertion in _bt_allocbuf(). So, the rules are
reasonably straightforward and enforceable.
--
Peter Geoghegan
function. This structure seems like a clear
improvement, since such logging is largely the point of having a
separate _bt_allocbuf() function that deals with new page allocations
and requires a valid heapRel in all cases.
v2 also renames "heaprel" to "heapRel" in function signa
that Bertand would have done it this way to begin with
were it not for the admittedly pretty bad nbtree convention around
P_NEW. It would be nice to get rid of P_NEW in the near future, too --
I gather that there was discussion of that in the context of recent
work in this area.
--
Peter Geoghegan
v1-0001
On Mon, May 22, 2023 at 10:59 AM Peter Geoghegan wrote:
> Attached is v2, which does it that way. It also adjusts the approach
> taken to release locks and pins when the left sibling validation check
> fails.
I pushed this just now, backpatching all the way.
> Not including a rev
On Mon, May 22, 2023 at 9:22 AM Peter Geoghegan wrote:
> > This comment notes that this is similar to what we did with the left
> > sibling, but there isn't really any mention at the left sibling code
> > about avoiding hard ERRORs. Feels a bit backwards. Maybe move the
> >
ed. I'm unsure.
> ERRCODE_NO_DATA doesn't look right. Let's just leave out the errcode.
Agreed.
--
Peter Geoghegan
ut the fix that I've come up with is very
well targeted. It seems just about impossible for it to affect any
user that didn't already have a serious problem (without the fix).
--
Peter Geoghegan
v1-0001-nbtree-VACUUM-cope-with-right-sibling-link-corrup.patch
Description: Binary data
v1-0002-a
ree pg_walinspect items now.
The wording for this item as it appears in the patch is: "Improve
descriptions of pg_walinspect WAL record descriptions". I suggest the
following wording be used instead: "Provide more detailed descriptions
of certain WAL records in the output of pg_walinspect and pg_waldump".
--
Peter Geoghegan
d) is unrelated
to all of the other changes. Plus it's just not very important.
> Okay, I went with:
>
> Improve descriptions of pg_walinspect WAL record descriptions
> (Melanie Plageman, Peter Geoghegan)
>
> > Note also that the item "Add pg_waldump option --s
y related to
pg_get_wal_block_info(), since you can also get FPIs using
pg_get_wal_block_info() (in fact, that was originally its main
purpose). I'm not saying that you necessarily need to connect them
together in any way, but you might consider it.
--
Peter Geoghegan
On Sun, May 14, 2023 at 1:59 PM Peter Geoghegan wrote:
> Have you read the documentation in question recently? The first two
> paragraphs, in particular:
>
> https://www.postgresql.org/docs/devel/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
>
> As I keep pointing out, we l
1. Fixing this
in Postgres is long overdue.
--
Peter Geoghegan
they
must already think about it. That is, to prominently point out that
"wraparound" actually refers to a protective mode of operation where
XID allocations are temporarily disallowed. And not pretty much
nothing to do with "wraparound" of the kind that the user may be
familiar with from other contexts. Including (and especially) all
earlier versions of the Postgres docs.
--
Peter Geoghegan
same generic policy that controls which
pages are frozen."
Now, since freezing works at the level of physical heap pages in 16,
the thing that triggers aggressive VACUUM matters less (just as the
thing that triggers freezing of individual pages matters much less --
freezing is freezing). There is minimal risk of freezing the same page
3 times during each of 3 different aggressive VACUUMs. To a much
greater extent, 3 aggressive VACUUMs isn't that different to only 1
aggressive VACUUM for those pages that were already "settled" from the
start. As a result, the addition of page-level freezing made
vacuum_freeze_min_age somewhat less bad -- in 16, its behavior was a
little less dependent on the phase of the moon (especially during
aggressive VACUUMs).
I really value stuff like that -- cases where you as a user can think
of something as independent to some other thing that you also need to
tune. There needs to be a lot more such improvements, but at least we
have this one now.
--
Peter Geoghegan
On Thu, May 11, 2023 at 1:40 PM Peter Geoghegan wrote:
> Just to be clear, I am not proposing changing the name of
> anti-wraparound autovacuum at all. What I'd like to do is use a term
> like "XID exhaustion" to refer to the state that we internally refer
> to as xidSt
y, my consistent experience (particularly back in my Heroku
days) has been that people imagine that data corruption would happen
when the system reached what we'd call xidStopLimit. Can you blame
them for thinking that? Almost any name for xidStopLimit that doesn't
have that historical baggage seems likely to be a vast improvement.
--
Peter Geoghegan
On Wed, May 3, 2023 at 2:59 PM Peter Geoghegan wrote:
> Coming up with a new user-facing name for xidStopLimit is already on
> my TODO list (it's surprisingly hard). I have used that name so far
> because it unambiguously refers to the exact thing that I want to talk
> about whe
HINT about single user mode).
Of course, that other patch is closely related to this patch -- the
precise boundaries are unclear at this point. In any case I think that
this should happen, because I think that it's a good idea.
> There are a few other small things I noticed along the way but my goal was to
> look at the overall structure.
Thanks again! This is very helpful.
[1]
https://www.postgresql.org/message-id/flat/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg%40mail.gmail.com
--
Peter Geoghegan
to one of my actual
flaws. I'm not a petty man -- I don't resent the success of others.
I've always thought that you do rather good work. Plus I'm just not in
the habit of obstructing things that I directly benefit from.
--
Peter Geoghegan
es/m/CAFBsxsGJMp43QO2cLAh0==ueYVL35pbbEHeXZ0cnZkU=q8s...@mail.gmail.com
--
Peter Geoghegan
so based in part on previous discussions).
John's volley of abuse seemed to come from nowhere at all.
--
Peter Geoghegan
ld does that amount to 95% of my review, or anything like it?
--
Peter Geoghegan
ed up until now -- that
much is clear. To have you talk to me like this when I'm working on
such a difficult, thankless task is a real slap in the face.
> 3. Claim that others are holding you back, and then try to move the goalposts
> in their work.
When did I say that? When did I even suggest it?
--
Peter Geoghegan
On Mon, May 1, 2023 at 7:55 PM Peter Geoghegan wrote:
> Obviously there are certain things that can hold back OldestMXact by a
> wildly excessive amount. But I don't think that there is anything that
> can hold back OldestMXact by a wildly excessive amount that won't more
> or less
EZE sometimes allocates new Multis, just to be able to do that.
Obviously there are certain things that can hold back OldestMXact by a
wildly excessive amount. But I don't think that there is anything that
can hold back OldestMXact by a wildly excessive amount that won't more
or less do the same thing to OldestXmin.
--
Peter Geoghegan
-bit XID space. Today we compare 64-bit XIDs using
simple unsigned integer comparisons. That's the same way that 32-bit
XID comparisons worked before freezing was invented in 2001. So it
really does seem like the natural way to explain it.
--
Peter Geoghegan
On Mon, May 1, 2023 at 9:16 AM Peter Geoghegan wrote:
> On Mon, May 1, 2023 at 9:08 AM Robert Haas wrote:
> > I disagree. If you start the cluster in single-user mode, you can
> > actually wrap it around, unless something has changed that I don't
> > know about.
>
>
usly. The main reason to frame it this way is because
it seems to make the material easier to understand.
--
Peter Geoghegan
me vague awareness of truncated XIDs
being insufficient at some point is all you really need, even if
you're an advanced user.
--
Peter Geoghegan
blem is a leaked
prepared transaction, or something along those lines. That is
increasingly likely to turn out to be the underlying cause of entering
xidStopLimit, given the work we've done on VACUUM over the years. I
still think that "imbalance" is the right way to frame discussion of
xidStopLimit. After all, autovacuum/VACUUM will still spin its wheels
in a futile effort to "restore balance". So it's kinda still about
restoring imbalance IMV.
--
Peter Geoghegan
e
problem gets completely out of hand". Naturally we'd link to this new
section from "Routine Vacuuming". What do you think of that general
approach?
> This is good information, but I wonder about:
> (Various points)
That's good feedback. I'll get to this in a couple of days.
--
Peter Geoghegan
ay of
> > not fixing it.)
>
> If it helps, I've gone ahead with some testing and polishing on that, and
> it's close to ready, I think (CC'd you). I'd like that piece to be separate
> and small enough to be backpatchable (at least in theory).
That's great news. Not least because it unblocks this patch series of mine.
--
Peter Geoghegan
ge, and just making a long hint even
> longer doesn't seem worth doing. I'd like to set that aside and come back to
> it. I've left it out of the attached set.
Yeah, 0003 can be treated as independent work IMV.
--
Peter Geoghegan
On Thu, Apr 27, 2023 at 11:17 AM Peter Geoghegan wrote:
> I'm asking this (at least in part) because it affects the answer. Lots
> of stuff that GIN does that seems like it would be particularly tricky
> to integrate with a system catalog is non-essential. It could be (and
&g
ntial. It could be (and
sometimes is) selectively disabled. Whereas B-Tree indexes don't
really have any optional features (you can disable deduplication
selectively, but I believe that approximately nobody ever found it
useful to do so).
--
Peter Geoghegan
On Thu, Apr 20, 2023 at 10:56 AM Pavel Borisov wrote:
> It's much deserved! Congratulations, Nathan, Amit and Masahiko!
Congratulations to all three!
--
Peter Geoghegan
standard, there should
also be significant wiggle-room. Kind of like with the guidelines for
rmgr desc authors discussion.
--
Peter Geoghegan
t generalizes across all users quite well. This doesn't seem
particularly subjective.
--
Peter Geoghegan
On Tue, Apr 18, 2023 at 11:10 PM Michael Paquier wrote:
> Yeah, I agree that your suggestion is more useful for debugging when a
> record includes both a block image and some data associated to it.
> So, +1.
Okay, pushed that fix just now.
--
Peter Geoghegan
On Sat, Apr 15, 2023 at 5:15 PM Peter Geoghegan wrote:
> ISTM that b6a0d469ca has created an unmet need for a "--suite
> setup-running", which is analogous to "--suite setup" but works with
> "--setup running". That way there'd at least be a
> "post
On Tue, Apr 18, 2023 at 4:30 PM Peter Geoghegan wrote:
> > I'd be interested to know if you could tell me if SKIP_LOCKED has more
> > importance than INDEX_CLEANUP, for example. If you can, it would seem
> > like trying to say apples are more important than oranges, or
> >
for example. If you can, it would seem
> like trying to say apples are more important than oranges, or
> vice-versa.
I don't accept your premise that the only thing that matters (or the
most important thing) is adherence to some unambiguous and consistent
order.
--
Peter Geoghegan
to summarize that also happen to have an FPI that the
REDO routine isn't supposed to apply (i.e. an FPI that is included in
the record purely so that verifyBackupPageConsistency can verify that
the REDO routine produces a matching image).
Attached patch fixes this bug.
--
Peter Geoghegan
v1-0001
FREEZE, and VERBOSE all come
first. Those options are approximately the most important options --
especially VERBOSE. But your patch places VERBOSE dead last.
--
Peter Geoghegan
in the REDO routines that shouldn't be performed in
the locked-updated-tuple case -- I had to invent XLH_LOCK_UPDATED to
deal with the issue. There may be a better approach there, but I
haven't thought about it in enough detail to feel confident either
way.
--
Peter Geoghegan
v1-0001-Remove
rd).
So even if we thought that the situation with strxfrm() had improved,
we'd still have little motivation to do anything about it.
--
Peter Geoghegan
301 - 400 of 3113 matches
Mail list logo