ing about the B-Tree on-disk format -- right?
--
Peter Geoghegan
in
the same way as the earliest version. This is the same as v4 in every
other way. Perhaps you can test this.
--
Peter Geoghegan
v5-0001-Avoid-pipeline-stall-in-_bt_compare.patch
Description: Binary data
v5-0002-Inline-_bt_compare.patch
Description: Binary data
hed a fix for this.
Thanks
--
Peter Geoghegan
On Thu, Jan 23, 2020 at 5:31 PM Peter Geoghegan wrote:
> In summary: I suppose that we can also solve "the cousin problem"
> quite easily, but only for rightmost cousins within a subtree --
> leftmost cousins might be too messy to verify for it to be worth it.
> We don't wa
t;absolute" negative infinity in the leftmost leaf page on the leaf
level, and "subtree relative" negative infinity for other leaf pages
that are merely leftmost within a subtree).
In summary: I suppose that we can also solve "the cousin problem"
quite easily, but only for rightmost cousins within a subtree --
leftmost cousins might be too messy to verify for it to be worth it.
We don't want to have to jump two or three levels up within
bt_downlink_connectivity_check() just for leftmost cousin pages. But
maybe you're feeling ambitious! What do you think?
Note: There is an existing comment about this exact negative infinity
business within bt_downlink_check(). It starts with "Negative
inifinity items can be thought of as a strict lower bound that works
transitively...". There should probably be some comment updates to
this comment block as part of this patch.
[1]
https://pdfs.semanticscholar.org/fd45/15ab23c00231d96c95c1091459d0d1eebfae.pdf
--
Peter Geoghegan
On Thu, Jan 23, 2020 at 5:40 PM Peter Geoghegan wrote:
> I suppose the alternative is to get the high key of the parent's left
> sibling, rather than going to the parent's parent (i.e. our
> grandparent). That would probably be the best way to get a separator
> key to compare against
is a considerable
undertaking, since you need to use prefix compression to get a high
entropy abbreviated key. It would probably take me the best part of a
whole release cycle to write such a patch. The attached patches get
us a relatively easy win in the short term, though.
Thoughts?
--
Peter Geoghegan
extent, it's okay
because the heuristics are self-limiting -- we can only make an
incorrect inference about what to do because we were unlucky, but
there is no reason to think that we'll consistently make the wrong
choice. It feels a little bit like quicksort to me.
--
Peter Geoghegan
ason to treat each case differently.
Again, maybe I'm making an excessively thin distinction. I really want
to be able to enable the feature everywhere, while also not getting
even one complaint about it. Perhaps that's just not a realistic or
useful goal.
--
Peter Geoghegan
a good idea of the benefits in one specific case.
For this prototype, I can either not do prefix compression to get a
high entropy abbreviated key, or do the prefix compression in a way
that is totally inflexible, but still works well enough for this
initial test workload.
My estimate is that it would take me 4 - 6 weeks to write a prototype
along those lines. That isn't so bad.
--
Peter Geoghegan
cond time for completeness.
--
Peter Geoghegan
From 8f55bcedaa9c109543627e9c785dab0ffb0e5c75 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan
Date: Wed, 22 Jan 2020 20:59:57 -0800
Subject: [PATCH v2 3/3] Remove "negative infinity" check from _bt_compare().
---
src/backend/access/nbtr
atch LGTM.
Thanks
--
Peter Geoghegan
On Wed, Jan 29, 2020 at 11:50 AM Peter Geoghegan wrote:
> I should stop talking about it for now, and go back to reassessing the
> extent of the regression in highly unsympathetic cases. The patch has
> become faster in a couple of different ways since I last looked at
> this questi
ng?
[1]
https://postgr.es/m/CAH2-Wzmb_QOmHX=uWjCFV4Gf1810kz-yVzK6RA=VS41EFcKh=g...@mail.gmail.com
--
Peter Geoghegan
ozen. If you're adding a new
capability to logtape.c, it makes sense to be clear on the
requirements on tapeset state or individual tape state.
--
Peter Geoghegan
) seems to work in a way that assumes that the
tape is frozen. It would be good to document that assumption, and
possible enforce it by way of an assertion. The same remark applies to
any other assumptions you're making there.
--
Peter Geoghegan
two tests as part of
performing this third test, out of general paranoia. Intel seem to
roll out a microcode update for a spectre-like security issue about
every other day.
Thanks again
--
Peter Geoghegan
On Thu, Jan 30, 2020 at 2:13 PM Peter Geoghegan wrote:
> My approach to showing the downsides of the patch wasn't particularly
> obvious, or easy to come up with. I could have contrived a case like
> the insert benchmark, but with more low cardinality non-unique
> indexes.
So
, but that isn't taken into account. I think that the patch will
tend to bring B-Tree indexes closer to heap tables in terms of their
overall sensitivity to how frequently VACUUM runs.
--
Peter Geoghegan
ize.
--
Peter Geoghegan
On Thu, Jan 30, 2020 at 11:16 AM Peter Geoghegan wrote:
> I prefer to think of the patch as being about improving the stability
> and predictability of Postgres with certain workloads, rather than
> being about overall throughput. Postgres has an ungoing need to VACUUM
> indexe
lute overhead of tapes was not that important
back in 2016. Using many tapes within tuplesort.c can never happen
anyway (with the 500 MAXORDER cap). Maybe the use of logtape.c by hash
aggregate changes the picture there now. Even if it doesn't, I still
think that your patch is a good idea.
--
Peter Geoghegan
b3d68f9. That can
perform very badly when the input is almost sorted, but has a few
tuples that are out of order towards the end. (I have called these
"banana skin tuples" in the past.)
--
Peter Geoghegan
On Thu, Jan 16, 2020 at 12:05 PM Peter Geoghegan wrote:
> > It does seem odd to me to treat them differently, but it's possible
> > that this is a reflection of my own lack of understanding. What do
> > other database systems do?
>
> Other database systems treat unique
gt; in the mix, too. I'd still not wish to rely on this for debugging.
I agree that there are a lot of moving pieces here. I wouldn't like to
have to rely on this working myself.
--
Peter Geoghegan
mething along those lines.
--
Peter Geoghegan
s works against the goal of
having good metadata about patches that are worked on over multiple
releases or years. We have a fair few of those.
--
Peter Geoghegan
ally pingpong back and forth between the same two patches on
each thread as e-mail is sent to each thread, without anybody ever
posting a new patch.
Thanks
[1] https://commitfest.postgresql.org/27/2429/#
[2] https://commitfest.postgresql.org/27/2202/
--
Peter Geoghegan
to debug this today, but I suspect that
the problem is that the lowkey thing is kind of brittle. If you can't
figure it out, let me know and I'll help with it.
* Can we reverse the order here, so that the common case (i.e. the
offset_is_negative_infinity() case) comes first?:
> + if (offset_is_negative_infinity(topaque, pivotkey_offset))
> + {
> + /*
> +* We're going to try match child high key to "negative
> +* infinity key". That means we should match to high key of
> +* left sibling of target page.
> +*/
*** SNIP ***
> + }
> + else
> + {
*** SNIP ***
> + }
* Can the offset_is_negative_infinity() branch explain what's going on
at a much higher level than this?
--
Peter Geoghegan
repeat
something like the previous experiments with v3, Floris. master vs v3
(both patches together). With a variable number of clients.
Thanks
--
Peter Geoghegan
v3-0002-Inline-_bt_compare.patch
Description: Binary data
v3-0001-Avoid-pipeline-stall-in-_bt_compare.patch
Description: Binary data
n it. Just like the
decision to enable it by default. It will work this way in the
committed version, but that isn't supposed to be the final word on it.
--
Peter Geoghegan
something went wrong.
You're probably right about that. I just wish that there was some way
of showing the same information that was discoverable, and didn't
require the use of pageinspect. If I make it a DEBUG1 message, then it
cannot really be documented.
--
Peter Geoghegan
rchitectural analysis
before proceeding with commit. This is still unsettled.
--
Peter Geoghegan
On Thu, Feb 20, 2020 at 10:58 AM Peter Geoghegan wrote:
> I think that there is a good chance that it just won't matter. The
> number of indexes that won't be able to support deduplication will be
> very small in practice. The important exceptions are INCLUDE indexes
> and nond
e master branch is the easy way of
resolving the situation, even if we don't really expect anyone to use
it.
--
Peter Geoghegan
pe comparison code?
--
Peter Geoghegan
binary string). Note also that a "one pass"
representation that we can just memcmp() will have to be significantly
larger in some cases, especially when collatable text is used. A
strxfrm() blob is typically about 3.3x larger than the original string
IIRC.
--
Peter Geoghegan
nce
of being noticed and becoming a big distraction in all sorts of ways.
That might happen anyway, but I think it's less likely this way.
ISTM that the smart thing to do is to ignore it completely. Don't even
try to preempt a silly headline written by some tech journalist
wiseacre.
--
Peter Geoghegan
On Thu, Feb 13, 2020 at 1:34 PM Andrew Dunstan
wrote:
> I also object because 20 is *my* unlucky number ...
I don't think we're going to do this, so you don't have to worry on that score.
--
Peter Geoghegan
On Sat, Feb 8, 2020 at 6:50 PM Peter Geoghegan wrote:
> My working assumption is that I only need to care about
> opclass-declared input data types (pg_opclass.opcintype), plus the
> corresponding collations -- the former can be used to lookup an
> appropriate pg_amproc entry (i.e. B-
On Wed, Feb 19, 2020 at 2:45 PM Tom Lane wrote:
> The buildfarm would only tell you if it compiles, not whether the
> performance results are what you hoped for.
Right. Plus I think that more granular control might make more sense
in this particular instance anyway.
--
Peter Geoghegan
On Wed, Feb 19, 2020 at 11:16 AM Peter Geoghegan wrote:
> On Wed, Feb 19, 2020 at 8:14 AM Anastasia Lubennikova
> wrote:
> > Thank you for this work. I've looked through the patches and they seem
> > to be ready for commit.
> > I haven't yet read recent documentation and
ugh you really need some kind of prefix
compression to make that effective.
--
Peter Geoghegan
fest.postgresql.org/24/2202/
[2] https://github.com/mdcallag/mytools/blob/master/bench/ibench/iibench.py
[3]
https://www.postgresql.org/message-id/flat/CAH2-Wz%3DSfAKVMv1x9Jh19EJ8am8TZn9f-yECipS9HrrRqSswnA%40mail.gmail.com#b20ead9675225f12b6a80e53e19eed9d
--
Peter Geoghegan
don't recall hearing any complaints about
that, but it still doesn't seem great.
--
Peter Geoghegan
On Thu, Jan 16, 2020 at 5:11 PM Peter Geoghegan wrote:
> I find this argument convincing. I'll try to get this committed soon.
>
> While you could have used bt_index_parent_check() or heapallindexed to
> detect the issue, those two options are a lot more expensive (plus the
> form
split" problem from two completely different angles, with an
overall effect that is greater than the sum of its parts.
While the difference in size of each filler index on the master branch
wasn't that significant on its own, it's still interesting. It's
probably quite workload dependent.
--
Peter Geoghegan
f at all.
I think it would be fine if the deduplication docs (not the docs for
this infrastructure) just pointed out specific cases that we *cannot*
support -- there are not many exceptions (numeric, text with a
nondeterministic collation, a few others like that).
--
Peter Geoghegan
On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra
wrote:
> Understood. Is that a reason to not commit of this patch now, though?
It could use some polishing. Are you interested in committing it?
--
Peter Geoghegan
On Fri, Jan 10, 2020 at 1:36 PM Peter Geoghegan wrote:
> * HEIKKI: Do we only generate one posting list in one WAL record? I
> would assume it's better to deduplicate everything on the page, since
> we're modifying it anyway.
Still thinking about this one.
> * HEIKKI: Does xl_btre
On Wed, Jan 22, 2020 at 10:55 AM Peter Geoghegan wrote:
> This is a good summary. This is the kind of scenario I had in mind
> when I expressed a general concern about "stopping between pages".
> Processing a whole page at a time is a crucial part of how
> _bt_read
rtion. It's
good that it verifies what is actually a fragile assumption, even
though I'd prefer to not make a fragile assumption.
--
Peter Geoghegan
or a time. Also, I think that this will only be safe with MVCC scans,
because otherwise the page could be concurrently deleted by VACUUM.
--
Peter Geoghegan
On Fri, Jan 17, 2020 at 5:43 PM Peter Geoghegan wrote:
> I tried to come up with a specific example of how this could be
> unsafe, but my explanation was all over the place (this could have had
> something to do with it being Friday evening). Even still, it's up to
> the patch to just
the executor
(but apparently they're not, so no need to worry about it). This may
have been mentioned somewhere already. If it is then I must have
missed it.
--
Peter Geoghegan
ck for races, possibly returning to the original page when we
think that we might have overlooked something and need to make sure).
It's an exception to the general rule, you could say.
--
Peter Geoghegan
On Wed, Jan 22, 2020 at 10:55 AM Peter Geoghegan wrote:
> On Tue, Jan 21, 2020 at 11:50 PM Floris Van Nee
> wrote:
> > As far as I know, a page split could happen at any random element in the
> > page. One of the situations we could be left with is:
> > Leaf page 1
k we'd run into torn page issues, if we
> were to do so without WAL logging, because we'd rely on the LP_DEAD bits
> and BTPageOpaqueData.xact to be in sync. I *think* we might be safe to
> do so *iff* the page's LSN indicates that there has been a WAL record
> covering it since the last redo location.
That sounds like a huge mess.
--
Peter Geoghegan
On Thu, Jan 16, 2020 at 10:55 AM Robert Haas wrote:
> On Wed, Jan 15, 2020 at 6:38 PM Peter Geoghegan wrote:
> > There are some outstanding questions about how B-Tree deduplication
> > [1] should be configured, and whether or not it should be enabled by
> > default. I'm sta
orth avoiding wherever possible.
--
Peter Geoghegan
On Mon, Jan 13, 2020 at 8:47 PM Andrey Borodin wrote:
> > 11 янв. 2020 г., в 7:49, Peter Geoghegan написал(а):
> > I'm curious why Andrey's corruption problems were not detected by the
> > cross-page amcheck test, though. We compare the first non-pivot tuple
> > on the
letes only
> LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do
> you have some feedback for this ?
It sounds like the design of the patch relies on doing something other
than stopping a scan "between" pages, in the sense that is outlined in
the commit message of commit 09cb5c0e. If so, then that's a serious
flaw in its design.
--
Peter Geoghegan
On Mon, Jan 20, 2020 at 1:19 PM Peter Geoghegan wrote:
> On Mon, Jan 20, 2020 at 11:01 AM Jesper Pedersen
> wrote:
> > > - nbtsearch.c _bt_skip line 1440
> > > if (BTScanPosIsValid(so->currPos) &&
> > > _bt_scankey_within_page(scan,
On Wed, Jan 8, 2020 at 5:56 AM Heikki Linnakangas wrote:
> On 04/01/2020 03:47, Peter Geoghegan wrote:
> > Attached is v28, which fixes bitrot from my recent commits to refactor
> > VACUUM-related code in nbtpage.c.
>
> I started to read through this gigantic patch.
Oh come
On Thu, Jan 2, 2020 at 6:42 AM Robert Haas wrote:
> On Mon, Dec 30, 2019 at 6:58 PM Peter Geoghegan wrote:
> > I propose that we adopt the following definition: For an operator
> > class to be safe, its equality operator has to always agree with
> > datum_image_eq()
On Wed, Jan 1, 2020 at 1:00 PM Peter Geoghegan wrote:
> Attached patch shows what I have in mind. The new comment block has
> been copied from _bt_delitems_vacuum().
I also think that the WAL record and function signature of
_bt_delitems_delete() should be brought closer to
_bt_delitems_
h/build/src/bin'
GNUmakefile:70: recipe for target 'check-world-src/bin-recurse' failed
make: *** [check-world-src/bin-recurse] Error 2
--
Peter Geoghegan
On Fri, Jan 3, 2020 at 10:39 PM Tom Lane wrote:
> Attached is a blind attempt to fix this by allowing escape
> sequence(s) instead of spaces between the words. Does this
> work for you?
I'm afraid not; no apparent change. No change in the "Actual output
was" line, either.
--
Peter Geoghegan
m curious also what is your prevailing setting
> of TERM?
I use zsh, with a fairly customized setup. $TERM is "xterm-256color"
in the affected shell. (I have a feeling that this has something to do
with my amazing technicolor terminal.)
--
Peter Geoghegan
ackage for who-knows-what
reason. This is indicated by the fact that the Version string is
"6.1-1ubuntu1.18.04", as opposed to a Debian style Version without the
"ubuntu" (I believe that that's the convention they follow).
--
Peter Geoghegan
On Fri, Jan 3, 2020 at 6:51 PM Tom Lane wrote:
> Hmm. If you set it to plain "xterm", does the test pass?
No. Also tried setting PG_COLOR="off" and CLICOLOR=0 -- that also
didn't help. (This was based on possibly-relevant vars that "env"
showed were set).
--
Peter Geoghegan
On Fri, Jan 3, 2020 at 7:06 PM Peter Geoghegan wrote:
> No. Also tried setting PG_COLOR="off" and CLICOLOR=0 -- that also
> didn't help. (This was based on possibly-relevant vars that "env"
> showed were set).
Removing the single check_completion() test from 010_ta
rchitecture
Description
+++--===-===-==
ii libtinfo5:amd64 6.1-1ubuntu1.18.04 amd64
shared low-level terminfo library for terminal handling
--
Peter Geoghegan
.e. whether
they've been placed by the FSM for recycling) by using
contrib/pg_freespacemap.
--
Peter Geoghegan
n insertion scankey.
We only need to determine whether or not the optimization is safe at
CREATE INDEX time. (Actually, I don't want to commit to the idea that
nbtree should only call this support function at CREATE INDEX time
right now. I'm sure that it will hardly ever need to be called,
though.)
--
Peter Geoghegan
postgresql.org/docs/devel/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON
[2]
https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a%402ndquadrant.com
--
Peter Geoghegan
copied from _bt_delitems_vacuum().
--
Peter Geoghegan
From a383d30b3c0404ea0095ea8f130214f4c8dd4dd9 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan
Date: Wed, 1 Jan 2020 12:30:26 -0800
Subject: [PATCH] Associate LP_DEAD offsets with WAL record's buffer.
Commit 558a9165e08 taught _bt_delitems_delete
On Sat, Jan 4, 2020 at 6:17 PM Tom Lane wrote:
> Cool, I'll go commit a fix along those lines. Thanks for tracing
> this down!
Glad to help!
--
Peter Geoghegan
k that
it's fine to go that way, provided it isn't hard to work around.
FWIW, I find it very surprising that it was possible for the test to
fail on my workstation/server, without it failing on any buildfarm
animals.
--
Peter Geoghegan
ocs suggests that the
> best way to do that would be to set envar INPUTRC to /dev/null --- could
> you confirm that that works for you?
Yes -- "export INPUTRC=/dev/null" also makes it work for me (on HEAD +
HEAD-unset-TERM-in-tab-completion-test.patch, but with my
original/problem
7 wallclock secs ( 0.02 usr 0.00 sys + 0.36 cusr
0.12 csys = 0.50 CPU)
Result: FAIL
Makefile:87: recipe for target 'check' failed
make: *** [check] Error 1
--
Peter Geoghegan
HEAD-unset-TERM-in-tab-completion-test.patch
Description: Binary data
t there must be something about
> your colorized setup that triggers use of at least the first one.
> But why didn't clearing the relevant environment variables
> change anything?
I don't know. It also failed with bash, which doesn't have any of that stuff.
--
Peter Geoghegan
ow probability bug due to a major version change in
ncurses-base.
--
Peter Geoghegan
0.00 sys + 0.37 cusr
0.09 csys = 0.47 CPU)
Result: FAIL
Makefile:87: recipe for target 'check' failed
make: *** [check] Error 1
--
Peter Geoghegan
-tab-completion-test.patch:
set colored-completion-prefix on
set colored-stats on
--
Peter Geoghegan
s in the index structure.
>bt_index_parent_check follows the general
>convention of raising an error if it finds a logical
>inconsistency or other problem.
This is very close now. I would be okay with you committing the patch
once you deal with this feedback. If you prefer, I can take another
look at a new revision.
--
Peter Geoghegan
il of amcheck, but that doesn't apply here.
> Thank you. I'd like to have another feedback from you assuming there
> are logic changes.
This looks committable. I only noticed one thing: The comments above
bt_target_page_check() need to be updated to reflect the new check,
which no longer h
On Fri, Mar 6, 2020 at 11:00 AM Andres Freund wrote:
> > Pushed.
>
> Congrats!
Thanks Andres!
--
Peter Geoghegan
On Wed, Mar 11, 2020 at 2:02 AM Alexander Korotkov
wrote:
> Thank you! Pushed with this comment revised!
Thanks!
--
Peter Geoghegan
ther condition
of killing the item -- there is clearly no point in doing work to kill
an item that is already dead. I don't like the idea of using an
exclusive buffer lock (even if it's just with wal_log_hints = on),
though.
--
Peter Geoghegan
cribe? If it was,
then we can hope (and maybe even verify) that the Postgres 12 work
noticeably ameliorates the problem.
--
Peter Geoghegan
here. I
believe that Active Record will sometimes generate created_at columns
that sometimes end up containing NULL values. Not sure why.
--
Peter Geoghegan
On Thu, Apr 9, 2020 at 5:25 PM Peter Geoghegan wrote:
> Was this a low cardinality index in the way I describe? If it was,
> then we can hope (and maybe even verify) that the Postgres 12 work
> noticeably ameliorates the problem.
What I really meant was an index where hundreds or even
;night and day" level difference. And we're willing to account
for FPIs on the primary (and the LP_DEAD bits set there) just to be
able to also set LP_DEAD bits on the standby.
Right?
--
Peter Geoghegan
On Wed, Apr 8, 2020 at 6:45 PM David Fetter wrote:
> Thanks so much for your hard work managing this one!
+1
--
Peter Geoghegan
> modules.
There is also pg_hexedit:
https://github.com/petergeoghegan/pg_hexedit
--
Peter Geoghegan
mmit message of 6655a729 says that nbtree has had this
problem "since time immemorial". I am planning to work on that
problem, eventually.
--
Peter Geoghegan
age, which is converted into
bytes to form a tolerance). That's the way it's done in the attached
patch.
I plan to commit this patch next week, barring any objections.
--
Peter Geoghegan
0001-Redefine-split-interval-to-be-space-wise.patch
Description: Binary data
any
deadlock hazards. Unlinking the page from the tree itself (deleting)
is really complicated.
> I'm not sure if the current locking
> model assumes anywhere that there is only one process (vacuum) unlinking
> pages though?
I'm not sure, though _bt_unlink_halfdead_page() has comments supposing
that there could be concurrent page deletions like that.
--
Peter Geoghegan
ver. I
agree that it's just an accident that it works that way. VACUUM could
probably remember deleted pages, and then revisited those pages at the
end of the index vacuuming -- that might make a big difference in a
lot of workloads. Or it could chain them together as a linked list
which can b
901 - 1000 of 3113 matches
Mail list logo