using a static analysis tool?
--
Peter Geoghegan
ouldn't make any sense to fix it in this specific esoteric function,
which is called when we've already decided to split the page (but only
sometimes). Sanitization needs to happen at some central choke point.
> Yes,two static tools, but reviewed by me.
I strongly suggest confining all of this to a single thread, and
stating your reasoning upfront.
--
Peter Geoghegan
>stating your reasoning upfront.
> I don't know what that means.
Instead of starting new email threads for each issue, confine the
entire discussion to just one thread. This makes the discussion much
more manageable for everyone else. This is a high traffic mailing
list.
--
Peter Geoghegan
On Tue, Nov 12, 2019 at 3:21 PM Peter Geoghegan wrote:
> * Decided to go back to turning deduplication on by default with
> non-unique indexes, and off by default using unique indexes.
>
> The unique index stuff was regressed enough with INSERT-heavy
> workloads that I was put
port for XFS is rather immature.
How did you invoke pg_upgrade? Did you use the --link (hard link) option?
--
Peter Geoghegan
ctions will throw an error with an !indisvalid
index.)
--
Peter Geoghegan
On Fri, Nov 1, 2019 at 1:19 PM Robert Haas wrote:
> On Fri, Nov 1, 2019 at 3:15 PM Peter Geoghegan wrote:
> > I don't think that those two things are equivalent at all. There may
> > even be workloads that will benefit when run on 32-bit hardware.
> > Having to palloc
On Wed, Nov 20, 2019 at 1:43 PM Peter Geoghegan wrote:
> My understanding is that we can trust RecentGlobalXmin to be something
> useful and current during recovery, in general, so the selfuncs.c
> index-only scan (which uses SnapshotNonVacuumable + RecentGlobalXmin)
> can be trusted
#x27;ve also been able to observe increases of 15%-20% in
TPS with similar workloads (with commensurate reductions in query
latency) more recently. This was with a simple gaussian distribution
for pgbench_accounts.aid, and a non-unique index with deduplication
enabled on pgbench_accounts.abalance. (The patch helps control the
size of both indexes, especially the extra non-unique one.)
[1]
https://postgr.es/m/CAH2-WzkXHhjhmUYfVvu6afbojU97MST8RUT1U=hld2w-gc5...@mail.gmail.com
--
Peter Geoghegan
On Tue, Dec 17, 2019 at 5:18 PM Bruce Momjian wrote:
> On Tue, Dec 17, 2019 at 03:30:33PM -0800, Peter Geoghegan wrote:
> > With many real world unique indexes, the true reason behind most or
> > all B-Tree page splits is "version churn". I view these page splits as
>
On Thu, Dec 12, 2019 at 6:21 PM Peter Geoghegan wrote:
> Still waiting for some review of the first patch, to get it out of the
> way. Anastasia?
I plan to commit this first patch [1] in the next day or two, barring
any objections.
It's clear that the nbtree "pin scan" V
ut back the assignment of "opaque". The sequence of BufferGetPage()
> and PageGetSpecialPointer() is a very standard switch-our-attention-
> to-another-page locution in nbtree and other index AMs.
+1
--
Peter Geoghegan
operator class
BITWISE. Better to drop everything, and recreate everything, since
your indexes should be considered corrupt anyway. (Also, I don't think
that it's that hard to get it right, so this will probably never
happen.)
--
Peter Geoghegan
.e. whether
they've been placed by the FSM for recycling) by using
contrib/pg_freespacemap.
--
Peter Geoghegan
page -- we pass that down through an insertion scankey.
We only need to determine whether or not the optimization is safe at
CREATE INDEX time. (Actually, I don't want to commit to the idea that
nbtree should only call this support function at CREATE INDEX time
right now. I'm sure that it will hardly ever need to be called,
though.)
--
Peter Geoghegan
the same issue.
[1]
https://www.postgresql.org/docs/devel/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON
[2]
https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a%402ndquadrant.com
--
Peter Geoghegan
k has
been copied from _bt_delitems_vacuum().
--
Peter Geoghegan
From a383d30b3c0404ea0095ea8f130214f4c8dd4dd9 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan
Date: Wed, 1 Jan 2020 12:30:26 -0800
Subject: [PATCH] Associate LP_DEAD offsets with WAL record's buffer.
Commit 558a9165e08 taught _bt_
On Thu, Jan 2, 2020 at 6:42 AM Robert Haas wrote:
> On Mon, Dec 30, 2019 at 6:58 PM Peter Geoghegan wrote:
> > I propose that we adopt the following definition: For an operator
> > class to be safe, its equality operator has to always agree with
> > datum_image_eq() (i.
On Wed, Jan 1, 2020 at 1:00 PM Peter Geoghegan wrote:
> Attached patch shows what I have in mind. The new comment block has
> been copied from _bt_delitems_vacuum().
I also think that the WAL record and function signature of
_bt_delitems_delete() should be brought closer to
_bt_delitems_
'check-psql-recurse' failed
make[1]: *** [check-psql-recurse] Error 2
make[1]: Leaving directory '/code/postgresql/patch/build/src/bin'
GNUmakefile:70: recipe for target 'check-world-src/bin-recurse' failed
make: *** [check-world-src/bin-recurse] Error 2
--
Peter Geoghegan
t; I'm curious also what is your prevailing setting
> of TERM?
I use zsh, with a fairly customized setup. $TERM is "xterm-256color"
in the affected shell. (I have a feeling that this has something to do
with my amazing technicolor terminal.)
--
Peter Geoghegan
On Fri, Jan 3, 2020 at 6:51 PM Tom Lane wrote:
> Hmm. If you set it to plain "xterm", does the test pass?
No. Also tried setting PG_COLOR="off" and CLICOLOR=0 -- that also
didn't help. (This was based on possibly-relevant vars that "env"
showed were set).
--
Peter Geoghegan
On Fri, Jan 3, 2020 at 7:06 PM Peter Geoghegan wrote:
> No. Also tried setting PG_COLOR="off" and CLICOLOR=0 -- that also
> didn't help. (This was based on possibly-relevant vars that "env"
> showed were set).
Removing the single check_completion() test from 0
Architecture
Description
+++--===-===-==
ii libtinfo5:amd64 6.1-1ubuntu1.18.04 amd64
shared low-level terminfo library for terminal handling
--
Peter Geoghegan
On Fri, Jan 3, 2020 at 10:39 PM Tom Lane wrote:
> Attached is a blind attempt to fix this by allowing escape
> sequence(s) instead of spaces between the words. Does this
> work for you?
I'm afraid not; no apparent change. No change in the "Actual output
was" line, either.
--
Peter Geoghegan
ian package for who-knows-what
reason. This is indicated by the fact that the Version string is
"6.1-1ubuntu1.18.04", as opposed to a Debian style Version without the
"ubuntu" (I believe that that's the convention they follow).
--
Peter Geoghegan
n that there must be something about
> your colorized setup that triggers use of at least the first one.
> But why didn't clearing the relevant environment variables
> change anything?
I don't know. It also failed with bash, which doesn't have any of that stuff.
--
Peter Geoghegan
llate.linux.utf8". I think that
it's fine to go that way, provided it isn't hard to work around.
FWIW, I find it very surprising that it was possible for the test to
fail on my workstation/server, without it failing on any buildfarm
animals.
--
Peter Geoghegan
8
Non-zero exit status: 1
Files=1, Tests=12, 7 wallclock secs ( 0.01 usr 0.00 sys + 0.37 cusr
0.09 csys = 0.47 CPU)
Result: FAIL
Makefile:87: recipe for target 'check' failed
make: *** [check] Error 1
--
Peter Geoghegan
btle, low probability bug due to a major version change in
ncurses-base.
--
Peter Geoghegan
test: 8
Non-zero exit status: 1
Files=1, Tests=12, 7 wallclock secs ( 0.02 usr 0.00 sys + 0.36 cusr
0.12 csys = 0.50 CPU)
Result: FAIL
Makefile:87: recipe for target 'check' failed
make: *** [check] Error 1
--
Peter Geoghegan
HEAD-unset-TERM-in-tab-completion-test.patch
Description: Binary data
HEAD + HEAD-unset-TERM-in-tab-completion-test.patch:
set colored-completion-prefix on
set colored-stats on
--
Peter Geoghegan
uggests that the
> best way to do that would be to set envar INPUTRC to /dev/null --- could
> you confirm that that works for you?
Yes -- "export INPUTRC=/dev/null" also makes it work for me (on HEAD +
HEAD-unset-TERM-in-tab-completion-test.patch, but with my
original/problem
On Sat, Jan 4, 2020 at 6:17 PM Tom Lane wrote:
> Cool, I'll go commit a fix along those lines. Thanks for tracing
> this down!
Glad to help!
--
Peter Geoghegan
On Wed, Jan 8, 2020 at 5:56 AM Heikki Linnakangas wrote:
> On 04/01/2020 03:47, Peter Geoghegan wrote:
> > Attached is v28, which fixes bitrot from my recent commits to refactor
> > VACUUM-related code in nbtpage.c.
>
> I started to read through this gigantic patch.
Oh come
ade the test I mentioned less effective. I care about the
specifics.
--
Peter Geoghegan
ses that have support for deduplication later, if at all.
I think it would be fine if the deduplication docs (not the docs for
this infrastructure) just pointed out specific cases that we *cannot*
support -- there are not many exceptions (numeric, text with a
nondeterministic collation, a few others like that).
--
Peter Geoghegan
On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra
wrote:
> Understood. Is that a reason to not commit of this patch now, though?
It could use some polishing. Are you interested in committing it?
--
Peter Geoghegan
On Fri, Jan 10, 2020 at 1:36 PM Peter Geoghegan wrote:
> * HEIKKI: Do we only generate one posting list in one WAL record? I
> would assume it's better to deduplicate everything on the page, since
> we're modifying it anyway.
Still thinking about this one.
> * HEIKKI: Do
enabling deduplication by default (by having the
deduplicate_btree_items GUC default to 'on')?
[1] https://commitfest.postgresql.org/24/2202/
[2] https://github.com/mdcallag/mytools/blob/master/bench/ibench/iibench.py
[3]
https://www.postgresql.org/message-id/flat/CAH2-Wz%3DSfAKVMv1x9Jh19EJ8am8TZn9f-yECipS9HrrRqSswnA%40mail.gmail.com#b20ead9675225f12b6a80e53e19eed9d
--
Peter Geoghegan
ake the primary's LP_DEADs
> into account on a standby. I think we'd run into torn page issues, if we
> were to do so without WAL logging, because we'd rely on the LP_DEAD bits
> and BTPageOpaqueData.xact to be in sync. I *think* we might be safe to
> do so *iff* the page's LSN indicates that there has been a WAL record
> covering it since the last redo location.
That sounds like a huge mess.
--
Peter Geoghegan
On Thu, Jan 16, 2020 at 10:55 AM Robert Haas wrote:
> On Wed, Jan 15, 2020 at 6:38 PM Peter Geoghegan wrote:
> > There are some outstanding questions about how B-Tree deduplication
> > [1] should be configured, and whether or not it should be enabled by
> > default. I'
ncurrently is worth avoiding wherever possible.
--
Peter Geoghegan
On Mon, Jan 13, 2020 at 8:47 PM Andrey Borodin wrote:
> > 11 янв. 2020 г., в 7:49, Peter Geoghegan написал(а):
> > I'm curious why Andrey's corruption problems were not detected by the
> > cross-page amcheck test, though. We compare the first non-pivot tuple
> &
sorts. I don't recall hearing any complaints about
that, but it still doesn't seem great.
--
Peter Geoghegan
On Thu, Jan 16, 2020 at 5:11 PM Peter Geoghegan wrote:
> I find this argument convincing. I'll try to get this committed soon.
>
> While you could have used bt_index_parent_check() or heapallindexed to
> detect the issue, those two options are a lot more expensive (plus the
>
ssary
page split" problem from two completely different angles, with an
overall effect that is greater than the sum of its parts.
While the difference in size of each filler index on the master branch
wasn't that significant on its own, it's still interesting. It's
probably quite workload dependent.
--
Peter Geoghegan
itself. _bt_vacuum_one_page deletes only
> LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do
> you have some feedback for this ?
It sounds like the design of the patch relies on doing something other
than stopping a scan "between" pages, in the sense that is outlined in
the commit message of commit 09cb5c0e. If so, then that's a serious
flaw in its design.
--
Peter Geoghegan
On Mon, Jan 20, 2020 at 1:19 PM Peter Geoghegan wrote:
> On Mon, Jan 20, 2020 at 11:01 AM Jesper Pedersen
> wrote:
> > > - nbtsearch.c _bt_skip line 1440
> > > if (BTScanPosIsValid(so->currPos) &&
> > > _bt_scankey_within_page(scan,
() assertion. It's
good that it verifies what is actually a fragile assumption, even
though I'd prefer to not make a fragile assumption.
--
Peter Geoghegan
lock on a leaf page
for a time. Also, I think that this will only be safe with MVCC scans,
because otherwise the page could be concurrently deleted by VACUUM.
--
Peter Geoghegan
On Wed, Jan 22, 2020 at 10:55 AM Peter Geoghegan wrote:
> This is a good summary. This is the kind of scenario I had in mind
> when I expressed a general concern about "stopping between pages".
> Processing a whole page at a time is a crucial part of how
> _bt_readpage(
On Fri, Jan 17, 2020 at 5:43 PM Peter Geoghegan wrote:
> I tried to come up with a specific example of how this could be
> unsafe, but my explanation was all over the place (this could have had
> something to do with it being Friday evening). Even still, it's up to
> the patch to
dered in the wider context of the executor
(but apparently they're not, so no need to worry about it). This may
have been mentioned somewhere already. If it is then I must have
missed it.
--
Peter Geoghegan
compensate as described by the README file (basically, we check and
re-check for races, possibly returning to the original page when we
think that we might have overlooked something and need to make sure).
It's an exception to the general rule, you could say.
--
Peter Geoghegan
On Wed, Jan 22, 2020 at 10:55 AM Peter Geoghegan wrote:
> On Tue, Jan 21, 2020 at 11:50 PM Floris Van Nee
> wrote:
> > As far as I know, a page split could happen at any random element in the
> > page. One of the situations we could be left with is:
> > Leaf page 1
t), which is
significantly more complexity -- that may not be worth it. (If you did
this, you would have to teach the code the difference between
"absolute" negative infinity in the leftmost leaf page on the leaf
level, and "subtree relative" negative infinity for other leaf pages
that are merely leftmost within a subtree).
In summary: I suppose that we can also solve "the cousin problem"
quite easily, but only for rightmost cousins within a subtree --
leftmost cousins might be too messy to verify for it to be worth it.
We don't want to have to jump two or three levels up within
bt_downlink_connectivity_check() just for leftmost cousin pages. But
maybe you're feeling ambitious! What do you think?
Note: There is an existing comment about this exact negative infinity
business within bt_downlink_check(). It starts with "Negative
inifinity items can be thought of as a strict lower bound that works
transitively...". There should probably be some comment updates to
this comment block as part of this patch.
[1]
https://pdfs.semanticscholar.org/fd45/15ab23c00231d96c95c1091459d0d1eebfae.pdf
--
Peter Geoghegan
On Thu, Jan 23, 2020 at 5:31 PM Peter Geoghegan wrote:
> In summary: I suppose that we can also solve "the cousin problem"
> quite easily, but only for rightmost cousins within a subtree --
> leftmost cousins might be too messy to verify for it to be worth it.
> We don'
On Thu, Jan 23, 2020 at 5:40 PM Peter Geoghegan wrote:
> I suppose the alternative is to get the high key of the parent's left
> sibling, rather than going to the parent's parent (i.e. our
> grandparent). That would probably be the best way to get a separator
> key to compar
ommit a3f0b3d68f9. That can
perform very badly when the input is almost sorted, but has a few
tuples that are out of order towards the end. (I have called these
"banana skin tuples" in the past.)
--
Peter Geoghegan
to
store an abbreviated key. Making that work well is a considerable
undertaking, since you need to use prefix compression to get a high
entropy abbreviated key. It would probably take me the best part of a
whole release cycle to write such a patch. The attached patches get
us a relatively easy win i
hard
work of making abbreviated keys work with a variety of workloads,
while still getting a good idea of the benefits in one specific case.
For this prototype, I can either not do prefix compression to get a
high entropy abbreviated key, or do the prefix compression in a way
that is totally inflexible, but still works well enough for this
initial test workload.
My estimate is that it would take me 4 - 6 weeks to write a prototype
along those lines. That isn't so bad.
--
Peter Geoghegan
the same patches here a second time for completeness.
--
Peter Geoghegan
From 8f55bcedaa9c109543627e9c785dab0ffb0e5c75 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan
Date: Wed, 22 Jan 2020 20:59:57 -0800
Subject: [PATCH v2 3/3] Remove "negative infinity" check from _bt_compare().
On Thu, Jan 16, 2020 at 12:05 PM Peter Geoghegan wrote:
> > It does seem odd to me to treat them differently, but it's possible
> > that this is a reflection of my own lack of understanding. What do
> > other database systems do?
>
> Other database systems treat un
ter than it needs
to be, or something along those lines.
--
Peter Geoghegan
in the mix, too. I'd still not wish to rely on this for debugging.
I agree that there are a lot of moving pieces here. I wouldn't like to
have to rely on this working myself.
--
Peter Geoghegan
On Fri, Mar 22, 2019 at 2:15 PM Peter Geoghegan wrote:
> On Thu, Mar 21, 2019 at 10:28 AM Peter Geoghegan wrote:
> > I'll likely push the remaining two patches on Sunday or Monday.
>
> I noticed that if I initidb and run "make installcheck" with and
> w
uperflous line to make gist logically
> broken. This triggers regression test of amcheck.
Any thoughts on this, Heikki?
It would be nice to be able to squeeze this into Postgres 12,
especially given that GiST has been enhanced for 12 already.
--
Peter Geoghegan
pushed something very close to the patch you posted. (I added
an additional assertion, and tweaked the comments.)
Thanks for the report and patch!
--
Peter Geoghegan
the item
pointer overhead?
I wonder if we should add an assertion based on the pd_special offset
of the page to PageGetFreeSpace(). That would make it harder to make
mistakes like this in the future. Maybe it would be better to revise
the whole API instead, though.
--
Peter Geoghegan
hat use conventional line pointers.
--
Peter Geoghegan
;t be able to call whatever we rename PageGetFreeSpace()
to.
--
Peter Geoghegan
C-Tree?
Peter Geoghegan
(Sent from my phone)
deal with this situation, by
consolidating the old and new tests in some way. I don't think that
your work needs to block on that, though.
> Thoughts? Anyone object to making these sorts of changes
> post-feature-freeze?
IMV there should be no problem with pushing ahead with this after
feature freeze.
--
Peter Geoghegan
e for every tuple on v4
indexes. The original fastpath tests won't tickle the implementation
in any interesting way in my opinion.
--
Peter Geoghegan
On Wed, Apr 10, 2019 at 4:56 PM Peter Geoghegan wrote:
> The original fastpath tests don't seem particularly effective to me,
> even without the oversight I mentioned. I suggest that you remove
> them, since the minimal btree_index.sql fast path test is sufficient.
To be clear:
a useful resource. I would at least like
to know the settings used by its builds.
--
Peter Geoghegan
efer to
have optimizations enabled if I was optimizing my code, but that's not
what the web resource is for, really.
Thanks
--
Peter Geoghegan
ul than the alternative.
--
Peter Geoghegan
able-debug would help, since llvm-gcov requires it,
but that doesn't seem particularly likely.
It's definitely generally recommended that "-O0" be used, so I think
that we can agree that that was an improvement, even if it doesn't fix
the remaining problem that I notice
On Fri, Apr 12, 2019 at 10:49 AM Peter Geoghegan wrote:
> It's definitely generally recommended that "-O0" be used, so I think
> that we can agree that that was an improvement, even if it doesn't fix
> the remaining problem that I noticed when I rechecked nbtutils
ible for that reason, but at
the same time the per-datum overhead seems very high to me. Maybe
prefix compression could help here, which a low key and high key can
do rather well.
--
Peter Geoghegan
would be difficult to
impossible as a practical matter.
Unfortunately, you're both right. I don't know where that leaves us.
--
Peter Geoghegan
e until the project begins to mature. I have little faith in our
ability to predict which approach will be the least painful at this
early stage.
--
Peter Geoghegan
overing no more than few hundred
contiguous logical values, you can justify aggressively compressing
the representation in the B-Tree entries. Compression would still be
based on prefix compression, but the representation itself can be
specialized.
--
Peter Geoghegan
That's one of the reasons why I've been trying to get you to get on
> board with allowing different leaf-level "item pointer equivalents"
> widths inside nbtree...
Getting me to agree that that would be nice and getting me to do the
work are two very different things. ;-)
--
Peter Geoghegan
us infinity items, and actually store a "low key"
at P_FIRSTDATAKEY() within internal pages instead. That would be
useful for other things anyway (e.g. prefix compression).
--
Peter Geoghegan
On Tue, Apr 16, 2019 at 12:00 PM Peter Geoghegan wrote:
> Can you be more specific? What was the cause of the corruption? I'm
> always very interested in hearing about cases that amcheck could have
> detected, but didn't.
FWIW, v4 indexes in Postgres 12 will support t
, rather than counting tuples per se.
I like that idea, but I'm pretty sure that there are very few users
that are aware of these distinctions at all. It would be a good idea
to clearly document them.
--
Peter Geoghegan
r us to detect.
I'll create an open item for this, and begin work on a patch tomorrow.
--
Peter Geoghegan
On Wed, Apr 17, 2019 at 7:37 PM Peter Geoghegan wrote:
> Tentatively, I think that the fix here is to not "itup_key->scantid =
> NULL" when a NULL value is involved (i.e. don't "itup_key->scantid =
> NULL" when we see IndexTupleHasNulls(itup) within _bt_
bute was NULL. It should only do that when a key attribute is
NULL.
--
Peter Geoghegan
On Thu, Apr 18, 2019 at 8:13 PM Peter Geoghegan wrote:
> It just occurred to me that the final patch will need to be more
> careful about non-key attributes in INCLUDE indexes. It's not okay for
> it to avoid calling _bt_check_unique() just because a non-key
> attribute was NUL
casionally saving a MAXALIGN() quantum in space this
way. It is unlikely that anyone would actually care very much about
these kinds of space savings, but at the same time it feels more
elegant to me. The heap TID may not have a pg_attribute entry, but
ISTM that the on-disk representation should not have padding "in the
wrong place", on general principle.
Thoughts?
--
Peter Geoghegan
reeTupleGetHeapTID() works just as
well with either the existing scheme, or this new one. Having the
"real" tuple length available will make it easier to implement "true"
suffix truncation, where we truncate *within* a text attribute (i.e.
generate a new, shorter value using new opclass infrastructure).
--
Peter Geoghegan
to the identifier (or perhaps
its length), plus the usual t_info stuff. We'd almost invariably waste
4 or 5 bytes, which seems like a problem to me.
--
Peter Geoghegan
e
nbtree (nbtree won't when I'm done, though, because it will actively
try to preserve the "real" tuple size). It's convenient to me that no
caller seems to rely on the index_form_tuple() MAXALIGN() that I want
to remove.
--
Peter Geoghegan
of
bugs in quite a variety of contexts.
--
Peter Geoghegan
/fd.c layering is confusing in a number of ways IMV.
--
Peter Geoghegan
On Fri, Apr 19, 2019 at 6:34 PM Peter Geoghegan wrote:
> Attached revision does it that way, specifically by adding a new field
> to the insertion scankey struct (BTScanInsertData).
Pushed.
--
Peter Geoghegan
901 - 1000 of 3245 matches
Mail list logo