On Tue, Jun 28, 2022 at 1:35 PM Bruce Momjian wrote:
> Okay, text updated, thanks. Applied patch attached.
I have some notes on these items:
1. "Allow vacuum to be more aggressive in setting the oldest frozenxid
(Peter Geoghegan)"
2. "Add additional information to
y points via a PRNG. During CREATE TABLE, for
example. This approach could make it easier to reproduce failures on the
buildfarm.
--
Peter Geoghegan
ebt than it can today, but you can't expect to avoid freezing
altogether (without significant work elsewhere). My general sense is
that freezing isn't a particularly good thing to try to do lazily --
even if we ignore the risk of an eventual wraparound failure.
--
Peter Geoghegan
datum tuplesorts for
'SELECT(DISTINCT mycol) ...' style queries on low cardinality columns
extremely fast. We're not really sorting so much as bucketing. This is
based on Dijkstra's Dutch national flag problem.
--
Peter Geoghegan
On Wed, Jun 22, 2022 at 8:45 AM Bruce Momjian wrote:
> That is a big help for committers who want to email a URL of the new
> doc commit output.
Yes, it's a nice "quality of life" improvement.
Thanks for finally getting this done!
--
Peter Geoghegan
past (when working on B-Tree deduplication).
It's quite straightforward to set up and run.
[1] http://smalldatum.blogspot.com/2017/06/the-insert-benchmark.html
--
Peter Geoghegan
ta will be *extremely* valuable, and
not merely somewhat more valuable?
I guess it doesn't matter much now (since you have all but conceded
that using a bit for this makes sense), but FWIW that's the main
reason why I almost took it for granted that we'd need to use a status
bit (or bits) for this.
--
Peter Geoghegan
On Tue, Jun 14, 2022 at 7:21 PM Robert Haas wrote:
> On Tue, Jun 14, 2022 at 9:56 PM Peter Geoghegan wrote:
> > Technically we don't already do that today, with the 16-bit checksums
> > that are stored in PageHeaderData.pd_checksum. But we do something
> > equivalent:
I can't see it
being more than 2 or 3. Which seems absolutely fine. They *definitely*
have no value if nobody ever uses them for anything.
--
Peter Geoghegan
On Tue, Jun 14, 2022 at 1:32 PM Peter Geoghegan wrote:
> On Tue, Jun 14, 2022 at 1:22 PM Robert Haas wrote:
> > I still am not clear on precisely what you are proposing here. I do
> > agree that there is significant bit space available in pd_flags and
> > that consuming s
e for now that we don't leave pd_flags unencrypted, as you
have suggested. We're still discussing new approaches to checksumming
in the scope of this work, which of course includes many individual
cases that don't involve any encryption. Plus even with encryption
there are things like defensive assertions that can be added by using
a flag bit for this.
--
Peter Geoghegan
the server, perhaps for performance reasons
(though it's unclear how much it matters). In which case the status
bit is technically redundant information as far as the code is
concerned. That may well be fine.
--
Peter Geoghegan
to be negotiated at
that level, because it will in fact affect a lot of callers to the
bufpage.h functions.
--
Peter Geoghegan
the door open to a world in which that assumption no longer
holds. Like when you do finally get around to making TDE something
that can work at the relation level, for example. Even if there is
only a small chance of that ever happening, why wouldn't we be
prepared for it, just on general principle?
--
Peter Geoghegan
hange to the on-disk format of this kind of
magnitude taken place, post-pg_upgrade? I would argue that this would
be the first, since it is the moral equivalent of extending the size
of the generic page header.
For all I know the overhead will be perfectly fine, and everybody
wins. I just want to be adamant that we're making the right
trade-offs, and maximizing the benefit from any new cost imposed on
access method code.
--
Peter Geoghegan
t
it'll never be useful to apply encryption selectively, perhaps at the
relation level?
--
Peter Geoghegan
On Tue, Jun 14, 2022 at 8:48 AM Robert Haas wrote:
> On Mon, Jun 13, 2022 at 6:26 PM Peter Geoghegan wrote:
> > Anyway, I can see how it would be useful to be able to know the offset
> > of a nonce or of a hash digest on any given page, without access to a
> > running serv
mit
> I wasn't aware myself of all the gotchas described there.
I didn't realize that it was that bad. Even if it's only 10% as bad as
you say, it would still be very valuable to do something about it
(ideally with an approach that is non-invasive).
--
Peter Geoghegan
g clearer in a doc patch that recently became
commit 8ac700ac.
--
Peter Geoghegan
the system lib C to be authoritative for the OS as a whole, in
the way that Postgres supposes. At least in the case of Mac OS, which
is after all purely a desktop operating system.
--
Peter Geoghegan
ber of pages may make
all the difference.
--
Peter Geoghegan
me to be totally inflexible, and doesn't
compose with other things?
--
Peter Geoghegan
On Mon, Jun 13, 2022 at 2:54 PM Bruce Momjian wrote:
> On Mon, Jun 13, 2022 at 02:44:41PM -0700, Peter Geoghegan wrote:
> > Is that the how block-level encryption feature from EDB Advanced Server
> > does it?
>
> Uh, EDB Advanced Server doesn't have a block-level encry
locate the checksum that's going to tell you
> whether the page contents are messed up. Perhaps this could be worked
> around if you tried hard enough, but I don't see what we get out of
> it.
Is that the how block-level encryption feature from EDB Advanced Server does it?
--
Peter Geoghegan
res package aren't forced to use the same ancient ICU
version.
--
Peter Geoghegan
and get a change in behavior,
then surely any related indexes must have been rebuilt too. The
interesting part may be what that upgrade looks like in detail.
--
Peter Geoghegan
atabase, and involves exactly 2 ICU versions. You
should probably be able to back out of it once it begins, but mostly
it's an inflexible process that just does what we need it to do.
Does something like that seem sensible to you?
--
Peter Geoghegan
have old physical
collations. Defining the problem as a problem with old
indexes/constraints only seems like it might make things a lot easier.
--
Peter Geoghegan
t's more realistically and robustly
> and simply implementable. Hmm.
That may be a decisive reason to go with your proposal. I really don't know.
--
Peter Geoghegan
on for their logical collation). So
directly tackling that seems natural to me.
--
Peter Geoghegan
t it would be to go as far as we can in the direction
of decoupling the concerns that we have as database people from the
concerns of natural language experts. Let's not step on their toes,
and let's avoid having our toes trampled on.
--
Peter Geoghegan
ady what we advise for users that use advanced
tailorings of custom ICU collations, such as a custom collation for
"natural sorting", often used for things like alphanumeric invoice
numbers. That might break if you downgrade ICU version, and maybe even
if you upgrade ICU version.
--
Peter Geoghegan
ld
largely be an implementation detail, perhaps only used to
unambiguously identify which specific ICU version and locale string
relate to which on-disk relfilenode structure currently.
--
Peter Geoghegan
On Thu, Jun 9, 2022 at 2:33 PM Peter Geoghegan wrote:
> My preference is for an approach that builds on that, or at least
> doesn't significantly complicate it. So a cryptographic hash or nonce
> can go in the special area proper (structs like BTPageOpaqueData don't
> need
ial area proper (structs like BTPageOpaqueData don't
need any changes), but at a page offset before the special area proper
-- not after.
What disadvantages does that approach have, if any, from your point of view?
--
Peter Geoghegan
at experience has shown are relatively common.
That's what the search_path case seems like to me.
If somebody else wants to write another patch that adds on that,
great. If not, then having this much still seems useful.
--
Peter Geoghegan
o store the index metapage. (Actually unlogged
indexes that run on a standby don't, but that's accounted for
directly.)
--
Peter Geoghegan
what not to do.
--
Peter Geoghegan
f ICU
will eventually become a "compelling feature" in its own right.
I believe that EDB adopted ICU many years ago, and stuck with one
vendored version for quite a few years. And eventually being on a very
old version of ICU became a real problem.
--
Peter Geoghegan
On Thu, Jun 9, 2022 at 6:40 AM Robert Haas wrote:
> Are you going to code up a patch?
I can, but feel free to fix it yourself if you prefer. Your analysis
seems sound.
--
Peter Geoghegan
On Wed, Jun 8, 2022 at 10:39 PM Peter Geoghegan wrote:
> They simply REINDEX, without changing anything. The details are still
> fuzzy, but at least that's what I was thinking of.
As I said before, BCP47 format tags are incredibly forgiving by
design. So it should be reasonable to
original/old environment (which is likely), you can avoid reindexing,
and so reserve the option of backing out of a complex upgrade until
very late in the process. You're going to have to do it eventually,
but it can probably just be an afterthought.
--
Peter Geoghegan
ly, and necessitates thinking about
multiple evaluation hazards, which is enough to discourage good
defensive coding practices.
--
Peter Geoghegan
ItemSize()
looked like prior to Postgres 12. So the limit on internal pages never
changed, even in Postgres 12. There was no separate leaf page limit
prior to 12. Only the rules on the leaf level ever really changed.
Note also that amcheck has tests for this stuff. Though that probably
doesn't matter at all.
--
Peter Geoghegan
. It took as long as 30 minutes or more to run the
test.
I think that we should fix this on HEAD, on general principle. There
is no reason to believe that this is a live bug, so a backpatch seems
unnecessary.
--
Peter Geoghegan
ficient
approach to implementing strxfrm() is another example of the same
thing. (The Apple strxfrm() produces huge low entropy binary strings,
unlike the glibc version, which is pretty well optimized.)
--
Peter Geoghegan
rhead with mixed
reads and writes, so it's a performance all-rounder that can still be
beaten by specialized techniques that come with their own downsides.
--
Peter Geoghegan
s once we
gain the ability to use multiple versions of ICU at the same time? For
example, do we want to generalize the definition of a collation, so
that it's associated with one particular ICU version and collation for
the purposes of on-disk compatibility, but isn't necessarily tied to
the same ICU version in other contexts, such as on a dump and restore?
--
Peter Geoghegan
le different ICU versions doesn't really seem like
overkill to me. Or if it is then I can easily think of far better
examples of software bloat. Defining "stable behavior for collations"
as "uses exactly the same software artifact over time" is defensive
(compared to always linking to one ICU version that does it all), but
we have plenty that we need to defend against here.
--
Peter Geoghegan
uot;best effort" approach, because throwing a "locale not
found" error message usually isn't helpful from the point of view of
the end user. Note that this is a broader standard than ICU or CLDR or
even Unicode.
[1] https://www.ietf.org/rfc/rfc6067.txt
--
Peter Geoghegan
ant. Even if glibc theoretically does a
perfect job of versioning, I still think that their priorities are
very much unlike our priorities, and that that should be a relevant
consideration for us.
--
Peter Geoghegan
is scheme wouldn't technically be under our direct control, but
would still be something that we could influence. We could have a back
and forth conversation about what's not working in the field.
--
Peter Geoghegan
port by the distro (while
actively discouraging its use in new databases). This isn't the same
thing as forking ICU. It's a compromise between that extreme, and
the current situation.
--
Peter Geoghegan
hat there are many
near-misses that we never get to hear about already. That's rather
beside the point. The index must be assumed to be corrupt.
--
Peter Geoghegan
space efficiency matters, especially with B-Tree
index-only scans that scan a significant fraction of the entire index,
or even the entire index.
--
Peter Geoghegan
more
attributes of scalar types.
The abbreviated keys optimization is very much something that comes
from the world of databases, not the world of sorting. It's pretty much a
domain-specific technique. That seems relevant to me.
--
Peter Geoghegan
otQuicksort.pdf
At one point quite a few years back I planned on investigating it
myself, but never followed through.
--
Peter Geoghegan
ic
mean? That's pretty standard practice when summarizing a set of
benchmark results that are expressed as ratios to some baseline.
If I tweak your spreadsheet to use the geometric mean, the patch looks
slightly better -- 89%.
--
Peter Geoghegan
On Fri, May 27, 2022 at 11:59 AM Andres Freund wrote:
> On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote:
> > I find it hard to believe that there wasn't even a cursory effort at
> > performance validation before this was committed, but that's what it
> > looks
speed up anything useful! There's not a
> single benchmark for the patch.
I find it hard to believe that there wasn't even a cursory effort at
performance validation before this was committed, but that's what it
looks like.
--
Peter Geoghegan
On Thu, May 19, 2022 at 1:12 PM Justin Pryzby wrote:
> Should these debug lines be removed ?
>
> elog(DEBUG1, "qsort_tuple");
I agree -- DEBUG1 seems too chatty for something like this. DEBUG2
would be more appropriate IMV. Though I don't feel very strongly about
it.
--
Peter Geoghegan
that's just obviously false.
+1
--
Peter Geoghegan
ble for a
variety of reasons. All of which boil down to "the current FSM design
cannot be totally trusted, so we verify redundantly".
--
Peter Geoghegan
-- I'm really
looking for bottlenecks, where Postgres does entirely the wrong thing.
It's especially interesting to me as somebody that focuses on B-Tree
indexing.
--
Peter Geoghegan
mpossible to overlook), then I
would object -- why even take a small chance? Fortunately I don't
believe that we're even taking a small chance here, all things
considered. And so I agree; this issue isn't a concern.
--
Peter Geoghegan
On Thu, Apr 21, 2022 at 4:28 PM Peter Geoghegan wrote:
> I don't think that there is any risk of one user of either variable
> "clobbering" some other user -- the current values of the variables
> are not actually meaningful at all. They're only useful as a way that
On Wed, Apr 20, 2022 at 8:00 PM Peter Geoghegan wrote:
> I knew about pgBufferUsage, and I knew about
> VacuumPage{Hit,Miss,Dirty} for a long time. But somehow I didn't make
> the very obvious connection between the two until today. I am probably
> not the only
n that path and added the others too?
I knew about pgBufferUsage, and I knew about
VacuumPage{Hit,Miss,Dirty} for a long time. But somehow I didn't make
the very obvious connection between the two until today. I am probably
not the only one.
--
Peter Geoghegan
On Tue, Apr 12, 2022 at 11:01 AM Peter Geoghegan wrote:
> Attached patch fixes the issue, and includes the test case that you posted.
Pushed a similar patch just now. Backpatched to all supported branches.
--
Peter Geoghegan
ack_io_timing is off, so are fields like
pgBufferUsage.shared_blks_hit (i.e. those that don't have a
time/duration component) officially okay to rely on across the board?
It looks like they are okay to rely on (even when track_io_timing is
off), but it would be nice to put that on a formal footing, if it
isn't already.
--
Peter Geoghegan
ith using DBT5 on a modern
Linux distribution. Perhaps I gave up too easily at the time, but I'm
definitely still interested. Has there been work on that since?
Thanks
--
Peter Geoghegan
s correct according to the
spec.
--
Peter Geoghegan
On Mon, Apr 18, 2022 at 1:12 PM Peter Geoghegan wrote:
> I would argue that it would be correct for the first time -- at least
> if we take the behavior within heapam_index_build_range_scan (and
> everywhere else) as authoritative. That's a feature, not a bug.
Attached draft patc
rgue that it would be correct for the first time -- at least
if we take the behavior within heapam_index_build_range_scan (and
everywhere else) as authoritative. That's a feature, not a bug.
--
Peter Geoghegan
uumInfo.num_heap_tuples value
in the amvacuumcleanup path (instead of new_rel_tuples). That way the
rule about IndexVacuumInfo.num_heap_tuples is simple: it's always
taken from pg_class.reltuples (for the heap rel). Either the existing
value, or the new value.
--
Peter Geoghegan
.num_index_tuples, which is related. Granted,
that won't be used to update pg_class for the index in the case where
it's just an estimate anyway.
--
Peter Geoghegan
. I believe that the "pg_class.reltuples is -1 even after a
VACUUM" case is completely impossible following the Postgres 15 work
on VACUUM, but we should still clamp for safety in
update_relstats_all_indexes (though not in the amvacuumcleanup path).
--
Peter Geoghegan
> And FreezeLimit doesn't affect "dead but not yet removable".
But OldestXmin affects FreezeLimit.
Anyway, I'm not opposed to showing the age at the start as well. But
from the point of view of issues like this tenk1 issue, it would be
more useful to just report on new_rel_allvisible. It would also be
more useful to users.
--
Peter Geoghegan
eneral approach to calculating
FreezeLimit makes little sense.
--
Peter Geoghegan
ally in the kinds of extreme cases I'm thinking about.
--
Peter Geoghegan
what's currently running.
As well as the age of OldestXmin at the start of VACUUM.
--
Peter Geoghegan
ind, though. Your new wording is fine.
I'll update the log output some time today.
--
Peter Geoghegan
mples of both. This could easily be changed to "XIDs".
--
Peter Geoghegan
fsync = off'. And did so in the
script as well.
That seems like it definitely could matter.
--
Peter Geoghegan
t theory (just putting
dinner on here). Just a wild guess at this point.
--
Peter Geoghegan
but I thought that the syncronous_commit thing was new
information that made that worth revisiting.
--
Peter Geoghegan
could plausibly have had that effect, whose
commit fits with our timeline for the problems seen on wrasse?
--
Peter Geoghegan
On Thu, Apr 14, 2022 at 3:28 PM Peter Geoghegan wrote:
> A bunch of autovacuums that ran between "2022-04-14 22:49:16.274" and
> "2022-04-14 22:49:19.088" all have the same "removable cutoff".
Are you aware of Andres' commit 02fea8fd? That work preven
o cannot go up as we're doing it (or it'll be less of an
issue, at least).
It would also help if VACUUM didn't scan pages that it already knows
don't have any dead tuples. The current SKIP_PAGES_THRESHOLD rule
could easily be improved. That's almost the same problem.
--
Peter Geoghegan
ee seconds
(likely more) where something held back OldestXmin generally.
That does seem a bit fishy to me, even though it happened about a
minute after the failure itself took place.
--
Peter Geoghegan
o XIDs to work
off of in the log_line_prefix that's in use on wrasse.
The CITester log_line_prefix is pretty useful -- I wonder if we can
standardize on that within the buildfarm, too.
--
Peter Geoghegan
On Thu, Apr 14, 2022 at 10:07 AM Peter Geoghegan wrote:
> It looks like you're changing the elevel convention for these "extra"
> messages with this patch. That might be fine, but don't forget about
> similar ereports() in vacuumparallel.c. I think that the elevel sho
, but don't forget about
similar ereports() in vacuumparallel.c. I think that the elevel should
probably remain uniform across all of these messages. Though I don't
particular care if it's DEBUG2 or DEBUG5.
--
Peter Geoghegan
e'd know what the xid horizon is, whether pages were
> skipped, etc.
I like the idea of making VACUUM log the VERBOSE output as a
configurable user-visible feature. We'll then be able to log all
VACUUM statements (not just autovacuum worker VACUUMs).
--
Peter Geoghegan
to the
> horizon potentially going backwards (in otherwise harmless ways)?
I agree, since vacuumlazy.c would need to either be given its own
OldestXmin, or knowledge of a wait-up-to XID. Either way we have to
make non-trivial changes to vacuumlazy.c.
--
Peter Geoghegan
On Wed, Apr 13, 2022 at 6:03 PM Peter Geoghegan wrote:
> I think that it's more likely that FREEZE will correct problems, out of the
> two:
>
> * FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
> cutoff value as possible (FreezeLimit will be equal to Olde
an SQL function for other
reasons, though. Users already think that there are several different
flavors of VACUUM, which isn't really true.
--
Peter Geoghegan
tgr.es/m/cah2-wzkib-qcsbmwrpzp0nxvrqexouts1d7tyshg_drkohe...@mail.gmail.com
--
Peter Geoghegan
problem, really. I wonder if it's worth inventing
a comprehensive solution. Some kind of infrastructure that makes
VACUUM establish a next XID up-front (by calling
ReadNextTransactionId()), and then find a way to run with an
OldestXmin that's >= the earleir "next" XID value. If necessary by
waiting.
--
Peter Geoghegan
rel->NewRelfrozenXid == OldestXmin", and run the
regression tests, the remaining assertion will fail quite easily.
Though perhaps not with a serial "make check".
--
Peter Geoghegan
901 - 1000 of 2645 matches
Mail list logo