...@mail.gmail.com
[2] http://www.informatics.jax.org/software.shtml
--
Peter Geoghegan
On Tue, Jul 2, 2019 at 3:51 PM Peter Geoghegan wrote:
> I've already written a rough patch that fixes the issue by taking this
> second view of the problem. The patch makes nbtsplitloc.c more
> skeptical about finishing with the "many duplicates" strategy,
> avoiding the
On Sat, Jul 6, 2019 at 4:08 PM Peter Geoghegan wrote:
> I took a closer look at this patch, and have some general thoughts on
> its design, and specific feedback on the implementation.
I have some high level concerns about how the patch might increase
contention, which could make queries
ould definitely have an open mind about unique
indexes, even with non-NULL values. If we can prevent a page split by
deduplicating the contents of a unique index page, then we'll probably
win. Why not try? This will need to be tested.
--
Peter Geoghegan
t.
> Regarding bitmap indexes itself, I think our BRIN could provide them.
> However, it would be useful to have opclass parameters to make them
> tunable.
I thought that we might implement them in nbtree myself. But we don't
need to decide now.
--
Peter Geoghegan
the version number isn't changed. I think that we may be
able to get away with not increasing the B-Tree version from 4 to 5,
actually. Deduplication is performed lazily when it looks like we
might have to split the page, so there isn't any expectation that
tuples will either be compressed or uncompressed in any context.
--
Peter Geoghegan
Database System" provides additional
background information (I should have suggested reading both 6.6 and
6.7 together).
--
Peter Geoghegan
On Thu, Jul 11, 2019 at 10:42 AM Peter Geoghegan wrote:
> > I think unique indexes may benefit from deduplication not only because
> > of NULL values. Non-HOT updates produce duplicates of non-NULL values
> > in unique indexes. And those duplicates can take significant
On Sun, Jul 7, 2019 at 7:53 PM Thomas Munro wrote:
> On Wed, May 1, 2019 at 12:58 PM Peter Geoghegan wrote:
> > I will think about a simple fix, but after the upcoming point release.
> > There is no hurry.
>
> A bureaucratic question: What should the status be for this CF
y of my test cases --
the fix barely affects the splits chosen for my real-world test data,
and TPC test data. As far as I know, I already have a comprehensive
fix. I will need to think about it much more carefully before
proceeding, though.
Thoughts?
--
Peter Geoghegan
derstand old deleted pages, where the deletion XID is
> stored in the page opaque field.
What Postgres versions will the B-Tree fix end up targeting? Sounds
like you plan to backpatch all the way?
--
Peter Geoghegan
unnecessary impediments in the way of making that
> happen, at least IMHO.
+1.
pg_stat_statements will already lose all the statistics that it
aggregated in the event of a hard crash. The trade-off that the query
jumbling logic makes is not a bad one, all things considered.
--
Peter Geoghegan
gine
doing quite a lot better still. Application developers love UUIDs. We
should try to meet them where they are.
[1] https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/
--
Peter Geoghegan
layering is confusing in a number of ways IMV.
--
Peter Geoghegan
actually care very much about
these kinds of space savings, but at the same time it feels more
elegant to me. The heap TID may not have a pg_attribute entry, but
ISTM that the on-disk representation should not have padding "in the
wrong place", on general principle.
Thoughts?
--
Peter Geoghegan
o disable the trace_sort instrumentation my commenting out
the TRACE_SORT entry in pg_config_manual.h. I recall being opposed on
this point by Robert Haas. Possibly because he just didn't want to
deal with it at the time.
--
Peter Geoghegan
uite meeting the traditional definition of a "developer
option".
--
Peter Geoghegan
seem to be saying that it is, I
> think we should just remove the symbol and be done with it.
Sounds like a plan. Do you want to take care of it, Joe?
--
Peter Geoghegan
le by the new
pg_stat_progress_create_index view, but with getrusage() stats.
--
Peter Geoghegan
On Fri, Apr 19, 2019 at 6:34 PM Peter Geoghegan wrote:
> Attached revision does it that way, specifically by adding a new field
> to the insertion scankey struct (BTScanInsertData).
Pushed.
--
Peter Geoghegan
orthwhile to keep the heap TID in the tuple header; it seems
inherently necessary to have a MAXALIGN()'d tuple header, so finding a
way to consistently put the first MAXALIGN() quantum to good use seems
wise.
--
Peter Geoghegan
On Wed, Apr 24, 2019 at 10:43 AM Peter Geoghegan wrote:
> The hard part is how to do varwidth encoding for space-efficient
> partition numbers while continuing to use IndexTuple fields for heap
> TID on the leaf level, *and* also having a
> BTreeTupleGetHeapTID()-style macro to g
er the
other, though I don't really know how to assess this layering
business. I'm glad that either approach will prevent oversights,
though.
--
Peter Geoghegan
The documentation has a section called "Routine Reindexing", which
explains how to simulate REINDEX CONCURRENTLY with a sequence of
creation and replacement steps. This should be updated to reference
the REINDEX CONCURRENTLY command.
--
Peter Geoghegan
hink about or define
developer options frames this discussion.
--
Peter Geoghegan
On Thu, Apr 25, 2019 at 1:56 PM Tom Lane wrote:
> Well, I was suggesting that we ought to consider the alternative of
> making it *not* always compiled, and Jeff was pushing back on that.
Right. Sorry.
--
Peter Geoghegan
On Tue, Apr 16, 2019 at 12:00 PM Peter Geoghegan wrote:
> On Mon, Apr 15, 2019 at 7:30 PM Alexander Korotkov
> wrote:
> > Currently we amcheck supports lossy checking for missing parent
> > downlinks. It collects bitmap of downlink hashes and use it to check
> > subse
nds of
bugs in quite a variety of contexts.
--
Peter Geoghegan
scheme, or this new one. Having the
"real" tuple length available will make it easier to implement "true"
suffix truncation, where we truncate *within* a text attribute (i.e.
generate a new, shorter value using new opclass infrastructure).
--
Peter Geoghegan
ength), plus the usual t_info stuff. We'd almost invariably waste
4 or 5 bytes, which seems like a problem to me.
--
Peter Geoghegan
use it will actively
try to preserve the "real" tuple size). It's convenient to me that no
caller seems to rely on the index_form_tuple() MAXALIGN() that I want
to remove.
--
Peter Geoghegan
continue;
I would expect the "break" statement to have a line count that is no
greater than that of the first two lines that immediately precede, and
yet it's far far greater (1292 is greater than 4). It looks like there
has been some kind of loop transformation.
--
Peter Geoghegan
http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf
Search the PDF for "-O0" to see numerous references to this. It seems
to be impossible to turn off all GCC optimizations.
--
Peter Geoghegan
On Tue, Jul 16, 2019 at 9:01 AM Robert Haas wrote:
> I cast my vote in the other direction i.e. for sticking with qsort.
I do too.
--
Peter Geoghegan
as painless as possible.
Note that ICU does at least provide a standard way to use multiple
versions at once; the symbol names have the ICU version baked in.
You're actually calling the functions using the versioned symbol names
without realizing it, because there is macro trickery involved
urs that I mentioned.
> > I think that the whole sentence about "the standard class of race
> > conditions" should go. There is no more dance. Nothing in
> > _bt_getroot() is surprising to me. The other comments explain things
> > comprehensively.
>
> +1
I'll take care of it soon.
--
Peter Geoghegan
h sounds
like a seriously bad approach to me.
I think that the whole sentence about "the standard class of race
conditions" should go. There is no more dance. Nothing in
_bt_getroot() is surprising to me. The other comments explain things
comprehensively.
--
Peter Geoghegan
t
only when its value is non-zero.
--
Peter Geoghegan
lever about ignorable/half-dead/deleted pages, to be conservative.)
--
Peter Geoghegan
ically different page
(even after masking within btree_mask()). However, I eventually
decided that you had it right. Your _bt_mark_page_halfdead() change is
clearer overall and doesn't break WAL consistency checking in
practice, for reasons that are no less obvious than before.
Thanks!
--
Peter Geoghegan
iary (e.g.,
* they are current target's child pages). Conceptually, problems are only
* ever found in the current target page (or for a particular heap tuple during
* heapallindexed verification). Each page found by verification's left/right,
* top/bottom scan becomes the target exactly once.
*/
--
Peter Geoghegan
uot;Row forwarding" across heap pages is the
traditional way of ensuring that TIDs in indexes are stable even in
the worst case, apparently, but other approaches also seem possible.
[1] http://www.vldb.org/pvldb/vol10/p781-Wu.pdf
--
Peter Geoghegan
random unused OID: 9099
I would like to push this patch shortly. How do people feel about this
wording? (It's based on the documentation added by commit a6417078.)
--
Peter Geoghegan
v2-0001-unused_oids-suggestion.patch
Description: Binary data
On Fri, Aug 2, 2019 at 3:52 PM Tom Lane wrote:
> Better ... but I'm the world's second worst Perl programmer,
> so I have little to say about whether it's idiomatic.
Perhaps Michael can weigh in here? I'd rather hear a second opinion on
v4 of the patch before proceeding.
--
Peter Geoghegan
implements your suggestion, generating output
like the above. I haven't written a line of Perl in my life prior to
today, so basic code review would be helpful.
--
Peter Geoghegan
v3-0001-unused_oids-suggestion.patch
Description: Binary data
u have to be fairly
unlucky to have that happen under the system introduced by commit
a6417078.)
It's probably the case that most patches that create a new pg_proc
entry only create one. The question of consecutive OIDs only comes up
with a fairly small number of patches.
--
Peter Geoghegan
tuples (say based on the
"k=:val" constant) seems like it might generalize well enough. I
suggest Floris look into that possibility. This paper might be worth a
read:
https://dl.acm.org/citation.cfm?id=582278
(Though it also might not be worth a read -- I haven't actually read it myself.)
--
Peter Geoghegan
On Fri, Aug 2, 2019 at 5:34 PM Peter Geoghegan wrote:
> I wonder if some variety of block nested loop join would be helpful
> here. I'm not aware of any specific design that would help with
> Floris' case, but the idea of reducing the number of scans required on
> the inner side
l programmer is no excuse.)
How about the attached? I've simply removed the "if ($oid > $prev_oid
+ 2)" test.
--
Peter Geoghegan
v4-0001-unused_oids-suggestion.patch
Description: Binary data
sed_oids would
*maximize* the number of OID collisions.
> We could
> recommend the range if there are at least 10 OIDs available in the
> range from the lowest position, and there are few patches eating more
> than 5-10 OIDs at once.
That sounds like an over-engineered solution to a pr
s all of the same
tricks as our existing the Bentley & McIlroy implementation, but is
more cache efficient. It's considered the successor to B, and had
input from Bentley himself. It is provably faster than B for a wide
variety of inputs, at least on modern hardware.
[1] http://www.vldb.org/journal/VLDBJ4/P603.pdf
[2] https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf
--
Peter Geoghegan
uppose that
bt_right_page_check_scankey() helps with transposed pages, but doesn't
help so much when you have WAL-level inconsistencies.
--
Peter Geoghegan
ther than letting an ambiguous "can't happen" error get
raised by low-level code. This might be possible with system catalog
corruption, for example. Finally, I thought that the WARNING was a bit
strong -- a NOTICE is more appropriate.
Thanks!
--
Peter Geoghegan
ready for review again.
I'm looking at it now. I'm going to spend a significant amount of time
on this tomorrow.
I think that we should start to think about efficient WAL-logging now.
> In the meantime, I'll run more stress-tests.
As you probably realize, wal_consistency_checking is a good thing to
use with your tests here.
--
Peter Geoghegan
ve that you came up with anyway.
> > How do you feel about officially calling this deduplication, not
> > compression? I think that it's a more accurate name for the technique.
> I agree.
> Should I rename all related names of functions and variables in the patch?
Please rename them when convenient.
--
Peter Geoghegan
d). This seemed like something that was really up to the callers.
Pushed a version with that change. Thanks for the review!
--
Peter Geoghegan
hmee19@news-spur.riddles.org.uk
That was a BufFile that was under the control of a tuplestore, so it
was similar to but different from your case. I suspect it's related.
--
Peter Geoghegan
which would improve matters
further with low cardinality indexes.)
--
Peter Geoghegan
ke?
Perhaps this is a problem that isn't worth solving right now, but it
is definitely a real problem.
[1]
https://www.postgresql.org/message-id/66ce997fb523c04e9749452273184c6c137cb88...@exch-mbx-113.vmware.com
--
Peter Geoghegan
isadvantages, including the fact that you have to know that
your data is amenable to BRIN indexing in order to use a BRIN index.
--
Peter Geoghegan
, because it doesn't care about the actual content of posting
lists. And, we can fix the "fake new item is not actually real new
item" issue at one point within _bt_split(), just as we're about to
WAL log.
What do you think of that approach?
--
Peter Geoghegan
On Thu, Aug 29, 2019 at 10:10 PM Peter Geoghegan wrote:
> I see some Valgrind errors on v9, all of which look like the following
> two sample errors I go into below.
I've found a fix for these Valgrind issues. It's a matter of making
sure that _bt_truncate() sizes new pivot tuples pr
On Thu, Aug 29, 2019 at 5:07 PM Peter Geoghegan wrote:
> I agree that v9 might be ever so slightly more space efficient than v5
> was, on balance.
I see some Valgrind errors on v9, all of which look like the following
two sample errors I go into below.
First one:
==11193== VALGRINDERROR
d is correct -- the NULL handling within
ApplySortAbbrevFullComparator() cannot actually be used currently. I
wouldn't change anything about the code, though, since it's useful to
defensively handle NULLs.
--
Peter Geoghegan
would have a lot of
advantages in the long term. It is certainly theoretically appealing.
Could this make it easier to use merge join with containment
operators? I'm thinking of things like geospatial joins, which can
generally only be performed as nested loop joins at the moment. This
is often wildly inefficient.
--
Peter Geoghegan
eel about this CREATE INDEX index-size-is-larger business?
--
Peter Geoghegan
lues. We've prototyped that, see [1].
I'm pretty sure that spatial joins generally need two spatial indexes
(usually R-Trees). There seems to have been quite a lot of research in
it in the 1990s.
--
Peter Geoghegan
On Sun, Aug 25, 2019 at 2:55 PM Peter Geoghegan wrote:
> I suppose that we'd add something new to CREATE OPERATOR CLASS to make
> this work? My instinct is to avoid adding things that are only
> meaningful for a single AM to interfaces like CREATE OPERATOR CLASS,
> but the system
On Sun, Aug 25, 2019 at 2:18 PM Peter Geoghegan wrote:
> > Indeed, we run up against this sort of thing all the time in, eg, planner
> > optimizations. I think some sort of "equality is precise" indicator
> > would be really useful for a lot of things.
>
>
letely, because
we're not directly concerned with the physical representation used
within an index. In fact, a major goal for this new infrastructure is
that nbtree gets to fully own the representation (it just needs to
know about the high level or logical requirements).
--
Peter Geoghegan
uot; collation isn't otherwise usable.
Perhaps there are far more compelling planner optimization that I
haven't considered, though. This idea probably has problems with
interesting sort orders that aren't actually that interesting.
--
Peter Geoghegan
the btree/numeric display
scale problem are simply not worth solving directly. That would add a
huge amount of complexity for very little benefit.
[1] https://commitfest.postgresql.org/24/2202/
--
Peter Geoghegan
5x+ reduction), along with a very small reduction in the
number of leaf pages. Users that happen to have a lot of indexes that
look like this are likely to find classic suffix truncation
compelling, but that doesn't seem like a good enough reason to push
ahead with the patch.
--
Peter Geoghegan
on. This code is a few years old, but I still wouldn't be
surprised if it turned out to be slightly wrong in a way that was
important. We still have no way of detecting if a buffer is accessed
without a pin. There have been numerous bugs like that before. (We
have talked about teaching Valgrind to detect the case, but that never
actually happened.)
--
Peter Geoghegan
THRESHOLD
stuff really helped with those indexes.
Want me to send this data and the associated tests script over to you?
--
Peter Geoghegan
m/cah2-wzmrt_0ybhf05axqb2oituqiqakr0lznntj8x3kadkz...@mail.gmail.com
--
Peter Geoghegan
er or not
incrementally doing all the work (not just the WAL logging) makes
sense. It's still too early to be sure about whether or not that's a
good idea.
--
Peter Geoghegan
nbtree_wal_test.sql
Description: Binary data
l debug this myself in a few days,
though you may prefer to do it before then.
--
Peter Geoghegan
On Fri, Sep 6, 2019 at 7:02 AM Alvaro Herrera from 2ndQuadrant
wrote:
> Peter, Heikki, are you going to do [at least] one more round of
> design/functional review?
I didn't plan on it, but somebody probably should. Are you offering to
commit the patch? If not, I can take care of it.
--
On Fri, Sep 6, 2019 at 2:35 PM Alvaro Herrera from 2ndQuadrant
wrote:
> I'd welcome it more if you did it; thanks.
I'll take care of it, then.
--
Peter Geoghegan
, and hopefully the next VACUUM will clean
it up.
"""
Why is this not a problem for the new amcheck checks? Maybe this is a
very naive question. I don't claim to be a GiST expert.
--
Peter Geoghegan
On Wed, Sep 11, 2019 at 3:09 PM Peter Geoghegan wrote:
> Hmm. So v12 seems to have some problems with the WAL logging for
> posting list splits. With wal_debug = on and
> wal_consistency_checking='all', I can get a replica to fail
> consistency checking very quickly when "
The patch has been committed already.
Peter Geoghegan
(Sent from my phone)
On Wed, Sep 11, 2019 at 7:10 PM Peter Geoghegan wrote:
> The patch has been committed already.
Oh, wait. It hasn't. Andrey didn't create a new thread for his largely
independent patch, so I incorrectly assumed he created a CF entry for
his original bugfix.
--
Peter Geoghegan
On Thu, Sep 12, 2019 at 11:30 AM Peter Geoghegan wrote:
> I wonder if it's possible to display a localized version of the
> display string in the NOTICE message? Does that work, or could it? For
> example, do you see the message in French?
BTW, I already know for sure that ICU supports
For
example, do you see the message in French?
--
Peter Geoghegan
indirectly triggered
more FPIs, which contributed to triggering a checkpoint even
earlier...and so on. Synthetic test cases can avoid this. A useful
synthetic test should have no checkpoints at all, so that we can see
the broken down costs, without any second order effects that add more
cost in weird ways.
--
Peter Geoghegan
eleted_in_Newsletter_I-8
--
Peter Geoghegan
On Fri, Sep 6, 2019 at 3:22 PM Peter Geoghegan wrote:
> I'll take care of it, then.
Attached is v10, which has some comment and style fix-ups, including
the stuff Alvaro mentioned. It also adds line pointer sanitization to
match what I added to verify_nbtree.c in commit a9ce839a (we use a
cus
th progress reporting infrastructure. I think that it's okay
to redefine how progress reporting works with CLUSTER now, in order to
fix the REINDEX/CLUSTER state clobbering bug.
--
Peter Geoghegan
As I went into at the start of this
e-mail, unnecessarily doing expensive things like copying large
posting lists around is a real concern. Even if it isn't truly useful
for _bt_dedup_one_page() to operate in a very incremental fashion,
incrementalism is probably still a good thing to aim for -- it seems
to make deduplication faster in all cases.
--
Peter Geoghegan
On Wed, Sep 18, 2019 at 10:43 AM Peter Geoghegan wrote:
> This also suggests that making _bt_dedup_one_page() do raw page adds
> and page deletes to the page in shared_buffers (i.e. don't use a temp
> buffer page) could pay off. As I went into at the start of this
> e-mail, unneces
have very
small BufFileWrite() size arguments. tuplestore.c, for one.
--
Peter Geoghegan
nal script
> is not really TPC-B. That's treading on being false advertising.
IANAL, but it may not even be permissible to claim that we have
implemented "standard TPC-B".
--
Peter Geoghegan
does the bit mean? It could mean "please check the undo
> log," in which case it'd have to be set on insert, eventually cleared,
> and then reset on delete, but I think that's likely to suck. I think
> therefore that the bit should mean
> is-deleted-but-not-necessarily-all-visible-yet, which avoids that
> problem.
That sounds about right to me.
--
Peter Geoghegan
e.
Not sure where that leaves this patch. What problem is it actually
trying to solve?
[1] http://www.tpc.org/tpcb/
--
Peter Geoghegan
On Fri, Jul 26, 2019 at 7:25 PM Peter Geoghegan wrote:
> I guess that the idea here was to prevent masking on ipv6 addresses,
> though not on ipv4 addresses. Obviously we're only dealing with a
> prefix with ipv6 addresses, whereas we usually have the whole raw
> ipaddr with ipv4. Not
the reserved
range? It seems preferable for everybody to consistently use the
reserved OID range.
--
Peter Geoghegan
ct same mail at
> CAH2-WzmCzNMebiN4-8p=ON92m0Rz0ybxNEKrO_2J+9DqWfWP=a...@mail.gmail.com :)
Seems like I should propose a patch this time around. I don't do Perl,
but I suppose I could manage something as trivial as this.
--
Peter Geoghegan
On Fri, Jul 26, 2019 at 6:58 PM Peter Geoghegan wrote:
> I found this part of your approach confusing:
>
> > + /*
> > +* Number of bits in subnet. e.g. An IPv4 that's /24 is 32 - 24 = 8.
> > +*
> > +* However, only some of the bits may hav
701 - 800 of 3113 matches
Mail list logo