lignment rules seem to make it
worthwhile to keep the heap TID in the tuple header; it seems
inherently necessary to have a MAXALIGN()'d tuple header, so finding a
way to consistently put the first MAXALIGN() quantum to good use seems
wise.
--
Peter Geoghegan
On Wed, Apr 24, 2019 at 10:43 AM Peter Geoghegan wrote:
> The hard part is how to do varwidth encoding for space-efficient
> partition numbers while continuing to use IndexTuple fields for heap
> TID on the leaf level, *and* also having a
> BTreeTupleGetHeapTID()-style macro to g
to disable the trace_sort instrumentation my commenting out
the TRACE_SORT entry in pg_config_manual.h. I recall being opposed on
this point by Robert Haas. Possibly because he just didn't want to
deal with it at the time.
--
Peter Geoghegan
wkward position of no
longer quite meeting the traditional definition of a "developer
option".
--
Peter Geoghegan
ntation is good, and you seem to be saying that it is, I
> think we should just remove the symbol and be done with it.
Sounds like a plan. Do you want to take care of it, Joe?
--
Peter Geoghegan
ailable by the new
pg_stat_progress_create_index view, but with getrusage() stats.
--
Peter Geoghegan
roaches over the
other, though I don't really know how to assess this layering
business. I'm glad that either approach will prevent oversights,
though.
--
Peter Geoghegan
The documentation has a section called "Routine Reindexing", which
explains how to simulate REINDEX CONCURRENTLY with a sequence of
creation and replacement steps. This should be updated to reference
the REINDEX CONCURRENTLY command.
--
Peter Geoghegan
hink about or define
developer options frames this discussion.
--
Peter Geoghegan
On Thu, Apr 25, 2019 at 1:56 PM Tom Lane wrote:
> Well, I was suggesting that we ought to consider the alternative of
> making it *not* always compiled, and Jeff was pushing back on that.
Right. Sorry.
--
Peter Geoghegan
On Tue, Apr 16, 2019 at 12:00 PM Peter Geoghegan wrote:
> On Mon, Apr 15, 2019 at 7:30 PM Alexander Korotkov
> wrote:
> > Currently we amcheck supports lossy checking for missing parent
> > downlinks. It collects bitmap of downlink hashes and use it to check
> > subse
K_ROW_END)
15521292 : break;
1553 0 : subkey++;
1554 0 : continue;
I would expect the "break" statement to have a line count that is no
greater than that of the first two lines that immediately precede, and
yet it's far far greater (1292 is greater than 4). It looks like there
has been some kind of loop transformation.
--
Peter Geoghegan
http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf
Search the PDF for "-O0" to see numerous references to this. It seems
to be impossible to turn off all GCC optimizations.
--
Peter Geoghegan
dable, and should be avoided. ISTM that the
least confusing way of removing the ambiguity would be to no longer
refer to ItemIds as item pointers, without changing anything else.
--
Peter Geoghegan
ly-logical
identifier in your prototype, but it's not a great long term solution.
--
Peter Geoghegan
rs" only. This would involve removing
the comment in itemid.h that confusingly refers to line pointers as
"item pointers" (plus any other comments that fail to make a clear
distinction).
I think that the total number of comments that would be affected by
this approach is quite low.
--
Peter Geoghegan
hat
would probably cause problems without any real benefit. OTOH, calling
two closely related but distinct things by the same name is atrocious.
--
Peter Geoghegan
at that's very
unlikely. That's why I would like to understand the problem that you
found with the check.
--
Peter Geoghegan
nity items in
internal pages.
I think that the simple answer to your question is yes. It would be
more complicated that way, and the only extra check would be the check
of high keys against their parent, but overall this does seem
possible.
--
Peter Geoghegan
nconsistencies
even with such a large index. I admit that its unfriendly that users
are not warned about the shortage currently, but that is something we
can probably find a simple (backpatchable) fix for.
--
Peter Geoghegan
dex builds.
Also, "off_t" is only 32-bits on Windows, which broke parallel CREATE
INDEX (issued fixed by commit aa551830). I suppose that "off_t" is
really a variant of the same problem.
--
Peter Geoghegan
that seems totally impractical, and without any real upside. But it
would be nice to identify integer types where there is a real risk of
making an incorrect assumption, and then eliminate that risk once and
for all.
--
Peter Geoghegan
that I don't know that much about, whereas off_t is used all over the
backend code.
--
Peter Geoghegan
oices elsewhere.
I was aware of that, but I wasn't aware of how many places that bleeds
into until I checked just now.
It would be nice if we could figure out how to make it obvious that
the idioms around the use of long for work_mem stuff are idioms that
have a specific rationale. It'
a lot of reward.
Whether or not *fully* banning the use of "long" is something that
will simplify the code is debatable. However, we could substantially
reduce the use of "long" across the backend without any real downside.
The work_mem question can be considered later. Does that seem
reasonable?
--
Peter Geoghegan
ide to using int64, and
being to support negative integers can be useful in some contexts
(though not this context).
--
Peter Geoghegan
tly I shouldn't have done so).
I am interested in making the code less complicated. If we can remove
the work_mem kludge for Windows as a consequence of that, then so much
the better.
--
Peter Geoghegan
n future same can happen as well.
I believe that the test coverage of GiST index builds is something
that is being actively worked on right now. It's a recognized problem
[1].
[1] https://postgr.es/m/24954.1556130...@sss.pgh.pa.us
--
Peter Geoghegan
On Tue, Apr 30, 2019 at 9:44 AM Andreas Joseph Krogh
wrote:
> I have a 1.4GB dump (only one table) which reliably reproduces this error.
> Shall I share it off-list?
I would be quite interested in this, too, since there is a chance that
it's my bug.
--
Peter Geoghegan
;ve send dumps that were larger than that by providing a Google drive
link. Something like that should work reasonably well.
--
Peter Geoghegan
On Tue, Apr 30, 2019 at 9:56 AM Andreas Joseph Krogh wrote:
> I've sent you guys a link (Google Drive) off-list.
I'll start investigating the problem right away.
Thanks
--
Peter Geoghegan
On Tue, Apr 30, 2019 at 9:59 AM Peter Geoghegan wrote:
> I'll start investigating the problem right away.
I have found what the problem is. I simply neglected to make a
conservative assumption about suffix truncation needing to add a heap
TID to a leaf page's new high key in nbtsort
On Tue, Apr 30, 2019 at 11:54 AM Andreas Joseph Krogh
wrote:
> Nice, thanks!
Thanks for the report!
--
Peter Geoghegan
On Sun, Apr 28, 2019 at 10:15 AM Alexander Korotkov
wrote:
> I think this definitely not bug fix. Bloom filter was designed to be
> lossy, no way blaming it for that :)
I will think about a simple fix, but after the upcoming point release.
There is no hurry.
--
Peter Geoghegan
On Tue, Apr 30, 2019 at 10:58 AM Peter Geoghegan wrote:j
> I have found what the problem is. I simply neglected to make a
> conservative assumption about suffix truncation needing to add a heap
> TID to a leaf page's new high key in nbtsort.c (following commit
> dd299df8189), eve
dlocks that
cannot be fixed by reordering something in application code).
--
Peter Geoghegan
On Tue, Apr 30, 2019 at 6:28 PM Peter Geoghegan wrote:
> Attached is a much more polished version of the same patch. I tried to
> make clear how the "page full" test (the test that has been fixed to
> take heap TID space for high key into account) is related to other
> clos
On Fri, Mar 1, 2019 at 3:59 PM Peter Geoghegan wrote:
> /*
> * Perform the same check on this internal level that
> * _bt_mark_page_halfdead performed on the leaf level.
> */
> if (_bt_is_page_halfdead(rel, *rightsib
On Fri, Apr 26, 2019 at 5:13 PM Peter Geoghegan wrote:
> On Fri, Apr 26, 2019 at 5:05 PM Tom Lane wrote:
> > Yeah, I'd be fine with that, although the disconnect between the type
> > name and the comment terminology might confuse some people.
>
> Maybe, but the fact
If so,
that would be a risk that was introduced in Postgres 11, and made much
more likely in practice in Postgres 12. (I haven't got as far as doing
an analysis of the risks to page deletion, though. The "fastpath"
rightmost page insertion optimization that was also added to Postgres
11
On Mon, May 6, 2019 at 12:48 PM Peter Geoghegan wrote:
> On the other other hand, it seems to me that the PageGetTempPage()
> thing might have been okay, because it happens before the high key is
> inserted on the new right buffer page. The same cannot be said for the
> way we generat
s, and that we'll probably make
_bt_truncate() itself do more, the actual business of performing a
split has no reason to change that I can think of. I would like to
keep _bt_split() as simple as possible anyway -- it should only be
copying tuples using simple primitives like the bufpage.c routines.
Living with what we have now (not using a temp buffer for the right
page) seems fine.
--
Peter Geoghegan
On Mon, May 6, 2019 at 4:11 PM Peter Geoghegan wrote:
> The important question is how VACUUM will recognize it. It's clearly
> not as bad as something that causes "failed to re-find parent key"
> errors, but I think that VACUUM might not be reclaiming it for the FSM
> (
On Mon, May 6, 2019 at 5:15 PM Peter Geoghegan wrote:
> VACUUM asserts P_FIRSTDATAKEY(opaque) > PageGetMaxOffsetNumber(page)
> within _bt_mark_page_halfdead(), but doesn't test that condition in
> release builds. This means that the earliest modifications of the
> right page,
me?
Actually, maybe we won't get that error, because we're talking about a
corrupt index, and all bets are off -- no reason to think that the
half-dead internal page would be consistent with other pages in any
way. But even then, you'll go on to report it in the usual way, since
VACUUM scans nbtree indexes in physical order.
--
Peter Geoghegan
rather
than doing a lot of typing each time, as in:
:ea SELECT ...
--
Peter Geoghegan
On Mon, May 6, 2019 at 4:11 PM Peter Geoghegan wrote:
> I am tempted to move the call to _bt_truncate() out of _bt_split()
> entirely on HEAD, possibly relocating it to
> nbtsplitloc.c/_bt_findsplitloc(). That way, there is a clearer
> separation between how split points are ch
On Tue, May 7, 2019 at 6:15 PM Peter Geoghegan wrote:
> I suppose I'm biased, but I prefer the new approach anyway. Adding the
> left high key first, and then the right high key seems simpler and
> more logical. It emphasizes the similarities and differences between
> leftpage
On Wed, May 8, 2019 at 3:37 PM Peter Geoghegan wrote:
> It makes perfect sense for _bt_split() to call _bt_findsplitloc()
> directly, since _bt_findsplitloc() is already aware of almost every
> _bt_split() implementation detail, whereas those same details are not
> of interest anywh
hat anyone will be
affected, because tuples that are that wide are already compressed in
almost all cases -- it seems like it would be hard to be just at the
edge of the limit already.
Thanks
--
Peter Geoghegan
0001-Suggested-changes-to-v12-release-notes.patch
Description: Binary data
On Fri, May 10, 2019 at 6:02 PM Bruce Momjian wrote:
> Have new btree indexes sort duplicate index entries in heap-storage
> order (Peter Geoghegan)
>
> This slightly reduces the maximum-allowed length of in
ne --
even in instances where you make the right call. I don't think that I
am alone in seeing it this way.
--
Peter Geoghegan
r enhancement
> made possible by previous commits, or a "fix" for a previous commit.
Yes. It's a bug fix that went in after feature freeze.
--
Peter Geoghegan
On Sat, May 11, 2019 at 11:02 AM Bruce Momjian wrote:
> OK, commit removed.
You're mistaken -- nothing has been pushed to master in the last 3 hours.
--
Peter Geoghegan
On Sun, May 5, 2019 at 1:14 PM Peter Geoghegan wrote:
> Attached draft patch adjusts code comments and error messages where a
> line pointer is referred to as an item pointer. It turns out that this
> practice isn't all that prevalent. Here are some specific concerns
> that I h
heap-only tuple),
>
> Should "that item" also be re-worded, for consistency?
Yes, it should be -- I'll fix it.
I'm going to backpatch the storage.sgml change on its own, while
pushing everything else in a separate HEAD-only commit.
Thanks
--
Peter Geoghegan
On Tue, May 7, 2019 at 9:59 AM Peter Geoghegan wrote:
> On Tue, May 7, 2019 at 12:27 AM Heikki Linnakangas wrote:
> > I don't understand that reasoning. Yes, _bt_pagedel() will complain if
> > it finds a half-dead internal page. But how does that mean that
> > _bt_
On Mon, May 13, 2019 at 8:30 PM Tom Lane wrote:
> Peter Geoghegan writes:
> > To be fair, I suppose that the code made more sense when it first went
> > in, because at the time there was a chance that there could be
> > leftover half-dead internal pages. But, that was a long
On Mon, May 13, 2019 at 9:09 PM Peter Geoghegan wrote:
> Even when that happens, the index is already considered corrupt by
> VACUUM, so the same VACUUM process that could in theory be adversely
> affected by removing the half-dead internal page check will complain
> about the page
On Thu, May 16, 2019 at 1:05 PM Peter Geoghegan wrote:
> Actually, now that I look back at how page deletion worked 5+ years
> ago, I realize that I have this slightly wrong: the leaf level check
> is not sufficient to figure out if the parent's right sibling is
> pending d
ng about what the expectations are for
> fd.c, and some rejiggering of PrepareTempTablespaces's API too.
> I'm not sufficiently excited about this issue to do that.
+1. Let's close this one out.
--
Peter Geoghegan
On Fri, May 17, 2019 at 6:36 PM Tom Lane wrote:
> Will do so tomorrow. Should we back-patch this?
I wouldn't, because I see no reason to. Somebody else might.
--
Peter Geoghegan
re
willing to treat TIDs as a whole new type of tuple with its own set of
specialized functions in tuplesort.c, which has problems of its own,
then it's kind of awkward to do it some other way.
--
Peter Geoghegan
On Mon, May 20, 2019 at 3:17 PM Andres Freund wrote:
>
>
>
> Improve speed of btree index insertions (Peter Geoghegan,
> Alexander Korotkov)
>
My concern here (which I believe Alexander shares) is that it doesn't
make sense to group t
ative infinity" sentinel values) also contributes to
making indexes smaller.
The page split stuff was mostly added by commit fab250243 ("Consider
secondary factors during nbtree splits"), but commit f21668f32 ("Add
"split after new tuple" nbtree optimization") a
wraparound point under normal circumstances.
This does seem like an unfriendly behavior. Moving the thread over to
the -hackers list for further discussion...
--
Peter Geoghegan
inable index corruption. Admittedly, amcheck didn't
find any bugs in my code after the first couple of versions of the
patch series, so this approach seems unlikely to find any problems
now. Even still, it wouldn't be very difficult to do this extra step.
It seems worthwhile to be thoroug
.e. we'll
definitely do it that way if there is sufficient work_mem)?
--
Peter Geoghegan
gh. Perhaps some hack with effects confined to
> pg_upgrade's test would be better. I don't have a good idea what
> that would look like, however.
>
> Or we could just say this isn't annoying enough to fix.
I think it's worth fixing.
--
Peter Geoghegan
On Fri, May 24, 2019 at 12:31 PM Peter Geoghegan wrote:
>
> On Wed, May 22, 2019 at 3:57 PM Tom Lane wrote:
> > I experimented with the attached quick-hack patch to make pg_regress
> > suppress notices from its various initial DROP/CREATE IF [NOT] EXISTS
> > commands. I&
when I
use the optimum number of jobs, so I don't consider it to be urgent.
I'm happy with the new approach, since it avoids the problem of
regression.diffs files that get deleted before I have a chance to take
a look. I should thank Noah for his work on this.
--
Peter Geoghegan
pick a number at random?
It also looks like pg_proc.dat should be updated, since it still
mentions the old custom of trying to use contiguous OIDs. It also
discourages the practice of picking OIDs at random, which is almost
the opposite of what it should say.
--
Peter Geoghegan
cular query and taking the median
execution time as representative seems to be the best approach.
--
Peter Geoghegan
y necessary to memset() at the
equivalent point for macaddr8, since we cannot "run out of bytes from
the authoritative representation" that go in the Datum/abbreviated
key. I suppose that the memset() should simply be removed, since it is
superfluous here.
--
Peter Geoghegan
that that would make sense, but I don't think that this
patch needs to do that. There is at least one pre-existing cases that
does this -- macaddr.
--
Peter Geoghegan
lly be fixed,
though, since you have one point of contention for each of the really
hot values at the far left of the leaf level.
[1] https://commitfest.postgresql.org/20/1787/
--
Peter Geoghegan
ot;data" field instead of a
text "data" field.)
--
Peter Geoghegan
On Wed, Oct 3, 2018 at 4:39 PM Peter Geoghegan wrote:
> I did find a pretty clear regression, though only with writes to
> unique indexes. Attached is v6, which fixes the issue. More on that
> below.
I've been benchmarking my patch using oltpbench's TPC-C benchmark
these pa
Shared_buffers is 10gb iirc. The server has 32gb of memory. Yes, 'public'
is the patch case. Sorry for not mentioning it initially.
--
Peter Geoghegan
(Sent from my phone)
#x27;d make sense to hack up a patch that logs when evicting a
> buffer while already holding another lwlock. That shouldn't be too hard.
I'll look into this.
Thanks
--
Peter Geoghegan
ant the extra LOG instrumentation to influence
the outcome.
--
Peter Geoghegan
's a particularly good target for my patch. I can
find a way of only executing the read TPC-C queries, to see where they
are on their own. TPC-C is particularly write-heavy, especially
compared to the much more recent though less influential TPC-E
benchmark.
--
Peter Geoghegan
I'm debugging, but it does still make
the process a bit easier. rr is supposed to be particularly helpful
with non-deterministic or difficult to recreate bugs; hopefully I'll
be able to apply it in that way when I gain some more experience with
it.
--
Peter Geoghegan
arch of a leaf
page needs to last for a relatively long time. I'll look into it
again, though.
> Due to the optimization the _bt_binsrch() size has grown twice. May be
> you move this to some service routine?
Maybe. There are some tricky details that seem to work against it.
I'll see if it's possible to polish that some more, though.
--
Peter Geoghegan
On Fri, Nov 2, 2018 at 5:00 PM Peter Geoghegan wrote:
> I had the opportunity to discuss this patch at length with Heikki
> during pgConf.EU.
> The DESC heap TID sort order thing probably needs to go. I'll probably
> have to go fix the regression test failures that occur wh
d"
order. I think that this is an example of the more general problem --
what you call the 'drop cascades to 62 other objects' problem is a
more specific subproblem, or, if you prefer, a more specific symptom
of the same problem.
Since I'm going to have to fix the problem head-on, I'll have to study
it in detail anyway.
--
Peter Geoghegan
ause we also wrote regression
tests that exercised the useful behavior.
> May be you know any another problems of the patch?
Just the lack of pg_upgrade support. That is progressing nicely,
though. I'll probably have that part in the next revision of the
patch. I've found what looks like a workable approach, though I need
to work on a testing strategy for pg_upgrade.
--
Peter Geoghegan
epend on it
-DETAIL: extension earthdistance depends on extension cube
+DETAIL: extension earthdistance depends on function cube_out(cube)
Can anyone think of a workable, scalable approach to fixing the
processing order of this findDependentObjects() pg_depend scan so that
we reliably get the
on gin page type into
"can't happen" elog errors in places like scanPendingInsert() and
ginInsertCleanup() would be a good start. Note that we already did
similar Assert-elog(ERROR) promotion this for posting tree code, when
similar bugs appeared way back in 2013.
--
Peter Geoghegan
t;--- deadlocks
list, nlist, NULL);
}
Clearly this particular call to ginEntryInsert() from within
ginInsertCleanup() is generally supposed to be a rare occurrence. It's
not surprising that it's hard for you to hit this issue.
BTW, I strongly doubt that disabling hugepages has fixed anything,
though it may have accidentally masked the problem. This is an issue
around the basic correctness of the locking protocols used by GIN.
--
Peter Geoghegan
c4ab97e from 2013 -- that detail is
probably important, because it seems to be what breaks the subtree
design (we don't delete in two phases or anything in GIN).
I think that you have to be doing a multi-level delete for a deadlock
to take place, which isn't particularly likely to coincide with a
concurrent insertion in general. That may explain why it's taken a
year to get a high-quality bug report.
--
Peter Geoghegan
On Wed, Nov 7, 2018 at 5:46 PM Peter Geoghegan wrote:
> I think that you have to be doing a multi-level delete for a deadlock
> to take place, which isn't particularly likely to coincide with a
> concurrent insertion in general. That may explain why it's taken a
> year to
On Wed, Nov 7, 2018 at 5:46 PM Peter Geoghegan wrote:
> Teodor: Do you think that the issue is fixable? It looks like there
> are serious issues with the design of 218f51584d5 to me. I don't think
> the general "there can't be any inserters at this subtree" thing
e best way to fix
the problem is to force the scan order to be the one that we
accidentally depend on using some mechanism or other. Though not
necessarily the one I've sketched out.
[1] https://postgr.es/m/24279.1525401...@sss.pgh.pa.us
--
Peter Geoghegan
On Tue, Nov 13, 2018 at 1:29 PM Peter Geoghegan wrote:
> A solution does occur to me that I'm kind of embarrassed to suggest,
> but that would nonetheless probably do the job:
> Why not vary the objsubid value among entries that don't use it
> anyway, so that they h
y changes along similar lines, since of course I'm only doing
this with the pg_depend indexes, and not for every system catalog
index.
--
Peter Geoghegan
From 7158fa1f93447d1f9bb296605a65b0fecfd54210 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan
Date: Tue, 13 Nov 2018 18:14:23
think that reverting commit
218f51584d5 is the only option on the table for the back branches. Its
design is based on ideas on locking protocols that are fundamentally
incorrect and unworkable. I don't have a lot of faith in our ability
to retrofit a design that fixes the issue without causing problems
elsewhere.
--
Peter Geoghegan
On Sun, Feb 11, 2018 at 11:09 PM, Pavan Deolasee
wrote:
> On Fri, Feb 9, 2018 at 6:53 AM, Peter Geoghegan wrote:
>> My concern is mostly just that MERGE manages to behave in a way that
>> actually "provides a single SQL statement that can conditionally
>> INSERT, UPD
hared_memory_type
== DSM_IMPL_NONE", so that's a second entry for the "nonstandard
configuration" list.
--
Peter Geoghegan
e positive rate, then
you can summarize a set of 40 million distinct values with only
22.85MB of memory and 3 hash functions. I think that the smallest
possible amount of memory that any hash table of 40 million elements
would require a much greater amount of memory than 22.85MB --
certainly closer to 20x than to 8x. Even that is pretty conservative.
I bet there are plenty of hash joins were the ratio could be 30x or
more. Not to mention cases with multiple batches.
--
Peter Geoghegan
1001 - 1100 of 3218 matches
Mail list logo