On Fri, Mar 23, 2018 at 11:15 PM, Peter Geoghegan wrote:
> I agree that this is very similar, as far as the RTEs go. What is
> dissimilar is the fact that there is hard-coded knowledge of both
> through parsing, planning, and execution. It's everything, taken
> together.
>
&
s worth.
Clearly I jumped the gun on this one. I agree that this is fine.
--
Peter Geoghegan
that
necessitates the use of multiple RTEs. I don't think it would make
sense to use a second RTE only when needed.
--
Peter Geoghegan
th 3 being determined
> purely experimentally though.
A quick look at the DWARF4 standard [1] suggests that this refers to
lexical depth. So, I agree that that doesn't seem like a great idea.
[1] http://dwarfstd.org/doc/DWARF4.pdf
--
Peter Geoghegan
ions.
Sorry for going off in a tangent, but I think it's somewhat necessary
to have a strategy here. Of course, we don't have to get everything
right now, but we should be looking in this direction whenever we talk
about on-disk nbtree changes.
--
Peter Geoghegan
ey_normalization#How_big_can_normalized_keys_get.2C_and_is_it_worth_it.3F
--
Peter Geoghegan
oss-check workaround is ugly, but apparently there is a
precedent in copy.c. I didn't know that detail until Robert pointed it
out. That makes me feel a lot better about this general question of
how the target relation is represented, having two RTEs, etc.
--
Peter Geoghegan
e committed shortly.
--
Peter Geoghegan
From ede1ba731dc818172a94adbb6331323c1f2b1170 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan
Date: Thu, 24 Aug 2017 20:58:21 -0700
Subject: [PATCH v7 1/2] Add Bloom filter data structure implementation.
A Bloom filter is a space-efficient, probabilistic dat
ch goal is when coming up with a back-patchable fix like
this. When the goals are somewhat in competition with each other, a
second or a third opinion is particularly appreciated.
--
Peter Geoghegan
rmat should not need to
>> be interpreted in a context-sensitive way, if we can avoid it.
>
> Both pg_filedump and amcheck could correclty parse any tuple based on
> BTP_LEAF flags and length of tuple.
amcheck doesn't just care about the length of the tuple. It would have
to rely on catalog metadata about this being an INCLUDE index.
--
Peter Geoghegan
-coded
assumptions about the relationship between those two RTEs.
--
Peter Geoghegan
t must have a snapshot that is at least
as old as the first snapshot that the first call to bt_check_index()
acquires. It's not a special case; it's exactly as bad as any
statement that takes the same amount of time to execute.
> Is
> there anything we can do to lessen that burden l
ould use suffix truncation to save at least one byte in
almost all cases, even with the thinnest possible
single-integer-attribute IndexTuples. What you describe just isn't
going to happen.
--
Peter Geoghegan
. That's what we see for the RETURNING +
ON CONFLICT projections within nodeModifyTable.c, which the new
projections are very similar to, and clearly modeled on.
--
Peter Geoghegan
he patch as ready
for committer, actually. I was just expressing that your input was
valuable. Sorry for being unclear.
--
Peter Geoghegan
uld take a pretty steep penalty for me to want to take
> the risk of refactoring to avoid that.
>
> I've pushed it with some cosmetic adjustments.
Thank you, Tom.
--
Peter Geoghegan
tuples? I don't
> know answer to that, but given that sometimes bt_check_index() may take
> hours if not days to finish, it seems worth thinking or at least documenting
> the side-effects somewhere.
That seems like a worthwhile goal for a heap checker that only checks
the structure of the heap, rather than something that checks the
consistency of an index against its table. Especially one that can run
without a connection to the database, for things like backup tools,
where performance is really important. There is certainly room for
that. For this particular enhancement, the similarity to CREATE INDEX
seems like an asset.
--
Peter Geoghegan
patch and it looks fine to me.
I'm grateful that you didn't feel any need to encourage me to use
whatever the novel/variant filter du jour is! :-)
--
Peter Geoghegan
d it when investigating problems after the fact, and
as a general smoke-test, where it works well. I would perhaps
recommend running it once a week, although that's a fairly arbitrary
choice. The docs in v10 don't take a position on this, so while I tend
to agree we could do better, it is a preexisting issue.
[1] https://www.postgresql.org/docs/10/static/amcheck.html#id-1.11.7.11.7
--
Peter Geoghegan
. You make a good point about Java. Furthermore, when people talk
about just in time compilation in database systems, which is far from
a novel thing, they generally use the word JIT.
Why confuse things?
--
Peter Geoghegan
item */
if (newitemonleft)
leftfree -= (int) state->newitemsz;
else
rightfree -= (int) state->newitemsz;
With an extreme enough case, this could result in a failure to find a
split point. Or at least, if that isn't true then it's not clear why,
and I think it needs to be explained.
--
Peter Geoghegan
The margin of error is tiny - certainly much less than a
practical use-case could ever care about.
>> +/*
>> + * Calculate "val MOD m" inexpensively.
>> + *
>> + * Assumes that m (which is bitset size) is a power-of-two.
>> + *
>> + * Using a power-of-two
On Thu, Mar 29, 2018 at 7:42 PM, Peter Geoghegan wrote:
>> We should be able to get this into v11...
>
> That's a relief. Thanks.
I have a new revision lined up. I won't send anything to the list
until you clear up what you meant in those few cases where it seemed
unclear.
the same functionality at a coarser granularity (e.g.
database, table), certainly, but I don't see that there is much more
that we can do while starting with a B-Tree index as our target.
Please propose an alternative user interface for the new check. If you
prefer, I can invent new bt_index_check_heap() +
bt_index_parent_check_heap() variants. That would be okay with me.
--
Peter Geoghegan
On Fri, Mar 30, 2018 at 6:55 PM, Peter Geoghegan wrote:
> On Fri, Mar 30, 2018 at 6:20 PM, Andres Freund wrote:
>>> What would the upcasting you have in mind look like?
>>
>> Just use UINT64CONST()? Let's try not to introduce further code that'll
>> need
idea to stick covering indexes and suffix truncation features together.
> That wouldn't accelerate appearance one feature after another, but rather
> likely would RIP both of them...
I think that the thing that's more likely to kill this patch is the
fact that after the first year, it only ever got discussed in the
final CF. That's not something that happened because of my choices. I
made several offers of my time. I did not create this urgency.
[1] https://www.postgresql.org/message-id/18788.963953...@sss.pgh.pa.us
[2]
https://wiki.postgresql.org/wiki/Key_normalization#Making_all_items_in_the_index_unique_by_treating_heap_TID_as_an_implicit_last_attribute
--
Peter Geoghegan
On Fri, Mar 30, 2018 at 10:39 PM, Peter Geoghegan wrote:
> It's safe, although I admit that that's a bit hard to see.
> Specifically, look at this code in _bt_insert_parent():
>
> /*
> * Find the parent buffer and get the parent page.
> *
&g
arent_check_heap()? Or, are you talking about function
overloading?
--
Peter Geoghegan
tion and
> causing issues due to dependencies.
WFM. I have all the information I need to produce the next revision now.
--
Peter Geoghegan
On Sat, Mar 31, 2018 at 2:59 PM, Peter Geoghegan wrote:
> WFM. I have all the information I need to produce the next revision now.
I might as well post this one first. I'll have 0002 for you in a short while.
--
Peter Geoghegan
v8-0001-Add-Bloom-filter-data-structure-implementati
On Sat, Mar 31, 2018 at 3:15 PM, Peter Geoghegan wrote:
>> WFM. I have all the information I need to produce the next revision now.
>
> I might as well post this one first. I'll have 0002 for you in a short while.
Attached is 0002 -- the amcheck enhancement itself. As requeste
onal overhead of having an
accurate high key size for every candidate split point would be
significant.
--
Peter Geoghegan
On Sat, Mar 31, 2018 at 3:15 PM, Peter Geoghegan wrote:
> On Sat, Mar 31, 2018 at 2:59 PM, Peter Geoghegan wrote:
>> WFM. I have all the information I need to produce the next revision now.
>
> I might as well post this one first. I'll have 0002 for you in a short while
On Sat, Mar 31, 2018 at 8:08 PM, Andres Freund wrote:
>> round() is from C99, apparently. I'll investigate a fix.
>
> Just replacing it with a floor(val + 0.5) ought to do the trick, right?
I was thinking of using rint(), which is what you get if you call
round(float8) fro
On Sat, Mar 31, 2018 at 8:09 PM, Peter Geoghegan wrote:
> I was thinking of using rint(), which is what you get if you call
> round(float8) from SQL.
Attached patch does it that way. Note that there are float/int cast
regression tests that ensure that rint() behaves consistently on
sup
On Sat, Mar 31, 2018 at 8:32 PM, Andres Freund wrote:
> LGTM, pushed. Closing CF entry. Yay! Only 110 to go.
Thanks Andres!
--
Peter Geoghegan
fset number...
Maybe I was confused.
> I'd like to note that I really appreciate your attention to this patch
> as well as other patches.
Thanks. I would like to thank Anastasia and you for your patience and
perseverance, despite what I see as mistakes in how this project was
manged. I really want for it to be possible for there to be more
patches in the nbtree code, because they're really needed. That was a
big part of my motivation for writing amcheck, in fact. It's tedious
to link this patch to a bigger picture about what we need to do with
nbtree in the next 5 years, but I think that that's what it will take
to get this patch in. That's my opinion.
--
Peter Geoghegan
And Craig is the main
> proponent of it?
I wonder how bad it will be in practice if we PANIC. Craig said "This
isn't as bad as it seems because AFAICS fsync only returns EIO in
cases where we should be stopping the world anyway, and many FSes will
do that for us". It would be nice to get more information on that.
--
Peter Geoghegan
ot;)));
>
> I guess we can add that later, but it's a bit sad that this won't work
> against views with INSTEAD OF triggers.
It also makes it hard to test deparsing support, which actually made
it in in the end. That must be untested.
--
Peter Geoghegan
join is bad. That's what MERGE is
> doing anyways, so it closely matches with the overall procedure.
The issue is not that there is a join as such. It's how it's
represented in the parser, and how that affects other things. There is
a lot of special case logic to make it work.
--
Peter Geoghegan
ccurred somewhere in the file is all that is necessary;
applications that require better granularity already use O_DIRECT."
--
Peter Geoghegan
), then the first difficulty isn't high
> too.
I think it's possible.
I didn't have time to look at this properly today, but I will try to
do so tomorrow.
Thanks
--
Peter Geoghegan
't ideal, and
particularly hinders automated testing.
--
Peter Geoghegan
dd it to _bt_check_natts().
* README-SSI says:
* The effects of page splits, overflows, consolidations, and
removals must be carefully reviewed to ensure that predicate locks
aren't "lost" during those operations, or kept with pages which could
get re-used for different parts of the index.
Do we need to worry about that here? I guess not, because this is just
like having many duplicates. But a note just above the _bt_doinsert()
call to CheckForSerializableConflictIn() might be a good idea.
That's all I have for today.
--
Peter Geoghegan
it.
I agree that this was handled in a way that was just unsatisfactory.
It was also unnecessary, since we still have a few days left until
feature freeze.
--
Peter Geoghegan
otally untested. That was a consequence
of the representation in the executor.
--
Peter Geoghegan
problem, but it probably wouldn't have made a big difference in the
end. There is only so much time available.
--
Peter Geoghegan
okup and so on. Yes, we change size of some tuples,
> but B-tree already worked with tuples of variable sizes. So, the fact
> that tuples now have different size shouldn't affect SSI. Right now,
> I'm not sure if CheckForSerializableConflictIn() just above the
> _bt_doinsert() is good idea. But even if so, I think that should be
> a subject of separate patch.
My point was that that nothing changes, because we already use what
_bt_doinsert() calls the "first valid" page. Maybe just add: "(This
reasoning also applies to INCLUDE indexes, whose extra attributes are
not considered part of the key space.)".
That's it for today.
--
Peter Geoghegan
marks need to be seen in the context of the past couple of
weeks, and in the context of this being a relatively high risk patch
that was submitted quite late in the cycle.
--
Peter Geoghegan
was just one patch. I can think of several.
--
Peter Geoghegan
ll
* truncate_useless_pathkeys() to possibly remove more pathkeys.
* I don't think that there is much point in having separate 0003 +
0004 patches. For the next revision, please squash those down into
0002. Actually, maybe there should be only one patch for the next
revision. Up to you.
* Please write commit messages for your patches. I like to make these
part of the review process.
That's all for now.
--
Peter Geoghegan
m going to look at this again today, and will post something within
12 hours. Please hold off on committing until then.
--
Peter Geoghegan
about applying following patch on the top of v14?
It's clearly necessary. Looks fine to me.
--
Peter Geoghegan
t were not within
scope for v11. In some cases, these were addressed with only modest
effort. It's pretty obvious to me that some aspects of MERGE's
architecture are accidents. Not necessarily happy accidents.
--
Peter Geoghegan
On Fri, Apr 6, 2018 at 4:39 PM, Robert Haas wrote:
> Committed. Thanks to David for the report and analysis and to Peter
> for the patch and study.
Thanks!
--
Peter Geoghegan
eature freeze, to
make sure that it is in good shape. I have a general interest in
making sure that amcheck gains acceptance as a way of validating a
complicated patch like this one after commit.
[1] https://www.postgresql.org/message-id/15195.1490988897%40sss.pgh.pa.us
--
Peter Geoghegan
ssive usage of
> ItemPointerGetBlockNumberNoCheck(&(itup->t_tid)), suggest to wrap it to
> macro something like this:
> #define BTreeInnerTupleGetDownLink(itup) \
> ItemPointerGetBlockNumberNoCheck(&(itup->t_tid))
Agreed. We do that with GIN.
--
Peter Geoghegan
On Sat, Apr 7, 2018 at 1:02 PM, Teodor Sigaev wrote:
> Thanks to everyone, pushed.
I'll keep an eye on the buildfarm, since it's late in Russia.
--
Peter Geoghegan
robably be returning NULL for most
> properties except 'returnable'.
>
> I can look at fixing these for you if you like?
I'm happy to accept your help with it, for one.
--
Peter Geoghegan
least add a note to the nbtree README that
explains the high level theory behind the optimization, as part of
post-commit clean-up. I'll ask him to say something about how it might
affect extent-based storage, too.
--
Peter Geoghegan
On Sun, Apr 8, 2018 at 11:18 AM, Teodor Sigaev wrote:
> Thank you, fixed
I suggest that we remove some unneeded amcheck tests, as in the
attached patch. They don't seem to add anything.
--
Peter Geoghegan
From 0dbbee5bfff8816cddf86961bf4959192f62f1ff Mon Sep 17 00:00:00 2001
Fro
re serious practical problems. I have personally scoffed at stories
involving Postgres databases corruption that gets attributed to
running out of disk space. Looks like I was dead wrong.
[1] https://danluu.com/file-consistency/ -- "Filesystem correctness"
--
Peter Geoghegan
backpatch it, isn't it?
+1 to committing to master as a documented behavioral change.
--
Peter Geoghegan
ing about (perhaps
just renaming the macro).
I do not have the time to write a patch right away, but I should be
able to post one in a few days. I want to avoid sending several small
patches.
--
Peter Geoghegan
_bt_mark_page_halfdead() looked like it had a problem, but it now
looks like I was wrong. I also verified every other
BTreeInnerTupleGetDownLink() caller. It now looks like everything is
good here.
--
Peter Geoghegan
ing down the server using immediate mode
could lead to torn pages, that crash recovery will need to repair at a
later stage. I think that some strong caveats around this are required
in the pg_verify_checksums docs, at a minimum.
--
Peter Geoghegan
On Tue, Apr 10, 2018 at 1:37 PM, Peter Geoghegan wrote:
> _bt_mark_page_halfdead() looked like it had a problem, but it now
> looks like I was wrong.
I did find another problem, though. Looks like the idea to explicitly
represent the number of attributes directly has paid off already:
pg@
eds to be offline, which can be anything as long as
> the postmaster is stopped. That's how I understand the current
> wording.
I see. The problem is clearly the documentation, then.
--
Peter Geoghegan
is theoretically allowed to affect the behavior of equality,
even though so far we've never tried to make that work for any
collatable datatype.
Maybe the right thing to do is to say that any collation will work
equally well (as will any opclass). Maybe that's the same thing as
what you said, though.
> +1 for ii_IndexAttrNumbers.
+1
--
Peter Geoghegan
han
that they're wrong to ask about it. Though we should probably actually
just throw an error.
--
Peter Geoghegan
On Tue, Apr 10, 2018 at 5:45 PM, Peter Geoghegan wrote:
> I did find another problem, though. Looks like the idea to explicitly
> represent the number of attributes directly has paid off already:
>
> pg@~[3711]=# create table covering_bug (f1 int, f2 int, f3 text);
> create uniqu
le also referring to more advanced
techniques for compressing (not truncating) leaf tuples as prefix
compression.
I suggested suffix truncation because it seemed to be the dominant way
of referring to the technique. And, because it seemed more logical:
the suffix is what gets truncated away.
--
Peter Geoghegan
b
= 15, c = 120 WHERE b = 4;
In short, it looks like the tests added to update.sql by commit
2f178441 ("Allow UPDATE to move rows between partitions") lead to this
failure, since I always hit a problem when update.sql is reached. I
haven't gone to the trouble of digging any deeper than that just yet.
--
Peter Geoghegan
On Thu, Apr 12, 2018 at 11:47 AM, Peter Geoghegan wrote:
> In short, it looks like the tests added to update.sql by commit
> 2f178441 ("Allow UPDATE to move rows between partitions") lead to this
> failure, since I always hit a problem when update.sql is reached. I
> haven
t most once
per page. Is it worth noting the exception?
--
Peter Geoghegan
;t replace the equivalent subexpression
> of MaxIndexTuplesPerPage with MinIndexTupleSize?".
I had the same reaction.
--
Peter Geoghegan
since fuzzing is inherently a computationally intensive
process; CPU costs are bound to dominate. A strategy like this would
also make it easier to run amcheck on indexes left behind by the
regression tests.
How can we do better?
--
Peter Geoghegan
tch added tests to. At no point do they leave behind any
INCLUDE indexes. I'll do something about that as part of the INCLUDE
patch that I'm working on at the moment.
--
Peter Geoghegan
On Sat, Apr 14, 2018 at 8:02 PM, Tom Lane wrote:
> So we've got *some*, but it sure looks like they were all added by the
> patch to fix covering indexes for partitions. I'd want to see some for
> plain-table cases as well.
Will do.
--
Peter Geoghegan
uot;can't happen" elog error). Finally, I made sure that we
don't drop all tables in the regression tests, so that we have some
pg_dump coverage for INCLUDE indexes, per a request from Tom.
[1] https://queue.acm.org/detail.cfm?id=2220317
--
Peter Geoghegan
From db777cad48f1
t your database.
I'm fine if Teodor wants to commit that change separately, of course.
--
Peter Geoghegan
ot to change.
Andres mentioned that he has prototyped an approach to buffer
management that uses a Radix tree, which is generally assumed to be
the right long-term fix.
--
Peter Geoghegan
ou posted.
> Attached patch is rebased to current head and contains some comment
> improvement in index_truncate_tuple() - you save some amount of memory with
> TupleDescCopy() call but didn't explain why pfree is enough to free all
> allocated memory.
Makes sense.
--
Peter Geoghegan
ached patch contains all
> changes suggested in my previous email.
Looks new BTreeTupSetNAtts () assertion good to me.
I suggest committing this patch as-is.
Thank you
--
Peter Geoghegan
On Wed, Apr 18, 2018 at 1:45 PM, Peter Geoghegan wrote:
> I suggest committing this patch as-is.
Actually, I see one tiny issue with extra '*' characters here:
> +* The number of attributes won't be explicitly represented if the
> +* negative infi
seem like a good idea? This could get pretty expensive if it
was overused, even by the standards of what we expect from
assertion-enabled builds, but we could make it optional if the
overhead got out of hand.
--
Peter Geoghegan
_attr(). AssertNotInCriticalSection() certainly found
several bugs when it was first added.
--
Peter Geoghegan
s first added.
>
> Yep.
I wrote a simple prototype of this. No assertion failed during a "make
installcheck". Assertions were added to heap_tuple_fetch_attr(), and a
couple of other places.
--
Peter Geoghegan
inity
items to the same comment block. While it's true that _bt_insertonpg()
cannot truncate downlinks to make new minus infinity items, I see that
as a narrower issue.
--
Peter Geoghegan
From dced00be29775965a45c7889ca99e19b96d9e4d0 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan
Date: Wed
On Wed, Apr 18, 2018 at 10:29 PM, Teodor Sigaev wrote:
> Will see...
I'll take a look tomorrow.
--
Peter Geoghegan
UDE patch didn't actually
change the page deletion algorithm in any way.
> Trivial and naive fix is attached, but for me it looks a bit annoing that we
> store pointer (leafhikey) somewhere inside unlocked page.
Well, it has worked that way for a long time. No reason to change it now.
I also think that we could have better conventional regression test
coverage here.
--
Peter Geoghegan
g will break unless we do a detailed audit of all that code.
+1. I think that it is plainly a bad idea to do something like that at
this point in the cycle.
--
Peter Geoghegan
In summary, I think that we should find a way to never use
ItemPointerGetBlockNumberNoCheck() in nbtree, while still using it in
amcheck.
--
Peter Geoghegan
been
there already). Maybe we can add it to the amcheck regression tests
instead, since those are run less often. This also allows us to add a
test case specifically as part of the amcheck enhancement, which makes
a bit more sense.
--
Peter Geoghegan
ing soon.
> Hm, it seems to me, that 350ms is short enough to place it in both core and
> amcheck test. I think, basic functionality should be covered by core tests
> as we test insert/create.
I think you're right. There is really no excuse for not having
thorough code coverage for a module like nbtree.
--
Peter Geoghegan
ut what general area this deals with.
Perhaps something like the following:
"Get/set leaf page highkey's link. During the second phase of
deletion, the target leaf page's high key may point to an ancestor
page (at all other times, the leaf level high key's link is not used).
See th
em around the
absence of a needed downlink. There is an absent downlink, but that's
beside the point, and in any case missing a downlink is not
*inherently* wrong (as I said, it's not inherently wrong because of
the issue of interrupted multi-level page deletes).
--
Peter Geoghegan
think that that needs to be updated. The docs should
explain that 'I' relations are not backed by storage, since that will
probably affect users of one or two external tools.
--
Peter Geoghegan
On Sat, Apr 21, 2018 at 6:02 PM, Peter Geoghegan wrote:
> I refined the amcheck enhancement quite a bit today. It will not just
> check that a downlink is not missing; It will also confirm that it
> wasn't a legitimately interrupted multi-level deletion, by descending
> to the l
nnected to this patch, but allocation of
> BtreeCheckState is not needed, it could be allocated on stack.
>
> 4) Nevertheless, I suggest to use palloc0 (or memset(0)) for
> BtreeCheckState. Now several fields of that structure could be not inited.
I like the idea of using palloc0(). Do
501 - 600 of 3245 matches
Mail list logo