Interesting. Does this mean that down the road a postgis index
might be GIT-ified?
Only if GiST will support GIT, but I don't follow on this discussion. In any
case, GIT on GiST will not be so effective as on Btree, because GiST doesn't
guarantee that all close values will be close in index:
On Wed, 2008-04-23 at 22:26 -0400, Bruce Momjian wrote:
Simon Riggs wrote:
GIT significantly reduces the size of clustered indexes, greatly
improving the number of index pointers that can be held in memory for
very large indexes. That translates directly into a reduction in I/O for
large
On Wed, 2008-04-23 at 19:43 -0700, Ron Mayer wrote:
Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
On Wed, 2008-04-23 at 12:07 -0400, Tom Lane wrote:
To be acceptable, a GIT patch would have to be optional and it
...
I was considering a new pg_index column. Or else we'd have to
On Thu, Apr 24, 2008 at 10:11:02AM +0100, Simon Riggs wrote:
Index compression is possible in many ways, depending upon the
situation. All of the following sound similar at a high level, but each
covers a different use case.
True, but there is one significant difference:
* For Long, Similar
On Thu, 2008-04-24 at 13:13 +0200, Martijn van Oosterhout wrote:
On Thu, Apr 24, 2008 at 10:11:02AM +0100, Simon Riggs wrote:
Index compression is possible in many ways, depending upon the
situation. All of the following sound similar at a high level, but each
covers a different use case.
Simon Riggs [EMAIL PROTECTED] wrote:
* For Highly Non-Unique Data we can use Duplicate Compression
The latter is the technique used by Bitmap Indexes. Efficient, but not
useful for unique/nearly-unique data
I heard that GIN has already had duplicate-compression feature.
Martijn van Oosterhout [EMAIL PROTECTED] writes:
On Thu, Apr 24, 2008 at 10:11:02AM +0100, Simon Riggs wrote:
Index compression is possible in many ways, depending upon the
situation. All of the following sound similar at a high level, but each
covers a different use case.
True, but there
Simon Riggs wrote:
Many users would be very interested if we could significantly reduce the
size of the main index on their largest tables.
Yes, basically GIT allows index compression for clustered tables, and
stated that way it is clear it would be a big feature if we had it for
On Thu, 2008-04-24 at 11:43 -0400, Bruce Momjian wrote:
Index compression is possible in many ways, depending upon the
situation. All of the following sound similar at a high level, but each
covers a different use case.
* For Long, Similar data e.g. Text we can use Prefix Compression
On Apr 24, 2008, at 10:43 AM, Bruce Momjian wrote:
Bruce asked if these should be TODOs...
Index compression is possible in many ways, depending upon the
situation. All of the following sound similar at a high level, but
each
covers a different use case.
* For Long, Similar data e.g. Text
Well, for these two:
* For Highly Non-Unique Data we can use Duplicate Compression
* Multi-Column Leading Value Compression - if you have a multi-column
You don't need any new logic at all. If _bt_compare says they're equal
you can compact them.
The long similar case is harder, ISTM there
Martijn van Oosterhout [EMAIL PROTECTED] writes:
Well, for these two:
* For Highly Non-Unique Data we can use Duplicate Compression
* Multi-Column Leading Value Compression - if you have a multi-column
You don't need any new logic at all. If _bt_compare says they're equal
you can compact
On Wed, 2008-04-09 at 20:30 -0400, Tom Lane wrote:
* GIT (Grouped Index Tuple) indexes, which achieve index space savings
in btrees by having a single index tuple represent multiple heap tuples
(on a single heap page) containing a range of key values. I am not sure
what the development
Simon Riggs [EMAIL PROTECTED] writes:
I don't see the returns index keys idea as being killed by or killing
this concept. Returning keys is valid and useful when we can, but there
are other considerations that, in some use cases, will be a dominant
factor.
The patch as-submitted was a killer
On Wed, 2008-04-23 at 12:07 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
I don't see the returns index keys idea as being killed by or killing
this concept. Returning keys is valid and useful when we can, but there
are other considerations that, in some use cases, will be a
Simon Riggs [EMAIL PROTECTED] writes:
On Wed, 2008-04-23 at 12:07 -0400, Tom Lane wrote:
To be acceptable, a GIT patch would have to be optional and it
would have to expose in the catalogs whether a given index was lossy
in this way or not (so that the planner could know whether a plan based
On Wed, 2008-04-23 at 14:06 -0400, Tom Lane wrote:
I think storage parameter is no good also, given the current design that
assumes those can be changed on-the-fly. It'd be okay to GIT-ify an
existing index, perhaps, but not the other way round.
I was considering a new pg_index column. Or
Simon Riggs wrote:
GIT significantly reduces the size of clustered indexes, greatly
improving the number of index pointers that can be held in memory for
very large indexes. That translates directly into a reduction in I/O for
large databases on typical hardware, for primary operations, file
Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
On Wed, 2008-04-23 at 12:07 -0400, Tom Lane wrote:
To be acceptable, a GIT patch would have to be optional and it
...
I was considering a new pg_index column. Or else we'd have to fix
the storage-parameter infrastructure to support
Heikki Linnakangas wrote:
Ron Mayer wrote:
One use case that I think GIT would help a lot with are my
large address tables that are clustered by zip-code but
often queried by State, City, County, School District,
Police Beat, etc.
I imagine a GIT index on state would just occupy
a couple
Ron Mayer wrote:
Then I wonder if I can conceive of yet another related
index type that'd be useful for such clustered tables. If
I had something like GIT that stored something like
values State='CA' can be found on pages 1000 through 1
and 2 through 21000 would it be even more
Ron Mayer wrote:
One use case that I think GIT would help a lot with are my
large address tables that are clustered by zip-code but
often queried by State, City, County, School District,
Police Beat, etc.
Yep, GIT would shrink the index on zip-code tremendously...
I imagine a GIT index on
I looked over the issue of making the regular-indexscan code path able
to handle runtime determination of operator lossiness. Here's my
implementation plan:
* Extend amgettuple API to allow a boolean recheck flag to be passed back.
* Remove index_getnext_indexitem, which is dead code and has
... The really serious problem I've got with it is that
it'd foreclose the possibility of returning actual index keys from
btree
indexes, thus basically killing the usefulness of that idea. I'm
not
convinced it would offer enough gain to be worth paying that price.
I do not see the
Teodor, do you have any thoughts about exactly how you'd fix @@@ ?
I suppose that the recheck-need is not really a property of specific
tuples, but of a particular query, for that case. Where would you
want to detect that?
tsquery may include restriction by weight of search terms: 'sea
Teodor Sigaev [EMAIL PROTECTED] writes:
So, I planned to add pointer to bool to consistent method, so signature will
be
bool consistent( bool check[], StrategyNumber n, Datum query, bool
*needRecheck)
Returning value of needRecheck should be ignored for operation not marked by
RECHECK
Tom Lane wrote:
Perhaps it would be better to initialize needRecheck to the opclass
RECHECK flag value? If the consistent function does nothing, the
behavior is the same as before, but it can flip the flag in either
direction if it wants.
I remember that last spring, in the context of GIT,
Heikki Linnakangas [EMAIL PROTECTED] writes:
Tom Lane wrote:
Perhaps it would be better to initialize needRecheck to the opclass
RECHECK flag value? If the consistent function does nothing, the
behavior is the same as before, but it can flip the flag in either
direction if it wants.
I
Perhaps it would be better to initialize needRecheck to the opclass
RECHECK flag value? If the consistent function does nothing, the
behavior is the same as before, but it can flip the flag in either
direction if it wants.
I have not any objections. In this case (and preparing of rechecking
I wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
I remember that last spring, in the context of GIT, you were worried
about the performance implication of preparing to recheck rows when no
rechecks are needed. I didn't quite buy that back then, but this would
have the same issue.
As
Slightly offtopic. How to get benefit on tuple level ? For example,
we mark GiST tsearch index as lossy, while for not very big documents it's
actually exact and we could save a lot not rechecking them.
Oleg
On Fri, 11 Apr 2008, Teodor Sigaev wrote:
Teodor, do you have any thoughts about
Oleg Bartunov [EMAIL PROTECTED] writes:
Slightly offtopic. How to get benefit on tuple level ? For example,
we mark GiST tsearch index as lossy, while for not very big documents it's
actually exact and we could save a lot not rechecking them.
Won't that just fall out of this? Assuming the
On Fri, 11 Apr 2008, Tom Lane wrote:
Oleg Bartunov [EMAIL PROTECTED] writes:
Slightly offtopic. How to get benefit on tuple level ? For example,
we mark GiST tsearch index as lossy, while for not very big documents it's
actually exact and we could save a lot not rechecking them.
Won't that
Heikki Linnakangas wrote:
* GIT (Grouped Index Tuple) indexes, which achieve index space savings
in btrees by having a single index tuple represent multiple heap tuples
[...]
Another issue is that we'd need to check how much of the use-case for
GIT has been taken over by HOT.
There is,
* Proposed changes to allow amgetnext to return the actual index keys,
allowing certain types of unindexable quals to be checked without
having to fetch the heap entry. For example a LIKE condition could be
fully checked against the index entry.
In the extreme we could build tuples and
* Proposed change to let both amgetnext and amgetmulti mark returned
tuples as candidate matches, that is in need of rechecking of quals
indexes). There seemed to be some possible marginal use for it in GIST
indexes, but I'm not convinced that's a sufficient reason to complicate
the APIs.
* GIT (Grouped Index Tuple) indexes, which achieve index space savings
in btrees by having a single index tuple represent multiple heap
tuples
(on a single heap page) containing a range of key values. I am not
sure
what the development status is --- Heikki had submitted a completed
patch
Zeugswetter Andreas OSB SD [EMAIL PROTECTED] writes:
... The really serious problem I've got with it is that
it'd foreclose the possibility of returning actual index keys from btree
indexes, thus basically killing the usefulness of that idea. I'm not
convinced it would offer enough gain to be
Teodor Sigaev [EMAIL PROTECTED] writes:
* Proposed change to let both amgetnext and amgetmulti mark returned
tuples as candidate matches, that is in need of rechecking of quals
...
indexes). There seemed to be some possible marginal use for it in GIST
indexes, but I'm not convinced that's a
Heikki Linnakangas [EMAIL PROTECTED] writes:
Tom Lane wrote:
A bigger issue is whether this is worth applying when nobody seems to be
working on either of the main uses for it (bitmap indexes and GIT
indexes). There seemed to be some possible marginal use for it in GIST
indexes, but I'm not
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
Tom Lane wrote:
A bigger issue is whether this is worth applying when nobody seems to be
working on either of the main uses for it (bitmap indexes and GIT
indexes). There seemed to be some possible marginal use for it in GIST
GiST too, because type of storage may be differ from type to be indexed. The
same touches GIN too.
Is this the case for *all* GiST and GIN indexes, or might we need a
per-index (rather than per-AM) flag to tell whether the original keys
are available?
Ughm. GiST and GIN are different here.
Bruce Momjian [EMAIL PROTECTED] writes:
Tom Lane wrote:
The remaining topics associated with index AMs are closed for this
commit fest, unless anyone has specific questions they want discussed
right now...
OK, do you want the items moved to the next commit-fest, discarded, or
made into
Teodor Sigaev [EMAIL PROTECTED] writes:
Both GIN and GiST make a call of transformation function before indexing
value.
I suppose, there is no automatic way to set this flag even in case when type
of
storage and type of indexing value are the same.
So, I see three variant:
- add flag
Yeah, just as messy as I feared :-(. My inclination for the first pass
would be to just make it a per-AM flag in pg_am. We could always
complicate matters later.
Agree, and the single existing suitable opclass hasn't operation marked with
RECHECK flag.
--
Teodor Sigaev
Tom Lane wrote:
Yeah, and Teodor's point about cleaning up the @@@ hack pretty much
seals the deal for me.
Note that we'll need to add candidate match support to the regular index
scan API as well for that.
Unless anyone has objections, I will review and apply Heikki's patch
Heikki Linnakangas [EMAIL PROTECTED] writes:
Tom Lane wrote:
Yeah, and Teodor's point about cleaning up the @@@ hack pretty much
seals the deal for me.
Note that we'll need to add candidate match support to the regular index
scan API as well for that.
[ confused... ] Thought you'd done
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
Note that we'll need to add candidate match support to the regular index
scan API as well for that.
[ confused... ] Thought you'd done that in this patch?
No, I only modified the bitmap scan API.
(Heiiki, you don't have a later
Heikki Linnakangas [EMAIL PROTECTED] writes:
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
Note that we'll need to add candidate match support to the regular index
scan API as well for that.
[ confused... ] Thought you'd done that in this patch?
No, I only modified the
I just finished looking through the various threads about index AM
changes that Bruce has been holding onto in the commit-fest queue.
There seem to be the following issues:
* Proposed change to have amgetmulti return its results by ORing bits
into a caller-supplied TIDBitmap, rather than the
I am glad you have summarized this. The details of exactly what was
being proposed was too complex for me to understand before.
---
Tom Lane wrote:
I just finished looking through the various threads about index AM
Tom Lane wrote:
* Proposed change to let both amgetnext and amgetmulti mark returned
tuples as candidate matches, that is in need of rechecking of quals
against the real heap tuple. I had originally objected to this on
the grounds that it would require setup work that doesn't happen now,
but
52 matches
Mail list logo