Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Teodor Sigaev
Interesting. Does this mean that down the road a postgis index might be GIT-ified? Only if GiST will support GIT, but I don't follow on this discussion. In any case, GIT on GiST will not be so effective as on Btree, because GiST doesn't guarantee that all close values will be close in index:

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Simon Riggs
On Wed, 2008-04-23 at 22:26 -0400, Bruce Momjian wrote: Simon Riggs wrote: GIT significantly reduces the size of clustered indexes, greatly improving the number of index pointers that can be held in memory for very large indexes. That translates directly into a reduction in I/O for large

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Simon Riggs
On Wed, 2008-04-23 at 19:43 -0700, Ron Mayer wrote: Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2008-04-23 at 12:07 -0400, Tom Lane wrote: To be acceptable, a GIT patch would have to be optional and it ... I was considering a new pg_index column. Or else we'd have to

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Martijn van Oosterhout
On Thu, Apr 24, 2008 at 10:11:02AM +0100, Simon Riggs wrote: Index compression is possible in many ways, depending upon the situation. All of the following sound similar at a high level, but each covers a different use case. True, but there is one significant difference: * For Long, Similar

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Simon Riggs
On Thu, 2008-04-24 at 13:13 +0200, Martijn van Oosterhout wrote: On Thu, Apr 24, 2008 at 10:11:02AM +0100, Simon Riggs wrote: Index compression is possible in many ways, depending upon the situation. All of the following sound similar at a high level, but each covers a different use case.

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread ITAGAKI Takahiro
Simon Riggs [EMAIL PROTECTED] wrote: * For Highly Non-Unique Data we can use Duplicate Compression The latter is the technique used by Bitmap Indexes. Efficient, but not useful for unique/nearly-unique data I heard that GIN has already had duplicate-compression feature.

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Gregory Stark
Martijn van Oosterhout [EMAIL PROTECTED] writes: On Thu, Apr 24, 2008 at 10:11:02AM +0100, Simon Riggs wrote: Index compression is possible in many ways, depending upon the situation. All of the following sound similar at a high level, but each covers a different use case. True, but there

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Bruce Momjian
Simon Riggs wrote: Many users would be very interested if we could significantly reduce the size of the main index on their largest tables. Yes, basically GIT allows index compression for clustered tables, and stated that way it is clear it would be a big feature if we had it for

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Simon Riggs
On Thu, 2008-04-24 at 11:43 -0400, Bruce Momjian wrote: Index compression is possible in many ways, depending upon the situation. All of the following sound similar at a high level, but each covers a different use case. * For Long, Similar data e.g. Text we can use Prefix Compression

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Decibel!
On Apr 24, 2008, at 10:43 AM, Bruce Momjian wrote: Bruce asked if these should be TODOs... Index compression is possible in many ways, depending upon the situation. All of the following sound similar at a high level, but each covers a different use case. * For Long, Similar data e.g. Text

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Martijn van Oosterhout
Well, for these two: * For Highly Non-Unique Data we can use Duplicate Compression * Multi-Column Leading Value Compression - if you have a multi-column You don't need any new logic at all. If _bt_compare says they're equal you can compact them. The long similar case is harder, ISTM there

Re: [HACKERS] Index AM change proposals, redux

2008-04-24 Thread Tom Lane
Martijn van Oosterhout [EMAIL PROTECTED] writes: Well, for these two: * For Highly Non-Unique Data we can use Duplicate Compression * Multi-Column Leading Value Compression - if you have a multi-column You don't need any new logic at all. If _bt_compare says they're equal you can compact

Re: [HACKERS] Index AM change proposals, redux

2008-04-23 Thread Simon Riggs
On Wed, 2008-04-09 at 20:30 -0400, Tom Lane wrote: * GIT (Grouped Index Tuple) indexes, which achieve index space savings in btrees by having a single index tuple represent multiple heap tuples (on a single heap page) containing a range of key values. I am not sure what the development

Re: [HACKERS] Index AM change proposals, redux

2008-04-23 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: I don't see the returns index keys idea as being killed by or killing this concept. Returning keys is valid and useful when we can, but there are other considerations that, in some use cases, will be a dominant factor. The patch as-submitted was a killer

Re: [HACKERS] Index AM change proposals, redux

2008-04-23 Thread Simon Riggs
On Wed, 2008-04-23 at 12:07 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I don't see the returns index keys idea as being killed by or killing this concept. Returning keys is valid and useful when we can, but there are other considerations that, in some use cases, will be a

Re: [HACKERS] Index AM change proposals, redux

2008-04-23 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2008-04-23 at 12:07 -0400, Tom Lane wrote: To be acceptable, a GIT patch would have to be optional and it would have to expose in the catalogs whether a given index was lossy in this way or not (so that the planner could know whether a plan based

Re: [HACKERS] Index AM change proposals, redux

2008-04-23 Thread Simon Riggs
On Wed, 2008-04-23 at 14:06 -0400, Tom Lane wrote: I think storage parameter is no good also, given the current design that assumes those can be changed on-the-fly. It'd be okay to GIT-ify an existing index, perhaps, but not the other way round. I was considering a new pg_index column. Or

Re: [HACKERS] Index AM change proposals, redux

2008-04-23 Thread Bruce Momjian
Simon Riggs wrote: GIT significantly reduces the size of clustered indexes, greatly improving the number of index pointers that can be held in memory for very large indexes. That translates directly into a reduction in I/O for large databases on typical hardware, for primary operations, file

Re: [HACKERS] Index AM change proposals, redux

2008-04-23 Thread Ron Mayer
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2008-04-23 at 12:07 -0400, Tom Lane wrote: To be acceptable, a GIT patch would have to be optional and it ... I was considering a new pg_index column. Or else we'd have to fix the storage-parameter infrastructure to support

Re: [HACKERS] Index AM change proposals, redux

2008-04-14 Thread Ron Mayer
Heikki Linnakangas wrote: Ron Mayer wrote: One use case that I think GIT would help a lot with are my large address tables that are clustered by zip-code but often queried by State, City, County, School District, Police Beat, etc. I imagine a GIT index on state would just occupy a couple

Re: [HACKERS] Index AM change proposals, redux

2008-04-14 Thread Heikki Linnakangas
Ron Mayer wrote: Then I wonder if I can conceive of yet another related index type that'd be useful for such clustered tables. If I had something like GIT that stored something like values State='CA' can be found on pages 1000 through 1 and 2 through 21000 would it be even more

Re: [HACKERS] Index AM change proposals, redux

2008-04-12 Thread Heikki Linnakangas
Ron Mayer wrote: One use case that I think GIT would help a lot with are my large address tables that are clustered by zip-code but often queried by State, City, County, School District, Police Beat, etc. Yep, GIT would shrink the index on zip-code tremendously... I imagine a GIT index on

Re: [HACKERS] Index AM change proposals, redux

2008-04-12 Thread Tom Lane
I looked over the issue of making the regular-indexscan code path able to handle runtime determination of operator lossiness. Here's my implementation plan: * Extend amgettuple API to allow a boolean recheck flag to be passed back. * Remove index_getnext_indexitem, which is dead code and has

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Zeugswetter Andreas OSB SD
... The really serious problem I've got with it is that it'd foreclose the possibility of returning actual index keys from btree indexes, thus basically killing the usefulness of that idea. I'm not convinced it would offer enough gain to be worth paying that price. I do not see the

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Teodor Sigaev
Teodor, do you have any thoughts about exactly how you'd fix @@@ ? I suppose that the recheck-need is not really a property of specific tuples, but of a particular query, for that case. Where would you want to detect that? tsquery may include restriction by weight of search terms: 'sea

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: So, I planned to add pointer to bool to consistent method, so signature will be bool consistent( bool check[], StrategyNumber n, Datum query, bool *needRecheck) Returning value of needRecheck should be ignored for operation not marked by RECHECK

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Heikki Linnakangas
Tom Lane wrote: Perhaps it would be better to initialize needRecheck to the opclass RECHECK flag value? If the consistent function does nothing, the behavior is the same as before, but it can flip the flag in either direction if it wants. I remember that last spring, in the context of GIT,

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: Perhaps it would be better to initialize needRecheck to the opclass RECHECK flag value? If the consistent function does nothing, the behavior is the same as before, but it can flip the flag in either direction if it wants. I

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Teodor Sigaev
Perhaps it would be better to initialize needRecheck to the opclass RECHECK flag value? If the consistent function does nothing, the behavior is the same as before, but it can flip the flag in either direction if it wants. I have not any objections. In this case (and preparing of rechecking

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Tom Lane
I wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: I remember that last spring, in the context of GIT, you were worried about the performance implication of preparing to recheck rows when no rechecks are needed. I didn't quite buy that back then, but this would have the same issue. As

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Oleg Bartunov
Slightly offtopic. How to get benefit on tuple level ? For example, we mark GiST tsearch index as lossy, while for not very big documents it's actually exact and we could save a lot not rechecking them. Oleg On Fri, 11 Apr 2008, Teodor Sigaev wrote: Teodor, do you have any thoughts about

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Tom Lane
Oleg Bartunov [EMAIL PROTECTED] writes: Slightly offtopic. How to get benefit on tuple level ? For example, we mark GiST tsearch index as lossy, while for not very big documents it's actually exact and we could save a lot not rechecking them. Won't that just fall out of this? Assuming the

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Oleg Bartunov
On Fri, 11 Apr 2008, Tom Lane wrote: Oleg Bartunov [EMAIL PROTECTED] writes: Slightly offtopic. How to get benefit on tuple level ? For example, we mark GiST tsearch index as lossy, while for not very big documents it's actually exact and we could save a lot not rechecking them. Won't that

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Ron Mayer
Heikki Linnakangas wrote: * GIT (Grouped Index Tuple) indexes, which achieve index space savings in btrees by having a single index tuple represent multiple heap tuples [...] Another issue is that we'd need to check how much of the use-case for GIT has been taken over by HOT. There is,

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Gregory Stark
* Proposed changes to allow amgetnext to return the actual index keys, allowing certain types of unindexable quals to be checked without having to fetch the heap entry. For example a LIKE condition could be fully checked against the index entry. In the extreme we could build tuples and

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Teodor Sigaev
* Proposed change to let both amgetnext and amgetmulti mark returned tuples as candidate matches, that is in need of rechecking of quals indexes). There seemed to be some possible marginal use for it in GIST indexes, but I'm not convinced that's a sufficient reason to complicate the APIs.

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Zeugswetter Andreas OSB SD
* GIT (Grouped Index Tuple) indexes, which achieve index space savings in btrees by having a single index tuple represent multiple heap tuples (on a single heap page) containing a range of key values. I am not sure what the development status is --- Heikki had submitted a completed patch

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Tom Lane
Zeugswetter Andreas OSB SD [EMAIL PROTECTED] writes: ... The really serious problem I've got with it is that it'd foreclose the possibility of returning actual index keys from btree indexes, thus basically killing the usefulness of that idea. I'm not convinced it would offer enough gain to be

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: * Proposed change to let both amgetnext and amgetmulti mark returned tuples as candidate matches, that is in need of rechecking of quals ... indexes). There seemed to be some possible marginal use for it in GIST indexes, but I'm not convinced that's a

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: A bigger issue is whether this is worth applying when nobody seems to be working on either of the main uses for it (bitmap indexes and GIT indexes). There seemed to be some possible marginal use for it in GIST indexes, but I'm not

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Bruce Momjian
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: A bigger issue is whether this is worth applying when nobody seems to be working on either of the main uses for it (bitmap indexes and GIT indexes). There seemed to be some possible marginal use for it in GIST

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Teodor Sigaev
GiST too, because type of storage may be differ from type to be indexed. The same touches GIN too. Is this the case for *all* GiST and GIN indexes, or might we need a per-index (rather than per-AM) flag to tell whether the original keys are available? Ughm. GiST and GIN are different here.

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: The remaining topics associated with index AMs are closed for this commit fest, unless anyone has specific questions they want discussed right now... OK, do you want the items moved to the next commit-fest, discarded, or made into

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: Both GIN and GiST make a call of transformation function before indexing value. I suppose, there is no automatic way to set this flag even in case when type of storage and type of indexing value are the same. So, I see three variant: - add flag

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Teodor Sigaev
Yeah, just as messy as I feared :-(. My inclination for the first pass would be to just make it a per-AM flag in pg_am. We could always complicate matters later. Agree, and the single existing suitable opclass hasn't operation marked with RECHECK flag. -- Teodor Sigaev

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Heikki Linnakangas
Tom Lane wrote: Yeah, and Teodor's point about cleaning up the @@@ hack pretty much seals the deal for me. Note that we'll need to add candidate match support to the regular index scan API as well for that. Unless anyone has objections, I will review and apply Heikki's patch

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: Yeah, and Teodor's point about cleaning up the @@@ hack pretty much seals the deal for me. Note that we'll need to add candidate match support to the regular index scan API as well for that. [ confused... ] Thought you'd done

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Note that we'll need to add candidate match support to the regular index scan API as well for that. [ confused... ] Thought you'd done that in this patch? No, I only modified the bitmap scan API. (Heiiki, you don't have a later

Re: [HACKERS] Index AM change proposals, redux

2008-04-10 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Note that we'll need to add candidate match support to the regular index scan API as well for that. [ confused... ] Thought you'd done that in this patch? No, I only modified the

[HACKERS] Index AM change proposals, redux

2008-04-09 Thread Tom Lane
I just finished looking through the various threads about index AM changes that Bruce has been holding onto in the commit-fest queue. There seem to be the following issues: * Proposed change to have amgetmulti return its results by ORing bits into a caller-supplied TIDBitmap, rather than the

Re: [HACKERS] Index AM change proposals, redux

2008-04-09 Thread Bruce Momjian
I am glad you have summarized this. The details of exactly what was being proposed was too complex for me to understand before. --- Tom Lane wrote: I just finished looking through the various threads about index AM

Re: [HACKERS] Index AM change proposals, redux

2008-04-09 Thread Heikki Linnakangas
Tom Lane wrote: * Proposed change to let both amgetnext and amgetmulti mark returned tuples as candidate matches, that is in need of rechecking of quals against the real heap tuple. I had originally objected to this on the grounds that it would require setup work that doesn't happen now, but