Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-09-19 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > Attached is a version that stores the minimal and maximal frequencies in > the Numbers array, has the aforementioned assertion and more nicely > ordered functions in ts_selfuncs.c. Applied with some small corrections.

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-09-19 Thread Jan Urbański
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: [EMAIL PROTECTED] wrote: Well whaddya know. It turned out that my new company has a 'Fridays-are-for-any-opensource-hacking-you-like' policy, so I got a full day to work on the patch. Hm, does their name start with

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-09-19 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > [EMAIL PROTECTED] wrote: > Well whaddya know. It turned out that my new company has a > 'Fridays-are-for-any-opensource-hacking-you-like' policy, so I got a > full day to work on the patch. Hm, does their name start with G? > Attach

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-09-19 Thread Jan Urbański
[EMAIL PROTECTED] wrote: Quoting Tom Lane <[EMAIL PROTECTED]>: I wrote: ... One possibly performance-relevant point is to use DatumGetTextPP for detoasting; you've already paid the costs by using VARDATA_ANY etc, so you might as well get the benefit. Actually, wait a second. That code does

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-09-19 Thread ju219721
Quoting Tom Lane <[EMAIL PROTECTED]>: I wrote: ... One possibly performance-relevant point is to use DatumGetTextPP for detoasting; you've already paid the costs by using VARDATA_ANY etc, so you might as well get the benefit. Actually, wait a second. That code doesn't work at all on toasted

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-09-02 Thread Tom Lane
I wrote: > ... One possibly > performance-relevant point is to use DatumGetTextPP for detoasting; > you've already paid the costs by using VARDATA_ANY etc, so you might > as well get the benefit. Actually, wait a second. That code doesn't work at all on toasted data, because it's trying to use V

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-09-02 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > Pre-sorting introduced one problem (see XXX in code): it's not easy > anymore to get the minimal frequency of MCELEM values. I was using it to > assert that the selectivity of a tsquery node containing a lexeme not in > MCELEM is no

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-27 Thread Simon Riggs
On Tue, 2008-08-26 at 12:45 +0200, Jan Urbański wrote: > > put it in a file called selfuncs_ts.c so it is similar to the existing > > filename? > > I followed the pattern of ts_parse.c, ts_utils.c and so on. > Also, I see geo_selfuncs.c. No big deal, though, I can move it. No don't worry. You'r

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-27 Thread Heikki Linnakangas
Jan Urbański wrote: Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: Simon Riggs wrote: put it in a file called selfuncs_ts.c so it is similar to the existing filename? I followed the pattern of ts_parse.c, ts_utils.c and so on. Also, I see geo_selfuncs.c. No big

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-27 Thread Jan Urbański
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: Simon Riggs wrote: put it in a file called selfuncs_ts.c so it is similar to the existing filename? I followed the pattern of ts_parse.c, ts_utils.c and so on. Also, I see geo_selfuncs.c. No big deal, though, I can

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-26 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes: > Simon Riggs wrote: >> put it in a file called selfuncs_ts.c so it is similar to the existing >> filename? > I followed the pattern of ts_parse.c, ts_utils.c and so on. > Also, I see geo_selfuncs.c. No big deal, though, I can move it.

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-26 Thread Jan Urbański
Simon Riggs wrote: On Thu, 2008-08-14 at 22:27 +0200, Jan Urbański wrote: Jan Urbański wrote: + * ts_selfuncs.c Not sure why this is in its own file I couldn't decide where to put it, so I came up with this. put it in a file called selfuncs_ts.c so it is similar to the existing filenam

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-26 Thread Simon Riggs
On Thu, 2008-08-14 at 22:27 +0200, Jan Urbański wrote: > Jan Urbański wrote: > + * ts_selfuncs.c Not sure why this is in its own file, but if it must be could we please put it in a file called selfuncs_ts.c so it is similar to the existing filename? -- Simon Riggs www.2ndQuadrant.c

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Alvaro Herrera
Jan Urbański wrote: > Yeah, I got that idea, but then I thought the chances of touching the > same element during binary search twice were very small. Especially now > when the detoasting occurs only when we hit a text Datum that has the > same length as the sought lexeme. > Still, I can do

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański
Alvaro Herrera wrote: Jan Urbański wrote: Heikki Linnakangas wrote: Sounds like a plan. In (2), it's even better to detoast the values lazily. For a typical one-word tsquery, the binary search will only look at a small portion of the elements. Hm, how can I do that? Toast is still a bit bla

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Alvaro Herrera
Jan Urbański wrote: > Heikki Linnakangas wrote: >> Sounds like a plan. In (2), it's even better to detoast the values >> lazily. For a typical one-word tsquery, the binary search will only >> look at a small portion of the elements. > > Hm, how can I do that? Toast is still a bit black magic to

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański
Jan Urbański wrote: Heikki Linnakangas wrote: Jan Urbański wrote: So right now the idea is to: (1) pre-sort STATISTIC_KIND_MCELEM values (2) build an array of pointers to detoasted values in tssel() (3) use binary search when looking for MCELEMs during tsquery analysis Sounds like a plan.

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański
Heikki Linnakangas wrote: Jan Urbański wrote: So right now the idea is to: (1) pre-sort STATISTIC_KIND_MCELEM values (2) build an array of pointers to detoasted values in tssel() (3) use binary search when looking for MCELEMs during tsquery analysis Sounds like a plan. In (2), it's even bet

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański
Heikki Linnakangas wrote: Jan Urbański wrote: So right now the idea is to: (1) pre-sort STATISTIC_KIND_MCELEM values (2) build an array of pointers to detoasted values in tssel() (3) use binary search when looking for MCELEMs during tsquery analysis Sounds like a plan. In (2), it's even bet

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Gregory Stark
Jan Urbański <[EMAIL PROTECTED]> writes: > Heikki Linnakangas wrote: >> Speaking of which, a lot of time seems to be spent on detoasting. I'd like to >> understand that a better. Where is the detoasting coming from? > > Hmm, maybe bttext_pattern_cmp does some detoasting? It calls > PG_GETARG_TEXT_

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Heikki Linnakangas
Jan Urbański wrote: So right now the idea is to: (1) pre-sort STATISTIC_KIND_MCELEM values (2) build an array of pointers to detoasted values in tssel() (3) use binary search when looking for MCELEMs during tsquery analysis Sounds like a plan. In (2), it's even better to detoast the values

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański
Heikki Linnakangas wrote: Jan Urbański wrote: Not good... Shall I try sorting pg_statistics arrays on text values instead of frequencies? Yeah, I'd go with that. If you only do it for the new STATISTIC_KIND_MCV_ELEMENT statistics, you shouldn't need to change any other code. OK, will do.

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Heikki Linnakangas
Jan Urbański wrote: Not good... Shall I try sorting pg_statistics arrays on text values instead of frequencies? Yeah, I'd go with that. If you only do it for the new STATISTIC_KIND_MCV_ELEMENT statistics, you shouldn't need to change any other code. Hmm. There has been discussion on raising

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-13 Thread Jan Urbański
Heikki Linnakangas wrote: Jan Urbański wrote: 26763 3.5451 AllocSetCheck Make sure you disable assertions before profiling. Awww, darn. OK, here goes another set of results, without casserts this time. === CVS HEAD === number of clients: 10 number of transactions per client: 10

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-13 Thread Heikki Linnakangas
Jan Urbański wrote: 26763 3.5451 AllocSetCheck Make sure you disable assertions before profiling. Although I'm actually a bit surprised the overhead isn't more than 3.5%, I've seen much higher overheads on other tests, but it's still skewing the results. - Heikki -- Sent via pgsql-hack

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-13 Thread Jan Urbański
Heikki Linnakangas wrote: Jan Urbański wrote: through it. The only tiny ugliness is that there's one function used for qsort() and another for bsearch(), because I'm sorting an array of texts (from pg_statistic) and I'm binary searching for a lexeme (non-NULL terminated string with length).

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-11 Thread Heikki Linnakangas
Jan Urbański wrote: Heikki Linnakangas wrote: Jan Urbański wrote: Another thing are cstring_to_text_with_len calls. I'm doing them so I can use bttextcmp in bsearch(). I think I could come up with a dedicated function to return text Datums and WordEntries (read: non-NULL terminated strings wi

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-07-29 Thread Jan Urbański
Heikki Linnakangas wrote: Jan Urbański wrote: Another thing are cstring_to_text_with_len calls. I'm doing them so I can use bttextcmp in bsearch(). I think I could come up with a dedicated function to return text Datums and WordEntries (read: non-NULL terminated strings with a given length).

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-07-29 Thread Heikki Linnakangas
Jan Urbański wrote: Another thing are cstring_to_text_with_len calls. I'm doing them so I can use bttextcmp in bsearch(). I think I could come up with a dedicated function to return text Datums and WordEntries (read: non-NULL terminated strings with a given length). Just keep them as cstrings

Re: [HACKERS] gsoc, oprrest function for text search

2008-07-29 Thread Jan Urbański
Heikki Linnakangas wrote: Jan Urbański wrote: Here's a WIP patch implementing an oprrest function for tsvector @@ tsquery and tsquery @@ tsvector. The idea is (quoting a comment) /* * Traverse the tsquery preorder, calculating selectivity as: * * selec(left_oper) * selec(right_oper) in A

Re: [HACKERS] gsoc, oprrest function for text search

2008-07-29 Thread Heikki Linnakangas
Jan Urbański wrote: Here's a WIP patch implementing an oprrest function for tsvector @@ tsquery and tsquery @@ tsvector. The idea is (quoting a comment) /* * Traverse the tsquery preorder, calculating selectivity as: * * selec(left_oper) * selec(right_oper) in AND nodes, * * selec(lef

[HACKERS] gsoc, oprrest function for text search take 2

2008-07-28 Thread Jan Urbański
Hi, I know Commit Fest is in progress, as well as the holiday season. But the Summer of Code ends in about three weeks, so I'd like to request a bit of out-of-order processing :) My previous mail sent to -hackers is here: http://archives.postgresql.org/message-id/[EMAIL PROTECTED] I had prob

Re: [HACKERS] gsoc, oprrest function for text search

2008-07-19 Thread Jan Urbański
Jan Urbański wrote: The idea is (quoting a comment) /* * Traverse the tsquery preorder, calculating selectivity as: Ekhm. This should of course read "postorder"... -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To mak

[HACKERS] gsoc, oprrest function for text search

2008-07-19 Thread Jan Urbański
Here's a WIP patch implementing an oprrest function for tsvector @@ tsquery and tsquery @@ tsvector. The idea is (quoting a comment) /* * Traverse the tsquery preorder, calculating selectivity as: * * selec(left_oper) * selec(right_oper) in AND nodes, * * selec(left_oper) + selec(right