Re: [HACKERS] pg_trgm

2010-05-29 Thread Tatsuo Ishii
It's not a practical solution for people working with prebuilt Postgres versions, which is most people. I don't object to finding a way to provide a not-space behavior instead of an is-alnum behavior, but as noted upthread a GUC isn't the right way. How do you feel about a new set of

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-29 Thread Jesper Krogh
On 2010-05-28 23:47, Jan Urbański wrote: On 28/05/10 22:22, Tom Lane wrote: The idea that I was toying with is to assume a Zipfian distribution of the input (with some reasonable parameter), and use that to estimate what the frequency of the K'th element will be, where K is the target

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-29 Thread Jesper Krogh
On 2010-05-28 04:47, Tom Lane wrote: Cranking up the stats target actually makes it worse not better, since low-frequency items are then more likely to get into the MCV list I should have been more precise in the wording. Cranking up the stats target gave me overall a better plan, but that

[HACKERS] Statistics for tsvector wildcards. term*

2010-05-29 Thread Jesper Krogh
Hi. There seems to be an unimplemented area around getting statistics for wildcard searches done. Wildcards anchored to the left can be matched up by the gin-index and the ts_match_vq operator: testdb=# select to_tsvector('project') @@ to_tsquery('proj:*'); ?column? -- t (1 row)

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-29 Thread Jan Urbański
On 29/05/10 12:34, Jesper Krogh wrote: On 2010-05-28 23:47, Jan Urbański wrote: On 28/05/10 22:22, Tom Lane wrote: Now I tried to substitute some numbers there, and so assuming the English language has ~1e6 words H(W) is around 6.5. Let's assume the statistics target to be 100. I chose s as

Re: [HACKERS] pg_trgm

2010-05-29 Thread Greg Stark
On Sat, May 29, 2010 at 9:13 AM, Tatsuo Ishii is...@postgresql.org wrote: ! #define iswordchr(c)  (lc_ctype_is_c()? \ !                                                               ((*(c) 0x80)? !t_isspace(c) : (t_isalpha(c) || t_isdigit(c))) : \ Surely isspace(c) will always be false for

Re: [HACKERS] pg_trgm

2010-05-29 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes: After thinking a little bit more, I think following patch would not break existing behavior and also adopts mutibyte + C locale case. What do you think? This is still ignoring the point: arbitrarily changing the module's longstanding standard behavior

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-29 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: Now I tried to substitute some numbers there, and so assuming the English language has ~1e6 words H(W) is around 6.5. Let's assume the statistics target to be 100. I chose s as 1/(st + 10)*H(W) because the top 10 English words will

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-29 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: Hm, I am now thinking that maybe this theory is flawed, because tsvecors contain only *unique* words, and Zipf's law is talking about words in documents in general. Normally a word like the would appear lots of times in a document,

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-29 Thread Jan Urbański
On 29/05/10 17:09, Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: Now I tried to substitute some numbers there, and so assuming the English language has ~1e6 words H(W) is around 6.5. Let's assume the statistics target to be 100. I chose s as 1/(st + 10)*H(W)

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-29 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: On 29/05/10 17:09, Tom Lane wrote: There is definitely something wrong with your math there. It's not possible for the 100'th most common word to have a frequency as high as 0.06 --- the ones above it presumably have larger

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-29 Thread Jan Urbański
On 29/05/10 17:34, Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: On 29/05/10 17:09, Tom Lane wrote: There is definitely something wrong with your math there. It's not possible for the 100'th most common word to have a frequency as high as 0.06 --- the ones

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-29 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: [ e of ] s/2 or s/3 look reasonable. The examples in the LC paper seem to all use e = s/10. Note the stated assumption e s. So, should I just write a patch that sets the bucket width and pruning count using 0.07 as the assumed

[HACKERS] PG 9.0 release timetable

2010-05-29 Thread Bruce Momjian
Assuming we want a release Postgres 9.0 by mid-August, here is how the timetable would look: Need RC release to be stable for 1-2 weeks before final RC must be released by August 1 Beta must be stable for 2-3 weeks before RC Stable beta must be

Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-29 Thread Robert Haas
On Mon, May 17, 2010 at 9:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, May 17, 2010 at 4:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Perhaps this is a backpatchable bug fix.  Comments? I can't say whether this is safe enough to back-patch, but the

Re: [HACKERS] PG 9.0 release timetable

2010-05-29 Thread Robert Haas
On Sat, May 29, 2010 at 4:19 PM, Bruce Momjian br...@momjian.us wrote: Assuming we want a release Postgres 9.0 by mid-August, here is how the timetable would look:        Need RC release to be stable for 1-2 weeks before final                RC must be released by August 1        Beta must

Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: This is still on the 9.0 open items list, but ISTM you fixed it with two commits on May 27th. Is that correct? Oh, sorry, forgot to update the open items. Done now. regards, tom lane -- Sent via pgsql-hackers mailing list

Re: [HACKERS] PG 9.0 release timetable

2010-05-29 Thread Robert Haas
On Sat, May 29, 2010 at 5:09 PM, Robert Haas robertmh...@gmail.com wrote: This is a really short list. Thoughts on a few of the remaining items: Type Mismatch Error in Set Returning Functions - tgl says this is a deliberate change per link I just added to the wiki. do we think more is required

Re: [HACKERS] PG 9.0 release timetable

2010-05-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Thoughts on a few of the remaining items: Should we revert the default output format for bytea to the old style before shipping 9.0.0? - Consensus seems to be no, thus no action is required. I think we should leave that there for awhile, though I

Re: [HACKERS] small exclusion constraints patch

2010-05-29 Thread Robert Haas
On Fri, May 28, 2010 at 10:32 PM, Bruce Momjian br...@momjian.us wrote: Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: Currently, the check for exclusion constraints performs a sanity check that's slightly too strict -- it assumes that a tuple will conflict with itself. That is not

Re: [HACKERS] small exclusion constraints patch

2010-05-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: The only disadvantage I see of just documenting this is that someone might write a user-defined index opclass that works like this, and they won't be able to use this until at least 9.1 (or at least, not without patching the source). I don't actually

Re: [HACKERS] small exclusion constraints patch

2010-05-29 Thread David Fetter
On Sat, May 29, 2010 at 06:11:57PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: The only disadvantage I see of just documenting this is that someone might write a user-defined index opclass that works like this, and they won't be able to use this until at least 9.1 (or

Re: [HACKERS] Regression testing for psql

2010-05-29 Thread Selena Deckelmann
On Wed, May 26, 2010 at 6:25 PM, Stephen Frost sfr...@snowman.net wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: There might be some value in psql backslash command tests that are designed to depend on just one or a few tables (or other appropriate objects). Updated, much much smaller, patch

Re: [HACKERS] pg_trgm

2010-05-29 Thread Tatsuo Ishii
This is still ignoring the point: arbitrarily changing the module's longstanding standard behavior isn't acceptable. You need to provide a way for the user to control the behavior. (Once you've done that, I think it can be just either alnum or !isspace, but maybe some other behaviors would

Re: [HACKERS] pg_trgm

2010-05-29 Thread Tatsuo Ishii
Wait. This works fine for me with stock pg_trgm. local is C and encoding is UTF8. What version of PostgreSQL are you using? Mine is 8.4.4. This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If you run this in 8.4, you're just comparing a sequence of ASCII letters and

Re: [HACKERS] Specification for Trusted PLs?

2010-05-29 Thread Bruce Momjian
Robert Haas wrote: On Sat, May 22, 2010 at 4:53 PM, C?dric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/5/21 Jan Wieck janwi...@yahoo.com: The original idea was that a trusted language does not allow an unprivileged user to gain access to any object or data, he does not have

Re: [HACKERS] [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up

2010-05-29 Thread Bruce Momjian
Added to TODO: Consider a faster CRC32 algorithm * http://archives.postgresql.org/pgsql-hackers/2010-05/msg01112.php --- Andres Freund wrote: Hi, I started to analyze XLogInsert because it

Re: [HACKERS] small exclusion constraints patch

2010-05-29 Thread Robert Haas
On Sat, May 29, 2010 at 6:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: The only disadvantage I see of just documenting this is that someone might write a user-defined index opclass that works like this, and they won't be able to use this until at least

[HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up

2010-05-29 Thread Greg Stark
This sounds familiar. If you search back in the archives around 2004 or so I think you'll find a similar discussion when we replaced the crc32 implementation with what we have now. We put a fair amount of effort into searching for faster implementations so if you've found one 3x faster I'm pretty

[HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up

2010-05-29 Thread Greg Stark
On Sun, May 30, 2010 at 3:56 AM, Greg Stark gsst...@mit.edu wrote: This sounds familiar. If you search back in the archives around 2004 or so I think you'll find a similar discussion when we replaced the crc32 implementation with what we have now. Fwiw here's the thread (from 2005):

Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-29 Thread Fujii Masao
On Fri, May 28, 2010 at 11:12 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, May 27, 2010 at 11:13 PM, Robert Haas robertmh...@gmail.com wrote: I guess this happens because the frequency of checkpoint on the standby is too lower than that on the master. In the master, checkpoint occurs

Re: [HACKERS] PG 9.0 release timetable

2010-05-29 Thread Robert Haas
On Sat, May 29, 2010 at 5:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Thoughts on a few of the remaining items: Should we revert the default output format for bytea to the old style before shipping 9.0.0? - Consensus seems to be no, thus no action is

[HACKERS] Is there anyway to get list of table name, before raw parser is analyze?

2010-05-29 Thread Mohammad Heykal Abdillah
Hi all, Right now i am trying to understand how SQL parser is work. My question is there anyway to get list of table name and its atribut before raw parser is analyze? Because i like to understand how PostgreSQL break-down the asterik at target list, specialy in natural join case where

Re: [HACKERS] small exclusion constraints patch

2010-05-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Sat, May 29, 2010 at 6:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Or, to put it differently: if nobody's done that in the past twenty years, why is it likely to happen before 9.1? Hmm. Well suppose we bet a dollar on whether that will happen or

Re: [HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up

2010-05-29 Thread Tom Lane
Greg Stark gsst...@mit.edu writes: On Sun, May 30, 2010 at 3:56 AM, Greg Stark gsst...@mit.edu wrote: This sounds familiar. If you search back in the archives around 2004 or so I think you'll find a similar discussion when we replaced the crc32 implementation with what we have now. Fwiw

Re: [HACKERS] small exclusion constraints patch

2010-05-29 Thread Robert Haas
On Sat, May 29, 2010 at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, May 29, 2010 at 6:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Or, to put it differently: if nobody's done that in the past twenty years, why is it likely to happen before 9.1?

Re: [HACKERS] Is there anyway to get list of table name, before raw parser is analyze?

2010-05-29 Thread Andrew Dunstan
Mohammad Heykal Abdillah wrote: Hi all, Right now i am trying to understand how SQL parser is work. My question is there anyway to get list of table name and its atribut before raw parser is analyze? Because i like to understand how PostgreSQL break-down the asterik at target list,

Re: [HACKERS] Is there anyway to get list of table name, before raw parser is analyze?

2010-05-29 Thread Mohammad Heykal Abdillah
On Min, 2010-05-30 at 00:44 -0400, Andrew Dunstan wrote: Mohammad Heykal Abdillah wrote: Hi all, Right now i am trying to understand how SQL parser is work. My question is there anyway to get list of table name and its atribut before raw parser is analyze? Because i like to