Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-03 Thread Oleg Bartunov
;-) Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Joshua D. Drake Sent: Tue 10/31/2006 7:46 PM To: Teodor Sigaev Cc: Darcy Buskermolen; PgSQL General; PostgreSQL-development Subject:Re: [HACKERS] [GENERAL] Index greater

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-03 Thread Joshua D. Drake
Oleg Bartunov wrote: Gregory, All you described you do with fti is possible with tsearch2. Just need some think, of course. If you don't need stemming, just don't use it, if you need something like %txt%, just write simple dictionary, which produce any substrings from input word. Is there

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-03 Thread Oleg Bartunov
On Fri, 3 Nov 2006, Joshua D. Drake wrote: Oleg Bartunov wrote: Gregory, All you described you do with fti is possible with tsearch2. Just need some think, of course. If you don't need stemming, just don't use it, if you need something like %txt%, just write simple dictionary, which produce

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-02 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Nov 01, 2006 at 07:16:37PM -0300, Alvaro Herrera wrote: [EMAIL PROTECTED] wrote: [...] a functional trigram index? (this would be very cool). Heh :-) I meant an index, using the pg_trgm opclass (which indexes trigrams; hence the

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Teodor Sigaev
We are trying to get something faster than ~ '%foo%'; Which Tsearch2 does not give us :) Hmm, why not with GIN? :) Although, right now GIN doesn't support prefix search, it's possible to use it. Brain storm method: Develop a dictionary which returns all substring for lexeme, for example for

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Joshua D. Drake
Teodor Sigaev wrote: We are trying to get something faster than ~ '%foo%'; Which Tsearch2 does not give us :) Hmm, why not with GIN? :) Although, right now GIN doesn't support prefix search, it's possible to use it. Well they run 8.1 :) Joshua D. Drake Brain storm method: Develop a

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Oct 31, 2006 at 08:55:04PM -0800, Joshua D. Drake wrote: We are not storing bytea [...] [...] Hmm, have you tried to create a functional trigram index on the equivalent of strings(bytea_column) or something like that? Hrm. Sorry for

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Gregory Maxwell
On 11/1/06, Teodor Sigaev [EMAIL PROTECTED] wrote: [snip] Brain storm method: Develop a dictionary which returns all substring for lexeme, for example for word foobar it will be 'foobar fooba foob foo fo oobar ooba oob oo obar oba ob bar ba ar'. And make GIN functional index over your column

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote: On Tue, Oct 31, 2006 at 08:55:04PM -0800, Joshua D. Drake wrote: We are not storing bytea [...] [...] Hmm, have you tried to create a functional trigram index on the equivalent of strings(bytea_column) or something like that? Hrm. Sorry for my impolite

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Oleg Bartunov
On Tue, 31 Oct 2006, Joshua D. Drake wrote: Yes we do (and can) expect to find text among the bytes. We have searches running, we are just running into the maximum size issues for certain rows. you can use substr() to be safe, if schema change doesn't available Sincerely, Joshua D.

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Teodor Sigaev
The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a self contained test case directly to Teodor which shows the error. 'ERROR: index row requires 8792 bytes, maximum size is 8191' Uh, I see. But I'm really surprised why do you use pg_trgm on big text? pg_trgm is

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Joshua D. Drake
Teodor Sigaev wrote: The problem I am after is the 8k index size issue. It is very easy to get a GIST index (especially when using tsearch2) that is larger than that. Hmm, tsearch2 GIST index is specially designed for support huge index entry: first, every lexemes in tsvectore are

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Darcy Buskermolen
On October 31, 2006 06:42 am, Joshua D. Drake wrote: Teodor Sigaev wrote: The problem I am after is the 8k index size issue. It is very easy to get a GIST index (especially when using tsearch2) that is larger than that. The problem as I remember it is pg_tgrm not tsearch2 directly, I've

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Darcy Buskermolen
On October 31, 2006 08:53 am, Teodor Sigaev wrote: The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a self contained test case directly to Teodor which shows the error. 'ERROR: index row requires 8792 bytes, maximum size is 8191' Uh, I see. But I'm really

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Alvaro Herrera
Darcy Buskermolen wrote: On October 31, 2006 08:53 am, Teodor Sigaev wrote: The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a self contained test case directly to Teodor which shows the error. 'ERROR: index row requires 8792 bytes, maximum size is 8191'

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Joshua D. Drake
Teodor Sigaev wrote: The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a self contained test case directly to Teodor which shows the error. 'ERROR: index row requires 8792 bytes, maximum size is 8191' Uh, I see. But I'm really surprised why do you use pg_trgm on big

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Joshua D. Drake
Alvaro Herrera wrote: Darcy Buskermolen wrote: On October 31, 2006 08:53 am, Teodor Sigaev wrote: The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a self contained test case directly to Teodor which shows the error. 'ERROR: index row requires 8792 bytes, maximum

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Gregory S. Williamson
DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Joshua D. Drake Sent: Tue 10/31/2006 7:46 PM To: Teodor Sigaev Cc: Darcy Buskermolen; PgSQL General; PostgreSQL-development Subject:Re: [HACKERS] [GENERAL] Index greater than 8k Teodor Sigaev

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Alvaro Herrera
Joshua D. Drake wrote: Alvaro Herrera wrote: Darcy Buskermolen wrote: On October 31, 2006 08:53 am, Teodor Sigaev wrote: The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a self contained test case directly to Teodor which shows the error. 'ERROR: index row

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Joshua D. Drake
We are not storing bytea, a customer is. We are trying to work around customer requirements. The data that is being stored is not always text, sometimes it is binary (a flash file or jpeg). We are using escaped text to be able to search the string contents of that file . Hmm, have you

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Do you mean you actually find stuff based on text attributes in JPEG images and the like? I thought those were compressed ... Typically not --- the design assumption is that the text size wouldn't amount to anything anyway compared to the image data,