Re: [GENERAL] bytea encode performance issues

2008-08-08 Thread Daniel Verite
Steve Atkins wrote: So, yeah, you're right. Generally, email is too complex to deal with in the database as anything other than an opaque bytea blob, along with some metadata Only because that's the choice made by dbmail. As an IMAP server, it doesn't _have_ to do more. The downs

Re: [GENERAL] bytea encode performance issues

2008-08-07 Thread Steve Atkins
On Aug 7, 2008, at 5:28 PM, Klint Gore wrote: Alvaro Herrera wrote: Merlin Moncure escribió: > er, I see the problem (single piece of text with multiple encodings > inside) :-). ok, it's more complicated than I thought. still, you > need to convert the email to utf8. There simply must be a

Re: [GENERAL] bytea encode performance issues

2008-08-07 Thread Klint Gore
Alvaro Herrera wrote: Merlin Moncure escribió: > er, I see the problem (single piece of text with multiple encodings > inside) :-). ok, it's more complicated than I thought. still, you > need to convert the email to utf8. There simply must be a way, > otherwise your emails are not well define

Re: [GENERAL] bytea encode performance issues

2008-08-07 Thread Sim Zacks
Merlin, You are suggesting a fight with the flexible dynamics of email by fitting it into a UTF shell - it doesn't always work. I would suggest you read the postgresql definition of SQL-ASCII: > The SQL_ASCII setting behaves considerably differently from the other > settings. When the server cha

Re: [GENERAL] bytea encode performance issues

2008-08-07 Thread Alvaro Herrera
Merlin Moncure escribió: > er, I see the problem (single piece of text with multiple encodings > inside) :-). ok, it's more complicated than I thought. still, you > need to convert the email to utf8. There simply must be a way, > otherwise your emails are not well defined. This is a client sid

Re: [GENERAL] bytea encode performance issues

2008-08-07 Thread Merlin Moncure
On Thu, Aug 7, 2008 at 9:38 AM, Merlin Moncure <[EMAIL PROTECTED]> wrote: > On Thu, Aug 7, 2008 at 1:16 AM, Sim Zacks <[EMAIL PROTECTED]> wrote: >> >>> I don't quite follow that...the whole point of utf8 encoded database >>> is so that you can use text functions and operators without the bytea >>>

Re: [GENERAL] bytea encode performance issues

2008-08-07 Thread Merlin Moncure
On Thu, Aug 7, 2008 at 1:16 AM, Sim Zacks <[EMAIL PROTECTED]> wrote: > >> I don't quite follow that...the whole point of utf8 encoded database >> is so that you can use text functions and operators without the bytea >> treatment. As long as your client encoding is set up properly (so >> that data

Re: [GENERAL] bytea encode performance issues

2008-08-06 Thread Sim Zacks
I ran the update, but now (obviously) it wants to vacuum again and vacuum on that table took 9 hours yesterday. Do the statistics change when changing the storage type? Meaning does it really need to vacuum? Thank you Sim Tom Lane wrote: > Sim Zacks <[EMAIL PROTECTED]> writes: >> After the alter

Re: [GENERAL] bytea encode performance issues

2008-08-06 Thread Sim Zacks
> I don't quite follow that...the whole point of utf8 encoded database > is so that you can use text functions and operators without the bytea > treatment. As long as your client encoding is set up properly (so > that data coming in and out is computed to utf8), then you should be > ok. Dropping

Re: [GENERAL] bytea encode performance issues

2008-08-06 Thread Merlin Moncure
On Wed, Aug 6, 2008 at 9:16 AM, Sim Zacks <[EMAIL PROTECTED]> wrote: > We are using UTF-8, and I am testing SQL-ASCII at the moment. DBMail is > a pre-built application, so until I am ready to start playing with its > internals I don't really have a choice about a number of its features. > The reas

Re: [GENERAL] bytea encode performance issues

2008-08-06 Thread Richard Huxton
Sim Zacks wrote: DBMail is a pre-built application, so until I am ready to start playing with its internals I don't really have a choice about a number of its features. Have you heard of this? Might be worth a quick look: http://www.archiveopteryx.org/overview -- Richard Huxton Archonet L

Re: [GENERAL] bytea encode performance issues

2008-08-06 Thread Tom Lane
Sim Zacks <[EMAIL PROTECTED]> writes: > After the alter table, then I have to update each row with an > update dbmail_messageblks set messageblk=messageblk; > so that it uses the new storage. I wouldn't actually bet on that changing anything at all ... I'd try something like messageblk = messagebl

Re: [GENERAL] bytea encode performance issues

2008-08-06 Thread Sim Zacks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Lane wrote: > We've seen complaints about toast fetch time before. I don't think > there's any really simple solution. You could experiment with disabling > compression (SET STORAGE external) but I'd bet on that being a net loss > unless the dat

Re: [GENERAL] bytea encode performance issues

2008-08-05 Thread Tom Lane
Sim Zacks <[EMAIL PROTECTED]> writes: > Results below: >> ... but given that, I wonder whether the cost isn't from fetching >> the toasted messageblk data, and nothing directly to do with either >> the encode() call or the ~~ test. It would be interesting to compare >> the results of Okay, so sub

Re: [GENERAL] bytea encode performance issues

2008-08-04 Thread Sim Zacks
Results below: > ... but given that, I wonder whether the cost isn't from fetching > the toasted messageblk data, and nothing directly to do with either > the encode() call or the ~~ test. It would be interesting to compare > the results of > > explain analyze select encode(messageblk, 'escape')

Re: [GENERAL] bytea encode performance issues

2008-08-04 Thread Tom Lane
Sim Zacks <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Could we see EXPLAIN ANALYZE, not EXPLAIN? Without actual facts >> to work from, any suggestions would be mere guesswork. > " -> Seq Scan on dbmail_messageblks k > (cost=0.00..39193.21 rows=259 width=764) (actual time=30.662

Re: [GENERAL] bytea encode performance issues

2008-08-04 Thread Tomasz Ostrowski
On 2008-08-03 12:12, Sim Zacks wrote: > SELECT m.message_idnr,k.messageblk > FROM dbmail_messageblks k > JOIN dbmail_physmessage p ON k.physmessage_id = p.id > JOIN dbmail_messages m ON p.id = m.physmessage_id > WHERE > mailbox_idnr = 8 > AND status IN (0,1 ) > AND k.is_header = '0' > GROUP BY

Re: [GENERAL] bytea encode performance issues

2008-08-03 Thread Sim Zacks
Tom Lane wrote: > Could we see EXPLAIN ANALYZE, not EXPLAIN? Without actual facts > to work from, any suggestions would be mere guesswork. This was taken immediately after a vacuum analyze on the database. "HashAggregate (cost=41596.68..41596.84 rows=16 width=764) (actual time=488263.802..48826

Re: [GENERAL] bytea encode performance issues

2008-08-03 Thread Tom Lane
Sim Zacks <[EMAIL PROTECTED]> writes: > The explain of the query is: Could we see EXPLAIN ANALYZE, not EXPLAIN? Without actual facts to work from, any suggestions would be mere guesswork. Also, what can you tell us about the sizes of the messageblk strings (max and avg would be interesting)?

Re: [GENERAL] bytea encode performance issues

2008-08-03 Thread Sim Zacks
The LIKE operator is likely the problem, but it is a critical part of an email application. Searches are done by, "Show me all emails containing the following word." I've tried using TSearch2's full text index. It made the query 50% faster, taking 5 minutes. This is still not even close to the les

Re: [GENERAL] bytea encode performance issues

2008-08-03 Thread Alban Hertroys
On Aug 3, 2008, at 2:36 PM, Sim Zacks wrote: The LIKE operator is likely the problem, but it is a critical part of an email application. Searches are done by, "Show me all emails containing the following word." I've tried using TSearch2's full text index. It made the query 50% faster, takin

Re: [GENERAL] bytea encode performance issues

2008-08-03 Thread Alban Hertroys
On Aug 3, 2008, at 12:12 PM, Sim Zacks wrote: This is the query that is used (I know it is not as efficient as it could be, but this is the query it comes with): SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_message

[GENERAL] bytea encode performance issues

2008-08-03 Thread Sim Zacks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I am using postgresql 8.2.7 on gentoo for my dbmail backend. I am also testing it on mysql 5. I am trying to figure out if I need to tune my database configuration or if querying a bytea field is just not practical in postgresql. Searching with the m