On Thu, Feb 12, 2009 at 04:13:45PM +0200, Kenneth Kalmer wrote: > How would couch fair as a backend for a mail delivery system (in concept)? > Considering you need high availability and very fast IO. Documents (email > messages) will be created and deleted very often, some almost > instantaneously.
I was thinking about this too, and I think it would perform well. You could for example store headers and IMAP flags in the JSON, and the body as an attachment (or multiple attachments already MIME-broken down) I can see a few things which need working through. (1) One of the key performance issues with real-world SMTP servers is the need to fsync() the file to disk before sending back an acknowledgement, in order to guarantee delivery. I asked about this recently, and it seems that Couch takes an optimistic view: it writes the file to the OS but doesn't fsync unless you explicitly ask it. There is a special HTTP header you can provide for this. If you don't, then your database won't be corrupted if the plug is pulled, but it may be missing data. Of course, SMTP clients don't mind a small delay before they get their 250 OK at the end of the message; you can therefore write a number of batches and do an fsync() every second or so, as long as you remember not to send the acknowledgement back to each client until *after* the fsync has completed. (2) Couch won't let you write a document to disk in chunks; if it doesn't get an up-front Content-Length: header then it will buffer the whole thing in RAM. So if you receive very large E-mail messages, you may wish to buffer them locally (e.g. in a tempfile on another disk) before sending them as a single document to Couch. Note that you sometimes get an indication of the message size in a SMTP transaction, but it's not guaranteed to be accurate; so you won't know the true size until you've read it in. (3) As you say, messages are stored and deleted frequently. You may end up having to compact your message store frequently, which means basically reading the whole store from start to end and rewriting it to a new file. This has to be done when write load isn't too high, to ensure that it will complete. Regards, Brian.
