On Sat, Jun 10, 2006 at 01:23:35PM -0600, qqqq wrote:
> >I would defer to the smart people to figure out the details. However I do 
> >wonder if the actual body content of the message would be best stored in a 
> >file and the SQL used to store anything and everything you would want to 
> >index. That would keep the SQL file size down if that's an issue. However, 
> >SQL databases might have to be changed to accomodate the needs to store 
> >email.
> 
> I think this is what I was getting at early in the thread.  I would think 
> that a 5 MB body would do better on file but I don't know enough in regards 
> to DBs to even make a call.

A good rule of thumb about storing something in the database is: are you
going to search that data? If you're going to search the text of an
email body, that makes it a more likely candidate for storing it in the
database (though there are ways to do this searching while storing the
file externally).

Another consideration is that storing everything in the database is
substantially easier than splitting between a database and the
filesystem. If you think this is a non-issue, consider how to deal with
all the error conditions where either the database or the filesystem is
updated, but not both.

Of course, storing anything in a database is going to have more
overheard than storing it as raw bytes on the filesystem, and there's
not really a way around that. Different databases will impose different
amounts of overhead.

As for all the arguments about how databases won't scale, or how they're
a single point of failure.. what exactly do you think a single mail
server is? Answer: not scalable and a single point of failure. Of
course there are ways to work around that, and those methods apply just
as well to databases (though the implementation can be different). Most
databases support at least some form of replication, and many support
clustering. And of course you don't have to try and cram all your users
into a single database.

Having said all that; it's nearly impossible to get a general-purpose RDBMS 
to outperform an optimized storage format (if you find an example where
it is possible, I'd wager that's only true because the original format
wasn't very well thought-out). It's essentially a given that a given set
of hardware will be able to handle a higher load of storing and
retrieving emails using maildir rather than a database (unless you get
enough messages in a directory that it starts choking the filesystem).
But if you want to do something like search for specific emails, there's
a much better chance that a database will outperform maildir, especially
if you're searching the message body. And there's other potential
applications where a database would outperform maildir as well.

So, in a nutshell, if you're not going to try doing something more
advanced than just storing and retrieving email, it's unlikely that
you'll be happy with storing that email in a database. The further off
that 'beaten path' you get, the more likely you are to see benefit from
using a database.
-- 
Jim C. Nasby, Database Architect                [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Reply via email to