> What we're seeing is that our network and RAID 5 IDE-based disk array on > our central mail store server is not able to keep up with the 'client' > servers doing the POP3, IMAP, Webmail, and SMTP legwork.
I've found an interesting bottleneck with webmail. When people use POP or IMAP clients (Outlook, Mozilla, Opera, Thunderbird, etc.), the client application caches alot of the information locally and synchronizes occasionally with the server to see if there are new messages. Things like browsing and searching run eally fast because the user is utilizing the resources of their local PC to do most of teh work. With webmail, the session state is not saved nor cached, so with each new request, the mailbox can be rescanned. A relatively modest webmail application might only rescan all headers and show subject lines. A complex application might scan all content in a folder to present content more fully. Without anything to throttle back the webmail server, it's possible that the webmail server softwar can pound the mail spool server to death. I used to run a Qmail-based infrastructure for 4000 clients on a single slow machine without much memory. They used POP as their only pickup mechanism. We recently reimplemented on a Dell 1750 with two Xeon procs, alot of RAM and a GigE backend to a NetApp filer with 14 fast disks, and I STILL notice that the machine sometimes slowed down while people tried to read their 140MB mailboxes via webmail. <sigh> I put some bottlenecks on the "search" and retrieval algorithms of the webmail software to help protect the filer from a flood of queries, and we've been better since then. The power users with super-large mailboxes complain that it's "slow", but now its a localized problem rather than a problem that affects everyone. Jeremy's comments are great for scaling the database, but it sounds to me that you're just maxed out on what you can serve over NFS. An SQL select might take at most a few kilobytes of data on the network whereas a webmail scan of a 30MB mailbox will take, well, 30MB. Doh! So.... what to do? Instead of the centralized NFS mail spool (where the central spool becomes the bottleneck), you might consider splitting the user base across several machines. Each machine would have its own RAID1 mail spool. Each machine would be responsible for its own Inbound SMTP and POP/IMAP/Webmail and use the local disk for the spool. Use lots of RAM for "buffer cache" to make sure your disk is hit less frequently. You might be able to centralize outbound SMTP. Once a machine "fills up", you add another machine. This is one way to scale. The big boys in teh mailbox size wars (google, yahoo, hotmail) can't afford centralized storage for their mailboxes. Look for each to roll out racks of distribtuted storage where each "storage server" is a 1/2 U box with a couple large ATA disks in it. We might learn from this method of scaling. > Before we take this costly step, what have you noticed for user / system > loads before you start hitting the limits of your hardware? Yes. I serve 6000 users right now. They used to all be POP, and life was good. Now a significant percentage of my new customers use webmail, and I'm not happy with how my current web-based mail reading software scales. I may have to hack it alot to get it to perform well. Something that would help is if we rolled out spam/virus filtering out for everyone whih will cut 50% inbound mail and 10% viruses from being processed/stored/read and reread/reread/reread. BTW: I separate SMTP processing (/var/qmail local RAID1 fast SCSI with battery cache) from user mail spool storage (/home/vpopmail NFS mount to filer). Putting /var/qmail on the NFS server might be another source of overload. -- Eric Ziegast