[CC'ed to perf-discuss]

Gary Mills wrote:
We have an IMAP server with ZFS for mailbox storage that has recently
become extremely slow on most weekday mornings and afternoons.  When
one of these incidents happens, the number of processes increases, the
load average increases, but ZFS I/O bandwidth decreases.  Users notice
very slow response to IMAP requests.  On the server, even `ps' becomes
slow.

We've tried a number of things, each of which made an improvement, but
the problem still occurs.  The ZFS ARC size was about 10 GB, but was
diminishing to 1 GB when the server was busy.  In fact, it was
unusable when that happened.  Upgrading memory from 16 GB to 64 GB
certainly made a difference.  The ARC size is always over 30 GB now.
Next, we limited the number of `lmtpd' (local delivery) processes to
64.  With those two changes, the server still became very slow at busy
times, but no longer became unresponsive.  The final change was to
disable ZFS prefetch.  It's not clear if this made an improvement.

If memory is being stolen from the ARC, then the consumer must be outside
of ZFS.  I think this is a case for a traditional performance assessment.

The server is a T2000 running Solaris 10.  It's a Cyrus murder back-
end, essentially only an IMAP server.  We did recently upgrade the
front-end, from a 4-CPU SPARC box to a 16-core Intel box with more
memory, also running Solaris 10.  The front-end runs sendmail and
proxies IMAP and POP connections to the back-end, and also forwards
SMTP for local deliveries to the back-end, using LMTP.

Cyrus runs thousands of `imapd' processes, with many `pop3d', and
`lmtpd' processes as well.  This should be an ideal workload for a
Niagara box.  All of these memory-map several moderate-sized
databases, both Berkeley DB and skiplist types, and occasionally
update those databases.  Our EMC Networker client also often runs
during the day, doing backups.  All of the IMAP mailboxes reside on
six ZFS filesystems, using a single 2-TB pool.  It's only 51% occupied
at the moment.

Many other layers are involved in this server.  We use scsi_vhci for
redundant I/O paths and Sun's Iscsi initiator to connect to the
storage on our Netapp filer.  The kernel plays a part as well.  How
do we determine which layer is responsible for the slow performance?


prstat is your friend.  Find out who is consuming the resources and work
from there.

I've found that it often makes sense to create processor sets and segregate
dissimilar apps into different processor sets. mpstat can then clearly show
how each processor set consumes its processors.  IMAP workloads can
be very tricky, because of the sort of I/O generated and because IMAP
allows searching to be done on the server, rather than the client (eg POP)
-- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to