Benchmarking mbox versus maildir

Table Of Contents

Copyright 2001-2003 Sam Varshavchik. Excerpts from this document can be freely reproduced as long as a URL to this document is provided.

Last updated on: March 25, 2003

Introduction - Phase I

This paper presents the results of a series of benchmarks that compare the relative performance of mbox-based and maildir-based mail access in several different environments. The comparative advantages of each mail storage format is a recurring subject that always comes up anytime someone asks which is the best IMAP server for them to use.


mbox mail storage format

This is the traditional way to store mail on UNIX-based mail servers. Individual messages are simply concatenated together, and saved in a single file. A special marker is placed where one message ends and the next message begins. Only one process can access the mbox file in read/write mode. Concurrent access requires a locking mechanism. Anytime someone needs to update the mbox file, everyone else must wait for the update to complete.

maildir mail storage format

Maildirs were originally implemented in the Qmail mail server, supposedly to address the inadequacies of mbox files. Individual messages are saved in separate files, one file per message. There is a defined method for naming each file. There's a defined procedure for adding new messages to the maildir. No locking is required. Multiple processes can use maildirs at the same time.

mbx mail storage format

This is a slightly modified version of the original mbox format that's offered by the UW-IMAP server. mbx mailboxes still require locking. The main difference from the mbox format is that each message in the file is preceded by a record that carries some message-specific metadata. As such, certain operations that used to require the entire mbox file to be rewritten can now be implemented by updating the fixed-size header record.

This benchmark focuses mainly on the mbox and maildirs formats. In March of 2003 an unrelated party conducted a similar benchmark for mbx formats. See http://www.decisionsoft.com/pdw/mailbench.html for more details.


Documentation included with the University of Washington IMAP server (UW-IMAP) states that maildirs have many "performance disadvantages" and that the maildir format "doesn't scale." Furthermore, maildirs are supposedly vulnerable to "filesystem trashing" due to multiple "open() and stat()" calls, because "just about every filesystem in existence serializes" file creation and access[1]. The document makes a conclusion that this results in performance degradation for "moderately sized" mailboxes of about 2,000 messages.

Painting "just about" every filesystem in existence with the same brush, and assuming that every filesystem works pretty much in the same way, is very misleading. Many contemporary high performance filesystem are designed explicitly for parallel access. For example, consider the SGI XFS filesystem:

The free space and inodes within each AG are managed independently and in parallel so multiple processes can allocate free space throughout the file system simultaneously.[2]

It took me about 6 months to write the first revision of the maildir-based Courier-IMAP server. The absence of maildir support in the UW-IMAP server is the reason I wrote it. Many people have found that it needed less memory, and was faster than UW-IMAP. Many people observed that upgrading to Courier-IMAP lowered their overall system load, and increased performance. Large mail clusters with a network-based fault tolerant, scalable, architecture frequently have problem deploying mbox-based mailboxes, due to many documented problems with file locking (file locking is required for mbox-based mailboxes) with network-based filesystems.[3] As referenced in [3], maildirs have no issues with NFS (the most common type of a network-based filesystem) since maildirs do not use locking.

After looking around for some time, I did not find any independent benchmarks that directly measured the relative performance of mboxes and maildirs. Therefore I decided to run some actual benchmarks myself. I defined the test conditions according to UW-IMAP server's documentation. I created a test environment that stacked the deck in favor of mboxes. This was done in accordance with the claimed shortcomings of maildirs as stated in UW-IMAP server's documentation, in order to accurately measure the magnitude of the claimed problems.

Test environment - Phase I

For this benchmark, I used the UW-IMAP 2000 server, that uses mbox files, and the Courier-IMAP 1.3.6 server, that uses maildirs. Initially I created a mailbox with 100 messages, and ran the same benchmarking script for each server. I reran the same script the second time, this time with 2,000 messages. This benchmarking script put each IMAP server through several tasks. Each task was profiled with the time command.

The benchmarks initially ran on very low-end, obsolete, hardware, then repeated on a more robust, modern server. This makes it possible to observe both kinds of scalability: larger mailboxes, and faster hardware.

Here's the script that generated the test data for the benchmarks:


#!/bin/sh

n=0
while test $n -lt 100
do

    dd if=/dev/urandom bs=3k count=1 | uuencode - | \
            mail -s "Test message $n" `whoami`

    n=`expr $n + 1`
done

The mail server was configured to deliver mail either to /var/spool/mail or to $HOME/Maildir, for its respective IMAP server. This test script created 100 messages, each one approximately 4.5Kb in size. In the second half of the benchmark, the script was modified to create 2,000 messages.

After the mailbox was primed with dummy messages, the following script benchmarked each IMAP server:


#!/bin/sh

# For IMAP-2000:
#
# PATH=/usr/sbin:$PATH
# export PATH

# For Courier-IMAP 1.3.6:

# PATH=/usr/lib/courier-imap/bin:$PATH
# MAILDIR=$HOME/Maildir
# export PATH
# export MAILDIR

echo "=============="
echo "SELECT.1"
echo ""
time imapd <<EOF
001 SELECT INBOX
002 LOGOUT
EOF
echo "=============="
echo ""
echo "SELECT.2"
echo ""
time imapd <<EOF
001 SELECT INBOX
002 LOGOUT
EOF
echo ""
echo "=============="
echo ""
echo "DELETE.1"
echo ""
time imapd <<EOF
001 SELECT INBOX
002 STORE 50 +FLAGS.SILENT (\Deleted)
003 EXPUNGE
004 LOGOUT
EOF
echo ""
echo "=============="
echo ""
echo "FETCH.1"
echo ""
time imapd >/dev/null <<EOF
001 SELECT INBOX
002 FETCH 1:* (BODYSTRUCTURE)
003 EXPUNGE
004 LOGOUT
EOF
echo ""
echo "=============="
echo ""
echo "FETCH.2"
echo ""
time imapd >/dev/null <<EOF
001 SELECT INBOX
002 FETCH 1:* (BODYSTRUCTURE)
003 EXPUNGE
004 LOGOUT
EOF
echo "=============="
echo ""
echo "SEARCH.1"
echo ""
time imapd >/dev/null <<EOF
001 SELECT INBOX
002 SEARCH 1:* TEXT "This text will not be found"
003 EXPUNGE
004 LOGOUT
EOF
echo "=============="
echo ""
echo "SEARCH.2"
echo ""
time imapd >/dev/null <<EOF
001 SELECT INBOX
002 SEARCH 1:* TEXT "This text will not be found"
003 EXPUNGE
004 LOGOUT
EOF

Here's a brief explanation for those who are not familiar with the IMAP protocol syntax. These tests carried out the following tasks:

  1. SELECT.1 - open a mailbox with 100 or 2,000 new messages.
  2. SELECT.2 - open a mailbox with 100 or 2,000 messages that have already been seen.
  3. DELETE.1 - delete a message from the mailbox.
  4. FETCH.1 - retrieve the MIME structure of all messages in the mailbox.
  5. FETCH.2 - same command as FETCH.1
  6. SEARCH.1 - search all messages for a text string.
  7. SEARCH.2 - same command as SEARCH.1

Benchmark results on low-end hardware

Hardware:

This slow hardware was chosen to highlight any inherent bottlenecks or performance problems that are inherent with maildirs. The raw results are given below. Analysis follows:


  UW-IMAP 2000 Courier-IMAP 1.3.6
100 messages 2,000 messages 100 messages 2,000 messages
SELECT.1
real        0m1.552s
user        0m0.090s
sys        0m0.300s
real        0m9.069s
user        0m2.120s
sys        0m2.440s
real        0m0.313s
user        0m0.150s
sys        0m0.100s
real        0m4.408s
user        0m0.160s
sys        0m4.210s
SELECT.2
real        0m0.208s
user        0m0.080s
sys        0m0.060s
real        0m1.068s
user        0m0.630s
sys        0m0.380s
real        0m0.030s
user        0m0.010s
sys        0m0.020s
real        0m0.169s
user        0m0.150s
sys        0m0.020s
DELETE.1
real        0m0.710s
user        0m0.120s
sys        0m0.020s
real        0m5.250s
user        0m1.190s
sys        0m1.510s
real        0m0.040s
user        0m0.010s
sys        0m0.030s
real        0m0.362s
user        0m0.330s
sys        0m0.030s
FETCH.1
real        0m0.455s
user        0m0.260s
sys        0m0.120s
real        0m6.061s
user        0m5.060s
sys        0m0.890s
real        0m0.728s
user        0m0.200s
sys        0m0.080s
real        0m27.713s
user        0m3.250s
sys        0m1.860s
FETCH.2
real        0m0.455s
user        0m0.270s
sys        0m0.120s
real        0m6.219s
user        0m5.220s
sys        0m0.890s
real        0m0.246s
user        0m0.140s
sys        0m0.110s
real        0m4.466s
user        0m2.930s
sys        0m1.500s
SEARCH.1
real        0m0.551s
user        0m0.410s
sys        0m0.080s
real        0m7.935s
user        0m6.450s
sys        0m1.380s
real        0m0.482s
user        0m0.350s
sys        0m0.140s
real        0m9.251s
user        0m7.480s
sys        0m1.760s
SEARCH.2
real        0m0.553s
user        0m0.400s
sys        0m0.100s
real        0m8.167s
user        0m6.920s
sys        0m1.140s
real        0m0.484s
user        0m0.390s
sys        0m0.090s
real        0m9.246s
user        0m7.300s
sys        0m1.870s

Analysis

The time command reports the following data:

user and sys can be interpreted as the total amount of CPU time the process took to execute. The difference between their sum, and the amount of real time, is the time the system was waiting for a pending I/O operation to complete (or it was busy with something else), before continuing to execute the process. user represents the actual amount of time executed by the program code, while sys represents the amount of time executed by the kernel, in this process. One typical example is the actual kernel code to open or close files, or read and write the content of the file.

SELECT.1

Here, the IMAP server opened a mailbox with a bunch of messages it never saw before. Even with 2,000 messages, maildirs are twice as fast as mboxes. Why? The IMAP server must assign unique message identifiers, UIDs, to each message. The UW-IMAP server saves UIDs in the mbox file, and must essentially read the entire mailbox, assign UIDs, and then save the UIDs in the mbox file. The Courier-IMAP server doesn't need to read the contents of each message. It only needs to rename each file in the maildir (note the high sys time). The Courier-IMAP server keeps track of UIDs separately.

SELECT.2

The same IMAP folder is reopened, and closed. The UW-IMAP server runs much faster this time, because it doesn't have to rewrite the mailbox, and the contents of the mailbox file are already cached in memory by the operating system (note that the process almost never waits for I/O - the sum of user and sys is almost the same as real). But Courier-IMAP is still faster.

According to the raw numbers, Courier-IMAP is about seven times faster than UW-IMAP, but this ratio should be considered as a mere approximation. The total execution time, in both cases, is very small, and the actual timings are less meaningful because of the granularity of the system clock. Other factors include the context switch time, and the behavior of the operating system process scheduler.

DELETE.1

UW-IMAP's performance noticeably deteriorates with a 2,000-message mailbox. This is because deleting a message also requires the entire mbox file to be rewritten. The UW-IMAP process spends half of its time waiting for pending I/O to complete.

Courier-IMAP doesn't need to do much I/O here. It only needs to rename and then delete a single file from the maildir.

FETCH.1

Courier-IMAP's execution time degrades drastically in this test, especially with a 2,000 message mailbox. This is the first time Courier-IMAP needs to read the contents of the entire mailbox, and the slow IDE disk really grinds things to a virtual halt. Note that the actual process time is the same for both UW-IMAP and Courier-IMAP. The difference is entirely in the I/O time. UW-IMAP already had to read the mailbox several times earlier in this benchmark, and the operating system already had the mailbox's contents cached. Courier-IMAP managed to avoid reading the mailbox's contents, so far. But, it can't avoid the inevitable, and it's time to pay the piper.

FETCH.2

Same exact task as FETCH.1, but this time Courier-IMAP is faster than UW-IMAP by a small margin. Why? There's no disk I/O this time, and both servers are on equal footing. Both servers have the exact same task at hand, and Courier-IMAP is slightly faster. Why?

I do not believe that the differences between mboxes and maildirs are a direct factor. I believe that the internal design of each IMAP server is in play here. The UW-IMAP server has a number of internal abstraction and indirection layers, in order to be able to support many different mail storage formats. All that translates into additional overhead, and a less optimal internal design. Courier-IMAP is designed to support maildirs only, and its internal code is optimized, in most places, for the maildir format. Note that Courier-IMAP's user time is consistently half of UW-IMAP's. That shows the much smaller internal execution path in Courier-IMAP, which is entirely based on the way that maildirs store mail. UW-IMAP's execution path is much longer. At its top level, the execution path is more generic, and is not particularly geared for any mail storage format. Eventually, it winds its way down to the driver for each particular mailbox, and its specific code. With all things being equal, Courier-IMAP's much simpler internal architecture saves enough process time to make up for the larger number of I/O calls.

SEARCH.1 and SEARCH.2

Both benchmarks show more or less equivalent results. With everything cached at this point, UW-IMAP is faster by about a second, with 2,000 messages in the mailbox. Courier-IMAP uses a slightly more sophisticated search algorithm that can find alternate encodings of the same search string (in alternate encodings of the same base character set). This additional complexity results in a slight performance penalty.

Benchmark results on high-end hardware

Hardware:

Raw results:


  UW-IMAP 2000 Courier-IMAP 1.3.6
100 messages 2,000 messages 100 messages 2,000 messages
SELECT.1
real        0m0.198s
user        0m0.020s
sys        0m0.030s
real        0m3.068s
user        0m0.560s
sys        0m0.240s
real        0m0.026s
user        0m0.030s
sys        0m0.010s
real        0m2.147s
user        0m0.030s
sys        0m2.120s
SELECT.2
real        0m0.035s
user        0m0.030s
sys        0m0.010s
real        0m0.201s
user        0m0.130s
sys        0m0.060s
real        0m0.009s
user        0m0.010s
sys        0m0.000s
real        0m0.052s
user        0m0.040s
sys        0m0.010s
DELETE.1
real        0m0.135s
user        0m0.020s
sys        0m0.030s
real        0m2.195s
user        0m0.220s
sys        0m0.250s
real        0m0.014s
user        0m0.000s
sys        0m0.010s
real        0m0.113s
user        0m0.090s
sys        0m0.020s
FETCH.1
real        0m0.093s
user        0m0.060s
sys        0m0.030s
real        0m1.359s
user        0m1.140s
sys        0m0.220s
real        0m0.057s
user        0m0.050s
sys        0m0.000s
real        0m1.004s
user        0m0.800s
sys        0m0.200s
FETCH.2
real        0m0.093s
user        0m0.080s
sys        0m0.020s
real        0m1.358s
user        0m1.110s
sys        0m0.240s
real        0m0.058s
user        0m0.050s
sys        0m0.010s
real        0m0.994s
user        0m0.790s
sys        0m0.200s
SEARCH.1
real        0m0.111s
user        0m0.110s
sys        0m0.000s
real        0m1.729s
user        0m1.460s
sys        0m0.270s
real        0m0.115s
user        0m0.100s
sys        0m0.020s
real        0m2.198s
user        0m1.870s
sys        0m0.330s
SEARCH.2
real        0m0.112s
user        0m0.090s
sys        0m0.010s
real        0m1.712s
user        0m1.450s
sys        0m0.250s
real        0m0.115s
user        0m0.100s
sys        0m0.010s
real        0m2.201s
user        0m1.910s
sys        0m0.290s

Analysis

Things look very different on larger hardware. It should be noted that the hardware used in this benchmark -- although much more powerful -- is not even considered to be state of the art, at the time these benchmarks were performed. Modern mail servers usually have two or four Pentium III (or Xeon) CPUs running at 800Mhz or higher; at least half a gigabyte of PC-133 SDRAM; and wide-SCSI hard drives running at 160MB/s DMA.

SELECT.1, SELECT.2, DELETE.1

Courier-IMAP continues to maintain its performance edge over the UW-IMAP server, pretty much by the same margin as it did on low-end hardware.

FETCH.1, FETCH.2, SEARCH.1, SEARCH.2

With better hardware, Courier-IMAP was slightly faster than UW-IMAP with 100 messages, and slightly slower with 2,000 messages. The severe performance degradation in the FETCH.1 benchmark with 2,000 messages -- that was caused by a slow IDE disk and limited amount of RAM -- is nowhere to be found. Therefore, Courier-IMAP technically scaled better than UW-IMAP, when moving from low-end to high-end hardware.

Memory usage comparison

These benchmarks show the memory usage of each IMAP server. The memory usage numbers were obtained by:

The raw results:


UW-IMAP 2000 Courier-IMAP 1.3.6
100 messages 2,000 messages 100 messages 2,000 messages
VmSize:     3832 kB
VmRSS:      1656 kB
VmData:      192 kB
VmStk:        28 kB
VmExe:       688 kB
VmLib:      2788 kB
VmSize:     5344 kB
VmRSS:      3168 kB
VmData:     1704 kB
VmStk:        28 kB
VmExe:       688 kB
VmLib:      2788 kB
VmSize:     1596 kB
VmRSS:       688 kB
VmData:       92 kB
VmStk:        28 kB
VmExe:       160 kB
VmLib:      1284 kB
VmSize:     2340 kB
VmRSS:      1444 kB
VmData:      832 kB
VmStk:        32 kB
VmExe:       160 kB
VmLib:      1284 kB

Analysis

These numbers report the following information:

Multiple instances of the same program share a single copy of the VmExe and VmLib segments. Therefore it's the size of the VmData, VmRSS, and VmStk segments that determines how many processes can be running, before the server runs out of memory.

Courier-IMAP's memory needs grew at a slightly faster pace than UW-IMAP's. However, Courier-IMAP needs much less memory than UW-IMAP to open a folder. Even at 2,000 messages, Courier-IMAP's VmData and VmRSS were less than half of UW-IMAP's. A mail server should be able to support at least twice as many IMAP clients with Courier-IMAP, before running out of RAM. This assumes that other system resources (filesystem handles, maximum number of processes, etc...) are not exhausted before then.

Introduction - Phase II

The parameters for Phase II were defined after reviewing the results of Phase I. The same benchmarking script was used for phase II, except that the INBOX folder was loaded with 10,000 random messages. The total size of INBOX was approximately 40 megabytes.

In Phase I, maildirs showed some weaknesses on low-end hardware, but achieve slightly better scalability - as compared to mbox files - on high end hardware. Phase II tries to determine if this scaling trend continues with even larger mail folders.

Test environment - Phase II

The same scripts generated the test data. The scripts were modified to generate 10,000 messages, about 4K per message. Phase II used the same test machines as in Phase I. Refer to Phase I for the specifications of each test machine.

Benchmark results


UW-IMAP 2000 Courier-IMAP 1.3.6
Low-end hardware High-end hardware Low-end hardware High-end hardware
SELECT.1
real    1m16.516s
user    0m40.260s
sys     0m25.680s
real    0m27.873s
user    0m15.780s
sys     0m2.060s
real    1m32.598s
user    0m0.850s
sys     1m30.220s
real    1m2.735s
user    0m0.320s
sys     1m1.880s
SELECT.2
real    0m5.221s
user    0m2.360s
sys     0m2.080s
real    0m0.721s
user    0m0.470s
sys     0m0.260s
real    0m0.786s
user    0m0.720s
sys     0m0.070s
real    0m0.240s
user    0m0.180s
sys     0m0.050s
DELETE.1
real    0m24.241s
user    0m4.470s
sys     0m14.320s
real    0m8.690s
user    0m0.890s
sys     0m1.200s
real    0m1.756s
user    0m1.570s
sys     0m0.190s
real    0m0.553s
user    0m0.460s
sys     0m0.090s
FETCH.1
real    0m26.383s
user    0m22.330s
sys     0m4.000s
real    0m5.921s
user    0m5.000s
sys     0m0.920s
real    2m26.612s
user    0m12.670s
sys     0m34.310s
real    0m3.898s
user    0m2.770s
sys     0m1.060s
FETCH.2
real    0m26.401s
user    0m22.370s
sys     0m3.970s
real    0m5.867s
user    0m4.820s
sys     0m1.050s
real    2m28.187s
user    0m12.140s
sys     0m34.500s
real    0m3.756s
user    0m2.840s
sys     0m0.920s
SEARCH.1
real    0m36.359s
user    0m30.050s
sys     0m5.980s
real    0m8.938s
user    0m7.210s
sys     0m1.660s
real    2m56.652s
user    0m29.760s
sys     0m37.050s
real    0m7.520s
user    0m6.160s
sys     0m1.360s
SEARCH.2
real    0m35.390s
user    0m30.070s
sys     0m5.280s
real    0m8.784s
user    0m7.430s
sys     0m1.360s
real    2m55.936s
user    0m29.450s
sys     0m37.320s
real    0m7.572s
user    0m6.290s
sys     0m1.280s

Analysis

Phase II's results were consistent with Phase I's. Maildirs continued to fall behind on low-end hardware. Mboxes lagged behind maildirs on high-end hardware. Except for the SELECT.1 benchmark, maildirs scaled much better (from 2,000 messages) on high-end hardware that mboxes. In fact, in terms of absolute numbers, maildirs were faster than mbox files. The Courier-IMAP server even managed to beat UW-IMAP on the FETCH and SEARCH benchmarks, for the very first time. The most likely explanation for that is that Courier-IMAP's smaller code size means that a larger percentage of its code can be kept in the CPU's Level 1 cache. Celerons do not have Level 2 cache, but they do have Level 1 cache.

The SELECT.1 benchmark involved opening a folder with 10,000 new messages. In this test, the UW-IMAP server only needed to rewrite the mbox file. The Courier-IMAP server had to rename every one of the 10,000 files in the maildir. Note that the maildir results show almost no CPU user time. All the CPU time came from the kernel.

Memory usage comparison


UW-IMAP 2000 Courier-IMAP 1.3.6
VmSize:     9656 kB
VmRSS:      7520 kB
VmData:     6036 kB
VmStk:        28 kB
VmExe:       688 kB
VmLib:      2752 kB
VmSize:     5488 kB
VmRSS:      4620 kB
VmData:     4008 kB
VmStk:        32 kB
VmExe:       160 kB
VmLib:      1256 kB

Even with a 10,000 messages in a folder, Courier-IMAP needed much less memory than UW-IMAP.

Introduction - Phase III

The parameters for Phase III were designed to determine a different kind of scalability. Phases I and II had a large number of small messages in the folder. In Phase III the mail folder had a small number of large messages. This environment is more like a corporate environment than an ISP environment, with middle-management constantly exchanging large documents and presentation files. The parameters for Phase III were defined after reviewing the results of Phase I and Phase II. The same benchmarking script was used for phase III. The INBOX folder in Phase III was about the same size as in phase II - about 40 megabytes - except that it contained 200 messages, and each message was 200Kb long.

Test environment - Phase III

The same scripts generated the test data. The scripts were modified to generate 20 messages, about 200Kb per message. Phase II used the same test machines as in Phase I. Refer to Phase I for the specifications of each test machine.

Benchmark results


UW-IMAP 2000 Courier-IMAP 1.3.6
Low-end hardware High-end hardware Low-end hardware High-end hardware
SELECT.1
real    0m43.238s
user    0m2.820s
sys     0m27.310s
real    0m13.712s
user    0m1.130s
sys     0m1.610s
real    0m0.141s
user    0m0.030s
sys     0m0.110s
real    0m0.290s
user    0m0.140s
sys     0m0.140s
SELECT.2
real    0m3.393s
user    0m1.760s
sys     0m1.510s
real    0m0.518s
user    0m0.350s
sys     0m0.160s
real    0m0.036s
user    0m0.030s
sys     0m0.010s
real    0m0.013s
user    0m0.010s
sys     0m0.000s
DELETE.1
real    0m3.335s
user    0m1.790s
sys     0m1.410s
real    0m7.139s
user    0m0.940s
sys     0m1.260s
real    0m0.059s
user    0m0.050s
sys     0m0.010s
real    0m0.022s
user    0m0.010s
sys     0m0.010s
FETCH.1
real    0m21.806s
user    0m18.280s
sys     0m3.430s
real    0m4.587s
user    0m4.010s
sys     0m0.580s
real    0m25.223s
user    0m3.530s
sys     0m14.080s
real    0m1.171s
user    0m1.000s
sys     0m0.180s
FETCH.2
real    0m21.783s
user    0m18.260s
sys     0m3.490s
real    0m4.586s
user    0m4.020s
sys     0m0.560s
real    0m5.404s
user    0m3.370s
sys     0m0.800s
real    0m1.201s
user    0m0.970s
sys     0m0.240s
SEARCH.1
real    0m32.841s
user    0m27.560s
sys     0m5.220s
real    0m6.582s
user    0m5.690s
sys     0m0.880s
real    0m18.875s
user    0m16.960s
sys     0m1.900s
real    0m4.899s
user    0m4.480s
sys     0m0.420s
SEARCH.2
real    0m32.841s
user    0m27.560s
sys     0m5.220s
real    0m6.609s
user    0m5.600s
sys     0m0.990s
real    0m18.878s
user    0m17.300s
sys     0m1.580s
real    0m4.961s
user    0m4.470s
sys     0m0.490s

Analysis

With large messages, maildirs did better than mboxes pretty much all across the board, on both low-end and high-end hardware. Expensive disk I/O on low end hardware dragged down maildirs on the FETCH.1 benchmark, though. Recall that FETCH.1 is the first benchmark where the Courier-IMAP server actually has to read the entire mailbox. The remaining benchmarks reflect the fact that the operating system caches a few large files better than many small files. The Courier-IMAP process didn't spend much time in kernel space even on low-end hardware, indicating that virtually no disk I/O took place.

One unexpected result is the UW-IMAP server's poor performance in the SEARCH and FETCH benchmarks. It appears that the server has some kind of a problem here, scaling to mailboxes that contain large messages. Note that the UW-IMAP server spends most of its time in "user" state. There's very little system activity. The process spent pretty much all of its time in user space, and that is entirely responsible for its poor performance.

Memory usage comparison


UW-IMAP 2000 Courier-IMAP 1.3.6
VmSize:     4036 kB
VmRSS:      1900 kB
VmData:      416 kB
VmStk:        28 kB
VmExe:       688 kB
VmLib:      2752 kB
VmSize:     1604 kB
VmRSS:       712 kB
VmData:      128 kB
VmStk:        28 kB
VmExe:       160 kB
VmLib:      1256 kB

Graphs

The following graphs visually represent the performance data gathered in Phases I-III. They were derived using the following process.

For each individual benchmark, the user and sys times were added together to obtain the total CPU time used in the benchmark. Then, the average of the total CPU time and the real time was computed. Essentially, the formula was (real+user+sys)/2. Justification: user+sys represents the total CPU time, which is a factor in how many mail clients the server can support; the real time is the apparent performance from the mail client's point of view. Both measurements are reasonable factors in determining the overall system performance. A small CPU time means that the system can handle more processes. But if the real CPU time is 2 minutes - for example - the fact that the total CPU is only a couple of seconds isn't going to play very well with a mail client that now must wait 2 minutes for a response. Averaging them together computes a metric where both factors are given equal weight. That is, both real CPU time and actual CPU time are considered equally in evaluating the overall system performance.

Phases I and II

The following graphs represent the combined results of Phases I and II. The CPU time is the Y axis, the number of messages in the mailbox is the X axis. The more vertical a line, the poorer is the represented scalability. A nearly horizontal line represents a nearly perfect, constant scalability.

SELECT.1
SELECT.2
DELETE.1
FETCH.1
FETCH.2
SEARCH.1
SEARCH.2

Phase III

The following graphs represent the scalability from Phase III, with Phase I as a reference point. The same formula computed the metric for an individual benchmark. The initial value on the graph is the metric from Phase I, with 100 messages each approximately 4Kb long, for a total mailbox size of about 400Kb. The final value on the graph is the metric from the 40Mb mailbox from Phase III (200 messages, 200Kb each message).

SELECT.1
SELECT.2
DELETE.1
FETCH.1
FETCH.2
SEARCH.1
SEARCH.2

Final Analysis

These results easily reject an absolute claim that maildirs always fail to scale to large mail folders. These benchmarks show that a big factor is the underlying hardware and the operating system. The ext2 filesystem, as implemented by the Linux kernel, is known for its speed and good performance.[4]

Maildirs will not scale very well on servers that use old, slow, hardware. Maildirs will also do poorly with an inefficient filesystem that stores very large folders which are frequently searched for specific content. However maildirs' performance should be adequate even on slow machines with very large folders, as long as the mail activity is just occasional read/write access, and browsing. Even with large folders, containing unread messages, maildirs will require less system load than mboxes. On fast hardware, these benchmarks indicate that maildirs scale better in more often than not. Maildirs scale much better with mail folders that contain large messages. Even with folders that have a large number of smaller messages, maildirs did better than mboxes on many benchmarks.

It should be noted that some of these numbers reflect the overall system performance that may differ from the apparent performance seen by a mail client. When running the benchmark, the UW-IMAP server did not actually take much longer to open a 2,000 message folder than Courier-IMAP -- it postponed the mbox file rewrite until the folder was closed. However, this benchmark takes both measurements into account. From the user's standpoint, some of the delay in opening a large folder is postponed until the folder is closed. This results in a slightly faster response when opening a folder, but from the system's viewpoint the load's the same. This is why both measurements are important. Whether you take the load up front, or spread it around, the grand total is still the same. The decision to postpone rewriting the mbox file can result in some savings in time (mostly by consolidating multiple rewrites into one). However, there's also a down side to this approach. An IMAP server can always be killed by an abnormal system event, for example. When that happens to the UW-IMAP server, any unsaved changes to the folder will be lost.

Mail clients that do not cache IMAP metadata may also result in degraded maildir performance. The Pine mail client doesn't do any caching; it pretty much reads the message index every time it opens the folder, which is usually an expensive operation for maildirs. Most Windows mail clients cache IMAP metadata extensively. IMAP mail clients that support offline use MUST cache IMAP metadata. Both Netscape Mail, Outlook, and Outlook Express, usually cache everything they receive from the IMAP server. They will not ask for the entire message index, again, and therefore avoid most of maildir's message index penalty. If they open a folder and see no changes since the last IMAP session, they will do absolutely nothing. Therefore, another factor to consider is the mail client software that will be used to access the mailbox.

The final conclusion is that -- except in some specific instances -- using maildirs will be just as fast -- and in sometimes much faster -- than mbox files, while placing less of a load on the rest of the mail system. The claims in the UW-IMAP server's documentation regarding maildir performance can be supported only in certain, specific, very narrowly-defined conditions. There is no simple answer on which mail storage format is better. A lot depends on many variables that vary widely in different situations. Besides the raw benchmarks shown above, other factors include the mail server software being used, what kind of storage is being used, and the available network bandwidth. The final answer depends on all of the above.

References

[1] http://www.washington.edu/imap/documentation/formats.txt.html.

[2] "Scalability and Performance in Modern File Systems", SGI.

[3] A Google search on "nfs locking errors" provides plenty of reading material. See also "Using sendmail in a NFS safe way".

[4] Independent benchmarks show that Linux's ext2 filesystem outperforms Solaris's tmpfs RAM-based filesystem!


http://www.courier-mta.org