Benoit Tellier created JAMES-3740:
-------------------------------------

             Summary: IMAP UID <-> MSN mapping occupies too much memory
                 Key: JAMES-3740
                 URL: https://issues.apache.org/jira/browse/JAMES-3740
             Project: James Server
          Issue Type: Improvement
          Components: IMAPServer
    Affects Versions: 3.7.0
            Reporter: Benoit Tellier


h3. What is UID <-> MSN mapping ?

In IMAP RFC-3501 there is two ways one addresses a message:
 - By its UID (Unique ID) that is unique (until UID_VALIDITY changes...)
 - By its MSN (Message Sequence  Number) which is the (mutable) position of a 
message in the mailbox.

We then need:
 - Given a UID return its MSN which is for instance compulsory upon EXPUNGED 
notifications when QRESYNCH is not enabled.
 - Given a MSN based request we need to convert it back to a UID (rare).

We do store the list of UIDs, sorted, in RAM and perform binarysearches to 
resolve those.

h3. What is the impact on heap?

Each uid is wrapped in a MessageUID object. This object wrapping comes with an 
overhead of at least 12 bytes in addition to the 8 bytes payload (long). Quick 
benchmarks shows it's actually worse: 10 million uids did take up to 275 MB.

{code:java}
    @Test
    void measureHeapUsage() throws InterruptedException {
        int count =10000000;
        testee.addAll(IntStream.range(0, count)
            .mapToObj(i -> MessageUid.of(i + 1))
            .collect(Collectors.toList()));
        Thread.sleep(1000);
        System.out.println("GCing");
        System.gc();
        Thread.sleep(1000);

        
System.out.println(ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed());
    }
{code}

Now, from let's take a classical production deployment I get:
 - Some users have up to 2.5 million messages in their INBOX
 - I can get an average of 100.000 messages for each user

So for a small scale deployment, we are already "consuming" ~300 MB of memory 
just for the UID <-> mapping.

Scaling to 1.000 users on a single James instance we clearly see that HEAP 
consumption will start being a problem (~3GB) without even speaking of target 
of 10.000 users per James I do have in mind.

It's worth mentioning that IMAP being statefull, and UID <-> MSN mapping 
attached to a selected mailbox, such a mapping is long lived:
 - Multiple small objects would need to be copied individually by the GC, 
putting pressure during long gen
 - Those long lived object will eventually be promoted to old gen, thus the 
more there is the longer the resulting stop-the-world GC pauses will be.

h3. Temporary fix ?

We can get rid of the object boxing in UidMsnConverter by using primitive type 
collections for instance provided by fastutils project.

The same bench was down to 84MB.

Also, we could get things more compact by using an INT representation of UIDs. 
(Those are most of the case below 2 billions, to be above this there need to be 
more than 2 billion emails transiting through one's mailbox which is highly 
unlikely). A fallback to "long" storage can be setted up if a UID above 2 
billion is observed.

This such a compact int storage we are down to 46MB.

So taking the prior mentioned numbers we could expect a 1.000 people deployment 
to require ~400 MB and a larger scale 10.000 people deployment on a single 
James to consume up to 4GB. Not that enjoyable but definitly more manageable.

Please note that primitive collections are more GC friendly as their elements 
are manages together, as a single object (backing array).

h3. What other mail servers do

I found references to Dovecote, which does a similar algorithm compared to us: 
binary search on a list of uids. The noticeable difference is that this list of 
UIDs is held on disk and not in memory as we do.

References: 
https://doc.dovecot.org/developer_manual/design/indexes/mail_index_api/?highlight=time

Of course, such a solution would be attractive... We could imagine keeping the 
last 1.000 uids in memory, which would most of the time be the ones used for 
MSN resolution and locate the rest on-disk, use them only when needed and thus 
dramatically reduce heap pressure.

Making UidMsnConverter an interface with a backing factory would enable 
different implementation to co-exist and allow some experimentation ;-)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to