[ https://issues.apache.org/jira/browse/JAMES-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514545#comment-17514545 ]
Andreas Joseph Krogh commented on JAMES-3740: --------------------------------------------- We use a custom implementation of `SelectedMailbox` which holds a private variable `uidMappings` of type `OrigoUidMsnMapper` (this is the one switching to using `org.mapdb` if numMsg > 5000). Our implementation of `SelectedMailbox` uses JDBC to build the UID<->MSN mappings so it's not quite portable outside our implementation. When `SelectMailbox` is initialized (a Mailbox is SELECT'ed) we do: {code:java} uidMappings = imapService.getUidMsnMappingForFolder(origoMailbox.getMailboxId){code} Then we override getFirst etc. (Scala-code): {code:java} override def getFirstUid: Optional[MessageUid] = this.synchronized { uidMappings.getFirstUid() } override def getLastUid: Optional[MessageUid] = this.synchronized { uidMappings.getLastUid() }{code} > IMAP UID <-> MSN mapping occupies too much memory > ------------------------------------------------- > > Key: JAMES-3740 > URL: https://issues.apache.org/jira/browse/JAMES-3740 > Project: James Server > Issue Type: Improvement > Components: IMAPServer > Affects Versions: 3.7.0 > Reporter: Benoit Tellier > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > h3. What is UID <-> MSN mapping ? > In IMAP RFC-3501 there is two ways one addresses a message: > - By its UID (Unique ID) that is unique (until UID_VALIDITY changes...) > - By its MSN (Message Sequence Number) which is the (mutable) position of a > message in the mailbox. > We then need: > - Given a UID return its MSN which is for instance compulsory upon EXPUNGED > notifications when QRESYNCH is not enabled. > - Given a MSN based request we need to convert it back to a UID (rare). > We do store the list of UIDs, sorted, in RAM and perform binarysearches to > resolve those. > h3. What is the impact on heap? > Each uid is wrapped in a MessageUID object. This object wrapping comes with > an overhead of at least 12 bytes in addition to the 8 bytes payload (long). > Quick benchmarks shows it's actually worse: 10 million uids did take up to > 275 MB. > {code:java} > @Test > void measureHeapUsage() throws InterruptedException { > int count =10000000; > testee.addAll(IntStream.range(0, count) > .mapToObj(i -> MessageUid.of(i + 1)) > .collect(Collectors.toList())); > Thread.sleep(1000); > System.out.println("GCing"); > System.gc(); > Thread.sleep(1000); > > System.out.println(ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed()); > } > {code} > Now, from let's take a classical production deployment I get: > - Some users have up to 2.5 million messages in their INBOX > - I can get an average of 100.000 messages for each user > So for a small scale deployment, we are already "consuming" ~300 MB of memory > just for the UID <-> mapping. > Scaling to 1.000 users on a single James instance we clearly see that HEAP > consumption will start being a problem (~3GB) without even speaking of target > of 10.000 users per James I do have in mind. > It's worth mentioning that IMAP being statefull, and UID <-> MSN mapping > attached to a selected mailbox, such a mapping is long lived: > - Multiple small objects would need to be copied individually by the GC, > putting pressure during long gen > - Those long lived object will eventually be promoted to old gen, thus the > more there is the longer the resulting stop-the-world GC pauses will be. > h3. Temporary fix ? > We can get rid of the object boxing in UidMsnConverter by using primitive > type collections for instance provided by fastutils project. > The same bench was down to 84MB. > Also, we could get things more compact by using an INT representation of > UIDs. (Those are most of the case below 2 billions, to be above this there > need to be more than 2 billion emails transiting through one's mailbox which > is highly unlikely). A fallback to "long" storage can be setted up if a UID > above 2 billion is observed. > This such a compact int storage we are down to 46MB. > So taking the prior mentioned numbers we could expect a 1.000 people > deployment to require ~400 MB and a larger scale 10.000 people deployment on > a single James to consume up to 4GB. Not that enjoyable but definitly more > manageable. > Please note that primitive collections are more GC friendly as their elements > are manages together, as a single object (backing array). > h3. What other mail servers do > I found references to Dovecote, which does a similar algorithm compared to > us: binary search on a list of uids. The noticeable difference is that this > list of UIDs is held on disk and not in memory as we do. > References: > https://doc.dovecot.org/developer_manual/design/indexes/mail_index_api/?highlight=time > Of course, such a solution would be attractive... We could imagine keeping > the last 1.000 uids in memory, which would most of the time be the ones used > for MSN resolution and locate the rest on-disk, use them only when needed and > thus dramatically reduce heap pressure. > Making UidMsnConverter an interface with a backing factory would enable > different implementation to co-exist and allow some experimentation ;-) -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org