Hello Jean,

thank you for your reply once again - but I guess I somewhat headed a wrong direction alltogether.

Anyway - one issue at a time:

1) multiple James instances connect to a single database server
What I try to achieve: Setup additional servers to receive e-mails independently. How do I try to achieve this: By joining up several mail-server instances at a central database-server. Issue: When the main James is down this is usually due to update/maintenance - hence the main database will also be down. So, although this sounded fun in the first place while on my way home today thinking about it I noticed my mistake.
Back to square 1!

2) unique IDs
This is likely a misunderstanding:
I was not talking about the uuid-message-id but rather the UIDs in the database table like
JAMES_MAILBOX.MAILBOX_ID
or
JAMES_MAIL.MAIL_UID
Currently my JAMES_MAILBOX_MAILBOX_ID is all over the place: It started sequential from 1 to 7, then jumped to 101, then 251-253, 401, 651, then all the way to 1601, 2252, 5801 and is currently at 8452. It's not correlated to the user account or to xx1 is always INBOX and xx2 is always Sent but seemingly random. If James itself is stateless then these IDs has to come from somewhere, likely the database then. My point here is: If this some number initialized once on James startup this could cause de-sync if another instance writes a record "out of sequence"? Although my initial idea is a dead end I will still try it just to see what will happen.

So I guess it likely comes down some simple fetchmail: Collecting all mails from the slaves and drop them into some collecting mailbox. I maybe also try to use java-mail-api to get the raw data stream and just try to ingest it as an incoming smtp connection at the master. I guess I have to tinker with either the mailetconfig or the smtp config as this will fail SPF unless I write some excemption for the slaves.

I'll report back tomorrow.

Have a good one

Matt

Am 10.03.25 um 11:21 schrieb Jean Helou:
Hi matt

On Sun, Mar 9, 2025 at 8:15 PM cryptearth <cryptea...@cryptearth.de.invalid>
wrote:

3) Multiple James instance connect to a single MariaDB server?
So, where/how are the IDs used to identify the mails, mailboxes, users,
etc. generated? I didn't found classic UNIQUE PRIMARY_KEY AUTO_INCREMENT
like in beginner books often shown.
Is it even possible to connect several James instances to one database
at once? Or will this cause synchronization issues? Can this be solved
by each James running its own mariadb server and just the databases are
synched via database replication?

The mail key is created in james in the smtp protocol handler once it
starts processing the data part of the mail see
https://github.com/apache/james-project/blob/master/server/protocols/protocols-smtp/src/main/java/org/apache/james/smtpserver/JamesDataCmdHandler.java#L54
The key is built using a combination of timestamp (millisecond) and random
20 byte value see
https://github.com/apache/james-project/blob/master/server/container/core/src/main/java/org/apache/james/server/core/MailImpl.java#L789
It is stored in the jpa repository under the column message_name
https://github.com/apache/james-project/blob/master/server/data/data-jpa/src/main/java/org/apache/james/mailrepository/jpa/model/JPAMail.java#L79
which is part of the composite primary key for this table
So technically, a key collision is possible but it is extremely unlikely.
Having several james server connect to the same datastore is quite safe.


As for the mentioned "slow performance": That's no issue as all the
slaves are supposed to do are to drop any received mails back at the
master anyway which I use to send and retrieve via IMAP. If it takes
some time for the VPS in Sydney to drop a mail back in Frankfurt - so be
it. I guess the additional roundtrips and few 100ms latency doesn't
matter in my use case.

For pure SMTP the latency is definitely not an issue. for IMAP  it could be

cheers
jean

Anyway - thanks again for your input. I'll see how I proceed from here.
Have a good one.

Matt

Am 08.03.25 um 16:12 schrieb Benoit TELLIER:
though Cassandra supports multiDC cross availability zone well

this dont' mean all Cassandra implems do

And James don't:
   - IMAP reliand on incrematal monotic counters means strong concistency
which don't play well with high latencies (2-4 rountrips)
   - multiple levels of metadata makes it inconsistencies prone if not
operated with quorum consistency - and quorum consistency means cross
availability read and writes which is a latency and throughtput show stoper.
TL DR: James distributed server can work on multiDC, but with
significant shortcomings, and only with region-wide set up, not world wide
setup
--

Best regards,

Benoit TELLIER

General manager of Linagora VIETNAM.
Product owner for Team-Mail product.
Chairman of the Apache James project.

Mail: btell...@linagora.com
Tel: (0033) 6 77 26 04 58 (WhatsApp, Signal)


On Mar 8, 2025 10:48 AM, from Jean Helou <jhe...@apache.org>Hi Matt,

This has turned into a rather long answer. The first part is more about
james in general, the second is more about your specific setup :)


As far as I'm aware James itself is stateless. I don't think you loose
counter values when you restart your main server.

This, you should be able to spin as many James instances as you want and
point them to the same storage without issues. Even if there are some
asynchronous state updates the state should eventually converge.

The difficulty is distributed storage not distributed processing.

For instance of you spin a mariadb on one or your new VPs and reload a
backup from you main mariadb the states of both databases will
immediately
start to diverge as they are unaware of each other, new messages
delivered
to your main since the backup will not be visible to the VPs, messages
read
on the vps will still appear unread on the main server.

  From there you will want to look into replication but simple
primary/secondary replication will throw errors on writes to the
secondary
making you secondary James instance fill error logs on failed writes.

The next step is multimaster replication which is something I never
tried.
The distributed james app demonstrates a fully distributed system :
including a distributed database (Cassandra), a distributed message
broker
(rabbitmq iirc), a distributed search engine (opensearch), etc.

This allows you to have as many James nodes as you want, all talking to
as
many messaging/storage nodes as you want. All fully synced and with write
semantics that offer a reasonable consistency. This is a setup that makes
sense for massive deployments. If you wanted to build the next google
mail
for example.

The use of blob storage (S3 like)  to store message contents is an
orthogonal concern. Database storage is fairly expensive compared to blob
storage.  And storing large blobs in databases while doable is usually
not
recommended, at least not without specific table design. The same is true
for message brokers.

The alternatives are storing on the file system, which is not distributed
or using a blob store.

I'm almost certain you can configure the distributed app (or build a
variant of it) that does not use blob storage but I wouldn't recommend
it.
Now, how all this applies to your setup :)

My understanding is that for now you have a single rather powerful
machine
hosting both James and mariadb. The james instance handles both SMTP and
IMAP or POP.

I'll also assume that you don't intend to start operating a multi DC
Cassandra cluster :)

Finally I'll assume the VPS are rather small at this price :)

If they are large enough to host a clone of your main Mariadb and it's
data
you can use one for a mariadb and another for James.start from a backup
of
the main Mariadb then use IMAP sync to have eventual consistency between
mailboxes on your main server and the replica.

You can go further and spread the workload of the main server too

You start a James instance configured for IMAP/POP on a couple vps
instances, keep the db config to talk to the main Mariadb.  Change your
clients config and eventually you can drop the corresponding listeners on
the main server if you want

Do the same for SMTP and put the new ones at a higher priority than the
instance running on the main server, after a while you can even stop the
main server James process entirely :)

The downside of course is increased latency both from client to vps but
also from vps to vps or to the main database server.

I hope that opens venues for exploration:)


Have fun

Le sam. 8 mars 2025 à 03:06, cryptearth <cryptea...@cryptearth.de
.invalid>
a écrit :

Hello there dear James devs and fellow James users,

my hoster OVH currently offers me a great deal on VPSs for less than 12
bucks a year (less than 1 buck per month) in several datacenters around
the world. I really tempt to get that deal as I have some ideas to
utilize multiple servers - having them around the world like in
Australia and Canada is just a bonus.
One thing I plan to implement is to setup James on each of the servers.
But then the question came up: How to synchronize them?
Currently I use my home server only as a backup without any
synchronization with my main root server. In fact: It's currently not
running due to some issues I have with my home server I have to fix
first before get James running again.
Now when scaling up to several servers around the world it would be cool
to take advantage of that by combine them with synchronization. But as
the additional systems are VPSs only I'd like to setup a master-slave
setup with each slave James on the VPSs sync up to the master James on
my powerful root server.
First I thought about fetchmail to at least pull in mails from the
slaves to the master - but fetchmail is only part of the deprecated
spring build. As I like to have my mailstorage in a database I would
like to keep using the guice-jpa build instead of switching the the
guice-distributed which doesn't use jpa and seems to be meant for use
with AWS S3 buckets.
I also could write some java code using the java mail api working in a
fetchmail way itself - but I'm unsure how to inject mails from other
servers properly into the main server so they do look like if they were
receive by the masterserver itself.
Could it be done by just synchronizing the MariaDB databases in the
background or would fiddle with the database while James is running
screw it up like the several counters for mails and mailboxes?
If James 3.x isn't suited for such a use case maybe that's something to
be considered for 4.0? Or is that too late into the current development
now and would delay a 4.0 release?

I would like to explore this idea further to see if and how James can be
used in a distributed cluster like other mailers can. Building a James
mail server cluster sounds just cool - and seen from "well, big
companies like google have several hundrets to thousands mail servers
deployed around the glob all working together" it sure has to be
possible with James as well - as broken down it's just some listeners on
some server sockets with some database backend synchronized by a message
bus. This should be extendable across multiple servers.

Have a nice weekend everyone.

Greetings from Germany,

Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org

Reply via email to