API Boundaries

David Leangen Sat, 16 May 2020 23:52:27 -0700

Hi,


Sorry for the barrage of emails, but I hope to get help from the community. I 
hope the benefits of what I plan to contribute will be worth the efforts that 
you make to answer my questions. Thanks for bearing with me. :-)


I wanted to ask about API boundaries. As I mentioned in a different email 
thread, I think that:

> the organization of the Guice Modules is perhaps THE most important 
> abstraction available to allow people to understand the system.
> 
> Ideally, to help provide a better understanding of the system and its 
> compile-time (and even to some extent its runtime) organization, I think it 
> is important to:
> 
> […]
> 
> * Ensure that each Module is well-contained (i.e. no “leaks” or coupling to 
> other implementations)
> —> I found this part to be quite problematic

I noticed what, to me, is a problem with API boundaries. I would like to use 
Cassandra as my example.

My shallow understanding of James is that it requires a module to store emails. 
If I understand correctly, that is the “Maildir" module, which can be 
implemented in many different ways (filesystem, RDB, Cassandra, in-memory…). If 
my understanding is correct, then I think the concept is quite simple. If the 
simplicity of the concept can be maintained in the code, then the system should 
in principle remain easy to understand.

The whole point of a framework like Guice is to separate the API from its 
implementations. All that James should care about is that it has a Maildir 
instance, and for all intents and purposes, it shouldn’t care at all about 
which implementation is has.

That is where the resolution/writing comes into play. With Guice, this happens 
to be done statically at compile time in Java code (and of course executed at 
runtime). Nothing fancy, but that’s quite ok as it gets the job done. I like 
having that configuration in code because it makes things like documentation 
and refactoring easier. Thanks to this type of DI approach (which requires 
clean separation of API from implementation code) for an application assembler, 
it should be trivial to swap out an RDB implementation of a Maildir for a 
Cassandra implementation of a Maildir.

Now, because James is a complex system, there are actually several APIs. For 
instance, by inspecting the code I gather that the EventStore also requires an 
implementation, and one of those implementations is Cassandra.

If the API is well-designed, then it should be very easy to swap 
implementations:

 Mailbox (interface)
   MailboxAImpl
   MailboxBImpl
   MailboxCImpl
   ...

EventStore
   EventStoreAImpl
   EventStoreBImpl
   EventStoreCImpl
   ...

If Cassandra could implement both of these, then this configuration could be 
possible:

Mailbox
 Mailbox (interface)
   MailboxAImpl
   MailboxBImpl
   CassandraMailbox
   ...

EventStore
   EventStoreAImpl
   EventStoreBImpl
   CassandraEventStore
   …


However, this configuration should also be possible:

Mailbox
 Mailbox (interface)
   CassandraJames

EventStore
   CassandraJames (yes! the same instance as above)


In other words, the “CassandraJames” Module could very well implement more than 
one API. There is absolutely nothing wrong with an implementation’s 
implementing multiple APIs.

Actually, ANY implementation, including the Cassandra implementation, does not 
necessarily need to reside in the same code base. It should ideally be packaged 
into a single JAR that gets dropped into the framework. That means that ideally 
there should be a single CassandraJames jar that is used to wire up the system. 
Perhaps all of the APIs will be used, but maybe not. Doesn’t matter. The point 
is to make the system easy to understand and easy to wire up.

However, I could be wrong, but in the current code base, there appears to be 
bits of “Cassandra” code all over the place.


In any case, this is just one example. The point is that I get the feeling that 
the Modules are not clearly defined, and the implementations are leaking into 
different places instead of keeping cohesive code together. Or rather, the 
cohesiveness is not being defined along the same dimensions, which IMO does not 
seem quite right.

The result is that the code base does not seem to match well the high-level 
system concepts, which makes the code very difficult to understand even if the 
concepts behind James are not so difficult.


So my question is: what are the thoughts behind how the APIs are designed? Is 
there any documentation or something that can help me understand? Right now I 
can only base my statements on my impressions, and my impression could be 
completely wrong.

My impression is that the important James concepts which require a 
well-designed API are:

 * Mailbox (is this the same as “backend”??? and what about “blob”??? and 
“data”???)
 * EventStore (is this always required???)
 * Mailet
 * Server
 * Protocols (and the various flavours)
 * Admin (implemented by CLI and API)
 * Queue (or is this part of “server”???)

These do not include the external projects, which do seem to be well defined as 
separate components:

 * Mime4j
 * jSieve
 * jSPF
 * jDKIM

Did I miss any other important component of the system (excluding Hupa)?


Thanks!
=David


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

API Boundaries

Reply via email to