RE: Proposal: new approach to spooling

Noel J. Bergman Mon, 12 Jun 2006 11:43:49 -0700

Steve Brewin wrote:

> I think there is a lot of merit in coming up with a new queueing mechanism


> we should explain the benefits any proposed change is seeking to achieve.

Considering the amount of available "free time", there had better be some
serious benefits, no?  ;-)

> might these be support for distributed operation

Yes.  Absolutely.

> integration into other service oriented architectures?

Somewhat.  Possibly not arguable as a primary goal.  And we may also foster
the use of the Mailet API well beyond JAMES.

> > Concepts:
> >   - Each processor is a named queue entry.

Our core architecture for the mailet pipeline would be message-based,
reusing well-established patterns from distributed queuing platforms, such
as MQ, JMS and others.  The use of a named queue is basically what we have
today: each processor is named.  I also assert that processor names are
locally scoped references, disassociated from real resources.  So if we have
a distributed scenario, we may end up with something like:

  <processor name="my-remote-processor" class="QAlias">
    <queue class="JMSQueue">
      <queuefactory>jms/myQF</queuefactory>
      <queuename>jms/myQ</queuename>
    </spoolmanager>
  </processor>

  <processor name="another-remote-processor" class="QAlias">
    <queue class="JDBCQueue">
      <datasource>jdbc/myDS</datasource>
    </queue>
  </processor>

  <processor name="mq-remote-processor" class="QAlias">
    <queue class="MQLink">
      <queuename>mQueue</queuename>
    </spoolmanager>
  </processor>

  <processor name="root">
    <queue jndi="queue/myRoot">
      ...
    </queue>
    <mailet ...>
  </processor>

Those are just quick and not necessarily complete examples, but basically
they define local processors so that we can do a put operation locally
without having to know the network topology, nor the real queue resource
name, and the local processor is just there to define in one place the queue
technology and address required.  Notice how we can mix and match entirely
different queuing technologies, since the queue manager is responsible for
providing put and get operations for both the processor (consumer) and any
senders.  The QAlias class, in this case, wouldn't do much, but it would
complain if there were a pipeline actually defined here.  The final example
defines a processor that is locally addressed, processes locally, but might
also be remotely addressable.  For that matter, any of the QAlias examples
could be defining an alias to that processor, since we don't know from this
textual context what context.lookup("queue/myRoot") will instantiate.

And a cool thing is that in an incoming message protocol handler, we could
have:

  <queue jndi="queue/myRoot"/>

and that would define the root processor for that particular message
handler.  You could have separate root servers for SMTP vs SMTPS, for
example.  This begs the question of what to do about
MailetContext.sendMail() within the pipeline, which starts at the implicit
root processor.  I see that as implementation-specific, and somewhat ripe
for discussion.

This is a bit primative, albeit not dissimilar from MQSeries.  We can
improve upon it, e.g., by looking up queue managers -- implementation and
all -- from JNDI as shown, but I am trying to not assume that every
implementation will have JNDI or JMS or JDBC pervasive throughout the
system.

> >   - A queue entry would normally contain a JAMES Mail
> >     object.

No real change from what we have today.  Just identifying the players in the
architecture.

>   - Each processor [defines] a transaction.

This is a key concept.  We are supposed to behave this way, but we have
failure scenarios today because we do not have transactional behavior in
JAMES.  So I'm defining the transaction boundary.  The processor is the
transaction.  Either everything completes successfully or nothing does.  In
the event of a failure, the get operation rolls back so that the message is
available to be processed again.

>   - Each processor is associated with a queue manager
>     and, optionally, a retry schedule.

This takes what we had to do in RemoteDelivery, and generalizes it.  For
example, what happens if [clamd | DNS | spamd] is not available?  We can
queue up and wait for the service to become available.  Perhaps we might
want to add something to allow notification (think queue events for those of
you who know MQ), but the real issue is that every processor can be made
more reliable.

I am a bit surprised that this is an area that Stefano asked about, because
one of the earliest messages from him that I recall was about wanting
multiple spoolers because he wanted finer grained control over threads
available to specific processors.  Perhaps he is wondering why I didn't
express things as:

  <spoolmanager>
    <processor>
      ...
    </processor>
  </spoolmanager>

For one thing, the processor is more the mental focus for an administrator.
But in addition, the spoolmanager, at this level of discourse, would not
have multiple queues (and thus not multiple processors), unless we did
something like:

  <spoolmanager>
    <processor name="myprocessor">
      <queue binding="..."/>
      ...
    </processor>

    <processor name="anotherprocessor">
      <queue binding="..."/>
      ...
    </processor>
  </spoolmanager>

Which gets us back to what I expressed.  Recalling that processors are the
named targets, and therefore are what is logically attached to a queue, and
that the queue manager is the entity bridging the processor and the queue,
it seems to make the most sense to describe it as I have in the proposal.
But that's why we post these things for discussion.

And, yes, the queue manager would continue to be responsible for calling the
processor to handle each message.  Each queue manager would be registered
with the MailetContext, which would be provided to the processor in order to
allow it to put messages set to a new processor (if we keep the currently
Mailet API).  We might provide a suitable error or exception on the
Mail.setState call if we try to address a queue (processor) that does not
exist.

> >   - I believe that a queue implementation independent
> >     scheduler that provides the next time at which a
> >     message should be processed may be sufficient.
> >     Each queue entry would carry a timestamp before
> >     which it should not be processed.  "Restarting"
> >     the queue would be as simple as changing that
> >     timestamp entry.

We've often wanted a nice way to restart a message, and I've already
described the use of retrying for more than just RemoteDelivery.  I do feel
that even though the code providing the schedule can be independent of the
queue implementation, the implementation of how the query is implemented
goes with the spool manager in order to facilitate optimization of that
process for the underlying technology.

> >   - A new RETRY Mail state can be set to rollback the
> >     transaction and put the Mail back into the queue.
> >     We should decide on commit and rollback semantics.

> >   - The processor acquires a new attribute that explicitly
> >     sets the fall-through state.  The default shall be the
> >     new RETRY state, except for messages that exhause the
> >     retry schedule.

This is just extending the current Mailet API semantics to allow the Mailet
to express the need for a scheduled retry, and it defaults to doing a RETRY
instead of a GHOST if we fall off the end of a processor, which seems safer.
Plus I made the fallthrough state configurable, which seems a nice little
win.  If we do express the operation differently, that's fine, too.

> > one might implement a processor as an MDB.

Actually, that is wrong.  The queue manager is responsible for taking things
off of the queue, and therefore the MDB would be part of that package.  The
processor should be independent of, and reusable with, any queue
implementation.

Oh, and if you really want to have some fun, consider that except where the
current API does refer to Mail (as in Mailet API and Mail object), nothing
in the above says anything about mail.  It is just about defining queues,
transaction, workflow and processoing for messages.

So this is a bit more discussion of what I have in mind, and why.

        --- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Proposal: new approach to spooling

Reply via email to