Re: [Mime4J] Support For Non-Streaming Inputs

Stefano Bagnara Mon, 24 Sep 2007 12:36:01 -0700

Noel J. Bergman ha scritto:
> Robert Burrell Donkin wrote:
> 
>> the disadvantage with using a byte array rather than a bytebuffer is
>> that direct bytebuffers would have to copy their data out into a byte
>> array.  using a byte buffer at the lowest level would solve this issue
>> without really an added overhead for the bio case (just create a byte
>> array backed buffer and then fill that buffer from the inputstream).
> 
> Given my background in real-time, embedded, systems, I'd like to see us
> improving performance, and doing a lot less movement of data.  So I'm in
> favor of changes that reduce data movement.
> 
> Here's a question for the lot of you: is this similar to DOM vs SAX, and if
> so, can we come up with a StAX solution?  Just go with the analogy, but the
> issue is a best of both worlds.
> 
>       --- Noel


I think that comparing this design issue to the existing XML
parsing/manipulating APIs may help (At least this helps me, and I see
you refer to it, too)

Manipulating a mime document is not so different from Manipulating an
xml document.

The key here is that, like for XML parser, there is not a best solution
for every use case. If you need backward navigation than you can't use
StAX or SAX, if you want to manipulate the XML at the source then your
only option is DOM. If you are mainly interested in transformation then
TrAX could be the best solution.

StAX is the best of both worlds only if you need SAX features but you
want a simpler interface, but using StAX you can't alter the original
"document".

As far as I understand Robert Cursor interface *is* the "StAX like"
solution.

After so many years I think no API let you do all you can do via the
DOM: the new APIs simply removed features to give us much better
performances.

Let's also take into consideration the main difference between "generic"
XML documents and MIME documents: MIME most time will have a smaller
structure tree, but bigger size for parts than most XML documents. E.g:
to manage a lazy loaded MIME skeleton in memory is feasible while
keeping small memory usage.
Another difference is that most time XML parsers do not need to know how
the exact source looked like, while we identified use cases where the
original text is very important (S/MIME).
Can you see other big differences between XML and MIME in this comparison?

The problem is, IMHO, that we already have a lot of very different use
cases for a MIME document "handling" library:

A. SMTPServer receiving message: receive data asynchronously via NIO
buffers and be able to create MIME events without blocking (push api).
To avoid big memory usage we need streaming here and a lot of processing
should be done "on the fly" *while* the message is temporarily saved in
the queue (anti-spam, content filtering, size limit, other).

B. IMAPServer sending data to a client: needs a fast way to know the
message structure and be able to retrieve each part without streaming
the whole message, but being able to stream (without blocking) a part
(or the full message) to the client (SEDA).

C. Mailets altering messages: most mailets will probably like to alter
the message in a DOM like way. getParts, addAttachment, removeAttachment
and similar things. Again we should find a way to avoid loading big
parts in memory when not needed. Few mailets may prefer a transformation
api based on an inputstream/outputstram solution.

The 3 above are 3 use cases for our "parsing" needs, but we also have to
take into consideration that there are 2 very different source
"transports" for this documents:

1. "The Internet": a BIO or NIO connection streaming the data in our
direction.

2. "The local storage": a file, a database record (an abstraction of the
file access), a JCR node (an abstraction of one of the previous).

The BIG difference between the 2 is that the former give us a "one-shot"
document (We can't go back in the document once we received it) while
the latter allow us to read it as many times as we like.

Another big aspect of "transports" is the BIO vs NIO and this applies
both to "the internet" and "the local storage"

The last aspect is the Synchronous processing vs Asynchronous processing
and the related patterns (thread pools, SEDA...).  (e.g: the SAX-style
probably better fits the SEDA scenario than any other XML api).

My conclusion is that we can add as much complexity as we want to this
design discussion but we'll probably never find a single API for all of
the use cases we can think of, and probably we can't even find a single
API that will be the best one for our *current* use cases.

Maybe the wrong thing is to try to create an API to accomplish so
different tasks like removing/adding an attachment to a MIME document
stored in a local random access file, fast failing when an SMTP-incoming
MIME message has an attachment bigger than 1MB, converting an 8bit
message to 7bit and viceversa. Maybe it is better to create separate
APIs and simply reuse code under the hood.

Well, this was much more than my 2 cents,
Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Mime4J] Support For Non-Streaming Inputs

Reply via email to