On Sat, Jan 3, 2009 at 8:15 PM, Markus Wiederkehr
<markus.wiederk...@gmail.com> wrote:
> On Wed, Dec 24, 2008 at 7:21 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
>> Folks
>>
>> I took liberty to commit an ultra-simple benchmark I use for testing
>> performance of the MIME stream parser.
>>
>> http://svn.apache.org/viewvc?view=rev&revision=729347
>>
>> Feel free to improve / extend / remove if useless.
>
> I have extended the class a bit. It is now possible to choose from
> four different tests.
>
> Test 0 is the one Oleg wrote. It reads from a MimeTokenStream until
> its end is reached.
> Test 1 uses a MimeStreamParser and reports to an empty AbstractContentHandler.
> Test 2 uses a MimeStreamParser and reports to an empty SimpleContentHandler.
> Test 3 creates Message objects in memory.
>
> On my machine the results are:
> Test 0: ~ 8 sec
> Test 1: ~ 8 sec
> Test 2: ~ 41 sec
> Test 3: ~ 47 sec
>
> So it looks like parsing the header fields consumes about 80 percent of test 
> 2.
>
> The difference between #2 and 3 is probably caused by copying the
> message bodies into Storage objects.
>
> Maybe the header fields should be parsed lazily?

IIRC there are a few wrinkles with this (at least some need to be
parsed and some care need to be taken with folded values) but i think
only structural headers really need to be parsed on the first pass.

> Does anybody have a better idea?

(this one isn't really a better idea but it's a little different so
i'll throw it out there and see what happens...)

the minimal useful MIME parser would read just the structural headers
and the boundaries: dividing the stream into header lines and body
parts without unnecessary parsing of the contents.

the generalised use case i have in mind is streaming into storage.
this use case occurs naturally when dealing with mail protocols but
has other applications (for example, in CMRs).

1. a MIME message starts to be delivered to a socket
2. the protocol processor feeds the stream to a parser
3. the processor analyzes the boundaries streams head lines and body
parts to permanent storage without unnecessary semantic parsing of the
meta-data
4. when the message is complete, the processor continues to parse the
incoming stream

one problem with full DOMs (as used by JavaMail) is that large MIME
documents are too big to fit in memory. this causes problems for
protocols server. a structural DOM (maintaining at most meta-data in
memory whilst allowing access to content through streams) backed by
storage would be much more useful in this case.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to