On Sat, Jan 3, 2009 at 8:15 PM, Markus Wiederkehr <markus.wiederk...@gmail.com> wrote: > On Wed, Dec 24, 2008 at 7:21 PM, Oleg Kalnichevski <ol...@apache.org> wrote: >> Folks >> >> I took liberty to commit an ultra-simple benchmark I use for testing >> performance of the MIME stream parser. >> >> http://svn.apache.org/viewvc?view=rev&revision=729347 >> >> Feel free to improve / extend / remove if useless. > > I have extended the class a bit. It is now possible to choose from > four different tests. > > Test 0 is the one Oleg wrote. It reads from a MimeTokenStream until > its end is reached. > Test 1 uses a MimeStreamParser and reports to an empty AbstractContentHandler. > Test 2 uses a MimeStreamParser and reports to an empty SimpleContentHandler. > Test 3 creates Message objects in memory. > > On my machine the results are: > Test 0: ~ 8 sec > Test 1: ~ 8 sec > Test 2: ~ 41 sec > Test 3: ~ 47 sec > > So it looks like parsing the header fields consumes about 80 percent of test > 2. > > The difference between #2 and 3 is probably caused by copying the > message bodies into Storage objects. > > Maybe the header fields should be parsed lazily?
IIRC there are a few wrinkles with this (at least some need to be parsed and some care need to be taken with folded values) but i think only structural headers really need to be parsed on the first pass. > Does anybody have a better idea? (this one isn't really a better idea but it's a little different so i'll throw it out there and see what happens...) the minimal useful MIME parser would read just the structural headers and the boundaries: dividing the stream into header lines and body parts without unnecessary parsing of the contents. the generalised use case i have in mind is streaming into storage. this use case occurs naturally when dealing with mail protocols but has other applications (for example, in CMRs). 1. a MIME message starts to be delivered to a socket 2. the protocol processor feeds the stream to a parser 3. the processor analyzes the boundaries streams head lines and body parts to permanent storage without unnecessary semantic parsing of the meta-data 4. when the message is complete, the processor continues to parse the incoming stream one problem with full DOMs (as used by JavaMail) is that large MIME documents are too big to fit in memory. this causes problems for protocols server. a structural DOM (maintaining at most meta-data in memory whilst allowing access to content through streams) backed by storage would be much more useful in this case. - robert --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org