Noel J. Bergman ha scritto: > Robert Burrell Donkin wrote: > >> the disadvantage with using a byte array rather than a bytebuffer is >> that direct bytebuffers would have to copy their data out into a byte >> array. using a byte buffer at the lowest level would solve this issue >> without really an added overhead for the bio case (just create a byte >> array backed buffer and then fill that buffer from the inputstream). > > Given my background in real-time, embedded, systems, I'd like to see us > improving performance, and doing a lot less movement of data. So I'm in > favor of changes that reduce data movement. > > Here's a question for the lot of you: is this similar to DOM vs SAX, and if > so, can we come up with a StAX solution? Just go with the analogy, but the > issue is a best of both worlds. > > --- Noel
I think that comparing this design issue to the existing XML parsing/manipulating APIs may help (At least this helps me, and I see you refer to it, too) Manipulating a mime document is not so different from Manipulating an xml document. The key here is that, like for XML parser, there is not a best solution for every use case. If you need backward navigation than you can't use StAX or SAX, if you want to manipulate the XML at the source then your only option is DOM. If you are mainly interested in transformation then TrAX could be the best solution. StAX is the best of both worlds only if you need SAX features but you want a simpler interface, but using StAX you can't alter the original "document". As far as I understand Robert Cursor interface *is* the "StAX like" solution. After so many years I think no API let you do all you can do via the DOM: the new APIs simply removed features to give us much better performances. Let's also take into consideration the main difference between "generic" XML documents and MIME documents: MIME most time will have a smaller structure tree, but bigger size for parts than most XML documents. E.g: to manage a lazy loaded MIME skeleton in memory is feasible while keeping small memory usage. Another difference is that most time XML parsers do not need to know how the exact source looked like, while we identified use cases where the original text is very important (S/MIME). Can you see other big differences between XML and MIME in this comparison? The problem is, IMHO, that we already have a lot of very different use cases for a MIME document "handling" library: A. SMTPServer receiving message: receive data asynchronously via NIO buffers and be able to create MIME events without blocking (push api). To avoid big memory usage we need streaming here and a lot of processing should be done "on the fly" *while* the message is temporarily saved in the queue (anti-spam, content filtering, size limit, other). B. IMAPServer sending data to a client: needs a fast way to know the message structure and be able to retrieve each part without streaming the whole message, but being able to stream (without blocking) a part (or the full message) to the client (SEDA). C. Mailets altering messages: most mailets will probably like to alter the message in a DOM like way. getParts, addAttachment, removeAttachment and similar things. Again we should find a way to avoid loading big parts in memory when not needed. Few mailets may prefer a transformation api based on an inputstream/outputstram solution. The 3 above are 3 use cases for our "parsing" needs, but we also have to take into consideration that there are 2 very different source "transports" for this documents: 1. "The Internet": a BIO or NIO connection streaming the data in our direction. 2. "The local storage": a file, a database record (an abstraction of the file access), a JCR node (an abstraction of one of the previous). The BIG difference between the 2 is that the former give us a "one-shot" document (We can't go back in the document once we received it) while the latter allow us to read it as many times as we like. Another big aspect of "transports" is the BIO vs NIO and this applies both to "the internet" and "the local storage" The last aspect is the Synchronous processing vs Asynchronous processing and the related patterns (thread pools, SEDA...). (e.g: the SAX-style probably better fits the SEDA scenario than any other XML api). My conclusion is that we can add as much complexity as we want to this design discussion but we'll probably never find a single API for all of the use cases we can think of, and probably we can't even find a single API that will be the best one for our *current* use cases. Maybe the wrong thing is to try to create an API to accomplish so different tasks like removing/adding an attachment to a MIME document stored in a local random access file, fast failing when an SMTP-incoming MIME message has an attachment bigger than 1MB, converting an 8bit message to 7bit and viceversa. Maybe it is better to create separate APIs and simply reuse code under the hood. Well, this was much more than my 2 cents, Stefano --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
