Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Oleg Kalnichevski Thu, 17 Jul 2008 10:16:09 -0700

Stefano Bagnara wrote:

Stefano Bagnara ha scritto:
I noticed that at a point in past the EOLConvertingInputStream hasbeen removed from the chain.
I think this create issues when we parse an input file having only \nand write it in output.
- It seems that we parse most of the code only checking for \n (whatdoes it happen when instead there are only \r? what should we do?)

As far as I know a single CR is not used as a valid line delimiteranywhere. Please correct me if I am wrong.

- If the message have only newlines it seems mime4j ends up outputtingheaders with CRLF and body with LF.

Why is it a problem? Headers serve a specific role. They convey metadataabout a content body. The transport aspects of metadata are irrelevant,whereas one _usually_ does not want to a content body to go through aprocess of unnecessary transformation.

- If the input message have CR ending lines they are not considered bymime4j.
IMHO either we accept LF, CR, and CRLF as CRLF or we only accept CRLF.


I respectfully disagree.

If we do that we have to take care of encoded nested messages: theycould have again LF, CR and CRLF like the top stream.
What is the right approach? Should we add a EOLConvertingInputStream(CONVERT_BOTH) to every level of parsing or should we fail to parsemessages with bad newlines?
I don't like the current behaviour where we accept some malformed data(LF alone are considered CRLF from our parser), we change some of them(the one between headers are converted to CRLF) and we still outputmalformed data.
Opinions?
I tried this patch and it seems to work fine (even if it breaks one ofour core tests that do not expect a CR in an header to be considered anewline):

Not only does this change completely reverts the performance gains andmakes the whole refactroring exercise completely pointless due to anutterly inefficient implementation of EOLConvertingInputStream, it isalso conceptually wrong (in my humble opinion), as it causes mime4j tocorrupt 8bit encoded 'application/octet-stream' content. This basicallyrenders mime4j incompatible with commons browsers and HttpClient

If you commit this change could you please provide an option to excludeEOLConvertingInputStream filter?


Thank you

Oleg

Index: src/main/java/org/apache/james/mime4j/MimeEntity.java
===================================================================

--- src/main/java/org/apache/james/mime4j/MimeEntity.java (revision677582)

+++ src/main/java/org/apache/james/mime4j/MimeEntity.java    (working copy)
@@ -197,7 +197,7 @@
         InputStream instream;
         if (MimeUtil.isBase64Encoding(transferEncoding)) {
             log.debug("base64 encoded message/rfc822 detected");
-            instream = new Base64InputStream(dataStream);

+ instream = new EOLConvertingInputStream(newBase64InputStream(dataStream));

         } else if (MimeUtil.isQuotedPrintableEncoded(transferEncoding)) {
             log.debug("quoted-printable encoded message/rfc822 detected");
             instream = new QuotedPrintableInputStream(dataStream);
Index: src/main/java/org/apache/james/mime4j/MimeTokenStream.java
===================================================================

--- src/main/java/org/apache/james/mime4j/MimeTokenStream.java(revision 676846)+++ src/main/java/org/apache/james/mime4j/MimeTokenStream.java(working copy)

@@ -143,7 +143,7 @@

     private void doParse(InputStream stream, String contentType) {
         entities.clear();
-        rootInputStream = new RootInputStream(stream);

+ rootInputStream = new RootInputStream(newEOLConvertingInputStream(stream));inbuffer = new BufferedLineReaderInputStream(rootInputStream, 4* 1024);

         switch (recursionMode) {
         case M_RAW:


IIRC the EOLConvertingInputStream was removed because of performance issue.

Stefano

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Reply via email to