Lenient dealing with headless messages or malformed header/body separation
--------------------------------------------------------------------------

                 Key: MIME4J-58
                 URL: https://issues.apache.org/jira/browse/MIME4J-58
             Project: Mime4j
          Issue Type: Task
    Affects Versions: 0.3
            Reporter: Stefano Bagnara
             Fix For: 0.5


Define how to deal with non canonical messages like this one:
-----------------------
This is a simple message not having headers.
The whole text should be recognized as body.
-----------------------
or this one:
-----------------------
Subject: this is a subject
This is an invalid header
AnotherHeader: is this an header or the first part of the body?

Body text
-----------------------

In the first case mime4j output twice an  "invalid header" error and a 
roundtrip write result in an empty message.
In the SMTP case this is unfortunate because sometimes it happens messages are 
sent without header.

In the second case mime4j currenlty take Subject and AnotherHeader as headers 
and "This is an invalid header" raise a monitor for "invalid header" and "Body 
text" is considered the body.

A compromise we evaluated in past between compliance, leniency and performace 
was to "alter" the requirement for CRLFCRLF between headers and body with a 
different rule: if during parsing of the headers we find a line (not multiline) 
and not including an "HeaderName: something" then we virtually add a CRLF 
*before* that line and consider that line the first line of the body. This 
allow us to only buffer a single line (as opposite to parsing the whole message 
in search of a CRLFCRLF and consider the full message a body if no CRLFCRLF is 
found) and to be very lenient with input. The "side effect" (maybe not bad) is 
that a wrong header in the middle of headers will result in some headers moved 
to the body.

With this algorythm the above would be "virtually" parsed as it was:
-----------------------

This is a simple message not having headers.
The whole text should be recognized as body.
-----------------------
or this one:
-----------------------
Subject: this is a subject

This is an invalid header
AnotherHeader: is this an header or the first part of the body?

Body text
-----------------------

If we think about strict and lenient approaches I think that current mime4j 
result is ok when using a strict parsing, while the one I propose is a good 
lenient alternative.

Opinions? Alternatives?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to