Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Stefano Bagnara Fri, 18 Jul 2008 05:45:57 -0700

Oleg Kalnichevski ha scritto:

On Fri, 2008-07-18 at 10:58 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
On Thu, 2008-07-17 at 20:21 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
Stefano Bagnara wrote:
...
E.g: I'm slowly coming to a possible proposal about parsing.
- strict mode: no conversion is done, a CR or LF in headers (or othernon 7bit content) make mime4j fail parsing.
- permissive modes:
- default binary: no conversion happen, isolated CR and LF areaccepted everywhere but not considered newlines (as like as other 8bitbytes), the default content-transfer-encoding is "binary" when notspecified (7bit, 8bit and binary are read as binary).- default text: we convert isolated CR and LF to CRLF almosteverywhere but in "binary" content-transfer-encoding parts.I'm not proposing this yet (not sure this is enough and we don't needmore granular tweakings), but this is something I'm evaluating rightnow... The strict mode is desiderable to have, but less important thanthe permissive parsing (we want to be strict in output, not in input).OTOH someone may want to use mime4j for validating if a content iswellformed or not (wrt RFC) and in this case a strict mode would benecessary.
Stefano
Stefano,

With all due respect but I see strict handling of line delimiters as
_pointless_ orthodoxy that really does not help anyone. Would you really
ship an application to a client of yours that rejects a message as
invalid because it contains a lone LF in it? So what is the _point_ of
being strict about line delimiters?

As I said the strict mode would only be useful to users of mime4jwanting to use mime4j as a validator to check RFC compliance. You know,mime4j born for SMTP, but now you need it for HTTP and someone else maywant to do a validator. So let's not keep our eyes closed once again.

Anyways, let's talk code now. How about this?

(1)

interface LineDelimiterStrategy {

 boolean isNewLine(char ch1, char ch2) // both can be -1
        throws MimeException;

}

One can provide MimeTokenStream with an implementation of this interface
at the construction time. MimeTokenStream it its turn passes a
reference to that class to all parser components that need to deal with
line delimiters.

I'm not sure I understand what are the 2 params passed to isNewLine andwhat code will invoke this service.

(2) The issue of CR / LF handling in content bodies should be taken of
when formatting output, _not_ when parsing input.

Would that work for you?


I'm not sure this is enough.

In output we format what we parser: if we parsed the input as multiplelines then we output multiple lines, otherwise we output a single line.So it is during parsing that we have to decide whether an isolated LF isa newline delimiter or not.This isssue is very related to charset: when you read a content you haveto deal with charset during parsing, you cannot do that duringformatting output. So if you find something invalid for that charset youhave to deal with it during parsing and not during output formatting.

I think this document (excerpt from RFC1521) is key to create an opinionabout the best approach:http://www.math-inf.uni-greifswald.de/~teumer/mime/1521/Appendix_G.html

I hope we get some more opinion from other contributor so we havemultiple "interpretation" of what is the best thing to do, too.


Stefano

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Reply via email to