Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Stefano Bagnara Fri, 18 Jul 2008 08:46:04 -0700

Oleg Kalnichevski ha scritto:

On Fri, 2008-07-18 at 16:19 +0200, Stefano Bagnara wrote:

Oleg Kalnichevski ha scritto:
On Fri, 2008-07-18 at 14:45 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
On Fri, 2008-07-18 at 10:58 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
On Thu, 2008-07-17 at 20:21 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
Stefano Bagnara wrote:
...
As I said the strict mode would only be useful to users of mime4jwanting to use mime4j as a validator to check RFC compliance. You know,mime4j born for SMTP, but now you need it for HTTP and someone else maywant to do a validator. So let's not keep our eyes closed once again.
OK, I fail to see any practical benefit of that aside from a nice warm
feeling about being 100% compliant, but I admit I am biased.
Anyways, let's talk code now. How about this?

(1)

interface LineDelimiterStrategy {

 boolean isNewLine(char ch1, char ch2) // both can be -1
        throws MimeException;

}

One can provide MimeTokenStream with an implementation of this interface
at the construction time. MimeTokenStream it its turn passes a
reference to that class to all parser components that need to deal with
line delimiters.
I'm not sure I understand what are the 2 params passed to isNewLine andwhat code will invoke this service.
2 consecutive characters read from the data stream or -1 if any of those
characters is not available.
so "a\r\nb" would result in the calls:
isNewLine(-1,'a');
isNewLine('a','\r');
isNewLine('\r','\n');
isNewLine('\n','b');
isNewLine('b',-1);
is this correct? What would be the result for the 5 above from theimplementation that will be fine in HTTP?


Anything that allows:

line delimiter = (LF|CRLF)

I understood this, but I'm not following you on how your do this withthe Interface you was proposing.Given your rule you have true on the 3rd and the 4th call? Wouldn't thisresult in 2 newlines?

(2) The issue of CR / LF handling in content bodies should be taken of
when formatting output, _not_ when parsing input.

Would that work for you?
I'm not sure this is enough.
In output we format what we parser: if we parsed the input as multiplelines then we output multiple lines, otherwise we output a single line.So it is during parsing that we have to decide whether an isolated LF isa newline delimiter or not.
But mime4j does not parse _content bodies_ as multiple lines, does it?
TextBody.getReader()
But that does not necessarily imply parsing into multiple lines, does
it? Anyways, I perfectly am fine with TexyBody automatically converting
line delimiters. IMHO this is the right place to do the conversion, but
not the MimeTokenStream

You are right, the Reader does not imply line parsing, but anywaysomewhere we have to deal with lines.Mime4J basic classes (the whole LineReaderInputStream hierarchy) haveindeed a readLine method. This just made me realize that the internalbuffer is filled with lines and that sending a very long binary makemime4j die with OOM. We can fix this OOM during standard parsing byhaving an hard limit on the size (and throwing exception otherwise) butwe have to do this differently during the streaming of "binary" encodedparts (line reading makes no sense there).

Furthermore, at the very minimum we have a RootInputStream only countinglines if they are CRLF terminated. It seems weird that we count linesonly if their are CRLF terminated but we recognize them also if they areLF ending (this is one more issue to be taken in consideration, not theone we was talking about).


Stefano

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Reply via email to