Oleg Kalnichevski ha scritto:
Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
On Fri, 2008-07-18 at 16:19 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
On Fri, 2008-07-18 at 14:45 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
On Fri, 2008-07-18 at 10:58 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
On Thu, 2008-07-17 at 20:21 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
Stefano Bagnara wrote:
...

As I said the strict mode would only be useful to users of mime4j wanting to use mime4j as a validator to check RFC compliance. You know, mime4j born for SMTP, but now you need it for HTTP and someone else may want to do a validator. So let's not keep our eyes closed once again.

OK, I fail to see any practical benefit of that aside from a nice warm
feeling about being 100% compliant, but I admit I am biased.

Anyways, let's talk code now. How about this?

(1)

interface LineDelimiterStrategy {

 boolean isNewLine(char ch1, char ch2) // both can be -1
    throws MimeException;

}

One can provide MimeTokenStream with an implementation of this interface
at the construction time. MimeTokenStream it its turn passes a
reference to that class to all parser components that need to deal with
line delimiters.
I'm not sure I understand what are the 2 params passed to isNewLine and what code will invoke this service.

2 consecutive characters read from the data stream or -1 if any of those characters is not available.
so "a\r\nb" would result in the calls:
isNewLine(-1,'a');
isNewLine('a','\r');
isNewLine('\r','\n');
isNewLine('\n','b');
isNewLine('b',-1);
is this correct? What would be the result for the 5 above from the implementation that will be fine in HTTP?


Anything that allows:

line delimiter = (LF|CRLF)

I understood this, but I'm not following you on how your do this with the Interface you was proposing. Given your rule you have true on the 3rd and the 4th call? Wouldn't this result in 2 newlines?


I do not think so, only a sequence with ch2 = '\n' would be considered a valid line delimiter. I realized, though, the problem with this interface is that it implied a one byte read I had thought we wanted to get rid of.

I understand it now, thank you!

(2) The issue of CR / LF handling in content bodies should be taken of
when formatting output, _not_ when parsing input.

Would that work for you?
I'm not sure this is enough.
In output we format what we parser: if we parsed the input as multiple lines then we output multiple lines, otherwise we output a single line. So it is during parsing that we have to decide whether an isolated LF is a newline delimiter or not.
But mime4j does not parse _content bodies_ as multiple lines, does it?
TextBody.getReader()


But that does not necessarily imply parsing into multiple lines, does
it? Anyways, I perfectly am fine with TexyBody automatically converting
line delimiters. IMHO this is the right place to do the conversion, but
not the MimeTokenStream

You are right, the Reader does not imply line parsing, but anyway somewhere we have to deal with lines. Mime4J basic classes (the whole LineReaderInputStream hierarchy) have indeed a readLine method. This just made me realize that the internal buffer is filled with lines and that sending a very long binary make mime4j die with OOM.

No, it would not. Binary content is not read line by line. The #readLine method is only used when parsing metadata (header fields), where we do need to put a cap on the max line length, as discussed before.

My fault: I had code casting to LineReaderInputStream and using readLine to get the content, but the method indeed returned me only an InputStream and there is no way to throw the OOM without using a cast.

About the line length limit we really need it: a random sequence of non-LF chars currently make our code to throw an OOM.

Stefano

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to