Oleg Kalnichevski ha scritto:
On Fri, 2008-07-18 at 10:58 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
On Thu, 2008-07-17 at 20:21 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
Stefano Bagnara wrote:
...
E.g: I'm slowly coming to a possible proposal about parsing.
- strict mode: no conversion is done, a CR or LF in headers (or other
non 7bit content) make mime4j fail parsing.
- permissive modes:
- default binary: no conversion happen, isolated CR and LF are
accepted everywhere but not considered newlines (as like as other 8bit
bytes), the default content-transfer-encoding is "binary" when not
specified (7bit, 8bit and binary are read as binary).
- default text: we convert isolated CR and LF to CRLF almost
everywhere but in "binary" content-transfer-encoding parts.
I'm not proposing this yet (not sure this is enough and we don't need
more granular tweakings), but this is something I'm evaluating right
now... The strict mode is desiderable to have, but less important than
the permissive parsing (we want to be strict in output, not in input).
OTOH someone may want to use mime4j for validating if a content is
wellformed or not (wrt RFC) and in this case a strict mode would be
necessary.
Stefano
Stefano,
With all due respect but I see strict handling of line delimiters as
_pointless_ orthodoxy that really does not help anyone. Would you really
ship an application to a client of yours that rejects a message as
invalid because it contains a lone LF in it? So what is the _point_ of
being strict about line delimiters?
As I said the strict mode would only be useful to users of mime4j
wanting to use mime4j as a validator to check RFC compliance. You know,
mime4j born for SMTP, but now you need it for HTTP and someone else may
want to do a validator. So let's not keep our eyes closed once again.
Anyways, let's talk code now. How about this?
(1)
interface LineDelimiterStrategy {
boolean isNewLine(char ch1, char ch2) // both can be -1
throws MimeException;
}
One can provide MimeTokenStream with an implementation of this interface
at the construction time. MimeTokenStream it its turn passes a
reference to that class to all parser components that need to deal with
line delimiters.
I'm not sure I understand what are the 2 params passed to isNewLine and
what code will invoke this service.
(2) The issue of CR / LF handling in content bodies should be taken of
when formatting output, _not_ when parsing input.
Would that work for you?
I'm not sure this is enough.
In output we format what we parser: if we parsed the input as multiple
lines then we output multiple lines, otherwise we output a single line.
So it is during parsing that we have to decide whether an isolated LF is
a newline delimiter or not.
This isssue is very related to charset: when you read a content you have
to deal with charset during parsing, you cannot do that during
formatting output. So if you find something invalid for that charset you
have to deal with it during parsing and not during output formatting.
I think this document (excerpt from RFC1521) is key to create an opinion
about the best approach:
http://www.math-inf.uni-greifswald.de/~teumer/mime/1521/Appendix_G.html
I hope we get some more opinion from other contributor so we have
multiple "interpretation" of what is the best thing to do, too.
Stefano
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]