Robert Burrell Donkin wrote:
On 7/18/08, Stefano Bagnara <[EMAIL PROTECTED]> wrote:
Robert Burrell Donkin ha scritto:
On Fri, Jul 18, 2008 at 9:34 AM, Stefano Bagnara <[EMAIL PROTECTED]> wrote:
Robert Burrell Donkin ha scritto:
On Thu, Jul 17, 2008 at 4:02 PM, Stefano Bagnara <[EMAIL PROTECTED]>
wrote:
Stefano Bagnara ha scritto:
<snip>
can we rewind a little
- If the message have only newlines it seems mime4j ends up outputting
headers with CRLF and body with LF.
am i right in assuming that this is about using Mime4J for
roundtripping via org.apache.james.mime4j.message.Message?
It involve both reading and writing.
In our specific case I record that we accept an LF as separator in
headers,
but we take a CR as a char part of the header (while it is invalid).
E.g: I would say that in the case of an isolated CR in headers we have 3
options:
1) consider it a newline
1a) output it as-is when roundtripping
1b) convert it to CRLF when roundtripping
2) fail parsing (malformed message)
3) use it as part of the header value.
Now we do #3 and I think this is the worst solution.
I don't know if mime4j should support all of the 4 solutions above for a
CR
(4 configurations seems too much to me) but I think we should discuss the
merit of each solution and decide what are the one we want to support.
i understand this argument. however, i still think we need to step
back a little and gain some perspective.
round tripping involves two distinct components. the parser parses
the message into a DOM (Message) which is then written out.
AIUI it is this complete cycle that results in the line ending
inconsistency noted between the input and the output. is my
understanding correct?
I think we should discuss about parsing separated from outputting
something we have in memory.
Yes
IMHO, it's clear we'll never be able to
alter malformed mime content while preserving the malformations, so we
have to think that in output we always have to create a canonical mime
message. This is currently not the case, but this is the minor of my
concern (because it is easier to fix, I think).
So I think there's rough consensus that writing the DOM should
canonicalise. Yes, I agree that this can be accomodated by altering
the DOM writer.
So the issue is also during parsing:
1) we now have special treatment for isolated LF, we do not have
something similar for CR (AFAIK both are special end of line delimiters
used in some specific platform and not compliant to the canonical mime
format, so I think we *should* support both special chars (in a lenient
parsing).
If this logic can be acommodated easily then it sounds like we
probably should unless there are good reasons not to
2) ((TextBody) b).getReader(). This give me a reader, so this support
the "line" concept: I do expect this one to treat "non canonical"
newlines like the header/structure parser: if headers are allowed to
terminate with an isolated LF then also lines in text content should do
the same (because probably the whole mime message has LF instead of
CRLF). [RFC seems to suggest that the fact is that the MIME message is
encoded using LF instead of CRLF and that this specific encoding breaks
binary parts, but we want to be smarter wrt this issue].
TextBody is part of the DOM. This can and should be addressed there
(rather than in the parser). I think that doing this should satisfy
both needs without compromising the performance of the parser.
If this is indeed something we can all agree on, I can try to solve the
first problem (strict/lenient line delimiter handling) using a pluggable
strategy of some kind.
Oleg
Robert
Stefano
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]