On 7/18/08, Stefano Bagnara <[EMAIL PROTECTED]> wrote:
> Robert Burrell Donkin ha scritto:
>> On Fri, Jul 18, 2008 at 9:34 AM, Stefano Bagnara <[EMAIL PROTECTED]> wrote:
>>> Robert Burrell Donkin ha scritto:
>>>> On Thu, Jul 17, 2008 at 4:02 PM, Stefano Bagnara <[EMAIL PROTECTED]>
>>>> wrote:
>>>>> Stefano Bagnara ha scritto:
>>>> <snip>
>>>>
>>>> can we rewind a little
>>>>
>>>>>> - If the message have only newlines it seems mime4j ends up outputting
>>>>>> headers with CRLF and body with LF.
>>>> am i right in assuming that this is about using Mime4J for
>>>> roundtripping via org.apache.james.mime4j.message.Message?
>>> It involve both reading and writing.
>>>
>>> In our specific case I record that we accept an LF as separator in
>>> headers,
>>> but we take a CR as a char part of the header (while it is invalid).
>>>
>>> E.g: I would say that in the case of an isolated CR in headers we have 3
>>> options:
>>> 1) consider it a newline
>>>  1a) output it as-is when roundtripping
>>>  1b) convert it to CRLF when roundtripping
>>> 2) fail parsing (malformed message)
>>> 3) use it as part of the header value.
>>>
>>> Now we do #3 and I think this is the worst solution.
>>> I don't know if mime4j should support all of the 4 solutions above for a
>>> CR
>>> (4 configurations seems too much to me) but I think we should discuss the
>>> merit of each solution and decide what are the one we want to support.
>>
>> i understand this argument. however, i still think we need to step
>> back a little and gain some perspective.
>>
>> round tripping involves two distinct components.  the parser parses
>> the message into a DOM (Message) which is then written out.
>>
>> AIUI it is this complete cycle that results in the line ending
>> inconsistency noted between the input and the output.  is my
>> understanding correct?
>
> I think we should discuss about parsing separated from outputting
> something we have in memory.

Yes
> IMHO, it's clear we'll never be able to
> alter malformed mime content while preserving the malformations, so we
> have to think that in output we always have to create a canonical mime
> message. This is currently not the case, but this is the minor of my
> concern (because it is easier to fix, I think).

So I think there's rough consensus that writing the DOM should
canonicalise. Yes, I agree that this can be accomodated by altering
the DOM writer.
> So the issue is also during parsing:
>
> 1) we now have special treatment for isolated LF, we do not have
> something similar for CR (AFAIK both are special end of line delimiters
> used in some specific platform and not compliant to the canonical mime
> format, so I think we *should* support both special chars (in a lenient
> parsing).
If this logic can be acommodated easily then it sounds like we
probably should unless there are good reasons not to

> 2) ((TextBody) b).getReader(). This give me a reader, so this support
> the "line" concept: I do expect this one to treat "non canonical"
> newlines like the header/structure parser: if headers are allowed to
> terminate with an isolated LF then also lines in text content should do
> the same (because probably the whole mime message has LF instead of
> CRLF). [RFC seems to suggest that the fact is that the MIME message is
> encoded using LF instead of CRLF and that this specific encoding breaks
> binary parts, but we want to be smarter wrt this issue].

TextBody is part of the DOM. This can and should be addressed there
(rather than in the parser). I think that doing this should satisfy
both needs without compromising the performance of the parser.

Robert

>
> Stefano
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to