On 7/18/08, Stefano Bagnara <[EMAIL PROTECTED]> wrote: > Robert Burrell Donkin ha scritto: >> On Fri, Jul 18, 2008 at 9:34 AM, Stefano Bagnara <[EMAIL PROTECTED]> wrote: >>> Robert Burrell Donkin ha scritto: >>>> On Thu, Jul 17, 2008 at 4:02 PM, Stefano Bagnara <[EMAIL PROTECTED]> >>>> wrote: >>>>> Stefano Bagnara ha scritto: >>>> <snip> >>>> >>>> can we rewind a little >>>> >>>>>> - If the message have only newlines it seems mime4j ends up outputting >>>>>> headers with CRLF and body with LF. >>>> am i right in assuming that this is about using Mime4J for >>>> roundtripping via org.apache.james.mime4j.message.Message? >>> It involve both reading and writing. >>> >>> In our specific case I record that we accept an LF as separator in >>> headers, >>> but we take a CR as a char part of the header (while it is invalid). >>> >>> E.g: I would say that in the case of an isolated CR in headers we have 3 >>> options: >>> 1) consider it a newline >>> 1a) output it as-is when roundtripping >>> 1b) convert it to CRLF when roundtripping >>> 2) fail parsing (malformed message) >>> 3) use it as part of the header value. >>> >>> Now we do #3 and I think this is the worst solution. >>> I don't know if mime4j should support all of the 4 solutions above for a >>> CR >>> (4 configurations seems too much to me) but I think we should discuss the >>> merit of each solution and decide what are the one we want to support. >> >> i understand this argument. however, i still think we need to step >> back a little and gain some perspective. >> >> round tripping involves two distinct components. the parser parses >> the message into a DOM (Message) which is then written out. >> >> AIUI it is this complete cycle that results in the line ending >> inconsistency noted between the input and the output. is my >> understanding correct? > > I think we should discuss about parsing separated from outputting > something we have in memory.
Yes > IMHO, it's clear we'll never be able to > alter malformed mime content while preserving the malformations, so we > have to think that in output we always have to create a canonical mime > message. This is currently not the case, but this is the minor of my > concern (because it is easier to fix, I think). So I think there's rough consensus that writing the DOM should canonicalise. Yes, I agree that this can be accomodated by altering the DOM writer. > So the issue is also during parsing: > > 1) we now have special treatment for isolated LF, we do not have > something similar for CR (AFAIK both are special end of line delimiters > used in some specific platform and not compliant to the canonical mime > format, so I think we *should* support both special chars (in a lenient > parsing). If this logic can be acommodated easily then it sounds like we probably should unless there are good reasons not to > 2) ((TextBody) b).getReader(). This give me a reader, so this support > the "line" concept: I do expect this one to treat "non canonical" > newlines like the header/structure parser: if headers are allowed to > terminate with an isolated LF then also lines in text content should do > the same (because probably the whole mime message has LF instead of > CRLF). [RFC seems to suggest that the fact is that the MIME message is > encoded using LF instead of CRLF and that this specific encoding breaks > binary parts, but we want to be smarter wrt this issue]. TextBody is part of the DOM. This can and should be addressed there (rather than in the parser). I think that doing this should satisfy both needs without compromising the performance of the parser. Robert > > Stefano > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]