Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Stefano Bagnara Sat, 19 Jul 2008 16:25:56 -0700

Oleg Kalnichevski ha scritto:

On Sat, 2008-07-19 at 18:29 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
...
Yes
IMHO, it's clear we'll never be able to
alter malformed mime content while preserving the malformations, so we
have to think that in output we always have to create a canonical mime
message. This is currently not the case, but this is the minor of my
concern (because it is easier to fix, I think).
So I think there's rough consensus that writing the DOM should
canonicalise. Yes, I agree that this can be accomodated by altering
the DOM writer.
So the issue is also during parsing:

1) we now have special treatment for isolated LF, we do not have
something similar for CR (AFAIK both are special end of line delimiters
used in some specific platform and not compliant to the canonical mime
format, so I think we *should* support both special chars (in a lenient
parsing).
If this logic can be acommodated easily then it sounds like we
probably should unless there are good reasons not to
2) ((TextBody) b).getReader(). This give me a reader, so this support
the "line" concept: I do expect this one to treat "non canonical"
newlines like the header/structure parser: if headers are allowed to
terminate with an isolated LF then also lines in text content should do
the same (because probably the whole mime message has LF instead of
CRLF). [RFC seems to suggest that the fact is that the MIME message is
encoded using LF instead of CRLF and that this specific encoding breaks
binary parts, but we want to be smarter wrt this issue].
TextBody is part of the DOM. This can and should be addressed there
(rather than in the parser). I think that doing this should satisfy
both needs without compromising the performance of the parser.
If this is indeed something we can all agree on, I can try to solve thefirst problem (strict/lenient line delimiter handling) using a pluggablestrategy of some kind.
Oleg
My limited knowledge of mime4j details doesn't let me reply "+1". So Isimply tell what I expect from mime4j as an user:
Lenient line delimiter parsing:
- consider isolated LF and CR in the mime stream as newlines as long asa newline concept exists in that specific place (everywhere but binarybody parts having ContentTransferEncoding = "binary").- This means that a CR in a base64 stream is a newline, a CR in atext/plain is a newline, a "CR<boundary> CR" sequence is a validmultipart boundary, "CRLFCR", "CRLFCR", "CRCRLF", "LFCRLF", "LFCR","CRCR" or "LFLF" sequences are valid separators between header and bodybecause they are considered as equivalent to "CRLFCRLF".- THis also means that writing in output this stuff will result in amime stream with NO isolated CRs or LFs (unless they are in a "binary"encoded body).
Strict line delimiter parsing (I don't care if we have this now, I justthink we should have this in mind while factoring mime4j because itshould be possible to implement this with no major changes).- LFs and CRs are not newlines, they are not considered newlines andresults in errors raised by the parser (invalid header, invalid content,and so on) that will result in a parsing failure or (if the raisederrors are ignored) in invalid DOM (I'm not sure how we currently handlethis case for non-expected 8bit content in an header, but it should bethe same).
- writing in output this content should result in a well-formed content, so:
- if an LF in the header is somehow "encodable" as a valid sequenceit should be parsed as LF and then encoded while outputting. If insteadan LF in the header is not encodable then we should fail parsing orremove it (or convert it to "?" or anything similar) if we want to belenient.
I'm not saying that I want mime4j to support all of this before arelease, I just want to understand if this is what you also expect andif this can be considered a common goal.
Stefano
<rant disclaimer="please ignore">

HttpComponents project chose to depend on mime4j instead of developing a
similar solution because we thought it was the right thing to do. We
thought we should rather contribute to an existing project instead of
pursuing competing efforts for which we have neither resources nor the
right expertise.
As a result we had to delay the next release of HttpClient by almost two
months waiting for a mime4j release. I do not see a point in waiting any
longer. I see no other way but dropping dependency on mime4j, at least
temporarily.
I did my very best to resolve an old problem no one seemed eager to work
on for a year and a half. I will happily continue to contribute to this
project to my best abilities, but in this particular case I see no
justification to investing any more time in trying to satisfy someone
else's wish list.

</rant>

Please note that I never expected you to satisfy any wishlist. I simplyopened the project and tested some code and I found issues. I simplyreported issues. I also tried to help with the repackaging because it issomething that I always found interesting (I also wrote a tool thatautomatically try to make automatic package classification based ondependencies and some metric).

I'm not (I never did) asking you to solve any of mime4j issues (my lastsentence you quoted is "I'm not saying that I want mime4j to support allof this before a release, I just want to understand if this is what youalso expect and if this can be considered a common goal..": I'm not sureI understand your rant and why you think I'm (or someone else) askingyou to do something. In fact we even voted to make you committer to thisproject so you could have worked on the code without waiting for ourlimited time to review/apply patches.

Most of the issues I opened against Mime4j are simply there because theyneed attention: there is no need to solve them in order to make a release.

Some other issue instead require attention (e.g: the quoted printablestuff no more being decoded), but this is not something we are askingyou to solve. I'm not sure when I'll find the time but I plan to try tounderstand when this has been broken and how to fix it.

Infinite loops and other OOM issues are there: I think I'm not the causeof them, and I understand that is frustrating for you to find criticalissues in a library you introduced in your component but please thinktwice to this.

Most of this issues are regression against old mime4j versions andNiklas did a good job in givin trunk a go and testing it in his environment.

I hope you understand I'm propositive in this discussion and I'm simplytrying to understand the common goal so that we don't break each other work.

I'm even willing to code myself the solution once we agree on theexpected behaviour: it simply does not worth the time of anyone if wesimply commit code satisfying one specific need while breaking previousbehaviours (e.g: for the specific use in a SMTP environment currenttrunk is much more fast but have many more issues than the last release.I don't know if this also apply to other protocols: e.g: do you needquoted printable decoding in HTTP?)

Stefano,

(1) I _personally_ see the strict handling of line delimiters as
_completely_ and _utterly_ pointless

I hope you understood that I'm not saying that we should delay a mime4jrelease for any of the issues I'm discussing here and also that I don'tthink that a strict handling is a needed feature, but only that is wouldbe a desiderable option for a similar library.

(2) I encountered only two types of line delimiters in HTTP messages in
the wild: <CRLF> and <LF> (often mixed in the same message). I cannot
recall seeing messages where <CR> was used as a line delimiter. I can
live with any solution that enables me to configure mime4j to parse
messages where both <CRLF> and <LF> can be used as line delimiters.

That's fine. Just let me understand: you wouldn't like to have mime4jparsing also CR as newline, right? What do you expect from mime4j when aCR is found around the mime stream?


Stefano

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [mime4j] newlines and parsing of nested (encoded) rfc822 messages

Reply via email to