Oleg Kalnichevski ha scritto:
On Thu, 2008-07-17 at 20:25 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
Stefano Bagnara wrote:

I've had a fast read of the RFC2822 about this issue. It insists that CRLF is the only valid delimiter for a canonical rfc822 message. Furthermore rfc2822 does not allow the use of isolated CR or LF. So, whenever isolated CR or isolated LF is found we have a malformed rfc822 message and we have to define how to deal with it.

Tell the users of IE about it
Can you provide more informations? What is the issue in IE?

Blatant disregard of all standards imaginable. Mozilla is actually
hardly any better.

I'm used to non compliant stuff: in the SMTP world there is plenty.
What I want to be sure is that we don't do the same they did: simply working "by example" without reading accurately the RFCs tend to create multiple incompatible results. I wouldn't like if mime4j created more non-standard output for people to blame us.

Does it post malformed mime? Can you be more precise about what version of IE and what kind of malformed sequences are produced?

All common browsers known to me put raw binary in the multipart/form
coded requests. I do not have an IE wire dump handy but I have a few
ones of Firefox

https://issues.apache.org/jira/browse/HTTPCLIENT-784 https://issues.apache.org/jira/browse/HTTPCLIENT-785

From the 2 bugs it seems they put raw binary without any header specifying that it is pure binary. In standard MIME (to my knowledge) this should be always be preceeded by a "Content-Transfer-Encoding: binary" header (notice "binary" and not "8bit", they are different).

"Content-Transfer-Encoding: binary" is the only place where isolated CR and LF are allowed.

First thing I don't know if the fact that the CTE-binary is missing is because in the HTTP world use it as the default (as opposite to SMTP 7bit default) or because they are abusing the MIME spec: does anyone know this?

This make it clear, to me, that anyway we want to support the binary encoding (at least when it is specified and when other environment says that it is the default behaviour).

Second thing I would like to understand if this is the only case where conversion of isolated CR and LF to CRLF would create issues or if HTTP shows more issues.

Third I would like to understand if simply having mime4j to not alter any isolated CR and LF and fail parsing when an isolated CR or LF is found outside binary content would be ok for http needs.

I don't understand why a conversion is wrong for the http case (when does it happen that you have to deal with isolated LF ?).
How about binary data in multipart/form encoded requests?
Can you tell me what RFC are we talking about?


We are not taking any RFC here. We are talking real-world content.

Well, they are using a protocol, anyway. What to do is specified in an RFC. I want to know what is the RFC and then to understand if they are doing something wrong or if we simply misunderstood the RFC or if there is an RFC we don't know. I'm not saying that we should ignore real-world content if it is non compliant, I'm saying that we have to understand it better.

In this case I think I was looking for this RFC:
http://www.faqs.org/rfcs/rfc1867.html

I'm not sure that the RFC is the latest and is the only one involved but there I read (about multipart/form-data):
----------------
   While the HTTP protocol can transport arbitrary BINARY data, the
   default for mail transport (e.g., if the ACTION is a "mailto:"; URL)
   is the 7BIT encoding.  The value supplied for a part may need to be
   encoded and the "content-transfer-encoding" header supplied if the
   value does not conform to the default encoding.  [See section 5 of
   RFC 1521 for more details.]
---------------

http://www.faqs.org/rfcs/rfc1521.html provides a long paragraph about content-transfer-encoding but I'm not sure I grok it all. From my current understanding it does not define a default transfer encoding and it says that each protocol could define its default (also telling that SMTP rfc821 define the 7bit as the default).

So maybe there is an HTTP RFC that tell that in an HTTP world the default is "binary".

What is clear is that any CR/LF conversion in a "binary" content is BAD and we don't want MIME4J to do that. So, if we want to be permissive with some content received with bad newlines we have to make sure we don't break binary content.

Furthermore I would say that there is a need for a "default content transfer encoding" to be used when one is not specified in headers (because this does not seem part of the MIME spec, but of specific protocols specifications).

WDYT?

Stefano

PS: please note that I'm not saying that we should block 0.4 release for this issue. I just think this issue is important and we should care for it, but this can land in 0.5 if we want to.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to