Oleg Kalnichevski ha scritto:
On Thu, 2008-07-17 at 20:25 +0200, Stefano Bagnara wrote:
Oleg Kalnichevski ha scritto:
Stefano Bagnara wrote:
I've had a fast read of the RFC2822 about this issue. It insists that
CRLF is the only valid delimiter for a canonical rfc822 message.
Furthermore rfc2822 does not allow the use of isolated CR or LF.
So, whenever isolated CR or isolated LF is found we have a malformed
rfc822 message and we have to define how to deal with it.
Tell the users of IE about it
Can you provide more informations? What is the issue in IE?
Blatant disregard of all standards imaginable. Mozilla is actually
hardly any better.
I'm used to non compliant stuff: in the SMTP world there is plenty.
What I want to be sure is that we don't do the same they did: simply
working "by example" without reading accurately the RFCs tend to create
multiple incompatible results. I wouldn't like if mime4j created more
non-standard output for people to blame us.
Does it post
malformed mime? Can you be more precise about what version of IE and
what kind of malformed sequences are produced?
All common browsers known to me put raw binary in the multipart/form
coded requests. I do not have an IE wire dump handy but I have a few
ones of Firefox
https://issues.apache.org/jira/browse/HTTPCLIENT-784
https://issues.apache.org/jira/browse/HTTPCLIENT-785
From the 2 bugs it seems they put raw binary without any header
specifying that it is pure binary. In standard MIME (to my knowledge)
this should be always be preceeded by a "Content-Transfer-Encoding:
binary" header (notice "binary" and not "8bit", they are different).
"Content-Transfer-Encoding: binary" is the only place where isolated CR
and LF are allowed.
First thing I don't know if the fact that the CTE-binary is missing is
because in the HTTP world use it as the default (as opposite to SMTP
7bit default) or because they are abusing the MIME spec: does anyone
know this?
This make it clear, to me, that anyway we want to support the binary
encoding (at least when it is specified and when other environment says
that it is the default behaviour).
Second thing I would like to understand if this is the only case where
conversion of isolated CR and LF to CRLF would create issues or if HTTP
shows more issues.
Third I would like to understand if simply having mime4j to not alter
any isolated CR and LF and fail parsing when an isolated CR or LF is
found outside binary content would be ok for http needs.
I don't understand why a conversion is wrong for the http case (when
does it happen that you have to deal with isolated LF ?).
How about binary data in multipart/form encoded requests?
Can you tell me what RFC are we talking about?
We are not taking any RFC here. We are talking real-world content.
Well, they are using a protocol, anyway. What to do is specified in an
RFC. I want to know what is the RFC and then to understand if they are
doing something wrong or if we simply misunderstood the RFC or if there
is an RFC we don't know.
I'm not saying that we should ignore real-world content if it is non
compliant, I'm saying that we have to understand it better.
In this case I think I was looking for this RFC:
http://www.faqs.org/rfcs/rfc1867.html
I'm not sure that the RFC is the latest and is the only one involved but
there I read (about multipart/form-data):
----------------
While the HTTP protocol can transport arbitrary BINARY data, the
default for mail transport (e.g., if the ACTION is a "mailto:" URL)
is the 7BIT encoding. The value supplied for a part may need to be
encoded and the "content-transfer-encoding" header supplied if the
value does not conform to the default encoding. [See section 5 of
RFC 1521 for more details.]
---------------
http://www.faqs.org/rfcs/rfc1521.html provides a long paragraph about
content-transfer-encoding but I'm not sure I grok it all.
From my current understanding it does not define a default transfer
encoding and it says that each protocol could define its default (also
telling that SMTP rfc821 define the 7bit as the default).
So maybe there is an HTTP RFC that tell that in an HTTP world the
default is "binary".
What is clear is that any CR/LF conversion in a "binary" content is BAD
and we don't want MIME4J to do that. So, if we want to be permissive
with some content received with bad newlines we have to make sure we
don't break binary content.
Furthermore I would say that there is a need for a "default content
transfer encoding" to be used when one is not specified in headers
(because this does not seem part of the MIME spec, but of specific
protocols specifications).
WDYT?
Stefano
PS: please note that I'm not saying that we should block 0.4 release for
this issue. I just think this issue is important and we should care for
it, but this can land in 0.5 if we want to.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]