On Fri, 2008-07-18 at 10:25 +0200, Stefano Bagnara wrote:
> Oleg Kalnichevski ha scritto:
> > On Thu, 2008-07-17 at 20:25 +0200, Stefano Bagnara wrote:
> >> Oleg Kalnichevski ha scritto:
> >>> Stefano Bagnara wrote:
> >>>>
> >>>> I've had a fast read of the RFC2822 about this issue. It insists that 
> >>>> CRLF is the only valid delimiter for a canonical rfc822 message. 
> >>>> Furthermore rfc2822 does not allow the use of isolated CR or LF.
> >>>> So, whenever isolated CR or isolated LF is found we have a malformed 
> >>>> rfc822 message and we have to define how to deal with it.
> >>>>
> >>> Tell the users of IE about it
> >> Can you provide more informations? What is the issue in IE?
> > 
> > Blatant disregard of all standards imaginable. Mozilla is actually
> > hardly any better.
> 
> I'm used to non compliant stuff: in the SMTP world there is plenty.
> What I want to be sure is that we don't do the same they did: simply 
> working "by example" without reading accurately the RFCs tend to create 
> multiple incompatible results. I wouldn't like if mime4j created more 
> non-standard output for people to blame us.
> 
> >>  Does it post 
> >> malformed mime? Can you be more precise about what version of IE and 
> >> what kind of malformed sequences are produced?
> > 
> > All common browsers known to me put raw binary in the multipart/form
> > coded requests. I do not have an IE wire dump handy but I have a few
> > ones of Firefox
> > 
> > https://issues.apache.org/jira/browse/HTTPCLIENT-784    
> > https://issues.apache.org/jira/browse/HTTPCLIENT-785
> 
>  From the 2 bugs it seems they put raw binary without any header 
> specifying that it is pure binary. In standard MIME (to my knowledge) 
> this should be always be preceeded by a "Content-Transfer-Encoding: 
> binary" header (notice "binary" and not "8bit", they are different).
> 
> "Content-Transfer-Encoding: binary" is the only place where isolated CR 
> and LF are allowed.
> 

My bad. I assumed 8bit encoding was the same as binary.


> First thing I don't know if the fact that the CTE-binary is missing is 
> because in the HTTP world use it as the default (as opposite to SMTP 
> 7bit default) or because they are abusing the MIME spec: does anyone 
> know this?
> 

I am not aware any HTTP specific requirements, so in my opinion 7bit
should be assumed to be the default encoding regardless of the
underlying transport.  


> This make it clear, to me, that anyway we want to support the binary 
> encoding (at least when it is specified and when other environment says 
> that it is the default behaviour).
> 
> Second thing I would like to understand if this is the only case where 
> conversion of isolated CR and LF to CRLF would create issues or if HTTP 
> shows more issues.
> 
> Third I would like to understand if simply having mime4j to not alter 
> any isolated CR and LF and fail parsing when an isolated CR or LF is 
> found outside binary content would be ok for http needs.
> 


Unfortunately not. There are lots of HTTP services that mix LF and CRLF
line delimiters in the same packet. In the HTTP world there is no way
around tolerating LFs and treating them as equivalent to CRLF.  


> >>>> I don't understand why a conversion is wrong for the http case (when 
> >>>> does it happen that you have to deal with isolated LF ?).
> >>> How about binary data in multipart/form encoded requests?
> >> Can you tell me what RFC are we talking about?
> >>
> > 
> > We are not taking any RFC here. We are talking real-world content.
> 
> Well, they are using a protocol, anyway. What to do is specified in an 
> RFC. I want to know what is the RFC and then to understand if they are 
> doing something wrong or if we simply misunderstood the RFC or if there 
> is an RFC we don't know.
> I'm not saying that we should ignore real-world content if it is non 
> compliant, I'm saying that we have to understand it better.
> 
> In this case I think I was looking for this RFC:
> http://www.faqs.org/rfcs/rfc1867.html
> 
> I'm not sure that the RFC is the latest and is the only one involved but 
> there I read (about multipart/form-data):
> ----------------
>     While the HTTP protocol can transport arbitrary BINARY data, the
>     default for mail transport (e.g., if the ACTION is a "mailto:"; URL)
>     is the 7BIT encoding.  The value supplied for a part may need to be
>     encoded and the "content-transfer-encoding" header supplied if the
>     value does not conform to the default encoding.  [See section 5 of
>     RFC 1521 for more details.]
> ---------------
> 
> http://www.faqs.org/rfcs/rfc1521.html provides a long paragraph about 
> content-transfer-encoding but I'm not sure I grok it all.
>  From my current understanding it does not define a default transfer 
> encoding and it says that each protocol could define its default (also 
> telling that SMTP rfc821 define the 7bit as the default).
> 
> So maybe there is an HTTP RFC that tell that in an HTTP world the 
> default is "binary".
> 

I am not aware of such RFC but it can well be I have just never come
across such a document.

Oleg


> What is clear is that any CR/LF conversion in a "binary" content is BAD 
> and we don't want MIME4J to do that. So, if we want to be permissive 
> with some content received with bad newlines we have to make sure we 
> don't break binary content.
> 
> Furthermore I would say that there is a need for a "default content 
> transfer encoding" to be used when one is not specified in headers 
> (because this does not seem part of the MIME spec, but of specific 
> protocols specifications).
> 
> WDYT?
> 
> Stefano
> 
> PS: please note that I'm not saying that we should block 0.4 release for 
> this issue. I just think this issue is important and we should care for 
> it, but this can land in 0.5 if we want to.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to