[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Niklas Therning (JIRA) Mon, 21 Jul 2008 09:11:54 -0700

    [ 
https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615289#action_12615289
 ]


Niklas Therning commented on MIME4J-62:
---------------------------------------

Stefano, the extra CRLF you see come from TempFileTextBody.writeTo(). For the 
QP case I think it's incorrect for TempFileTextBody (TempFileBinaryBody also 
does this) to do that as that will actually alter the body. If e.g. a PNG is 
encoded like this it may not render properly when decoded.

If it's ok with you guys I'll change all SPACEs to _ for now and remove the 
CRLF at the end of the QP encoded part of the MULTIPART_WITH_BINARY_ATTACHMENTS 
test message. I'll also change TempFileTextBody and TempFileBinaryBody to not 
add the extra CRLF after outputting a QP encoded body. This will fix the 
failing tests. And then we can come back to this issue later to make an 
alternative encoder for textual data. How about that?

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: TextAttachmentEncodingTest.java
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the 
> encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode 
> SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Reply via email to