[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Stefano Bagnara (JIRA) Mon, 21 Jul 2008 08:28:23 -0700

    [ 
https://issues.apache.org/jira/browse/MIME4J-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615274#action_12615274
 ]


Stefano Bagnara commented on MIME4J-62:
---------------------------------------

MY OPINION is that rules #3, #4 and #5 are not for space optimization but for 
better representation of the content when a decoding is not possible. But my 
opinion is not important in resolving this issue.

We have 3 tests I see are in MessageWriteToTest
> - testBinaryAttachmentLenient
> - testBinaryAttachmentStrictError
> - testBinaryAttachmentStrictIgnore

The expected result written in this tests expect a quoted-printable encoder 
supporting at least #3 and #5 spec from rfc1521.

Either we add these features or we change the expected result.
(of course it simpler to change the expected result).

I tried this locally and it seems there is another bug about a CRLF sequence 
added in the roundtripping. Maybe a problem in the QuotedPrintableInputStream 
or in the MimeBoundaryInputStream, no clue yet.

> Unnecessary qp encoding of SPACE and TAB characters in CodecUtil
> ----------------------------------------------------------------
>
>                 Key: MIME4J-62
>                 URL: https://issues.apache.org/jira/browse/MIME4J-62
>             Project: Mime4j
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Niklas Therning
>            Priority: Minor
>             Fix For: 0.4
>
>
> ATM we always encode SPACE and TAB. The result is that the output of the 
> encoding is longer than necessary. According to the MIME RFC:
> (3)   (White Space) Octets with values of 9 and 32 MAY be
>           represented as US-ASCII TAB (HT) and SPACE characters,
>           respectively, but MUST NOT be so represented at the end
>           of an encoded line.  Any TAB (HT) or SPACE characters
>           on an encoded line MUST thus be followed on that line
>           by a printable character.  In particular, an "=" at the
>           end of an encoded line, indicating a soft line break
>           (see rule #5) may follow one or more TAB (HT) or SPACE
>           characters.  It follows that an octet with decimal
>           value 9 or 32 appearing at the end of an encoded line
>           must be represented according to Rule #1.  This rule is
>           necessary because some MTAs (Message Transport Agents,
>           programs which transport messages from one user to
>           another, or perform a portion of such transfers) are
>           known to pad lines of text with SPACEs, and others are
>           known to remove "white space" characters from the end
>           of a line.  Therefore, when decoding a Quoted-Printable
>           body, any trailing white space on a line must be
>           deleted, as it will necessarily have been added by
>           intermediate transport agents.
> To make the encoded output as short as possible we should try to not encode 
> SPACE and TAB unless they are the last character in a line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (MIME4J-62) Unnecessary qp encoding of SPACE and TAB characters in CodecUtil

Reply via email to