Details on why some attachments are corrupt (Was: Mangledattachments)

Eddy Fri, 13 Sep 2002 06:07:20 -0700

First, I've requested that the 'messages.zip' file be removed from any
sites where it has been "published"; I never intended that particular
file to be "widely available" (it was a forward of a forward of
forward and potentially contained email addresses which I wouldn't
want to be responsible for inadvertendly being made "public".)
Hopefully enough (responsible) people truly interested in helping
track down the problem did get a copy.


Second, last night I looked at the differences between the .TBB/.TBI
files with a corrupt version (as in the 'messages.zip' file) and
between a version which I exported as a Unix mailbox file and then
re-imported into a new TB! folder.

First, the .TBI files were identical. So they aren't involved.

        Corrupt MESSAGES.TBB file: 147,230 bytes
MESSAGES.TBB file after importing: 147,180 bytes

Here are where the differences come from (there are three):

1) In the original, corrupt file, the To: line is formatted like this:

To: Name <address>,
<tab>Name <address>,
<tab>Name <address>,
<tab>Name <address>,
<tab>Name <address>,
<tab>Name <address>,
<tab>Name <address>

In the version that was exported/imported, it is now formatted like
this:

To: Name <address>, Name <address>,
<tab>Name <address>, Name <address>,
<tab>Name <address>, Name <address>,
<tab>Name <address>

This is an insignificant difference, and both are perfectly compliant
with RFC-2822 (which replaced RFC-822).

2) In the original, corrupt file, the first few lines of the message
body were:

This is a multi-part message in MIME format.
--------------060503030600060908060405
Content-Type: multipart/alternative;
 boundary="------------030609040601080206080607"

In the exported/imported file, the first few lines of the message body
were:

--------------060503030600060908060405
Content-Type: multipart/alternative;
 boundary="------------030609040601080206080607"

Notice that the line that says "This is a multi-part message in MIME
format." is missing. This is also perfectly fine, since that line is
a carryover from when MIME first "hit the scene" so users with non-MIME
aware mailers would have a clue what the heck the message was.

3) Here is the real difference, and the one that is causing all of
the problems, at least in this particular case:

At around line 2020, there is this line:

jv44Qlinmzyfpnmixjsryee3EEnqxz2z+nIG4lQORB2BKyf6yJgVJbv8rfLs1/f3rPcA39s1

(This is very close to the end of the file, which explains why only the
VERY BOTTOM of the JPEG image is "corrupt" when saved from TB! and
viewed in an external viewer, like IrfanView.)

In the corrupt version this line is the ONLY line in the ENTIRE email
that is terminated by only a newline ('\n', 0x0A). EVERY other line in
the file is termianted by a CR/NL combination ('\r\n', 0x0D 0x0A)!

In the exported/imported version, this line is properly terminated
with a CR/NL combo, and hence, can be extracted properly, etc.

I have verified this same problem with an entirely different message
and found the exact same thing: there is a line in the Base64-encoded
attachment area that was terminated solely by a NL and not a CR/NL
combo like the rest of the message.

My next task will be to do it with some corrupt attachments of
differing types (.DOC files, .ZIP files, etc.) and see if this
analysis always holds true or not.

One possibility is that TB! is being very strict in interpreting its
"end-of-line", and other mailers (such as Outlook) are perfectly OK
with a single NL (even if other lines are terminated with CR/NL) as an
end-of-line indicator.

Comments? Other theories?


________________________________________________
Current version is 1.61 | "Using TBUDL" information:
http://www.silverstones.com/thebat/TBUDLInfo.html

Details on why some attachments are corrupt (Was: Mangledattachments)

Reply via email to