On Thu, Sep 11, 2008 at 4:22 AM, grishma govani <[EMAIL PROTECTED]> wrote:
> I have the e-mails from gmail in a file on my computer. I have used the code > below extract all the headers. As you can see for now I am using text stored > in document as my body. I just want to extract the plain text and leave out > all the html, duplicates of plain text and all the other information like > content type, from etc. Can anyone help me out? Here is a program that shows the contents of an mbox file. It shows the subject of each message and the content-type and except from each part of the message body. It works with both single and multipart messages. import mailbox def showMbox(mboxPath): box = mailbox.mbox(mboxPath) for msg in box: print msg['Subject'] showPayload(msg) print print '**********************************' print def showPayload(msg): payload = msg.get_payload() if msg.is_multipart(): div = '' for subMsg in payload: print div showPayload(subMsg) div = '------------------------------' else: print msg.get_content_type() print payload[:200] if __name__ == '__main__': showMbox('/path/to/mbox'') Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor