[Tutor] Extracting body of all email messages from an mbox file on computer

grishma govani Thu, 11 Sep 2008 01:23:45 -0700

Yes, I used the part of the code from the second link.
I am using the mailbox modules too.

I have the e-mails from gmail in a file on my computer. I have usedthe code below extract all the headers. As you can see for now I amusing text stored in document as my body. I just want to extract theplain text and leave out all the html, duplicates of plain text andall the other information like content type, from etc. Can anyone helpme out?


mb = mailbox.UnixMailbox(file('tmp/automated/Feedback', 'r'))
fout = file('Feedback.txt', 'w')
msg = mb.next()

while msg is not None:
   document = msg.fp.read()
   document = passthrough_filter(msg, document)
   msg = mb.next()


def passthrough_filter(msg, document):
   """This prints the 'from' address of the message and
   returns the document unchanged.
   """
   from_addr = msg.getaddr('From')[0]
   Sub = msg.get('Subject')
   ContentType = msg.get('Content-Type')
   ContentDisp = msg.get('Content-Disposition')
   print "From:",from_addr
   print "Subject:",Sub
   print "Attachment:",None
   print "Body:",document
   print '\n'
   return document




On 10 Sep 2008, at 22:09, Kent Johnson wrote:

On Wed, Sep 10, 2008 at 4:06 PM, grishma govani<[EMAIL PROTECTED]> wrote:
Hello Everybody,
I have been trying to extract the body of all the email messagesfrom an
mbox file.
How are you doing this? Have you seen the mailbox module and thisrecipe:
http://docs.python.org/lib/mailbox-mbox.html
http://code.activestate.com/recipes/157437/

Kent

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Extracting body of all email messages from an mbox file on computer

Reply via email to