Yes, I used the part of the code from the second link.
I am using the mailbox modules too.

I have the e-mails from gmail in a file on my computer. I have used the code below extract all the headers. As you can see for now I am using text stored in document as my body. I just want to extract the plain text and leave out all the html, duplicates of plain text and all the other information like content type, from etc. Can anyone help me out?

mb = mailbox.UnixMailbox(file('tmp/automated/Feedback', 'r'))
fout = file('Feedback.txt', 'w')
msg = mb.next()

while msg is not None:
   document = msg.fp.read()
   document = passthrough_filter(msg, document)
   msg = mb.next()


def passthrough_filter(msg, document):
   """This prints the 'from' address of the message and
   returns the document unchanged.
   """
   from_addr = msg.getaddr('From')[0]
   Sub = msg.get('Subject')
   ContentType = msg.get('Content-Type')
   ContentDisp = msg.get('Content-Disposition')
   print "From:",from_addr
   print "Subject:",Sub
   print "Attachment:",None
   print "Body:",document
   print '\n'
   return document




On 10 Sep 2008, at 22:09, Kent Johnson wrote:

On Wed, Sep 10, 2008 at 4:06 PM, grishma govani <[EMAIL PROTECTED]> wrote:
Hello Everybody,

I have been trying to extract the body of all the email messages from an
mbox file.

How are you doing this? Have you seen the mailbox module and this recipe:
http://docs.python.org/lib/mailbox-mbox.html
http://code.activestate.com/recipes/157437/

Kent

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to