Yes, I used the part of the code from the second link.
I am using the mailbox modules too.
I have the e-mails from gmail in a file on my computer. I have used
the code below extract all the headers. As you can see for now I am
using text stored in document as my body. I just want to extract the
plain text and leave out all the html, duplicates of plain text and
all the other information like content type, from etc. Can anyone help
me out?
mb = mailbox.UnixMailbox(file('tmp/automated/Feedback', 'r'))
fout = file('Feedback.txt', 'w')
msg = mb.next()
while msg is not None:
document = msg.fp.read()
document = passthrough_filter(msg, document)
msg = mb.next()
def passthrough_filter(msg, document):
"""This prints the 'from' address of the message and
returns the document unchanged.
"""
from_addr = msg.getaddr('From')[0]
Sub = msg.get('Subject')
ContentType = msg.get('Content-Type')
ContentDisp = msg.get('Content-Disposition')
print "From:",from_addr
print "Subject:",Sub
print "Attachment:",None
print "Body:",document
print '\n'
return document
On 10 Sep 2008, at 22:09, Kent Johnson wrote:
On Wed, Sep 10, 2008 at 4:06 PM, grishma govani
<[EMAIL PROTECTED]> wrote:
Hello Everybody,
I have been trying to extract the body of all the email messages
from an
mbox file.
How are you doing this? Have you seen the mailbox module and this
recipe:
http://docs.python.org/lib/mailbox-mbox.html
http://code.activestate.com/recipes/157437/
Kent
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor