Hi,
I'm trying to iterate through a directory of files, some of which are email
files. For the email files I would like to extract just the body of the
message and ignore both attachments and headers.
I'm currently trying to use the auto detect parser:
InputStream document = …;
Parser parse = new AutoDetectParser();
Metadata metadata = new Metadata();
StringWriter textBuffer = new StringWriter();
ContentHandler contentHandler = new BodyContentHandler(textBuffer);
ParseContext context = new ParseContext();
TikaInputStream inputStream = TikaInputStream.get(document);
parse.parse(inputStream, contentHandler, metadata, context);
This is getting me both the foots and headers. How can I modify this to only
get me the body? Do I have to use the Detector API to figure out if I'm
dealing with an email message? And then use some other parsers to extract just
the body of the message?
Any help is appreciated.
Thanks,
Karthik