For parsing nutch also use Tika which supports a lot of formats [1] including several mail formats.
[1] tika.apache.org/0.9/formats.html‎ On Jan 29, 2014, at 12:09 AM, Tejas Patil <tejas.patil...@gmail.com> wrote: > Nutch has these protocols implemented : http, https, ftp, file. As long as > you get links to your documents in those schemes, Nutch would do the crawl. > > Thanks, > Tejas > > > > On Tue, Jan 28, 2014 at 10:07 PM, rashmi maheshwari < > maheshwari.ras...@gmail.com> wrote: > >> I could crawl internet webpage and local directory folder to some extent. >> >> How to implement email and inranet blogs crawling? >> >> -- >> Rashmi >> Be the change that you want to see in this world! >> ________________________________________________________________________________________________ III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu