For parsing nutch also use Tika which supports a lot of formats [1] including 
several mail formats.

[1] tika.apache.org/0.9/formats.html‎

On Jan 29, 2014, at 12:09 AM, Tejas Patil <tejas.patil...@gmail.com> wrote:

> Nutch has these protocols implemented : http, https, ftp, file. As long as
> you get links to your documents in those schemes, Nutch would do the crawl.
> 
> Thanks,
> Tejas
> 
> 
> 
> On Tue, Jan 28, 2014 at 10:07 PM, rashmi maheshwari <
> maheshwari.ras...@gmail.com> wrote:
> 
>> I could crawl internet webpage and local directory folder to some extent.
>> 
>> How to implement email and inranet blogs crawling?
>> 
>> --
>> Rashmi
>> Be the change that you want to see in this world!
>> 


________________________________________________________________________________________________
III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

Reply via email to