Hi,

I am using Nutch-1.7 for crawling and getting the crawled data in
crawl/segments in HDFS. I want to get the structured data using
Apache-Tika. Can someone suggest me some reference on how to parse the
crawled data by Nutch using Apache-Tika?

Regards,
Rahul


On Mon, Feb 10, 2014 at 6:29 PM, Markus Jelsma
<[email protected]>wrote:

> did you set
>
>   <property>
>    <name>db.fetch.schedule.class</name>
>    <value>org.apache.nutch.crawl.AdaptiveFetchSchedule</value>
>   </property>
>
> as well? The other settings not mandatory, they have defaults.
>
>
> -----Original message-----
> > From:Erwin Gunadi <[email protected]>
> > Sent: Monday 10th February 2014 13:05
> > To: [email protected]
> > Subject: Question about fetch interval value
> >
> > Hi,
> >
> >
> >
> > I have a question the behavior of using AdaptiveFetchSchedule in
> combination
> > of "db.fetch.interval.default".
> >
> > I know that one should configure:
> >
> > -          db.fetch.schedule.adaptive.min_interval
> >
> > -          db.fetch.schedule.adaptive.max_interval
> >
> > In order to use AdaptiveFetchSchedule.
> >
> >
> >
> > But I've been having strange behavior during crawling, because it always
> > tried to re-fetch with the value of "db.fetch.interval.default".
> >
> >
> >
> > Thank you for your help.
> >
> >
> >
> > Best Regards
> >
> > Erwin
> >
> >
>

Reply via email to