Hi, I am using Nutch-1.7 for crawling and getting the crawled data in crawl/segments in HDFS. I want to get the structured data using Apache-Tika. Can someone suggest me some reference on how to parse the crawled data by Nutch using Apache-Tika?
Regards, Rahul On Mon, Feb 10, 2014 at 6:29 PM, Markus Jelsma <[email protected]>wrote: > did you set > > <property> > <name>db.fetch.schedule.class</name> > <value>org.apache.nutch.crawl.AdaptiveFetchSchedule</value> > </property> > > as well? The other settings not mandatory, they have defaults. > > > -----Original message----- > > From:Erwin Gunadi <[email protected]> > > Sent: Monday 10th February 2014 13:05 > > To: [email protected] > > Subject: Question about fetch interval value > > > > Hi, > > > > > > > > I have a question the behavior of using AdaptiveFetchSchedule in > combination > > of "db.fetch.interval.default". > > > > I know that one should configure: > > > > - db.fetch.schedule.adaptive.min_interval > > > > - db.fetch.schedule.adaptive.max_interval > > > > In order to use AdaptiveFetchSchedule. > > > > > > > > But I've been having strange behavior during crawling, because it always > > tried to re-fetch with the value of "db.fetch.interval.default". > > > > > > > > Thank you for your help. > > > > > > > > Best Regards > > > > Erwin > > > > >

