Hi
I did not found the freegen tool for nutch 2.x. What should I do
Thanks
On Thursday 25 February 2016 12:24 PM, harsh wrote:
Dear Markus
Thanks for your Help.I hope it will solve my problem.Thanks a lot.
On Wednesday 24 February 2016 06:12 PM, Markus Jelsma wrote:
Ah forget about it, you are on 2.x i read in the next message. But i
think it also has a freegen tool.
Markus
-----Original message-----
From:Markus Jelsma <markus.jel...@openindex.io>
Sent: Wednesday 24th February 2016 13:41
To: user@nutch.apache.org
Subject: RE: recrawling of specific URLS
Hi - easiest method is to use the freegen tool. But if you really
want homepages, not just domain roots, you can use the hostdb with
freegen for it.
# Update the hostdb
bin/nutch updatehostdb -hostdb crawl/hostdb -crawldb crawl/crawldb/
# Get list of homepages for each host
bin/nutch readhostdb crawl/hostdb/ output -dumpHomepages
Then use freegen.
Markus
-----Original message-----
From:harsh <harsh.sha...@orkash.com>
Sent: Wednesday 24th February 2016 12:49
To: user@nutch.apache.org
Subject: recrawling of specific URLS
Hi All
Nutch is made to update ALL the URLs after a certain point of time.
But I want to recrawl only the home page of seed URL so that i
could get
new link from the home page to crawl.
Currently I am using the bug "Inject command re-inject seed URLS."
for
recrawling my seed URLs.But this is not the standard way.
Please give a suggestion.I have read articles/discussions on
re-crawling.But could not find the solution.
Lewis,Tejas Please help!!!!!
Thanks