I'm starting with nutch and I ran a simple job as described in the
nutch tutorial. After a while I get the following error:
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-07-12 12:32:03, elapsed: 00:00:03
LinkDb: starting at
Actually I'm not shure if I look at the right log lines. Please
explain in more detail for what exactly I should look for. Anyway I
found the following line just before the error:
Error parsing: http://eu.apachecon.com/js/jquery.akslideshow.js:
failed(2,0): Can't retrieve Tika parser for
Actually I'm not shure if I look at the right log lines. Please
explain in more detail for what exactly I should look for. Anyway I
found the following line just before the error:
Error parsing: http://eu.apachecon.com/js/jquery.akslideshow.js:
failed(2,0): Can't retrieve Tika parser for
I'm not if I did understand you correct. Here is the complete output
of my crawl:
tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled
-dir /Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50
solrUrl is not set, indexing will be skipped...
crawl started in:
I don't see this segment 20110712114256 being parsed.
On Tuesday 12 July 2011 13:38:35 Paul van Hoven wrote:
I'm not if I did understand you correct. Here is the complete output
of my crawl:
tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled
-dir
Okay, and what does that mean? How can I repair the error?
2011/7/12 Markus Jelsma markus.jel...@openindex.io:
I don't see this segment 20110712114256 being parsed.
On Tuesday 12 July 2011 13:38:35 Paul van Hoven wrote:
I'm not if I did understand you correct. Here is the complete output
of
Fro mn the looks of it you need to parse all segments before indexing
attempting to index them.
As Markus has pointed out, the specific segment hasn't been parsed. Try
parsing as per the following link
http://wiki.apache.org/nutch/bin/nutch_parse
On Tue, Jul 12, 2011 at 1:50 PM, Paul van Hoven
7 matches
Mail list logo