Re: Run Nutch Crawl in Eclipse

Andy Xue Tue, 10 Apr 2012 23:42:27 -0700

Hi Lewis:

Thank you for the help. This is the (entire) output after I set the log4j
property to debug.
==============================================================
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 2
solrUrl=http://localhost:8983/solr/
topN = 10
Injector: starting at 2012-04-11 16:37:20
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
==============================================================


And btw, The "urls" directory is correct and it does contain a txt file of
a list of urls.

Regards
Andy


On 10 April 2012 22:08, Lewis John Mcgibbney <[email protected]>wrote:

> There is no more log information before the solrUrl stuff, no?
>
> try setting log4j.properties to debug in conf/ rebuild the project and see
> whats going on.
>
> On Tue, Apr 10, 2012 at 1:03 PM, Andy Xue <[email protected]> wrote:
>
> > Lewis:
> > Thanks for the reply.
> > However as far as I know, I don't have to set solrUrl unless I want to
> > index using solr.
> >
> > Correct. My fault. I just assumed that this was required.
>
> Lewis
>

Re: Run Nutch Crawl in Eclipse

Reply via email to