Greetings!
This may be a Nutch question and if so, I will repost to the Nutch list.
I can run the following commands with Solr-3.5.0/Nutch-1.4:
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
then:
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
crawl/linkdb crawl/segments/*
successfully.
But, if I run:
bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
It fails with the following messages:
SolrIndexer: starting at 2011-12-11 14:01:27
Adding 11 documents
SolrIndexer: finished at 2011-12-11 14:01:28, elapsed: 00:00:01
SolrDeleteDuplicates: starting at 2011-12-11 14:01:28
SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
I am running on Ubuntu 10.10 with 12 GB of memory, Java version 1.6.0_26.
I can delete the crawl directory and replicate this error consistently.
Suggestions?
Other than "...use the way that doesn't fail." ;-)
I am concerned that a different invocation of Solr failing consistently
represents something that may cause trouble elsewhere when least
expected. (And hard to isolate as the problem.)
Thanks!
Hope everyone is having a great weekend!
Patrick
PS: From the hadoop log (when it fails) if that's helpful:
2011-12-11 15:21:51,436 INFO solr.SolrWriter - Adding 11 documents
2011-12-11 15:21:52,250 INFO solr.SolrIndexer - SolrIndexer: finished at
2011-12-11 15:21:52, elapsed: 00:00:01
2011-12-11 15:21:52,251 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates:
starting at 2011-12-11 15:21:52
2011-12-11 15:21:52,251 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates:
Solr url: http://localhost:8983/solr/
2011-12-11 15:21:52,330 WARN mapred.LocalJobRunner - job_local_0020
java.lang.NullPointerException
at org.apache.hadoop.io.Text.encode(Text.java:388)
at org.apache.hadoop.io.Text.set(Text.java:178)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
--
Patrick Durusau
patr...@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
OASIS Technical Advisory Board (TAB) - member
Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau