On May 10, 2012, at 1:42 AM, Markus Jelsma wrote:

> Hi,
> 
> On Thu, 10 May 2012 09:10:04 +0300, Tolga <[email protected]> wrote:
>> Hi,
>> 
>> This will sound like a duplicate, but actually it differs from the
>> other one. Please bear with me. Following
>> http://wiki.apache.org/nutch/NutchTutorial, I first issued the command
>> 
>> bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
>> 
>> Then when I got the message
>> 
>> Exception in thread "main" java.io.IOException: Job failed!
>>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>>    at
>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
>>    at
>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
>>    at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>    at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
> 
> Please include the relevant part of the log. This can be a known issue.
> 
>> 
>> I issued the commands
>> 
>> bin/nutch crawl urls -dir crawl -depth 3 -topN 5
>> 
>> and
>> 
>> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawldb -linkdb
>> crawldb/linkdb crawldb/segments/*
>> 
>> separately, after which I got no errors. When I browsed to
>> http://localhost:8983/solr/admin and attempted a search, I got the
>> error
>> 
>> 
>>   HTTP ERROR 400
>> 
>> Problem accessing /solr/select. Reason:
>> 
>>    undefined field text
> 
> But this is a Solr thing, you have no field named text. Resolve this in Solr 
> or on the Solr mailing list.


I will say that I had similar issues last week when I tried the Nutch tutorial. 
 I went to the #Solr IRC channel and got no response.  The quick answer was 
that I had to go back to Solr version 3.1.0 for the instructions in the Nutch 
tutorial to work.

The longer answer is that following the existing Nutch tutorial gave me two 
errors.

1) SolrDeleteDuplicates exception as mentioned by Tolga above.

To fix this I:

1.a) Stop Solr.
1.b) Delete Solr index.
1.c) Copy the Nutch-provided schema.xml into the proper Solr directory 
(example/solr/conf/).
1.d) Replace Nutch's solr-solrj-xxx.jar with the appropriate version from Solr:
       ( solr/dist/apache-solr-solrj-xxx.jar  --> 
nutch/runtime/local/lib/solr-solrj-xxx.jar )
1.e) Restart Solr.

The first two steps may only be necessary if you had Solr running already using 
the default schema that they provided as I did because I had done the Solr 
tutorial first.

2) The HTTP 400 Error "undefined field text" issue.

This appears to be the same as: 
https://issues.apache.org/jira/browse/SOLR-3416.  Log output from Solr output 
is here: http://pastebin.com/YWdPnXpv and the Nutch provided schema is here: 
http://pastebin.com/LQDDKC5B

The only way I got this working was to move Solr from version 3.6.0 back to 
version 3.1.0.

I'm *totally* new to Solr/Nutch, but I might suggest a versioning mismatch?


Regards,
--mike

Michael Erickson
[email protected]


Reply via email to