On May 10, 2012, at 1:42 AM, Markus Jelsma wrote:
> Hi,
>
> On Thu, 10 May 2012 09:10:04 +0300, Tolga <[email protected]> wrote:
>> Hi,
>>
>> This will sound like a duplicate, but actually it differs from the
>> other one. Please bear with me. Following
>> http://wiki.apache.org/nutch/NutchTutorial, I first issued the command
>>
>> bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
>>
>> Then when I got the message
>>
>> Exception in thread "main" java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>> at
>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
>> at
>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
>> at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>
> Please include the relevant part of the log. This can be a known issue.
>
>>
>> I issued the commands
>>
>> bin/nutch crawl urls -dir crawl -depth 3 -topN 5
>>
>> and
>>
>> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawldb -linkdb
>> crawldb/linkdb crawldb/segments/*
>>
>> separately, after which I got no errors. When I browsed to
>> http://localhost:8983/solr/admin and attempted a search, I got the
>> error
>>
>>
>> HTTP ERROR 400
>>
>> Problem accessing /solr/select. Reason:
>>
>> undefined field text
>
> But this is a Solr thing, you have no field named text. Resolve this in Solr
> or on the Solr mailing list.
I will say that I had similar issues last week when I tried the Nutch tutorial.
I went to the #Solr IRC channel and got no response. The quick answer was
that I had to go back to Solr version 3.1.0 for the instructions in the Nutch
tutorial to work.
The longer answer is that following the existing Nutch tutorial gave me two
errors.
1) SolrDeleteDuplicates exception as mentioned by Tolga above.
To fix this I:
1.a) Stop Solr.
1.b) Delete Solr index.
1.c) Copy the Nutch-provided schema.xml into the proper Solr directory
(example/solr/conf/).
1.d) Replace Nutch's solr-solrj-xxx.jar with the appropriate version from Solr:
( solr/dist/apache-solr-solrj-xxx.jar -->
nutch/runtime/local/lib/solr-solrj-xxx.jar )
1.e) Restart Solr.
The first two steps may only be necessary if you had Solr running already using
the default schema that they provided as I did because I had done the Solr
tutorial first.
2) The HTTP 400 Error "undefined field text" issue.
This appears to be the same as:
https://issues.apache.org/jira/browse/SOLR-3416. Log output from Solr output
is here: http://pastebin.com/YWdPnXpv and the Nutch provided schema is here:
http://pastebin.com/LQDDKC5B
The only way I got this working was to move Solr from version 3.6.0 back to
version 3.1.0.
I'm *totally* new to Solr/Nutch, but I might suggest a versioning mismatch?
Regards,
--mike
Michael Erickson
[email protected]