It indeed seems to be caused by some sort of (schema) configuration issue. We are currently trying to resolve this issue. Although the browse page shows an error, it is still possible to search. Use: /solr/select/?q=content%3Athe
On Thu, May 10, 2012 at 3:45 PM, Markus Jelsma <[email protected]>wrote: > On Thursday 10 May 2012 14:35:03 Lewis John Mcgibbney wrote: > > Hi Michael, > > > > As I'm also not using most recent stable Solr distribution (3.6.0), I > > can only comment (maybe unwisely) that the most recent version of Solr > > that Nutch supports is maybe 3.4.0 as this is the dependency we pull > > with ivy. It also looks like Solr and Solrj are released in parallel > > so maybe try upgrading your solrj dependency if you wish to use Solr > > 3.6.0... > > This should not be a version issue. We happily index from trunk or 1.4 to > Solr > versions > 3.0. There must be some schema thing or bad Solr request handler > defined. > > > > > If the above is correct, then this is why 3.1.0 works fine when you > > roll back as I would imagine backwards compatibility is always of key > > importance. > > > > I would be pleased to know that the above is not correct and that > > Nutch is above to index to Solr 3.6.0, however if not then maybe we > > should upgrade accordingly in trunk. > > > > Thanks > > > > Lewis > > > > On Thu, May 10, 2012 at 1:56 PM, Michael Erickson > > > > <[email protected]> wrote: > > > On May 10, 2012, at 1:42 AM, Markus Jelsma wrote: > > >> Hi, > > >> > > >> On Thu, 10 May 2012 09:10:04 +0300, Tolga <[email protected]> wrote: > > >>> Hi, > > >>> > > >>> This will sound like a duplicate, but actually it differs from the > > >>> other one. Please bear with me. Following > > >>> http://wiki.apache.org/nutch/NutchTutorial, I first issued the > command > > >>> > > >>> bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 > -topN 5 > > >>> > > >>> Then when I got the message > > >>> > > >>> Exception in thread "main" java.io.IOException: Job failed! > > >>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > > >>> at > > >>> > org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli > > >>> cates.java:373) at > > >>> > org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli > > >>> cates.java:353) at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) > > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > >>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > > >> > > >> Please include the relevant part of the log. This can be a known > issue. > > >> > > >>> I issued the commands > > >>> > > >>> bin/nutch crawl urls -dir crawl -depth 3 -topN 5 > > >>> > > >>> and > > >>> > > >>> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawldb -linkdb > > >>> crawldb/linkdb crawldb/segments/* > > >>> > > >>> separately, after which I got no errors. When I browsed to > > >>> http://localhost:8983/solr/admin and attempted a search, I got the > > >>> error > > >>> > > >>> > > >>> HTTP ERROR 400 > > >>> > > >>> Problem accessing /solr/select. Reason: > > >>> > > >>> undefined field text > > >> > > >> But this is a Solr thing, you have no field named text. Resolve this > in > > >> Solr or on the Solr mailing list.> > > > I will say that I had similar issues last week when I tried the Nutch > > > tutorial. I went to the #Solr IRC channel and got no response. The > > > quick answer was that I had to go back to Solr version 3.1.0 for the > > > instructions in the Nutch tutorial to work. > > > > > > The longer answer is that following the existing Nutch tutorial gave me > > > two errors. > > > > > > 1) SolrDeleteDuplicates exception as mentioned by Tolga above. > > > > > > To fix this I: > > > > > > 1.a) Stop Solr. > > > 1.b) Delete Solr index. > > > 1.c) Copy the Nutch-provided schema.xml into the proper Solr directory > > > (example/solr/conf/). 1.d) Replace Nutch's solr-solrj-xxx.jar with the > > > appropriate version from Solr: ( solr/dist/apache-solr-solrj-xxx.jar > --> > > > nutch/runtime/local/lib/solr-solrj-xxx.jar ) 1.e) Restart Solr. > > > > > > The first two steps may only be necessary if you had Solr running > already > > > using the default schema that they provided as I did because I had done > > > the Solr tutorial first. > > > > > > 2) The HTTP 400 Error "undefined field text" issue. > > > > > > This appears to be the same as: > > > https://issues.apache.org/jira/browse/SOLR-3416. Log output from Solr > > > output is here: http://pastebin.com/YWdPnXpv and the Nutch provided > > > schema is here: http://pastebin.com/LQDDKC5B > > > > > > The only way I got this working was to move Solr from version 3.6.0 > back > > > to version 3.1.0. > > > > > > I'm *totally* new to Solr/Nutch, but I might suggest a versioning > > > mismatch? > > > > > > > > > Regards, > > > --mike > > > > > > Michael Erickson > > > [email protected] > -- > Markus Jelsma - CTO - Openindex > >

