Re: Nutch-1.3 + Solr 3.3.0 = fail

Way Cool Mon, 01 Aug 2011 12:45:31 -0700

Did you restart Solr after you copied the schema.xml from nutch to Solr?

If you want, you can look at the tutorial I put together as though I did not
use Hadoop. Here are the urls:
http://thetechietutorials.blogspot.com/2011/06/solr-and-nutch-integration.html
http://thetechietutorials.blogspot.com/2011/06/setup-apache-nutch-13-to-crawl-web.html


If you want to setup Solr so that you can change how Solr browse looks for
nutch data, you can look at:
http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
http://thetechietutorials.blogspot.com/2011/07/customized-solr-browser-interface-for.html

Please let me know if it did not work.

Have fun.

On Mon, Aug 1, 2011 at 1:07 PM, Jerry E. Craig, Jr.
<[email protected]>wrote:

> What are you using for your Crawl Command Line?  I remember trying to get
> mine to work and there was a line that wasn't very clear in the Tutorial.
>
> bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
>
> where you had to include where the -solr location was for it to index the
> files.  If they are working separately then I would guess it's somewhere in
> the connection and this was my problem.
>
> Jerry E. Craig, Jr.
>
>
> -----Original Message-----
> From: John R. Brinkema [mailto:[email protected]]
> Sent: Monday, August 01, 2011 11:46 AM
> To: [email protected]
> Subject: Nutch-1.3 + Solr 3.3.0 = fail
>
> Friends,
>
> I am having the worst time getting nutch and solr to play together nicely.
>
> I downloaded and installed the current binaries for both nutch and solr.  I
> edited the nutch-site.xml file to include:
>
> <property>
> <name>http.agent.name</name>
> <value>Solr/Nutch Search</value>
> </property>
> <property>
> <name>plugin.includes</name>
> <value>protocol-http|urlfilter-regex|parse-(text|html|tika)|
> index-basic|query-(basic|stemmer|site|url)|summary-basic|scoring-opic|
> urlnormalizer-(pass|regex|basic)</value>
> </property>
> <property>
> <name>http.content.limit</name>
> <value>65536</value>
> </property>
> <property>
> <name>searcher.dir</name>
> <value>/opt/SolrSearch</value>
> </property>
>
>
> I installed them and tested them according to each of their respective
> tutorials; in other words I believe each is working, separately.  I crawled
> a url and the 'readdb -stats' report shows that I have successfully
> collected some links.  Most of the links are to '.pdf' files.
>
> I followed the instructions to link nutch and solr; e.g. copy the nutch
> schema to become the solr schema.
>
> When I run the bin/nutch solrindex ... command I get the following error:
>
> java.io.IOException: Job failed!
>
> When I look in the log/hadoop.log file I see:
>
> 2011-08-01 13:10:00,086 INFO  solr.SolrMappingReader - source: content
> dest: content
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: site
> dest: site
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: title
> dest: title
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: host
> dest: host
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: segment
> dest: segment
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: boost
> dest: boost
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: digest
> dest: digest
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: tstamp
> dest: tstamp
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: url dest: id
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2011-08-01 13:10:00,537 WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Document [null] missing required
> field: id
>
> Document [null] missing required field: id
>
> request: http://localhost:8983/solr/update?wt=javabin&version=2
>         at
>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
>         at
>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>         at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>         at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
>         at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82)
>         at
>
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
>         at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2011-08-01 13:10:01,050 ERROR solr.SolrIndexer - java.io.IOException:
> Job failed!
>
> The same error appears in the solr log.
>
> I have tried the 'sync solrj libraries' fix; that is, I copied
> apache-solr-solrj-3.3.0.jar from the solr lib to the nutch lib with no
> effect.  Since I am running binaries, I, of course, did not run ant job.  Is
> that the magic?
>
> Any suggestions?
>
>
>
>
>
>
>

Re: Nutch-1.3 + Solr 3.3.0 = fail

Reply via email to