Re: Nutch-1.3 + Solr 3.3.0 = fail

John R. Brinkema Wed, 24 Aug 2011 09:53:03 -0700

Markus,

What do you mean by "update the schema version"? Nutch's or Solr's?And are we talking about simple copies or line-by-line merges? And whatabout the schema copy specified in the RunningNutchAndSolr tutorial?


This sounds like the answer, I just don't know enough to do it.  tnx.

On 8/8/2011 8:04 PM, Markus Jelsma wrote:

3.3 will work perfectly as there are no changes the the javabin format.
However, one should update the schema version to reflect recent changes in
branch 3.4-dev. It's likely this branch version is released earlier than Nutch
1.4 that should be compatible with the most recent stable Solr release.

Glad it worked for you on Solr 3.2. I did try Nutch 1.3 and Solr 3.3,
however I did not update my blog yet with Solr 3.3. ;-)

have fun!

On Mon, Aug 8, 2011 at 1:57 PM, John R. Brinkema

<[email protected]>wrote:

On 8/2/2011 11:21 PM, Way Cool wrote:

Try changing uniqueKey from id to url as below under in schema.xml and
restart Solr:
<uniqueKey>url</uniqueKey>

If that still did not work, that means you are having an empty url. We
can fix that.


On Mon, Aug 1, 2011 at 12:45 PM, John R. Brinkema<brinkema@teo.**
uscourts.gov<[email protected]>

wrote:
Friends,

I am having the worst time getting nutch and solr to play together
nicely.

I downloaded and installed the current binaries for both nutch and
solr.

  I

edited the nutch-site.xml file to include:

<property>
<name>http.agent.name</name>
<value>Solr/Nutch Search</value>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-http|****urlfilter-regex|parse-(text|****html|tika)|
index-basic|query-(basic|****stemmer|site|url)|summary-****
basic|scoring-opic|
urlnormalizer-(pass|regex|****basic)</value>
</property>
<property>
<name>http.content.limit</****name>
<value>65536</value>
</property>
<property>
<name>searcher.dir</name>
<value>/opt/SolrSearch</value>
</property>


I installed them and tested them according to each of their respective
tutorials; in other words I believe each is working, separately.  I
crawled
a url and the 'readdb -stats' report shows that I have successfully
collected some links.  Most of the links are to '.pdf' files.

I followed the instructions to link nutch and solr; e.g. copy the nutch
schema to become the solr schema.

When I run the bin/nutch solrindex ... command I get the following
error:

java.io.IOException: Job failed!

When I look in the log/hadoop.log file I see:

2011-08-01 13:10:00,086 INFO  solr.SolrMappingReader - source: content
dest: content
2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: site
dest: site
2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: title
dest:
title
2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: host
dest: host
2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: segment
dest: segment
2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: boost
dest:
boost
2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: digest
dest:
digest
2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: tstamp
dest:
tstamp
2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: url
dest: id
2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: url
dest: url
2011-08-01 13:10:00,537 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.****SolrException: Document [null] missing
required
field: id

Document [null] missing required field: id

request:
http://localhost:8983/solr/****update?wt=javabin&version=2<http://loca
lhost:8983/solr/**update?wt=javabin&version=2>
<ht**tp://localhost:8983/solr/**update?wt=javabin&version=2<http://loc
alhost:8983/solr/update?wt=javabin&version=2>

        at
        org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.**

request(CommonsHttpSolrServer.****java:435)

        at
        org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.**

request(CommonsHttpSolrServer.****java:244)

        at org.apache.solr.client.solrj.****request.**

AbstractUpdateRequest.**
process(AbstractUpdateRequest.****java:105)

        at org.apache.solr.client.solrj.****SolrServer.add(SolrServer.**

java:49)

        at
        org.apache.nutch.indexer.solr.****SolrWriter.close(SolrWriter.

****
java:82)

        at org.apache.nutch.indexer.****IndexerOutputFormat$1.close(**

IndexerOutputFormat.java:48)

        at org.apache.hadoop.mapred.****ReduceTask.runOldReducer(**

ReduceTask.java:474)

        at org.apache.hadoop.mapred.****ReduceTask.run(ReduceTask.****

java:411)

        at org.apache.hadoop.mapred.****LocalJobRunner$Job.run(**

LocalJobRunner.java:216)
2011-08-01 13:10:01,050 ERROR solr.SolrIndexer - java.io.IOException:
Job failed!

The same error appears in the solr log.

I have tried the 'sync solrj libraries' fix; that is, I copied
apache-solr-solrj-3.3.0.jar from the solr lib to the nutch lib with no
effect.  Since I am running binaries, I, of course, did not run ant
job.

  Is

that the magic?

Any suggestions?

  Update from the trenches ....

I followed Way Cool's suggestion (now called  Dr. Cool since he has been
so helpful) of using Nutch 1.3 and Solr 3.2 ... which worked just fine.

I am off using this pair until a get a breather and then try Nutch 1.3
and Solr 3.3 again, this time with Dr. Cool's latest suggestion/

Thanks to all.  /jb

Re: Nutch-1.3 + Solr 3.3.0 = fail

Reply via email to