Actually, I do get something in the hadoop log:

java.lang.Exception:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr: Expected mime type
application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/update. Reason:
<pre>    Not Found</pre></p><hr><i><small>Powered by
Jetty://</small></i><hr/>

</body>
</html>

        at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr: Expected mime type
application/octet-stream but got text/html. <html>

Googling finds suggestions that /solr/update is the wrong url, that it
needs to include the nutch core. But, where does that url need to be
configured? I don't believe it's on the command line.

On Fri, Oct 6, 2017 at 5:28 PM, Sol Lederman <sol.leder...@gmail.com> wrote:

> Hi,
>
> I've got Nutch 1.13 and Solr 5.5.0. When I try to index some documents I
> get an error:
>
> % bin/nutch index -D solr.server.url=http://localhost:8983/solr
> crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20170910201610/ -filter
> -normalize -deleteGone
>
> Indexing 20/20 documents
> Deleting 0 documents
> Indexing 20/20 documents
> Deleting 0 documents
> Indexer: java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
>         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.
> java:147)
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
>
> I found an article on StackOverflow that suggested comparing the fields in
> schema.xml. So, I compared
> /home/me/apache-solr/solr-5.5.0/server/solr/configsets/nutch/conf/schema.xml
> and
> /home/me/apache-nutch/apache-nutch-1.13/conf/schema.xml
>
> There are no differences in fields.And, there is not any more info in the
> Nutch log.
>
> How can I debug this?
>
> Thanks!
> Sol
>

Reply via email to