Hi,
I just crawl a single URL so no whole web crawling.
So I do option 2, fetching, invertlinks successfully. This is just Nutch 1.x
Then I do Indexing into Apache Solr so go to section Setup Solr for search.
First thing that does not work:
cd ${APACHE_SOLR_HOME}/example
java -jar start.jar
No start.jar at the specified location, but no problem you start Solr
6.6.0 with bin/solr start.
Then the tutorial says:
Backup the original Solr example schema.xml:
mv ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml
${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml.org

But in current Solr, 6.6.0, there is no schema.xml file. In the whole
distribution. What should I do here?
if I go directly to run the Solr Index command from ${NUTCH_RUNTIME_HOME}:
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
crawl/linkdb crawl/segments/
which may not make sense since I have skipped some steps, it crashes:
The input path at segments is not a segment... skipping
Indexer: java.lang.RuntimeException: Missing elastic.cluster and
elastic.host. At least one of them should be set in nutch-site.xml
ElasticIndexWriter
        elastic.cluster : elastic prefix cluster
        elastic.host : hostname
        elastic.port : port

Clearly there is some missing configuration in nutch-site.xml, apart
from setting http.agent.name in nutch-site.xml (mentioned) other
fields need to be set up. The segments message above is also
troubling.

If you follow the steps (if they worked) should we run
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
crawl/linkdb crawl/segments/
(this is the last step in Integrate Solr with Nutch) and then

bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb
crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize
-deleteGone
(this is one of the steps of Using Individual Commands for Whole-Web
Crawling, which in fact also is the section to read if you are only
crawling a URL.

This is what I found by following the tutorial at
https://wiki.apache.org/nutch/NutchTutorial

On 7/9/17, lewis john mcgibbney <lewi...@apache.org> wrote:
> Hi Pau,
>
> On Sat, Jul 8, 2017 at 6:52 AM, <user-digest-h...@nutch.apache.org> wrote:
>
>> From: Pau Paches <sp.exstream.t...@gmail.com>
>> To: user@nutch.apache.org
>> Cc:
>> Bcc:
>> Date: Sat, 8 Jul 2017 15:52:46 +0200
>> Subject: nutch 1.x tutorial with solr 6.6.0
>> Hi,
>> I have run the Nutch 1.x Tutorial with Solr 6.6.0.
>> Many things do not work,
>
>
> What does not work? Can you elaborate?
>
>
>> there is a mismatch between the assumed Solr
>> version and the current Solr version.
>>
>
> We support Solr as an indexing backend in the broadest sense possible. We
> do not aim to support the latest and greatest Solr version available. If
> you are interested in upgrading to a particular version, if you could open
> a JIRA issue and provide a pull request it would be excellent.
>
>
>> I have seen some messages about the same problem for Solr 4.x
>> Is this the right path to go or should I move to Nutch 2.x?
>
>
> If you are new to Nutch, I would highly advise that you stick with 1.X
>
>
>> Does it
>> make sense to use Solr 6.6 with Nutch 1.x?
>
>
> Yes... you _may_ have a few configuration options to tweak but there have
> been no backwards incompatibility issues so I see no reason for anything to
> be broken.
>
>
>> If yes, I'm willing to
>> amend the tutorial if someone helps.
>>
>>
> What is broken? Can you elaborate?
>

Reply via email to