Hi, I just crawl a single URL so no whole web crawling. So I do option 2, fetching, invertlinks successfully. This is just Nutch 1.x Then I do Indexing into Apache Solr so go to section Setup Solr for search. First thing that does not work: cd ${APACHE_SOLR_HOME}/example java -jar start.jar No start.jar at the specified location, but no problem you start Solr 6.6.0 with bin/solr start. Then the tutorial says: Backup the original Solr example schema.xml: mv ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml.org
But in current Solr, 6.6.0, there is no schema.xml file. In the whole distribution. What should I do here? if I go directly to run the Solr Index command from ${NUTCH_RUNTIME_HOME}: bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/ which may not make sense since I have skipped some steps, it crashes: The input path at segments is not a segment... skipping Indexer: java.lang.RuntimeException: Missing elastic.cluster and elastic.host. At least one of them should be set in nutch-site.xml ElasticIndexWriter elastic.cluster : elastic prefix cluster elastic.host : hostname elastic.port : port Clearly there is some missing configuration in nutch-site.xml, apart from setting http.agent.name in nutch-site.xml (mentioned) other fields need to be set up. The segments message above is also troubling. If you follow the steps (if they worked) should we run bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/ (this is the last step in Integrate Solr with Nutch) and then bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize -deleteGone (this is one of the steps of Using Individual Commands for Whole-Web Crawling, which in fact also is the section to read if you are only crawling a URL. This is what I found by following the tutorial at https://wiki.apache.org/nutch/NutchTutorial On 7/9/17, lewis john mcgibbney <lewi...@apache.org> wrote: > Hi Pau, > > On Sat, Jul 8, 2017 at 6:52 AM, <user-digest-h...@nutch.apache.org> wrote: > >> From: Pau Paches <sp.exstream.t...@gmail.com> >> To: user@nutch.apache.org >> Cc: >> Bcc: >> Date: Sat, 8 Jul 2017 15:52:46 +0200 >> Subject: nutch 1.x tutorial with solr 6.6.0 >> Hi, >> I have run the Nutch 1.x Tutorial with Solr 6.6.0. >> Many things do not work, > > > What does not work? Can you elaborate? > > >> there is a mismatch between the assumed Solr >> version and the current Solr version. >> > > We support Solr as an indexing backend in the broadest sense possible. We > do not aim to support the latest and greatest Solr version available. If > you are interested in upgrading to a particular version, if you could open > a JIRA issue and provide a pull request it would be excellent. > > >> I have seen some messages about the same problem for Solr 4.x >> Is this the right path to go or should I move to Nutch 2.x? > > > If you are new to Nutch, I would highly advise that you stick with 1.X > > >> Does it >> make sense to use Solr 6.6 with Nutch 1.x? > > > Yes... you _may_ have a few configuration options to tweak but there have > been no backwards incompatibility issues so I see no reason for anything to > be broken. > > >> If yes, I'm willing to >> amend the tutorial if someone helps. >> >> > What is broken? Can you elaborate? >