I think by default the newer SOLR starts in "schemaless" mode.. One neds to create a config directory with ALL necessary configuration files like schema and solar.conf BEFORE creating the collection and then run a command to create this collection using this conf directory. I don't have access to my nutch set-up at this moment, so I can't check.. but this was explained in the SOLR docs.
On Tue, Jul 11, 2017 at 12:58 PM, Yossi Tamari <yossi.tam...@pipl.com> wrote: > I struggled with this as well. Eventually I moved to ElasticSearch, which > is much easier. > > What I did manage to find out, is that in newer versions of SOLR you need > to use ZooKeeper to update the conf file. see https://stackoverflow.com/a/ > 43351358. > > -----Original Message----- > From: Pau Paches [mailto:sp.exstream.t...@gmail.com] > Sent: 11 July 2017 13:29 > To: user@nutch.apache.org > Subject: Re: nutch 1.x tutorial with solr 6.6.0 > > Hi, > I just crawl a single URL so no whole web crawling. > So I do option 2, fetching, invertlinks successfully. This is just Nutch > 1.x Then I do Indexing into Apache Solr so go to section Setup Solr for > search. > First thing that does not work: > cd ${APACHE_SOLR_HOME}/example > java -jar start.jar > No start.jar at the specified location, but no problem you start Solr > 6.6.0 with bin/solr start. > Then the tutorial says: > Backup the original Solr example schema.xml: > mv ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml > ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml.org > > But in current Solr, 6.6.0, there is no schema.xml file. In the whole > distribution. What should I do here? > if I go directly to run the Solr Index command from ${NUTCH_RUNTIME_HOME}: > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb > crawl/linkdb crawl/segments/ which may not make sense since I have skipped > some steps, it crashes: > The input path at segments is not a segment... skipping > Indexer: java.lang.RuntimeException: Missing elastic.cluster and > elastic.host. At least one of them should be set in nutch-site.xml > ElasticIndexWriter > elastic.cluster : elastic prefix cluster > elastic.host : hostname > elastic.port : port > > Clearly there is some missing configuration in nutch-site.xml, apart from > setting http.agent.name in nutch-site.xml (mentioned) other fields need > to be set up. The segments message above is also troubling. > > If you follow the steps (if they worked) should we run bin/nutch solrindex > http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb > crawl/segments/ (this is the last step in Integrate Solr with Nutch) and > then > > bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb > crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize -deleteGone > (this is one of the steps of Using Individual Commands for Whole-Web > Crawling, which in fact also is the section to read if you are only > crawling a URL. > > This is what I found by following the tutorial at > https://wiki.apache.org/nutch/NutchTutorial > > On 7/9/17, lewis john mcgibbney <lewi...@apache.org> wrote: > > Hi Pau, > > > > On Sat, Jul 8, 2017 at 6:52 AM, <user-digest-h...@nutch.apache.org> > wrote: > > > >> From: Pau Paches <sp.exstream.t...@gmail.com> > >> To: user@nutch.apache.org > >> Cc: > >> Bcc: > >> Date: Sat, 8 Jul 2017 15:52:46 +0200 > >> Subject: nutch 1.x tutorial with solr 6.6.0 Hi, I have run the Nutch > >> 1.x Tutorial with Solr 6.6.0. > >> Many things do not work, > > > > > > What does not work? Can you elaborate? > > > > > >> there is a mismatch between the assumed Solr > >> version and the current Solr version. > >> > > > > We support Solr as an indexing backend in the broadest sense possible. We > > do not aim to support the latest and greatest Solr version available. If > > you are interested in upgrading to a particular version, if you could > open > > a JIRA issue and provide a pull request it would be excellent. > > > > > >> I have seen some messages about the same problem for Solr 4.x > >> Is this the right path to go or should I move to Nutch 2.x? > > > > > > If you are new to Nutch, I would highly advise that you stick with 1.X > > > > > >> Does it > >> make sense to use Solr 6.6 with Nutch 1.x? > > > > > > Yes... you _may_ have a few configuration options to tweak but there have > > been no backwards incompatibility issues so I see no reason for anything > to > > be broken. > > > > > >> If yes, I'm willing to > >> amend the tutorial if someone helps. > >> > >> > > What is broken? Can you elaborate? > > > >