Hi Yossi and BlackIce, many thanks for your tips. However, a tutorial needs to be self-contained, or at least link to the documentation/tutorial on how to configure the parts it uses.
On Tue, Jul 11, 2017 at 1:39 PM BlackIce <blackice...@gmail.com> wrote: > I think by default the newer SOLR starts in "schemaless" mode.. One neds to > create a config directory with ALL necessary configuration files like > schema and solar.conf BEFORE creating the collection and then run a command > to create this collection using this conf directory. I don't have access to > my nutch set-up at this moment, so I can't check.. but this was explained > in the SOLR docs. > > On Tue, Jul 11, 2017 at 12:58 PM, Yossi Tamari <yossi.tam...@pipl.com> > wrote: > > > I struggled with this as well. Eventually I moved to ElasticSearch, which > > is much easier. > > > > What I did manage to find out, is that in newer versions of SOLR you need > > to use ZooKeeper to update the conf file. see > https://stackoverflow.com/a/ > > 43351358. > > > > -----Original Message----- > > From: Pau Paches [mailto:sp.exstream.t...@gmail.com] > > Sent: 11 July 2017 13:29 > > To: user@nutch.apache.org > > Subject: Re: nutch 1.x tutorial with solr 6.6.0 > > > > Hi, > > I just crawl a single URL so no whole web crawling. > > So I do option 2, fetching, invertlinks successfully. This is just Nutch > > 1.x Then I do Indexing into Apache Solr so go to section Setup Solr for > > search. > > First thing that does not work: > > cd ${APACHE_SOLR_HOME}/example > > java -jar start.jar > > No start.jar at the specified location, but no problem you start Solr > > 6.6.0 with bin/solr start. > > Then the tutorial says: > > Backup the original Solr example schema.xml: > > mv ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml > > ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml.org > > > > But in current Solr, 6.6.0, there is no schema.xml file. In the whole > > distribution. What should I do here? > > if I go directly to run the Solr Index command from > ${NUTCH_RUNTIME_HOME}: > > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb > > crawl/linkdb crawl/segments/ which may not make sense since I have > skipped > > some steps, it crashes: > > The input path at segments is not a segment... skipping > > Indexer: java.lang.RuntimeException: Missing elastic.cluster and > > elastic.host. At least one of them should be set in nutch-site.xml > > ElasticIndexWriter > > elastic.cluster : elastic prefix cluster > > elastic.host : hostname > > elastic.port : port > > > > Clearly there is some missing configuration in nutch-site.xml, apart from > > setting http.agent.name in nutch-site.xml (mentioned) other fields need > > to be set up. The segments message above is also troubling. > > > > If you follow the steps (if they worked) should we run bin/nutch > solrindex > > http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb > > crawl/segments/ (this is the last step in Integrate Solr with Nutch) and > > then > > > > bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb > > crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize > -deleteGone > > (this is one of the steps of Using Individual Commands for Whole-Web > > Crawling, which in fact also is the section to read if you are only > > crawling a URL. > > > > This is what I found by following the tutorial at > > https://wiki.apache.org/nutch/NutchTutorial > > > > On 7/9/17, lewis john mcgibbney <lewi...@apache.org> wrote: > > > Hi Pau, > > > > > > On Sat, Jul 8, 2017 at 6:52 AM, <user-digest-h...@nutch.apache.org> > > wrote: > > > > > >> From: Pau Paches <sp.exstream.t...@gmail.com> > > >> To: user@nutch.apache.org > > >> Cc: > > >> Bcc: > > >> Date: Sat, 8 Jul 2017 15:52:46 +0200 > > >> Subject: nutch 1.x tutorial with solr 6.6.0 Hi, I have run the Nutch > > >> 1.x Tutorial with Solr 6.6.0. > > >> Many things do not work, > > > > > > > > > What does not work? Can you elaborate? > > > > > > > > >> there is a mismatch between the assumed Solr > > >> version and the current Solr version. > > >> > > > > > > We support Solr as an indexing backend in the broadest sense possible. > We > > > do not aim to support the latest and greatest Solr version available. > If > > > you are interested in upgrading to a particular version, if you could > > open > > > a JIRA issue and provide a pull request it would be excellent. > > > > > > > > >> I have seen some messages about the same problem for Solr 4.x > > >> Is this the right path to go or should I move to Nutch 2.x? > > > > > > > > > If you are new to Nutch, I would highly advise that you stick with 1.X > > > > > > > > >> Does it > > >> make sense to use Solr 6.6 with Nutch 1.x? > > > > > > > > > Yes... you _may_ have a few configuration options to tweak but there > have > > > been no backwards incompatibility issues so I see no reason for > anything > > to > > > be broken. > > > > > > > > >> If yes, I'm willing to > > >> amend the tutorial if someone helps. > > >> > > >> > > > What is broken? Can you elaborate? > > > > > > > >