Hi Yossi and BlackIce,
many thanks for your tips. However, a tutorial needs to be self-contained,
or at least link to the documentation/tutorial on how to configure the
parts it uses.


On Tue, Jul 11, 2017 at 1:39 PM BlackIce <blackice...@gmail.com> wrote:

> I think by default the newer SOLR starts in "schemaless" mode.. One neds to
> create a config directory with ALL necessary configuration files like
> schema and solar.conf BEFORE creating the collection and then run a command
> to create this collection using this conf directory. I don't have access to
> my nutch set-up at this moment, so I can't check.. but this was explained
> in the SOLR docs.
>
> On Tue, Jul 11, 2017 at 12:58 PM, Yossi Tamari <yossi.tam...@pipl.com>
> wrote:
>
> > I struggled with this as well. Eventually I moved to ElasticSearch, which
> > is much easier.
> >
> > What I did manage to find out, is that in newer versions of SOLR you need
> > to use ZooKeeper to update the conf file. see
> https://stackoverflow.com/a/
> > 43351358.
> >
> > -----Original Message-----
> > From: Pau Paches [mailto:sp.exstream.t...@gmail.com]
> > Sent: 11 July 2017 13:29
> > To: user@nutch.apache.org
> > Subject: Re: nutch 1.x tutorial with solr 6.6.0
> >
> > Hi,
> > I just crawl a single URL so no whole web crawling.
> > So I do option 2, fetching, invertlinks successfully. This is just Nutch
> > 1.x Then I do Indexing into Apache Solr so go to section Setup Solr for
> > search.
> > First thing that does not work:
> > cd ${APACHE_SOLR_HOME}/example
> > java -jar start.jar
> > No start.jar at the specified location, but no problem you start Solr
> > 6.6.0 with bin/solr start.
> > Then the tutorial says:
> > Backup the original Solr example schema.xml:
> > mv ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml
> > ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml.org
> >
> > But in current Solr, 6.6.0, there is no schema.xml file. In the whole
> > distribution. What should I do here?
> > if I go directly to run the Solr Index command from
> ${NUTCH_RUNTIME_HOME}:
> > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
> > crawl/linkdb crawl/segments/ which may not make sense since I have
> skipped
> > some steps, it crashes:
> > The input path at segments is not a segment... skipping
> > Indexer: java.lang.RuntimeException: Missing elastic.cluster and
> > elastic.host. At least one of them should be set in nutch-site.xml
> > ElasticIndexWriter
> >         elastic.cluster : elastic prefix cluster
> >         elastic.host : hostname
> >         elastic.port : port
> >
> > Clearly there is some missing configuration in nutch-site.xml, apart from
> > setting http.agent.name in nutch-site.xml (mentioned) other fields need
> > to be set up. The segments message above is also troubling.
> >
> > If you follow the steps (if they worked) should we run bin/nutch
> solrindex
> > http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb
> > crawl/segments/ (this is the last step in Integrate Solr with Nutch) and
> > then
> >
> > bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb
> > crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize
> -deleteGone
> > (this is one of the steps of Using Individual Commands for Whole-Web
> > Crawling, which in fact also is the section to read if you are only
> > crawling a URL.
> >
> > This is what I found by following the tutorial at
> > https://wiki.apache.org/nutch/NutchTutorial
> >
> > On 7/9/17, lewis john mcgibbney <lewi...@apache.org> wrote:
> > > Hi Pau,
> > >
> > > On Sat, Jul 8, 2017 at 6:52 AM, <user-digest-h...@nutch.apache.org>
> > wrote:
> > >
> > >> From: Pau Paches <sp.exstream.t...@gmail.com>
> > >> To: user@nutch.apache.org
> > >> Cc:
> > >> Bcc:
> > >> Date: Sat, 8 Jul 2017 15:52:46 +0200
> > >> Subject: nutch 1.x tutorial with solr 6.6.0 Hi, I have run the Nutch
> > >> 1.x Tutorial with Solr 6.6.0.
> > >> Many things do not work,
> > >
> > >
> > > What does not work? Can you elaborate?
> > >
> > >
> > >> there is a mismatch between the assumed Solr
> > >> version and the current Solr version.
> > >>
> > >
> > > We support Solr as an indexing backend in the broadest sense possible.
> We
> > > do not aim to support the latest and greatest Solr version available.
> If
> > > you are interested in upgrading to a particular version, if you could
> > open
> > > a JIRA issue and provide a pull request it would be excellent.
> > >
> > >
> > >> I have seen some messages about the same problem for Solr 4.x
> > >> Is this the right path to go or should I move to Nutch 2.x?
> > >
> > >
> > > If you are new to Nutch, I would highly advise that you stick with 1.X
> > >
> > >
> > >> Does it
> > >> make sense to use Solr 6.6 with Nutch 1.x?
> > >
> > >
> > > Yes... you _may_ have a few configuration options to tweak but there
> have
> > > been no backwards incompatibility issues so I see no reason for
> anything
> > to
> > > be broken.
> > >
> > >
> > >> If yes, I'm willing to
> > >> amend the tutorial if someone helps.
> > >>
> > >>
> > > What is broken? Can you elaborate?
> > >
> >
> >
>

Reply via email to