I think by default the newer SOLR starts in "schemaless" mode.. One neds to
create a config directory with ALL necessary configuration files like
schema and solar.conf BEFORE creating the collection and then run a command
to create this collection using this conf directory. I don't have access to
my nutch set-up at this moment, so I can't check.. but this was explained
in the SOLR docs.

On Tue, Jul 11, 2017 at 12:58 PM, Yossi Tamari <yossi.tam...@pipl.com>
wrote:

> I struggled with this as well. Eventually I moved to ElasticSearch, which
> is much easier.
>
> What I did manage to find out, is that in newer versions of SOLR you need
> to use ZooKeeper to update the conf file. see https://stackoverflow.com/a/
> 43351358.
>
> -----Original Message-----
> From: Pau Paches [mailto:sp.exstream.t...@gmail.com]
> Sent: 11 July 2017 13:29
> To: user@nutch.apache.org
> Subject: Re: nutch 1.x tutorial with solr 6.6.0
>
> Hi,
> I just crawl a single URL so no whole web crawling.
> So I do option 2, fetching, invertlinks successfully. This is just Nutch
> 1.x Then I do Indexing into Apache Solr so go to section Setup Solr for
> search.
> First thing that does not work:
> cd ${APACHE_SOLR_HOME}/example
> java -jar start.jar
> No start.jar at the specified location, but no problem you start Solr
> 6.6.0 with bin/solr start.
> Then the tutorial says:
> Backup the original Solr example schema.xml:
> mv ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml
> ${APACHE_SOLR_HOME}/example/solr/collection1/conf/schema.xml.org
>
> But in current Solr, 6.6.0, there is no schema.xml file. In the whole
> distribution. What should I do here?
> if I go directly to run the Solr Index command from ${NUTCH_RUNTIME_HOME}:
> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
> crawl/linkdb crawl/segments/ which may not make sense since I have skipped
> some steps, it crashes:
> The input path at segments is not a segment... skipping
> Indexer: java.lang.RuntimeException: Missing elastic.cluster and
> elastic.host. At least one of them should be set in nutch-site.xml
> ElasticIndexWriter
>         elastic.cluster : elastic prefix cluster
>         elastic.host : hostname
>         elastic.port : port
>
> Clearly there is some missing configuration in nutch-site.xml, apart from
> setting http.agent.name in nutch-site.xml (mentioned) other fields need
> to be set up. The segments message above is also troubling.
>
> If you follow the steps (if they worked) should we run bin/nutch solrindex
> http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb
> crawl/segments/ (this is the last step in Integrate Solr with Nutch) and
> then
>
> bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb
> crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize -deleteGone
> (this is one of the steps of Using Individual Commands for Whole-Web
> Crawling, which in fact also is the section to read if you are only
> crawling a URL.
>
> This is what I found by following the tutorial at
> https://wiki.apache.org/nutch/NutchTutorial
>
> On 7/9/17, lewis john mcgibbney <lewi...@apache.org> wrote:
> > Hi Pau,
> >
> > On Sat, Jul 8, 2017 at 6:52 AM, <user-digest-h...@nutch.apache.org>
> wrote:
> >
> >> From: Pau Paches <sp.exstream.t...@gmail.com>
> >> To: user@nutch.apache.org
> >> Cc:
> >> Bcc:
> >> Date: Sat, 8 Jul 2017 15:52:46 +0200
> >> Subject: nutch 1.x tutorial with solr 6.6.0 Hi, I have run the Nutch
> >> 1.x Tutorial with Solr 6.6.0.
> >> Many things do not work,
> >
> >
> > What does not work? Can you elaborate?
> >
> >
> >> there is a mismatch between the assumed Solr
> >> version and the current Solr version.
> >>
> >
> > We support Solr as an indexing backend in the broadest sense possible. We
> > do not aim to support the latest and greatest Solr version available. If
> > you are interested in upgrading to a particular version, if you could
> open
> > a JIRA issue and provide a pull request it would be excellent.
> >
> >
> >> I have seen some messages about the same problem for Solr 4.x
> >> Is this the right path to go or should I move to Nutch 2.x?
> >
> >
> > If you are new to Nutch, I would highly advise that you stick with 1.X
> >
> >
> >> Does it
> >> make sense to use Solr 6.6 with Nutch 1.x?
> >
> >
> > Yes... you _may_ have a few configuration options to tweak but there have
> > been no backwards incompatibility issues so I see no reason for anything
> to
> > be broken.
> >
> >
> >> If yes, I'm willing to
> >> amend the tutorial if someone helps.
> >>
> >>
> > What is broken? Can you elaborate?
> >
>
>

Reply via email to