Hi All,
Note reposting my question since looks like earlier one got posted on wrong
thread.
We are using Nutch 1.13 and Solr 6. I am trying to use one of the parsers that
come with Tika boilerpipe support. I am getting best result for pages where
there are only outlinks with CanolaExtracto
Hi All,
We are using Nutch 1.13 and Solr 6. I am trying to use one of the parsers that
come with Tika boilerpipe support. I am getting best result for pages where
there are only outlinks with CanolaExtractor in a page like this:
https://support.automationdirect.com/faq/dl205.php
But checking
Hi,
We currently use Nutch 1.10 and SOLR 4.x. We are in a process of upgrading both
software. I wanted to find out if the latest version of Nutch 1.13 is
compatible with SOLR 6. Also, if there is any documentation that I can use for
upgrading Nutch that will be compatible with SOLR 6.
Thanks i
On Thu, Aug 18, 2016 at 7:08 AM, wrote:
>
>>
>> From: "Arora, Madhvi"
>> To: "user@nutch.apache.org"
>> Cc:
>> Date: Wed, 17 Aug 2016 13:30:09 +
>> Subject: Upgrade to Nutch 1.12
>> Hi,
>>
>>
>> I wanted to find out how t
Hi,
I wanted to find out how to correct the issue below and will appreciate any
help.
I am trying to upgrade to Nutch 1.12. I am using solr 5.3.1. The reason I am
upgrading are:
1: https crawling
2: Boilerplate canola extraction through tika
The only problem so far I am having is an IOExcep
ind of related to what I need.
On 8/5/16, 2:18 PM, "Arora, Madhvi" wrote:
>Thank you very much!
>
>
>
>
>On 8/5/16, 2:13 PM, "Markus Jelsma" wrote:
>
>>I am not sure which version is was added, you'd have to check CHANGES.txt,
>&g
Thank you very much!
On 8/5/16, 2:13 PM, "Markus Jelsma" wrote:
>I am not sure which version is was added, you'd have to check CHANGES.txt, but
>upgrading is usually a good idea and very simple.
>Markus
>
>
>
>-Original message-
>> From:Arora, Madhvi
>> Sent: Friday 5th August 201
Markus so to crawl https and http urls successfully we just need to switch to a
newer version of Nutch I.e. Higher than Nutch 1.10?
On 8/5/16, 12:47 PM, "Markus Jelsma" wrote:
>Hello - see inline.
>Markus
>
>-Original message-
>> From:Arora, Madhvi
>> Sent: Friday 5th August 2016
Hi,
We are using Nutch 1.10 and Solr 5. We have around 10 different web sites that
are crawled regularly. We are changing protocol of a few websites from http to
https. So we will have a mix bag of http and https protocols.
I checked in nutch user-mail archive and get that we need to change
pr