Hi all,
I'm running Nutch 2.3.1 over a standalone HBase instance with no yarn or
hdfs. This means that the jobs get run through
org.apache.hadoop.mapred.LocalJobRunner which doesn't support killing
mapred tasks. I've set it up so that all of the nutch threads get run in
the same ThreadGroup a
Hi all,
I'm wondering how Nutch 2.3.1 handles links with the rel="canonical"
attribute.
I found this ticket: https://issues.apache.org/jira/browse/NUTCH-710
which is from version 1.1 and doesn't seem to have ever been resolved.
Are all canonical links still just rejected? Are there any plans
you need to set
db.update.purge.404=true
?
Tom
On 15/05/17 20:35, Ben Vachon wrote:
Hi all,
I'm working with Nutch 2.3.1 and I have a problem that I'm hoping the
community can help me with.
A page is fetched successfully and subsequently indexed during the
initial run of a crawle
Hi all,
I'm working with Nutch 2.3.1 and I have a problem that I'm hoping the
community can help me with.
A page is fetched successfully and subsequently indexed during the
initial run of a crawler, but later, the page no longer exists on the
server (404 not found). When I run the crawler agai
Hi all,
It's a requirement for our platform to use the hbase-client-1.1.2 jar
and we can't have multiple versions of hbase-client so I need to get
nutch-2.3.1 to use hbase-client-1.1.2 rather than 0.98.8-hadoop2.
*/For these tests, I have been pointing nutch at a standalone hbase
running on
Hi Fabio,
I believe there is a property generate.max.distance in nutch-site.xml in
the newest releases that you can use to configure max depth.
On 04/11/2017 06:20 AM, Fabio Ricci wrote:
Hi Sebastian
thank you for your message. That does not help me really…
Yes I new the output of ./crawl
<https://issues.apache.org/jira/browse/NUTCH-2292>
HTH
Julien
Thanks very much,
Ben V.
On 04/07/2017 09:48 AM, lsroudi abdel wrote:
hi,
i think you should add it in the ivy/ivy.xml and and just run ant runtime
On Thu, Apr 6, 2017 at 9:35 PM, Ben Vachon wrote:
Hi all,
I'
n V.
On 04/07/2017 09:48 AM, lsroudi abdel wrote:
hi,
i think you should add it in the ivy/ivy.xml and and just run ant runtime
On Thu, Apr 6, 2017 at 9:35 PM, Ben Vachon wrote:
Hi all,
I'm working on a project that gets Nutch 2.3.1 from maven and uses it to
set off crawl jobs which are con
Hi all,
I'm working on a project that gets Nutch 2.3.1 from maven and uses it to
set off crawl jobs which are configurable in our own UI and through our
own search platform's properties. To allow specific configuration of
crawlers, I want to use many of the default plugins that come with a
Nutch
9 matches
Mail list logo