and try to apply those to scoring-similarity.
>
> > Can somebody guide me on this?
>
> There is currently no Nutch committer actively working on 2.x - just
> compare the commit history on
> the master and 2.x branches.
>
> Sebastian
>
>
>
> On 6/28/19 12:46 PM,
Hi all,
I am using Nutch 2.3.1 with Hbase-1.2.3 as storage backend on top of
Hadoop-2.5.2 cluster in *deploy mode* with crawled data being indexed to
solr-6.5.1.
I want to add *focussed crawling capabilities to nutch 2.3.1* similar to
one provided by *scoring-similarity plugin for nutch 1.x*.
Can
once
I find time to dwell into this issue deeper.
-Gajanan
On Fri, Oct 19, 2018 at 12:54 AM Lewis John McGibbney
wrote:
> Hi Gahanna,
> Response inline
>
> On 2018/10/12 07:40:50, Gajanan Watkar wrote:
> > Hi all,
> > I am using Nutch 2.3.1 with Hbase-1.2.3 as storag
Hi all,
I am using Nutch 2.3.1 with Hbase-1.2.3 as storage backend on top of
Hadoop-2.5.2 cluster in *deploy mode* with crawled data being indexed to
solr-6.5.1.
I want to use *webapp* for creating, controlling and monitoring crawl jobs
in deploy mode.
With Hadoop cluster, Hbase and nutchserver st
12:19 AM
> wrote:
>
> >
> >
> > From: Gajanan Watkar
> > To: user@nutch.apache.org
> > Cc:
> > Bcc:
> > Date: Wed, 10 Oct 2018 17:19:24 +0530
> > Subject: Re: Unable to get regex-urlfilter working
> > I am using Nutch 2.x with habse as backend storage.
> >
> > *-Gajanan*
> >
>
I am using Nutch 2.x with habse as backend storage.
*-Gajanan*
On Wed, Oct 10, 2018 at 5:17 PM Gajanan Watkar
wrote:
> Hi all,
>
> *1. Want to fillter all urls like:*
>
> http://14538.diarynote.jp/items/music-jp/B5FMG1/
> http://12899diarynote.jp/amp/20150316
Hi all,
*1. Want to fillter all urls like:*
http://14538.diarynote.jp/items/music-jp/B5FMG1/
http://12899diarynote.jp/amp/201503160602121325/
http://15131513marudiarynote.jp/amp/201603181431397340/
http://11621diarynote.jp/amp/20040906174131/
http://14291.diarynote.jp/items/dvd-jp/B00016Z
se correct the
> record.
> BTW, I found the following article written by Elis, to be extremely useful
> https://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
>
> On Wed, Sep 19, 2018 at 3:55 AM wrote:
>
> > From: Gajanan Watkar
> > To: user@nutch.apach
patch for
MalformedURLException.
I am getting uneven region sizes, can you suggest me on pre-spliting
webpage table i.e. split points to be used and splitting policy and optimum
GC setup for regionserver for efficient Nutch crawling.
-Gajanan
On Sun, Sep 9, 2018 at 8:34 AM Gajanan Watkar
wrote:
use the 2.x codebase, you should
> use the most recent from SCM e.g. check out master and change to 2.x
> branch.
> Finally, for now at least, you didn't mention the phase at which the crawl
> is failing. Can you provide this?
>
> On Thu, Sep 6, 2018 at 8:58 AM wrote:
&g
I am running Nutch-2.3.1 over Hadoop-2.5.2 and Hbase-1.2.3 with
integration to Solr-6.5.1. I have crawled over 10 million pages. But
while doing all this I am continuously facing two problems:
1. My Nodemanager is crashing repeatedly during different phases of
crawl. It crashes my linux session an
11 matches
Mail list logo