On Mon, Jan 14, 2013 at 6:45 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> > Markus, implemented an extension of the AdaptiveFetchSchedule [0] which
> allows you to specify a configuration file [1] containing the mime-types
> and thier inc and dec rates based upon your preference.
Please see below
On Sat, Jan 12, 2013 at 8:48 PM, Bayu Widyasanyata
wrote:
>
> That's tomcat port for Solr.
> Should we activate the proxy setting?
>
Is it already activated in nutch-site.xml? No I do not think it should be
activated unless you have a proxy running.
>
>
>
> But the strange is t
Hi J,
On Sun, Jan 13, 2013 at 2:14 AM, J. Gobel wrote:
>
> At the moment I am testing if this works.
>
Please keep us updated then.
>
> This is not
> desirable as this means that ALL urls will be fetched daily.
Typically if URLs are dynamically changing, you would want to maintain a
webdb of
hi there,
I am trying to figure out what the best method is to recrawl certain sites.
I am crawling news-sites and they update their frontpage quite often, so I
need o crawl their frontpage/index.php etc. often and have Nutch fetch the
new links + content.
I cannot find an answer to my question i
This should be correct yes.
If you look at the plugin source you can see the patterns it uses to
extract links.
Also you can check what's iyour crawldb using the readdb command
Hth
Lewis
On Saturday, January 12, 2013, Michael Gang wrote:
> Hi,
>
> So if there is a javascript which actually submit
On Sun, Jan 13, 2013 at 5:50 PM, Markus Jelsma
wrote:
> No, you can plugin another FetchSchedule that supports adjusting the
> interval based on whether a record is modified. See the
> AdaptiveFetchSchedule for an example.
>
Hi,
Thanks for pointing into that subject since I'm new in nutch & solr
Hi Feng and Lewis,
Thanks for your replies! I tried a few different settings and finally
found out that increasing "http.content.limit" fixed the problem.
Kaz
2013/1/13 Lewis John Mcgibbney :
> Hi Kaz,
>
> On Sat, Jan 12, 2013 at 1:09 AM, k4200 wrote:
>
>>
>> Here are the questions:
>> 1. How t
-Original message-
> From:Bayu Widyasanyata
> Sent: Sun 13-Jan-2013 07:34
> To: user@nutch.apache.org
> Subject: Re: How segments is created?
>
> On Sun, Jan 13, 2013 at 12:47 PM, Tejas Patil wrote:
>
> >
> > Well, if you know that the front page is updated frequently, set
> > "db.
8 matches
Mail list logo