I am using Nutch 1.8 with Solr 4.3 and I want to index two custom meta tags
that we have on our site. I have followed the tutorial at
http://wiki.apache.org/nutch/IndexMetatags but I cannot get it to work. If I
run parsechecker, it shows that the fields are being parsed, but if I run
indexchec
That will work, but use nutch.fetchInterval.fixed in case you use an adaptive
fetch scheduler.
-Original message-
> From:Julien Nioche
> Sent: Friday 23rd May 2014 12:09
> To: user@nutch.apache.org
> Subject: Re: Re-crawl every 24 hours
>
> Hi
>
> This will work with 1.8 indeed. Wh
Hi Julien,
Would you please guide me how a re-Crawling Script should be. I pass following
steps(even after adding fetch.interval parameter), crawler goes deep and
deeper.
1) ./nutch Inject /url
2)Loop{
./nutch generate -topN 2000
./nutch fetch [CrwalID]
./nutch parse [CrawlID]
./nutch generated
Hi,
Anyone could pointing me on documentation how to pull in (fetching) data
from database (e.g. common RDBMS such MySQL, etc.) with nutch?
While the rest of process are nutch commons: parse and index them.
Thanks in advance.
--
wassalam,
[bayu]
Hi
This will work with 1.8 indeed. What procedure do you mean? Just add
nutch.fetchInterval to the seeds, that's all.
J.
On 23 May 2014 10:13, Ali Nazemian wrote:
> Dear Julien,
> Hi,
> Do you know any step by step guide for this procedure? Is this the same for
> nutch 1.8?
> Best regards.
>
Dear Julien,
Hi,
Do you know any step by step guide for this procedure? Is this the same for
nutch 1.8?
Best regards.
On Wed, May 21, 2014 at 6:43 PM, Julien Nioche <
lists.digitalpeb...@gmail.com> wrote:
>
> db.fetch.interval.default
> 1800
> The default number of seconds between re-fetc
Thanks Sebastian for your trouble!
In http://wiki.apache.org/nutch/Nutch2Crawling , just before the Fetch
procedure, it says:
Cons: Scoring is not used for selection Domains (hosts) at the start of a
region (mapper input) have the highest chance to get selected.
I guess that the first line i
7 matches
Mail list logo