Hi,
no Agenda
see:
http://www.evite.com/app/publicUrl/[EMAIL PROTECTED]/nutch-1
Stefan
As far as I understand, /robots.txt designates which files may and may
not be indexed by the Nutch and other crawlers. However, is there a
method by which site may exclude only sections of a document?
Some methods I've seen include:
If there is no such feature and this is deemed useful, I would
Is there any way to limit searches to a particular directory structure
in an index using query terms? For instance, let's say that I have a
single index created for my intranet site, a.b.c., and the site contains
these directories:
http://a.b.c/dir1/some/others
http://a.b.c/dir2
Is there
Lukas Vlcek wrote:
Hi,
I am using nutch0.8-dev. I have a small shell script for
generate/fetch/update cycle. I used generate command with -topN 500.
After crawling about 2000 pages I changed -topN to 3 (yes three pages
only) to see what pages are crawled.
I found that generate/fetch/update cycl
Eugen Kochuev wrote:
Hi guys,
I have a catalogue of the sites where domains are ranked by human
experts. Is it possible to tweak the score of pages belonging to the
domains listed in the catalogue according to their catalogue rank?
So, I'm interested in the ability to change scores of s
Hi guys,
I have a catalogue of the sites where domains are ranked by human
experts. Is it possible to tweak the score of pages belonging to the
domains listed in the catalogue according to their catalogue rank?
So, I'm interested in the ability to change scores of some urls.
--
Best reg
Hello,
As far as I understand, /robots.txt designates which files may and may
not be indexed by the Nutch and other crawlers. However, is there a
method by which site may exclude only sections of a document?
The benefit is most evident in the search hit result description
(snippets) which will o