date:20060516

Re: [Nutch-general] RE: new location! nutch user meeting San Francisco

2006-05-16 Thread Stefan Groschupf

Hi, no Agenda see: http://www.evite.com/app/publicUrl/[EMAIL PROTECTED]/nutch-1 Stefan

Re: robot exclusion portional of a document

2006-05-16 Thread Jérôme Charron

As far as I understand, /robots.txt designates which files may and may not be indexed by the Nutch and other crawlers. However, is there a method by which site may exclude only sections of a document? Some methods I've seen include: If there is no such feature and this is deemed useful, I would

query term for searching directories of a site?

2006-05-16 Thread Lance Birtcil

Is there any way to limit searches to a particular directory structure in an index using query terms? For instance, let's say that I have a single index created for my intranet site, a.b.c., and the site contains these directories: http://a.b.c/dir1/some/others http://a.b.c/dir2 Is there

Re: Generalte/Fetch/Update - urgent issue?

2006-05-16 Thread Andrzej Bialecki

Lukas Vlcek wrote: Hi, I am using nutch0.8-dev. I have a small shell script for generate/fetch/update cycle. I used generate command with -topN 500. After crawling about 2000 pages I changed -topN to 3 (yes three pages only) to see what pages are crawled. I found that generate/fetch/update cycl

Re: changing ranking

2006-05-16 Thread Andrzej Bialecki

Eugen Kochuev wrote: Hi guys, I have a catalogue of the sites where domains are ranked by human experts. Is it possible to tweak the score of pages belonging to the domains listed in the catalogue according to their catalogue rank? So, I'm interested in the ability to change scores of s

changing ranking

2006-05-16 Thread Eugen Kochuev

Hi guys, I have a catalogue of the sites where domains are ranked by human experts. Is it possible to tweak the score of pages belonging to the domains listed in the catalogue according to their catalogue rank? So, I'm interested in the ability to change scores of some urls. -- Best reg

robot exclusion portional of a document

2006-05-16 Thread Alexander E Genaud

Hello, As far as I understand, /robots.txt designates which files may and may not be indexed by the Nutch and other crawlers. However, is there a method by which site may exclude only sections of a document? The benefit is most evident in the search hit result description (snippets) which will o

Re: [Nutch-general] RE: new location! nutch user meeting San Francisco

Re: robot exclusion portional of a document

query term for searching directories of a site?

Re: Generalte/Fetch/Update - urgent issue?

Re: changing ranking

changing ranking

robot exclusion portional of a document

7 matches

Site Navigation

Mail list logo

Footer information