Re[2]: robot exclusion portional of a document

2006-05-18 Thread Eugen Kochuev
Hello juan, Thursday, May 18, 2006, 10:18:36 AM, you wrote: I don't think that such usage of html meta tag is good idea. This will lead to not valid HTML code. Google adsense bot uses HTML comments (if present) to determine which content to use for targeting. Nutch could use the same approach.

Re: robot exclusion portional of a document

2006-05-18 Thread juan_barbancho_rsi
Hello, I proposed a idea. You could use a especial tag like meta in the body. This tag do not show in html browser and do not need HTML comment. HELLO HELLO NO INDEX "Nutch N

Re: robot exclusion portional of a document

2006-05-17 Thread Nutch Newbie
On 5/16/06, Alexander E Genaud <[EMAIL PROTECTED]> wrote: Hello, As far as I understand, /robots.txt designates which files may and may not be indexed by the Nutch and other crawlers. However, is there a method by which site may exclude only sections of a document? The benefit is most evident i

Re: robot exclusion portional of a document

2006-05-17 Thread Alexander E Genaud
Thanks for getting back to me Jérôme, Would you suggest I jump into the Tokenizer? Would we need to differentiate indexing, summaries, and/or anchors (as google claims to do)? Should I target 0.7.2 or 0.8-dev? Aside, perhaps we should add the modified date field (as NutchWax and others do). Ale

Re: robot exclusion portional of a document

2006-05-16 Thread Jérôme Charron
As far as I understand, /robots.txt designates which files may and may not be indexed by the Nutch and other crawlers. However, is there a method by which site may exclude only sections of a document? Some methods I've seen include: If there is no such feature and this is deemed useful, I would

robot exclusion portional of a document

2006-05-16 Thread Alexander E Genaud
Hello, As far as I understand, /robots.txt designates which files may and may not be indexed by the Nutch and other crawlers. However, is there a method by which site may exclude only sections of a document? The benefit is most evident in the search hit result description (snippets) which will o