Hi,
What exactly does this plugin do? I haven't seen it mentioned and the
README.txt doesn't really describe it.
Thanks,
Otis
- Original Message
From: [EMAIL PROTECTED]
To: nutch-commits@lucene.apache.org
Sent: Sunday, June 4, 2006 3:44:23 PM
Subject: [Nutch-cvs] svn commit: r411594 -
The idea to have
someething like this as a nutch-module (dropping pages or ranking them
very low) might come up :-)
This will be a very long way.
I collect some thoughts and a list of web spam related papers in my
blog.
http://www.find23.net/Web-Site/blog/521BA1CD-14C4-4E84-A072-
F98E13CAEFE
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ]
Chris A. Mattmann closed NUTCH-258:
---
Won't fix: issue describes intended behavior of system (fetcher component).
> Once Nutch logs a SEVERE log item, Nutch fails forevermore
> --
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ]
Chris A. Mattmann resolved NUTCH-258:
-
Resolution: Won't Fix
The use of LOG.severe in the fetcher indicates an unrecoverable error: thus,
this issue is not a bug, and in fact describes
Stefan Groschupf wrote:
>
> a interesting tool:
> http://tool.motoricerca.info/spam-detector/
Do you have good/bad experience with that tool? The idea to have
someething like this as a nutch-module (dropping pages or ranking them
very low) might come up :-)
>From the FAQ I read that the author i
Hi,
a interesting tool:
http://tool.motoricerca.info/spam-detector/
Stefan
[
http://issues.apache.org/jira/browse/NUTCH-294?page=comments#action_12414653 ]
Stefan Neufeind commented on NUTCH-294:
---
I'm not sure. On a quick run I wasn't able to get the
"clustering-carrot2"-plugin to work - though I thought I simply need to inc
[ http://issues.apache.org/jira/browse/NUTCH-298?page=all ]
Stefan Groschupf updated NUTCH-298:
---
Summary: if a 404 for a robots.txt is returned a NPE is thrown (was: if a
404 for a robots.txt is returned no page is fetched at all from the host)
Sorry
[
http://issues.apache.org/jira/browse/NUTCH-299?page=comments#action_12414648 ]
Hasan Diwan commented on NUTCH-299:
---
Extracts and indexes meta-data. Doesn't follow the URL to the tracker. I would
add that if I have the time, or maybe someone else can.
>
[
http://issues.apache.org/jira/browse/NUTCH-298?page=comments#action_12414647 ]
Stefan Neufeind commented on NUTCH-298:
---
Is the description-line of this bug correct? I've been indexing pages without
robots.txt, and I just checked that those hosts gi
[
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414646 ]
Stefan Neufeind commented on NUTCH-258:
---
Agreed. The root-causee of the loop should be identified. So I'd suggest
turning this into a wont-fix-bug - and if it occurs agai
[
http://issues.apache.org/jira/browse/NUTCH-299?page=comments#action_12414643 ]
Stefan Neufeind commented on NUTCH-299:
---
Could you briefly explain what it does? Extract meta-data and index the comment
as "content of that page"? Or does it also follow
12 matches
Mail list logo