Hi Folks,
Just wondering if someone could add to the svn:ignore property for Nutch
the files:
.classpath
.project
I happen to use eclipse to do Nutch development and always ignore these
files in my other eclipse projects as well.
Cheers,
Chris
__
Gang,
At the risk of incurring cross-posting ire (and based on a suggestion
from Stefan), I'm posting this to nutch-dev as well:
We're now running into "No node available for block "
errors, which are killing our MapReduce-based crawling jobs. I did
some digging through our logs after one of
[ http://issues.apache.org/jira/browse/NUTCH-196?page=all ]
Jerome Charron updated NUTCH-196:
-
Attachment: NUTCH-196.lib-log4j.patch
My two cents with this patch that:
* provides a lib-log4j plugin (base on log4j 1.2.11)
* remove log4j jars from pars
[ http://issues.apache.org/jira/browse/NUTCH-149?page=all ]
Chris A. Mattmann closed NUTCH-149:
---
Closed at request of reporter: not a bug
> outlinks not shown properly in cached.jsp
> -
>
> Key: NUTCH-
[ http://issues.apache.org/jira/browse/NUTCH-149?page=all ]
Chris A. Mattmann resolved NUTCH-149:
-
Resolution: Invalid
Closed at request of the reporter: not a bug.
> outlinks not shown properly in cached.jsp
> -
[
http://issues.apache.org/jira/browse/NUTCH-158?page=comments#action_12365483 ]
raghavendra prabhu commented on NUTCH-158:
--
This is an important thing
We should automaticall be able to insert the links parsed out of site map into
webdb
But curr
Hi, Mike,
On Tue, Feb 07, 2006 at 10:18:11AM -0800, Michael Cafarella wrote:
>
> John,
>
> This is a pretty awesome idea. Do you have any performance
> numbers or experience with it you can share?
No number yet. Just created it for my immediate use of browsing
and moving around files. It u
Andrew McNabb wrote:
Now that Hadoop is branched off, and since I'm more interested in Hadoop
than in web indexing, I was just curious whether or not there are plans
to branch off a hadoop-dev mailing list. Anything to reduce email. :)
http://lucene.apache.org/hadoop/mailing_lists.html
Doug
unsubscribe
[
http://issues.apache.org/jira/browse/NUTCH-207?page=comments#action_12365462 ]
Rod Taylor commented on NUTCH-207:
--
Code was by Radu Mateescu with additional kibitzing by myself.
> Bandwidth target for fetcher rather than a thread count
>
Bandwidth target for fetcher rather than a thread count
---
Key: NUTCH-207
URL: http://issues.apache.org/jira/browse/NUTCH-207
Project: Nutch
Type: New Feature
Components: fetcher
Versions: 0.8-dev
R
[ http://issues.apache.org/jira/browse/NUTCH-207?page=all ]
Rod Taylor updated NUTCH-207:
-
Attachment: ratelimit.patch
> Bandwidth target for fetcher rather than a thread count
> ---
>
> Key: NUTCH
[
http://issues.apache.org/jira/browse/NUTCH-205?page=comments#action_12365434 ]
Andrzej Bialecki commented on NUTCH-205:
-
This is a design choice, not a bug. The errors you see are due to improper
configuration - some threads cannot access the hos
unsubscribe
search server throws InstantiationException
---
Key: NUTCH-206
URL: http://issues.apache.org/jira/browse/NUTCH-206
Project: Nutch
Type: Bug
Components: searcher
Versions: 0.8-dev
Environment: windows 2003
cygwin
Now that Hadoop is branched off, and since I'm more interested in Hadoop
than in web indexing, I was just curious whether or not there are plans
to branch off a hadoop-dev mailing list. Anything to reduce email. :)
--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1
[
http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365458 ]
Mike Cafarella commented on NUTCH-193:
--
It should be noted that the name "Nutch" also comes from one of Doug's
children.
They seem to have a proud future in advertising
[
http://issues.apache.org/jira/browse/NUTCH-205?page=comments#action_12365446 ]
M.Oliver Scheele commented on NUTCH-205:
Thanks for comment.
I'm using the standard properties in my configuration (which shouldn't be
improper by default;)):
fetcher.
John,
This is a pretty awesome idea. Do you have any performance
numbers or experience with it you can share?
--Mike
On Thu, 2006-02-02 at 23:19, John X wrote:
> On Sat, Jan 21, 2006 at 09:23:01AM -0800, John X wrote:
> > Hi, Sami,
> >
> > On Sat, Jan 21, 2006 at 05:32:37PM +0200, Sami
Hi Bryan,
On Thu, 2006-02-02 at 12:06, Bryan A. Pendleton wrote:
>
> 1) If you fill up the space of a datanode, it appears to fail with the wrong
> exception and reload. This, combined with the currently simple
> block-allocation method (random), means that one "full" node can cause a big
> dr
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12365450 ]
Doug Cutting commented on NUTCH-192:
Sorry, I misspoke and overstated things too. There are problems, but not with
MapWritable, rather with WritableName: this refers to so
Wrong 'fetch date' for non available pages
--
Key: NUTCH-205
URL: http://issues.apache.org/jira/browse/NUTCH-205
Project: Nutch
Type: Bug
Components: fetcher
Versions: 0.7, 0.7.1
Environment: JDK 1.4.2_09 / Windows
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12365413 ]
Andrzej Bialecki commented on NUTCH-192:
-
I have a different opinion on this (I think MapWritable is a sufficiently
general-purpose data structure that would be useful
[ http://issues.apache.org/jira/browse/NUTCH-81?page=all ]
Michael Nebel updated NUTCH-81:
---
Attachment: fix-faq-url.diff
with the move from sf to apache the old faq isn' accessable any more. This
patch changes the link from http://www.nutch.org/faq.html t
24 matches
Mail list logo