[ http://issues.apache.org/jira/browse/NUTCH-54?page=all ]
Andrzej Bialecki updated NUTCH-54:
---
Attachment: 20050518.patch
new-plugins.zip
Updated patch. Fixed problems with redirection handling, improved support for
JavaScript and for
Sami Siren wrote:
should we introduce a new package for these: NutchConfigurable,
NutchConfigured and the upcoming action classes -
I've added these in util in the mapred branch and will use them as I
rewrite tools to use MapReduce. I'll commit them soon.
Doug
--
Hi,
When running Fetcher sometimes it dies with the following exception:
050517 212854 SEVERE error writing output:java.io.IOException: key out
of order: 3420 after 3420
java.io.IOException: key out of order: 3420 after 3420
at org.apache.nutch.io.MapFile$Writer.checkKey(MapFile.java:128)
Exception in thread "main" java.io.IOException: Could not obtain new
output block for file /db/linkstats.txt
at
net.nutch.ndfs.NDFSClient$NameNodeCaller.getNewOutputBlock(NDFSClient.java:907)
This file already exists, so perhaps it needs to be deleted first?
Would appreciate any pointers
Doug Cutting wrote:
Andrzej Bialecki wrote:
You can download the patch from here:
http://www.getopt.org/nutch/20050507.patch
I have not yet had a chance to try this. Following are some quick
comments from reading the patch. Overall I think this is great stuff.
1. Why does an HTMLMetaTags n
Hi,
This is just an observation and a warning for those of you who are
crawling single sites in depth, and encountered frequent "Exceeded
http.max.delays" exception.
Assume the following scenario: a user runs the CrawlTool to crawl a
single site. Fetchlists generated by the CrawlTool will conta
Pablo Mayrgundter wrote:
I'm testing a deployment of Nutch at work and am trying to decide what
filesystem to use. I got the NDFS demo working, and am excited to use
it, but it looks pretty new. Should I consider using it for
production? I'm considering storing quite a lot of data, in the
10-100
should we introduce a new package for these: NutchConfigurable,
NutchConfigured and the upcoming action classes -
org.apache.nutch.action ?
--
Sami Siren
Stefan Groschupf wrote:
Hi,
Doug, can you or someone else please commit the classes you suggested, I
think most / all agree and we can start
Andrzej Bialecki wrote:
You can download the patch from here:
http://www.getopt.org/nutch/20050507.patch
I have not yet had a chance to try this. Following are some quick
comments from reading the patch. Overall I think this is great stuff.
1. Why does an HTMLMetaTags need to be passed to P