On 10/6/05, Dawid Weiss <[EMAIL PROTECTED]> wrote:
>
>
> > That would be great, I looked already to the code base in the plug-in
> > directory and it seems you use this call to get the clustering results:
> >
> > controller.query("lingo-nmf-km-3", "pseudo-query", requestParams);
> > am I right ?
>
[
http://issues.apache.org/jira/browse/NUTCH-109?page=comments#action_12331764 ]
Fuad Efendi commented on NUTCH-109:
---
By default, Java 1.4 caches DNS-to-IP mappings forever...
java.security.Security.setProperty("networkaddress.cache.ttl" , "1");
Try new Protocol-HTTPClient-Innovation:
http://issues.apache.org/jira/browse/NUTCH-109
-Original Message-
From: Daniele Menozzi [mailto:[EMAIL PROTECTED]
Sent: Monday, October 10, 2005 5:42 PM
To: nutch-dev@lucene.apache.org
Subject: Re: Re[2]: what contibute to fetch slowing down
On
[ http://issues.apache.org/jira/browse/NUTCH-109?page=all ]
Fuad Efendi updated NUTCH-109:
--
Attachment: protocol-httpclient-innovation-0.1.0.zip
New Plugin, you may play with commenting this code in HttpFactory
static {
CookieModule.
Nutch - Fetcher - HTTP - Performance Testing & Tuning
-
Key: NUTCH-109
URL: http://issues.apache.org/jira/browse/NUTCH-109
Project: Nutch
Type: Improvement
Components: fetcher
Versions: 0.7, 0.6, 0.7.1, 0.8-
http://nagoya.apache.org/jira
- it does not work right now, I am trying to upload new Http-Plugin which
seems to be 100 times faster.
1. TCP connection costs a lot, not only for Nutch and end-point but also for
intermediary network equipment
2. Web Server creates Client thread and hopes that Nutch
Hi,
I am trying to search for 2 things. Is there any support I can "buy" for nutch?
and are there any commercial nutch based sollutions that provide things like
vide indexing and administration interface?
Is there any administration interface available...
--
On 09:59:45 03/Oct , Doug Cutting wrote:
> I suspect threads are hanging, probably in the parser,
I tried to not parse, but without good results.
If I use 100 threads, I can download pages at 500KB/s for about 5 seconds,
but after that, the download rate falls to 0. If I set 20 threads, I can
do
On 03:36:45 03/Oct , Michael wrote:
> 3mbit, 100 threads = 15 pages/sec
> cpu is low during fetch, so its bandwidth limit.
yes, cpu is low, and even memory is quite free. But, with a 10MB in/out
I cannot obtain good results (and I do not parse results, simply fetch
them).
If I use 100 threads, I
Another observation: when the same size fetch list and same number of
threads were used, the fetcher started at different speed in different runs,
ranging from 200kb/s to 1200kb/s. I'm using DSL at home, so this variation
in downlaod speed could be due to the variation in DSL connection. If using
s
Stefan Groschupf wrote:
May we misunderstand each other, I do not mean tasks that crash, I mean
tasks that are 20 times slower on one machine as the other tasks on the
other machines.
Ah, I call that "speculative re-exectution". Nutch does not yet
implement that.
I don't think speculativ
Doug,
I definitely run several times in problems, where task-trackers was
sending hard-beat messages but hadn't process the job anymore.
For example no new pages was fetched but the page / sec. statistic
becomes slow and slower.
I personal would think it makes more sense in case the jobtracker
Stefan Groschupf wrote:
Do I miss the section in the jobtracker where this is done, or are
people interested that I submit a patch doing this mechanism?
This is mostly already implemented. The tasktracker fails tasks that do
not update their status within a configurable timeout. Task status
Hi,
I tried to understand the jobtracker code.
Hmm more than 1000 lines of code in just one class. :-( This makes
understanding code very difficult.
Anyway I'm missing a mechanism to reprocess hanging tasks. May I just
didn't find the code, but I invest some time to find it.
As the google pa
14 matches
Mail list logo