Hi Gal,
I'm curious about the memory consumption of the cache and the speed of
retrieval of an item from the cache, when the cache has 100k domains in
it.
Thanks,
Otis
--- Gal Nitzan [EMAIL PROTECTED] wrote:
Hi Michael,
At the moment I have about 3000 domains in my db. I didn't time the
[EMAIL PROTECTED] wrote:
Hi Gal,
I'm curious about the memory consumption of the cache and the speed of
retrieval of an item from the cache, when the cache has 100k domains in
it.
Slightly off-topic, but I hope this is relevant to the original reason
for creating this plugin...
There is a
Hi,
I tried to understand the jobtracker code.
Hmm more than 1000 lines of code in just one class. :-( This makes
understanding code very difficult.
Anyway I'm missing a mechanism to reprocess hanging tasks. May I just
didn't find the code, but I invest some time to find it.
As the google
Doug,
I definitely run several times in problems, where task-trackers was
sending hard-beat messages but hadn't process the job anymore.
For example no new pages was fetched but the page / sec. statistic
becomes slow and slower.
I personal would think it makes more sense in case the
Stefan Groschupf wrote:
May we misunderstand each other, I do not mean tasks that crash, I mean
tasks that are 20 times slower on one machine as the other tasks on the
other machines.
Ah, I call that speculative re-exectution. Nutch does not yet
implement that.
I don't think speculative
Another observation: when the same size fetch list and same number of
threads were used, the fetcher started at different speed in different runs,
ranging from 200kb/s to 1200kb/s. I'm using DSL at home, so this variation
in downlaod speed could be due to the variation in DSL connection. If using
On 03:36:45 03/Oct , Michael wrote:
3mbit, 100 threads = 15 pages/sec
cpu is low during fetch, so its bandwidth limit.
yes, cpu is low, and even memory is quite free. But, with a 10MB in/out
I cannot obtain good results (and I do not parse results, simply fetch
them).
If I use 100 threads, I
On 09:59:45 03/Oct , Doug Cutting wrote:
I suspect threads are hanging, probably in the parser,
I tried to not parse, but without good results.
If I use 100 threads, I can download pages at 500KB/s for about 5 seconds,
but after that, the download rate falls to 0. If I set 20 threads, I can
Nutch - Fetcher - HTTP - Performance Testing Tuning
-
Key: NUTCH-109
URL: http://issues.apache.org/jira/browse/NUTCH-109
Project: Nutch
Type: Improvement
Components: fetcher
Versions: 0.7, 0.6, 0.7.1,
[ http://issues.apache.org/jira/browse/NUTCH-109?page=all ]
Fuad Efendi updated NUTCH-109:
--
Attachment: protocol-httpclient-innovation-0.1.0.zip
New Plugin, you may play with commenting this code in HttpFactory
static {
On 10/6/05, Dawid Weiss [EMAIL PROTECTED] wrote:
That would be great, I looked already to the code base in the plug-in
directory and it seems you use this call to get the clustering results:
controller.query(lingo-nmf-km-3, pseudo-query, requestParams);
am I right ?
anyway, I want
11 matches
Mail list logo