Re: [jira] Updated: (NUTCH-100) New plugin urlfilter-db

2005-10-10 Thread ogjunk-nutch
Hi Gal, I'm curious about the memory consumption of the cache and the speed of retrieval of an item from the cache, when the cache has 100k domains in it. Thanks, Otis --- Gal Nitzan [EMAIL PROTECTED] wrote: Hi Michael, At the moment I have about 3000 domains in my db. I didn't time the

Re: [jira] Updated: (NUTCH-100) New plugin urlfilter-db

2005-10-10 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: Hi Gal, I'm curious about the memory consumption of the cache and the speed of retrieval of an item from the cache, when the cache has 100k domains in it. Slightly off-topic, but I hope this is relevant to the original reason for creating this plugin... There is a

reprocessing hanging tasks

2005-10-10 Thread Stefan Groschupf
Hi, I tried to understand the jobtracker code. Hmm more than 1000 lines of code in just one class. :-( This makes understanding code very difficult. Anyway I'm missing a mechanism to reprocess hanging tasks. May I just didn't find the code, but I invest some time to find it. As the google

Re: reprocessing hanging tasks

2005-10-10 Thread Stefan Groschupf
Doug, I definitely run several times in problems, where task-trackers was sending hard-beat messages but hadn't process the job anymore. For example no new pages was fetched but the page / sec. statistic becomes slow and slower. I personal would think it makes more sense in case the

Re: reprocessing hanging tasks

2005-10-10 Thread Doug Cutting
Stefan Groschupf wrote: May we misunderstand each other, I do not mean tasks that crash, I mean tasks that are 20 times slower on one machine as the other tasks on the other machines. Ah, I call that speculative re-exectution. Nutch does not yet implement that. I don't think speculative

fetch speed issue

2005-10-10 Thread AJ Chen
Another observation: when the same size fetch list and same number of threads were used, the fetcher started at different speed in different runs, ranging from 200kb/s to 1200kb/s. I'm using DSL at home, so this variation in downlaod speed could be due to the variation in DSL connection. If using

Re: Re[2]: what contibute to fetch slowing down

2005-10-10 Thread Daniele Menozzi
On 03:36:45 03/Oct , Michael wrote: 3mbit, 100 threads = 15 pages/sec cpu is low during fetch, so its bandwidth limit. yes, cpu is low, and even memory is quite free. But, with a 10MB in/out I cannot obtain good results (and I do not parse results, simply fetch them). If I use 100 threads, I

Re: what contibute to fetch slowing down

2005-10-10 Thread Daniele Menozzi
On 09:59:45 03/Oct , Doug Cutting wrote: I suspect threads are hanging, probably in the parser, I tried to not parse, but without good results. If I use 100 threads, I can download pages at 500KB/s for about 5 seconds, but after that, the download rate falls to 0. If I set 20 threads, I can

[jira] Created: (NUTCH-109) Nutch - Fetcher - HTTP - Performance Testing Tuning

2005-10-10 Thread Fuad Efendi (JIRA)
Nutch - Fetcher - HTTP - Performance Testing Tuning - Key: NUTCH-109 URL: http://issues.apache.org/jira/browse/NUTCH-109 Project: Nutch Type: Improvement Components: fetcher Versions: 0.7, 0.6, 0.7.1,

[jira] Updated: (NUTCH-109) Nutch - Fetcher - HTTP - Performance Testing Tuning

2005-10-10 Thread Fuad Efendi (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-109?page=all ] Fuad Efendi updated NUTCH-109: -- Attachment: protocol-httpclient-innovation-0.1.0.zip New Plugin, you may play with commenting this code in HttpFactory static {

Re: [jira] Created: (NUTCH-103) Vivisimo like treeview and url redirect

2005-10-10 Thread Robert Benea
On 10/6/05, Dawid Weiss [EMAIL PROTECTED] wrote: That would be great, I looked already to the code base in the plug-in directory and it seems you use this call to get the clustering results: controller.query(lingo-nmf-km-3, pseudo-query, requestParams); am I right ? anyway, I want