[Nutch-dev] Re: [jira] Created: (NUTCH-103) Vivisimo like treeview and url redirect

2005-10-10 Thread Robert Benea
On 10/6/05, Dawid Weiss <[EMAIL PROTECTED]> wrote: > > > > That would be great, I looked already to the code base in the plug-in > > directory and it seems you use this call to get the clustering results: > > > > controller.query("lingo-nmf-km-3", "pseudo-query", requestParams); > > am I right ? >

[Nutch-dev] [jira] Commented: (NUTCH-109) Nutch - Fetcher - HTTP - Performance Testing & Tuning

2005-10-10 Thread Fuad Efendi (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-109?page=comments#action_12331764 ] Fuad Efendi commented on NUTCH-109: --- By default, Java 1.4 caches DNS-to-IP mappings forever... java.security.Security.setProperty("networkaddress.cache.ttl" , "1");

[Nutch-dev] RE: Re[2]: what contibute to fetch slowing down

2005-10-10 Thread Fuad Efendi
Try new Protocol-HTTPClient-Innovation: http://issues.apache.org/jira/browse/NUTCH-109 -Original Message- From: Daniele Menozzi [mailto:[EMAIL PROTECTED] Sent: Monday, October 10, 2005 5:42 PM To: nutch-dev@lucene.apache.org Subject: Re: Re[2]: what contibute to fetch slowing down On

[Nutch-dev] [jira] Updated: (NUTCH-109) Nutch - Fetcher - HTTP - Performance Testing & Tuning

2005-10-10 Thread Fuad Efendi (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-109?page=all ] Fuad Efendi updated NUTCH-109: -- Attachment: protocol-httpclient-innovation-0.1.0.zip New Plugin, you may play with commenting this code in HttpFactory static { CookieModule.

[Nutch-dev] [jira] Created: (NUTCH-109) Nutch - Fetcher - HTTP - Performance Testing & Tuning

2005-10-10 Thread Fuad Efendi (JIRA)
Nutch - Fetcher - HTTP - Performance Testing & Tuning - Key: NUTCH-109 URL: http://issues.apache.org/jira/browse/NUTCH-109 Project: Nutch Type: Improvement Components: fetcher Versions: 0.7, 0.6, 0.7.1, 0.8-

[Nutch-dev] RE: Re[2]: what contibute to fetch slowing down

2005-10-10 Thread Fuad Efendi
http://nagoya.apache.org/jira - it does not work right now, I am trying to upload new Http-Plugin which seems to be 100 times faster. 1. TCP connection costs a lot, not only for Nutch and end-point but also for intermediary network equipment 2. Web Server creates Client thread and hopes that Nutch

[Nutch-dev] Nutch management tools

2005-10-10 Thread Abdurrahman Advany
Hi, I am trying to search for 2 things. Is there any support I can "buy" for nutch? and are there any commercial nutch based sollutions that provide things like vide indexing and administration interface? Is there any administration interface available... --

[Nutch-dev] Re: what contibute to fetch slowing down

2005-10-10 Thread Daniele Menozzi
On 09:59:45 03/Oct , Doug Cutting wrote: > I suspect threads are hanging, probably in the parser, I tried to not parse, but without good results. If I use 100 threads, I can download pages at 500KB/s for about 5 seconds, but after that, the download rate falls to 0. If I set 20 threads, I can do

[Nutch-dev] Re: Re[2]: what contibute to fetch slowing down

2005-10-10 Thread Daniele Menozzi
On 03:36:45 03/Oct , Michael wrote: > 3mbit, 100 threads = 15 pages/sec > cpu is low during fetch, so its bandwidth limit. yes, cpu is low, and even memory is quite free. But, with a 10MB in/out I cannot obtain good results (and I do not parse results, simply fetch them). If I use 100 threads, I

[Nutch-dev] fetch speed issue

2005-10-10 Thread AJ Chen
Another observation: when the same size fetch list and same number of threads were used, the fetcher started at different speed in different runs, ranging from 200kb/s to 1200kb/s. I'm using DSL at home, so this variation in downlaod speed could be due to the variation in DSL connection. If using s

[Nutch-dev] Re: reprocessing hanging tasks

2005-10-10 Thread Doug Cutting
Stefan Groschupf wrote: May we misunderstand each other, I do not mean tasks that crash, I mean tasks that are 20 times slower on one machine as the other tasks on the other machines. Ah, I call that "speculative re-exectution". Nutch does not yet implement that. I don't think speculativ

[Nutch-dev] Re: reprocessing hanging tasks

2005-10-10 Thread Stefan Groschupf
Doug, I definitely run several times in problems, where task-trackers was sending hard-beat messages but hadn't process the job anymore. For example no new pages was fetched but the page / sec. statistic becomes slow and slower. I personal would think it makes more sense in case the jobtracker

[Nutch-dev] Re: reprocessing hanging tasks

2005-10-10 Thread Doug Cutting
Stefan Groschupf wrote: Do I miss the section in the jobtracker where this is done, or are people interested that I submit a patch doing this mechanism? This is mostly already implemented. The tasktracker fails tasks that do not update their status within a configurable timeout. Task status

[Nutch-dev] reprocessing hanging tasks

2005-10-10 Thread Stefan Groschupf
Hi, I tried to understand the jobtracker code. Hmm more than 1000 lines of code in just one class. :-( This makes understanding code very difficult. Anyway I'm missing a mechanism to reprocess hanging tasks. May I just didn't find the code, but I invest some time to find it. As the google pa