[jira] Commented: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links )

2006-04-20 Thread Christophe Noel (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-173?page=comments#action_12375300 ] Christophe Noel commented on NUTCH-173: --- We are TENS of nutch users using this precious patch. Most of nutch users are not making whole-web search engine (too much

fetcher.thread.per.host not working ??

2005-11-21 Thread Christophe Noel
Hello, There is something wrong with thread per host... Only one thread should only fetch one host at the same time, so why do i get these 2 connect time out (15 sec) at 13:15 and 15 seconds ?!!! This is not normal and so I get about 1000 errors when I crawl about 1400 pages... *Here is the

Crawling unpolite problem

2005-11-03 Thread Christophe Noel
and threads.per.host=15 and http.max.delay=1500). To have a polite crawler, what are the best parameters with threads.per.host =1 ? Thank you very much for your answer. Christophe Noel

[jira] Created: (NUTCH-74) French Analyzer Plugin

2005-07-19 Thread Christophe Noel (JIRA)
French Analyzer Plugin -- Key: NUTCH-74 URL: http://issues.apache.org/jira/browse/NUTCH-74 Project: Nutch Type: New Feature Environment: Nutch Reporter: Christophe Noel Attachments: analyze-french.zip This is DRAFT for a new plugin

[jira] Updated: (NUTCH-71) Search web page doesn't not focus on query input

2005-07-12 Thread Christophe Noel (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-71?page=all ] Christophe Noel updated NUTCH-71: - Attachment: searchQueryFocus.patch Search.html (fr,en) and search.jsp focus patch. Search web page doesn't not focus on query input