[
http://issues.apache.org/jira/browse/NUTCH-173?page=comments#action_12375300 ]
Christophe Noel commented on NUTCH-173:
---
We are TENS of nutch users using this precious patch.
Most of nutch users are not making whole-web search engine (too much
Hello,
There is something wrong with thread per host...
Only one thread should only fetch one host at the same time, so why do i
get these 2 connect time out (15 sec) at 13:15 and 15 seconds ?!!!
This is not normal and so I get about 1000 errors when I crawl about
1400 pages...
*Here is the
and
threads.per.host=15 and http.max.delay=1500).
To have a polite crawler, what are the best parameters with
threads.per.host =1 ?
Thank you very much for your answer.
Christophe Noel
French Analyzer Plugin
--
Key: NUTCH-74
URL: http://issues.apache.org/jira/browse/NUTCH-74
Project: Nutch
Type: New Feature
Environment: Nutch
Reporter: Christophe Noel
Attachments: analyze-french.zip
This is DRAFT for a new plugin
[ http://issues.apache.org/jira/browse/NUTCH-71?page=all ]
Christophe Noel updated NUTCH-71:
-
Attachment: searchQueryFocus.patch
Search.html (fr,en) and search.jsp focus patch.
Search web page doesn't not focus on query input