Salut Yann,
Not really answering your question but where did you get this config from?
Some of its elements have been long deprecated (query-*, response-*,
summary-*)
Julien
On 15 June 2014 10:20, Yann Levreau yann.levr...@gmail.com wrote:
hi everyone !
I'm sorry to disturb you but i need
Julien Nioche created NUTCH-1793:
Summary: HttpRobotRulesParser not configured properly =
http.robots.403.allow property is not read
Key: NUTCH-1793
URL: https://issues.apache.org/jira/browse/NUTCH-1793
You're right, I need to clean these config files. I think these plugins
came from Nutch 1.7 (bad copy/paste :) )
I have news with my issue. Actually there were two issues :
1) outlinks are not set in the WebPage :
In ParseUtil.java (line195), we have :
*if
Hi - sites such as nytimes are hard to crawl. The only way to work around the
redirect problem is to identify why it does so and then have Nutch send the
appropriate HTTP headers so it won't. It may be a cookie, or a browser-like
user-agent string. AFAIK Nutch has no facility yet to send
See https://builds.apache.org/job/Nutch-nutchgora/1045/
--
[...truncated 3068 lines...]
init-plugin:
clean-lib:
resolve-default:
[ivy:resolve] :: loading settings :: file =
https://builds.apache.org/job/Nutch-nutchgora/ws/2.x/ivy/ivysettings.xml
5 matches
Mail list logo