Re: nutch elpais.com

2014-06-16 Thread Julien Nioche
Salut Yann, Not really answering your question but where did you get this config from? Some of its elements have been long deprecated (query-*, response-*, summary-*) Julien On 15 June 2014 10:20, Yann Levreau yann.levr...@gmail.com wrote: hi everyone ! I'm sorry to disturb you but i need

Re: nutch elpais.com

2014-06-16 Thread Yann Levreau
You're right, I need to clean these config files. I think these plugins came from Nutch 1.7 (bad copy/paste :) ) I have news with my issue. Actually there were two issues : 1) outlinks are not set in the WebPage : In ParseUtil.java (line195), we have : *if

RE: nutch elpais.com

2014-06-16 Thread Markus Jelsma
arbitrary HTTP headers, certainly not a per-host set of headers. Markus -Original message- From: Yann Levreauyann.levr...@gmail.com Sent: Monday 16th June 2014 19:18 To: dev@nutch.apache.org Subject: Re: nutch elpais.com Youre right, I need to clean these config files. I think these plugins

nutch elpais.com

2014-06-15 Thread Yann Levreau
hi everyone ! I'm sorry to disturb you but i need some assistance for getting the outlinks of http://elpais.com. I use Nutch 2.2.1. The web page is well parsed, in debug I have all the outlinks in the Parse object. I use these basic plugins :