Hi

I just test both of the links with the commands that you mentioned and they both worked. so ...

On 01/27/2012 03:53 PM, Xiao Li wrote:
I am using Nutch to parse a university website. I found that the html
parser can not work properly. Here is the problem.

For the two webpages of students,

1. http://kdd.csd.uwo.ca/doku.php/people/yan_luo
2. http://kdd.csd.uwo.ca/doku.php/people/xiao_li

I use command "nutch parsechecker -dumpText
http://kdd.csd.uwo.ca/doku.php/people/yan_luo";
and "nutch parsechecker -dumpText
http://kdd.csd.uwo.ca/doku.php/people/xiao_li"; to extract text.

However, nutch can only extract text from the first webpage but give no
results for the second one. I do not understand why. The two webpages come
from a template.


--
Kaveh Minooie

www.plutoz.com

Reply via email to