I am using Nutch to parse a university website. I found that the html parser can not work properly. Here is the problem.
For the two webpages of students, 1. http://kdd.csd.uwo.ca/doku.php/people/yan_luo 2. http://kdd.csd.uwo.ca/doku.php/people/xiao_li I use command "nutch parsechecker -dumpText http://kdd.csd.uwo.ca/doku.php/people/yan_luo" and "nutch parsechecker -dumpText http://kdd.csd.uwo.ca/doku.php/people/xiao_li" to extract text. However, nutch can only extract text from the first webpage but give no results for the second one. I do not understand why. The two webpages come from a template.

