Hello,

I have problems fetching some urls having GET parameters with nutch. For
example, nutch is fetching :

http://www.mywebsite.com/studies/formation-offer/Sciences-Technologies-Sante?domaine=1&diplome=TI-DUT&composante=

but will not fetch :
http://www.mywebsite.com/studies/formation-offer/Sciences-Technologies-Sante?domaine=1&diplome=TI-DUT&composante=&mention=FR_RNE_0593559Y_PR_ST-dut-000001&specialite=FR_RNE_0593559Y_PR_formation-DUT-INFO

I updated the crawl-urlfilter :
#-[?*!@=]

+^http://www.mywebsite.com/studies/formation-offer/

and nutch-default.xml :

<property>
  <name>db.max.anchor.length</name>
  <value>300</value>
  <description>The maximum number of characters permitted in an anchor.
  </description>
</property>

but i have the same result, i didn't find anything in the configuration files to
make it work. Have somebody an idea ?

Best regards,
David

Reply via email to