Hi Radim, 

  Thank you so much for this. I am not familiar with commit process to the 
core. 
  Is there someone who can help us get this committed and help resolve this 
issue?

Thanks for all your help.

Rajesh Ramana

-----Original Message-----
From: Radim Kolar [mailto:[email protected]] 
Sent: Thursday, October 06, 2011 2:18 PM
To: [email protected]
Subject: Re: Nutch not crawling URLs with spanish accented characters ( ñ)

- The REGEX normalizer transforms the special characters, but fails to 
substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’
  - The fetcher is having trouble interpreting the links with special character 
‘ñ’.

i can add this transformation to basic-url normalizer if somebody is willing to 
commit it.

Reply via email to