I've run into a small issue with my deployment of Nutch. Some of the sites I crawl use characters such as æøå in their URLs, and these never seem to parse properly. Is there any way to get around this? I tried adding the UTF-values (as '\u00e5' and so on) in regex-normalize.xml, but I suppose they may be misparsed already when they're fetched, so they aren't actually seen as e.g. character 00e5. Any suggestions would be much appreciated.

