Hi all, I just found a weird error and it looks like a JDK bug but I'm not sure. Whenever replacing a URL-A, that contains a number, with a URL-B, then I get an error: "IndexOutOfBoundsException: No group 1"
In my regex-normalize.xml, I have: <regex> <pattern>http://google1.com/.+</pattern> <substitution>http://google.com/$1</substitution> </regex> and trying: echo 'http://google2.com/whatever'|bin/nutchorg.apache.nutch.net.URLNormalizerChecker gives: Checking combination of all URLNormalizers available Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 1 at java.util.regex.Matcher.start(Matcher.java:374) at java.util.regex.Matcher.appendReplacement(Matcher.java:830) at java.util.regex.Matcher.replaceAll(Matcher.java:905) at org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer.regexNormalize(RegexURLNormalizer.java:181) at org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer.normalize(RegexURLNormalizer.java:188) at org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:286) at org.apache.nutch.net.URLNormalizerChecker.checkAll(URLNormalizerChecker.java:83) at org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110) Have you experienced this before? Remi