[ https://issues.apache.org/jira/browse/NUTCH-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2746. ------------------------------------ Resolution: Fixed Merged/committed. Note: by default the behavior is still the old and neither IDNs are normalized nor trailing dots in host names are stripped. > Basic URL normalizer to normalize Unicode domain names > ------------------------------------------------------ > > Key: NUTCH-2746 > URL: https://issues.apache.org/jira/browse/NUTCH-2746 > Project: Nutch > Issue Type: Improvement > Components: plugin, urlnormalizer > Affects Versions: 1.16 > Reporter: Sebastian Nagel > Priority: Major > Fix For: 1.17 > > > The BasicURLNormalizer (plugin urlnormalizer-basic) lacks the possibility to > normalize IDNs (Unicode host/domain names). -- This message was sent by Atlassian Jira (v8.3.4#803005)