Hi,

We are crawling a site using nutch 1.6 and indexing into solr.

However, we need to rewrite the urls that are indexed in the following way

For instance, nutch crawls a page http://www.example.com/article=xxx but
when moving data to the index we would like to use the url

http://www.example.com/kb#article=xxx <http://www.example.com/article=xxx>

Instead. So when we get data from solr it will show links to
http://www.example.com/kb#article=xxx
<http://www.example.com/article=xxx> instead
of http://www.example.com/article=xxx

Is that possible to do by creating a plugin that extends the UrlNormalizer,
eg

http://nutch.apache.org/apidocs-1.4/org/apache/nutch/net/URLNormalizer.html

Or is it better to add a new indexed property that we use.

Best Regards
Niels

Reply via email to