If you are using Solr versions in the 4.x series, then you could update the fields [1] once the data is indexed. This is not doing the nutch way but this is something that came in min and can work right away.
[1] http://wiki.apache.org/solr/UpdateJSON#Atomic_Updates On Mon, Apr 22, 2013 at 9:56 AM, Niels Boldt <nielsbo...@gmail.com> wrote: > Hi, > > We are crawling a site using nutch 1.6 and indexing into solr. > > However, we need to rewrite the urls that are indexed in the following way > > For instance, nutch crawls a page http://www.example.com/article=xxx but > when moving data to the index we would like to use the url > > http://www.example.com/kb#article=xxx <http://www.example.com/article=xxx> > > Instead. So when we get data from solr it will show links to > http://www.example.com/kb#article=xxx > <http://www.example.com/article=xxx> instead > of http://www.example.com/article=xxx > > Is that possible to do by creating a plugin that extends the UrlNormalizer, > eg > > http://nutch.apache.org/apidocs-1.4/org/apache/nutch/net/URLNormalizer.html > > Or is it better to add a new indexed property that we use. > > Best Regards > Niels > -- Kiran Chitturi <http://www.linkedin.com/in/kiranchitturi>