Hi - did you run the invertlinks program over your segments before indexing? 
 
-----Original message-----
> From:chethan <chethan.p...@gmail.com>
> Sent: Mon 08-Oct-2012 04:28
> To: user@nutch.apache.org
> Subject: Anchor text of current URL
> 
> Hi,
> 
> In an indexing filter, is there a way to figure out the Anchor text from
> which the current URL/document originated from? I tried the inlinks but
> that seems to be null.
> 
> public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
> CrawlDatum datum, Inlinks inlinks) IndexingException {
> 
> *    //Need to know the anchor text from which the current document
> originated from at this point*
> 
> }
> 
> Thanks
> Chethan
> 

Reply via email to