Re: Anchor text of current URL

chethan Mon, 08 Oct 2012 08:51:13 -0700

It turns out that db.ignore.internal.links and
db.ignore.external.links were set to true and because of that the linkdb
was not populated at all. That's the reason the inlinks was null in the
Indexing filter. Thanks anyway.


- Chethan

On Mon, Oct 8, 2012 at 3:27 PM, chethan <chethan.p...@gmail.com> wrote:

> Well, it was a crawl command first and then a solrindex command. The crawl
> does invertlinks as well right?
>
> Thanks
> Chethan
>
>
> On Mon, Oct 8, 2012 at 3:16 PM, Markus Jelsma 
> <markus.jel...@openindex.io>wrote:
>
>> Hi - did you run the invertlinks program over your segments before
>> indexing?
>>
>> -----Original message-----
>> > From:chethan <chethan.p...@gmail.com>
>> > Sent: Mon 08-Oct-2012 04:28
>> > To: user@nutch.apache.org
>> > Subject: Anchor text of current URL
>> >
>> > Hi,
>> >
>> > In an indexing filter, is there a way to figure out the Anchor text from
>> > which the current URL/document originated from? I tried the inlinks but
>> > that seems to be null.
>> >
>> > public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
>> > CrawlDatum datum, Inlinks inlinks) IndexingException {
>> >
>> > *    //Need to know the anchor text from which the current document
>> > originated from at this point*
>> >
>> > }
>> >
>> > Thanks
>> > Chethan
>> >
>>
>
>

Re: Anchor text of current URL

Reply via email to