Re: Regarding Internal Links

Yash Thenuan Thenuan Mon, 05 Mar 2018 04:00:02 -0800

Please help me out regarding this.
It's urgent.

On 5 Mar 2018 15:41, "Yash Thenuan Thenuan" <rit2014...@iiita.ac.in> wrote:


> How can I achieve this in nutch 1.x?
>
> On 1 Mar 2018 22:30, "Sebastian Nagel" <wastl.na...@googlemail.com> wrote:
>
>> Hi,
>>
>> Yes, that's possible but only for Nutch 1.x:
>> a ParseResult [1] may contain multiple ParseData objects
>> each accessible by a separate URL.
>> This feature is not available for 2.x [2].
>>
>> It's used by the feed parser plugin to add a single
>> entry for every feed item.  Afaik, that's not supported
>> out of the box for sections of a page (e.g., split by
>> anchors or h1/h2/h3). You would need to write a
>> parse-filter plugin to achieve this.
>>
>> I've once used it to index parts of a page identified
>> by XPath expressions.
>>
>> Best,
>> Sebastian
>>
>> [1] https://nutch.apache.org/apidocs/apidocs-1.14/org/apache/
>> nutch/parse/ParseResult.html
>> [2] https://nutch.apache.org/apidocs/apidocs-2.3.1/org/apache/
>> nutch/parse/Parse.html
>>
>>
>> On 03/01/2018 08:02 AM, Yash Thenuan Thenuan wrote:
>> > Hi there,
>> > For example we have a url
>> > https://wiki.apache.org/nutch/NutchTutorial#Table_of_Contents
>> > here #table_of _contents is a internal link.
>> > I want to separate the contents of the page on the basis of internal
>> links.
>> > Is this possible in nutch??
>> > I want to index the contents of each internal link separately.
>> >
>>
>>

Re: Regarding Internal Links

Reply via email to