Please help me out regarding this. It's urgent. On 5 Mar 2018 15:41, "Yash Thenuan Thenuan" <rit2014...@iiita.ac.in> wrote:
> How can I achieve this in nutch 1.x? > > On 1 Mar 2018 22:30, "Sebastian Nagel" <wastl.na...@googlemail.com> wrote: > >> Hi, >> >> Yes, that's possible but only for Nutch 1.x: >> a ParseResult [1] may contain multiple ParseData objects >> each accessible by a separate URL. >> This feature is not available for 2.x [2]. >> >> It's used by the feed parser plugin to add a single >> entry for every feed item. Afaik, that's not supported >> out of the box for sections of a page (e.g., split by >> anchors or h1/h2/h3). You would need to write a >> parse-filter plugin to achieve this. >> >> I've once used it to index parts of a page identified >> by XPath expressions. >> >> Best, >> Sebastian >> >> [1] https://nutch.apache.org/apidocs/apidocs-1.14/org/apache/ >> nutch/parse/ParseResult.html >> [2] https://nutch.apache.org/apidocs/apidocs-2.3.1/org/apache/ >> nutch/parse/Parse.html >> >> >> On 03/01/2018 08:02 AM, Yash Thenuan Thenuan wrote: >> > Hi there, >> > For example we have a url >> > https://wiki.apache.org/nutch/NutchTutorial#Table_of_Contents >> > here #table_of _contents is a internal link. >> > I want to separate the contents of the page on the basis of internal >> links. >> > Is this possible in nutch?? >> > I want to index the contents of each internal link separately. >> > >> >>