Hello Users, Can anybody please revert on this. It would be highly appreciated.
On Fri, Apr 3, 2020 at 2:28 PM ritika jain <ritikajain5...@gmail.com> wrote: > Hi All, > I am using Manifoldcf 2.14 to crawl data from a website using Web as Repo > connector and Elastic Search as output connector, > I want to get some knowledge about the crawling framework/hierarchy used > by the webcrawler. > As far as I know or I understand the crawling of the URL's works in the > manner of tree structure. > > I want to know if there is any functionality supported by manifoldcf as of > now to store parent URL of a document > For example seed URL is: www.example.com. and at document queue 80th > number our document identifier is > www.example.com/education/univeristy/234.html. > > Is there any way manifolcf is storing the back traced URL's, that means by > following which hierarchy level the 80th document has came from. > Like to store 79th, 78th,77th level of document crawl to reach 80th number > of documents followed by seed document. > > Is this crawling hierarchy (if only level also), is being stored somewhere > in manifoldcf code yet. If yes does this framework code is present in the > form of jar.?? helpful or if not in jar any clue to which Java file this > logic is being implemented, will be really. > > Any kind of clue or help will be really appreciated. > > Many Thanks > Ritika > >