Re: Extraction of related links

2020-02-12 Thread Karl Wright
This is not functionality that ManifoldCF supports out of the box.  The
extracted links are used for crawling, not as metadata.

I don't see a general use-case for this either, so I think you're on your
own modifying the web connector code to do what you want.  The
RepositoryDocument structure has arbitrary multi-valued fields; just put
what you want into one such field and you should see it in Elastic Search.

Karl


On Thu, Feb 13, 2020 at 1:57 AM ritika jain 
wrote:

> Hi All,
>
> I am using Manifoldcf 2.12, Repository as Web connector and Output as ES.
> As per requirement now, I want to save all related sub-links of a
> particular document Identifier(at a time). For example :-DocumentId::-
> www.xyz.com, so I would like to extract all related sublinks say:-
> www.xyz.com/abc, www.xyz.com/pqr etc.and save it in variable and then
> pass it to Elastic search
>
> I had gone the Web Repo code and thought of the function extractLinks
> ( protected boolean extractLinks(String documentIdentifier,
> IProcessActivity activities, DocumentURLFilter filter)) can do so.
> Is the existing functionality of MF is able for this extraction or we have
> to customize it? Any help would be appreciated.
>
>
> Thanks
> Ritika
>


Extraction of related links

2020-02-12 Thread ritika jain
Hi All,

I am using Manifoldcf 2.12, Repository as Web connector and Output as ES.
As per requirement now, I want to save all related sub-links of a
particular document Identifier(at a time). For example :-DocumentId::-
www.xyz.com, so I would like to extract all related sublinks say:-
www.xyz.com/abc, www.xyz.com/pqr etc.and save it in variable and then pass
it to Elastic search

I had gone the Web Repo code and thought of the function extractLinks
( protected boolean extractLinks(String documentIdentifier,
IProcessActivity activities, DocumentURLFilter filter)) can do so.
Is the existing functionality of MF is able for this extraction or we have
to customize it? Any help would be appreciated.


Thanks
Ritika