Hi Ritika,

There is no deletion process.  Deletion takes place when a job is run in a
mode where deletion is possible (there are some where it is not).  The way
it takes place depends on the kind of repository connector (what model it
declares itself to use).

For the most common kinds of connectors, the job sequence involves scanning
all documents described by the job.  If the document is gone, it is deleted
right away.  If the document just wasn't accessed on the crawl, then and at
the end, those no-longer-referenced documents are removed.

Karl


On Tue, Mar 30, 2021 at 9:03 AM ritika jain <ritikajain5...@gmail.com>
wrote:

> Hi All,
>
> I want to understand the process of Manifoldcf Deletion . i.e in which all
> cases Deletion process (When checked in Simple History) executes.
> One case as per my knowledge , is the one whenever Seed URL of a
> particular job is changed.
> What all are the cases when Deletion process runs.
>
> My requirement to research whether Manifold is capable of handling  the
> scenario, say when a URL is existing and ingested in Elastic Index (say:-
> www.abc.com),
>
> Next time when job is run ,say the URL www.abc.com does not exist anymore
> and resulted in 404, Is Manifoldcf is capable of handling(by default) this
> 404 URL and deleting the URL from Database and from ElasticSearch Index(in
> which it was ingested already)..
>
> Any help will be thankful.
> Thanks
> Ritika
>

Reply via email to