Because ManifoldCF is not just a crawler, but a synchonizer, a job
represents and includes a list of documents that have been indexed.
Deleting the job requires deleting the documents that have been indexed
also.  It's part of the basic model.

So if you tear down your target output instance and then try to tear down
the job, it won't work.  ManifoldCF won't just throw away the memory of
those documents and act as if nothing happened.

If you're just using ManifoldCF as a crawler, therefore, your fix is about
as good as it gets.

You can get into similar trouble if, for example, you reinstall ManifoldCF
but forget to include a connector class that was there before.  Carnage
ensues.

Karl


On Mon, Jun 13, 2022 at 1:39 AM Ricardo Ruiz <ricrui3s...@gmail.com> wrote:

> Hi all
> My team uses mcf to crawl documents and index into solr instances, but for
> reasons beyond our control, sometimes the instances or collections are
> deleted.
> When we try to delete a job and the solr instance or collection doesn't
> exist anymore, the job reaches the "End notification" status and gets stuck
> there. No other job can be aborted or deleted until the initial error is
> fixed.
>
> We are able to clean up the errors following the next steps:
>
> 1.  Reconfigure the output connector to an existing Solr instance and
> collection
> 2.  Reset the output connection, so it forgets any indexed documents.
> 3.  Reset the job, so it forgets any indexed documents.
> 4.  Restart the ManifoldCF server.
>
> Is there any other way we can solve this error? Is there any way we can
> force delete the job if we don't care about the job's documents anymore?
>
> Thanks in advance.
> Ricardo.
>

Reply via email to