Hi, everyone!

Another problem that I got sometimes. We are using ManifoldCF 2.22.1 with
multiple nodes in our production. The creation of the MCF job pipeline is
handled via the API calls from our service. We create jobs, repositories
and output repositories. The crawler extracts documents and then they are
pushed to the Solr. The pipeline works OK.

The problem is about deleteing the job. Sometimes the job get stucked with
a `Cleaning up` status (in DB it has status `e` that corresponds to status
`STATUS_DELETING`). This time I have used MCF Web Admin to delete the job
(pressed the delete button on the job list page).

I have checked sources and debug it a bit. The method
`deleteJobsReadyForDelete()`
(`org.apache.manifoldcf.crawler.jobs.JobManager.deleteJobsReadyForDelete()`)
is works OK. It is unable to delete the job cause it still found some
documents in the document's queue table. The following SQL is executed
within this method:

```sql
select id from jobqueue where jobid = '1658215015582' and (status = 'E' or
status = 'D') limit 1;
```

where `E` status stands for `STATUS_ELIGIBLEFORDELETE` and `D` status
stands for `STATUS_BEINGDELETED`. If at least one of such a documents is
found in the queue it will do nothing. At the moment I had a lot of
documents resided within the `jobqueue` having indicated statuses (actually
all of them have `D` status).

I see that `Documents delete stuffer thread` is running, and it set status
`STATUS_BEINGDELETED` to the documents via the
`getNextDeletableDocuments()` method
(`org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(String,
int, long)`). But I can't find any logic that actually deletes the
documents. I've searched throught the sources, but status
`STATUS_BEINGDELETED` mentioned mostly in `NOT EXISTS ...` queries.
Searching in reverse order from `JobQueue`
(`org.apache.manifoldcf.crawler.jobs.JobQueue`) also doesn't give result to
me. I will be appreciated if somewone can point where to look, so I can
debug and check what conditions are preventing documents to be removed.

Thank you!

With respect,
Artem Abeleshev

Reply via email to