I need both ingestion and deletion. Karl
On Tue, Oct 29, 2019 at 8:09 AM Priya Arora <pr...@smartshore.nl> wrote: > History is shown as below as it does not indicates any error. > [image: 12.JPG] > > Thanks > Priya > > On Tue, Oct 29, 2019 at 5:02 PM Karl Wright <daddy...@gmail.com> wrote: > >> What does the history say about these documents? >> Karl >> >> On Tue, Oct 29, 2019 at 6:53 AM Priya Arora <pr...@smartshore.nl> wrote: >> >>> >>> it may be that (a) they weren't found, or (b) that the document >>> specification in the job changed and they are no longer included in the job. >>> >>> URL's that were deleted are valid URL's(as that does not result in 404 >>> or page not found error), and it is not being mentioned in Exclusion tab of >>> job configuration. >>> And the URL's were getting indexed earlier and except for index name in >>> Elasticsearch nothing is changed in Job specification and in other >>> connectors. >>> >>> Thanks >>> Priya >>> >>> On Tue, Oct 29, 2019 at 3:40 PM Karl Wright <daddy...@gmail.com> wrote: >>> >>>> ManifoldCF is an incremental crawler, which means that on every >>>> (non-continuous) job run it sees which documents it can find and removes >>>> the ones it can't. The history for the documents being deleted should tell >>>> you why they are being deleted -- it may be that (a) they weren't found, or >>>> (b) that the document specification in the job changed and they are no >>>> longer included in the job. >>>> >>>> Karl >>>> >>>> >>>> On Tue, Oct 29, 2019 at 5:30 AM Priya Arora <pr...@smartshore.nl> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I have a query regarding ManifoldCF Job process.I have a job to crawl >>>>> intranet site >>>>> Repository Type:- Web >>>>> Output Connector Type:- Elastic search. >>>>> >>>>> Job have to crawl around4-5 lakhs of total records. I have discarded >>>>> the previous index and created a new index(in Elasticsearch) with proper >>>>> mappings and settings and started the job again after cleaning Database >>>>> even(Database used a PostgreSQL). >>>>> But while the job continues its ingests the records properly but just >>>>> before finishing (some times in between also), it initiates the process of >>>>> Deletions and also it does not index the deleted documents again in index. >>>>> >>>>> Can you please something if I am doing anything wrong? or is this a >>>>> process of manifoldcf if yes , why its not getting ingested again. >>>>> >>>>> Thanks and regards >>>>> Priya >>>>> >>>>>