What does the history say about these documents? Karl On Tue, Oct 29, 2019 at 6:53 AM Priya Arora <pr...@smartshore.nl> wrote:
> > it may be that (a) they weren't found, or (b) that the document > specification in the job changed and they are no longer included in the job. > > URL's that were deleted are valid URL's(as that does not result in 404 or > page not found error), and it is not being mentioned in Exclusion tab of > job configuration. > And the URL's were getting indexed earlier and except for index name in > Elasticsearch nothing is changed in Job specification and in other > connectors. > > Thanks > Priya > > On Tue, Oct 29, 2019 at 3:40 PM Karl Wright <daddy...@gmail.com> wrote: > >> ManifoldCF is an incremental crawler, which means that on every >> (non-continuous) job run it sees which documents it can find and removes >> the ones it can't. The history for the documents being deleted should tell >> you why they are being deleted -- it may be that (a) they weren't found, or >> (b) that the document specification in the job changed and they are no >> longer included in the job. >> >> Karl >> >> >> On Tue, Oct 29, 2019 at 5:30 AM Priya Arora <pr...@smartshore.nl> wrote: >> >>> Hi All, >>> >>> I have a query regarding ManifoldCF Job process.I have a job to crawl >>> intranet site >>> Repository Type:- Web >>> Output Connector Type:- Elastic search. >>> >>> Job have to crawl around4-5 lakhs of total records. I have discarded the >>> previous index and created a new index(in Elasticsearch) with proper >>> mappings and settings and started the job again after cleaning Database >>> even(Database used a PostgreSQL). >>> But while the job continues its ingests the records properly but just >>> before finishing (some times in between also), it initiates the process of >>> Deletions and also it does not index the deleted documents again in index. >>> >>> Can you please something if I am doing anything wrong? or is this a >>> process of manifoldcf if yes , why its not getting ingested again. >>> >>> Thanks and regards >>> Priya >>> >>>