Thanks a lot Karl

In the “Simple History” in ManifoldCF I see, for every document, even if it’s 
not been modified every day:

26/05/23, 08:47:47         document ingest (SolrShare)     
26/05/23, 08:47:46         extract [TikaTrasform]          
26/05/23, 08:47:45         access                          

In Solr, I execute the query to search the document and I see, omitting 
extended result..) :

        "resourcename":"...Avanzato 2014.pptx",

Is this what did you mean when you mentioned “activity log” ?

I see that document in Solr, so, I suppose that it is indexed

What could I investigated furthermore?
Thanks a lot


Da: Karl Wright <>
Inviato: venerdì 26 maggio 2023 07:20
Oggetto: Re: Long Job on Windows Share

The jcifs connector does not include a lot of information in the version string 
for a file - basically, the length, and the modified date.  So I would not 
expect there to be lot of actual work involved if there are no changes to a 

The activity "access" does imply that the system believes that the document 
does need to be reindexed.  It clearly reads the document properly.  I would 
check to be sure it actually indexes the document.  I suspect that your job may 
be reading the file but determining it is not suitable for indexing and then 
repeating that every day.  You can see this by looking for the document in the 
activity log to see what ManifoldCF decided to do with it.


On Thu, May 25, 2023 at 6:03 AM Bisonti Mario 
<<>> wrote:
I would like to understand how recrawl works

My job scan, using “Connection Type”  “Windows shares” works for near 18 hours.
My document numebr a little bit of 1 million.

If I check the documents scan from MifoldCF I see, for example:

It seems that re work on the document every day even if it hadn’t been modified.
So, is it right or I chose a wrong job to crawl the documents?

Thanks a lot

Reply via email to