Hi Shigeki, What database is ManifoldCF configured to use in this case? Do you see any indication of slow queries in the ManifoldCF log?
Karl On Fri, Jan 18, 2013 at 5:27 AM, Shigeki Kobayashi <[email protected]> wrote: > Hello > > > I would like some advice to improve crawling time of new/updated files using > Windows share connection. > > I crawl file in Windows server and index them into Solr. > > Currently, the second crawling of two hundred thousands files takes over 5 > hours, even though any files are not updated, created, deleted. > > I assume MCF does the following processes (let me know if I am wrong) > > - obtain updated time of a file > - compare the updated time with the one MCF obtained last time crawling( > probably stored in DB) > - if they are different MCF recognizes the file is to be indexed. > > If the above processes are done for two thousands files, what part of the > processes could take time the most? obtaining updated time? reading data > from DB? what could be done to increase the crawling time do you think? > > Please give me some advice. > > > Regards, > > Shigeki > >
