See: https://manifoldcf.apache.org/release/release-1.10/en_US/how-to-build-and-deploy.html#file+properties
Look at the table "Advanced properties.xml properties" Karl On Mon, Feb 11, 2019 at 4:16 AM LIROT Daniel - SG/SPSSI/CPII/DOSO/ET < daniel.li...@developpement-durable.gouv.fr> wrote: > Hello, > > 1/ The database we use is Postgresql version 9.6 > > 2/ I will look at what is happening about the queries in the logs. > > 3/ We do a vacuum full analyse every 24 hours, for each table we adjust > the reindex at the value 5000000 (in properties.xml) with the line : > <property name="org.apache.manifoldcf.db.postgres.reindex.intrinsiclink" > value="5000000" /> > > Is there an instruction that allows to disable the reindex requested by > manifoldcf > > thanks > > Daniel > > > Le 08/02/2019 à 16:00, > Karl Wright (par Internet, dépôt > user-return-5674-daniel.lirot=developpement-durable.gouv...@manifoldcf.apache.org) > a écrit : > > Hello, > > (1) What database are you using for this? Some databases require > maintenance periodically or have other heavy usage constraints. > (2) Every time a query takes more than an minute to execute, it is logged, > along with the query plan. You need to look at the manifoldcf log to see > which queries are problematic before concluding anything. > (3) For every database table, you can individually configure how many > table operations approximately occur before MCF re-analyzes the table. > However, it's likely that you have the opposite problem: a bad query plan > for the query that queues documents for processing. That may mean more > frequent analysis to prevent. But we cannot tell that until we understand > what queries are taking a long time. > > Thanks, > Karl > > > > On Fri, Feb 8, 2019 at 8:07 AM LIROT Daniel - SG/SPSSI/CPII/DOSO/ET < > daniel.li...@developpement-durable.gouv.fr> wrote: > >> Hello, >> >> We use ManifoldCF v2.10, with postgresql (9.6) to crawl our websites. >> this represents approximately 1.2 million documents. >> We split the crawl into 4 jobs that distribute their results on 3 SOLR >> collections. >> The crawl is powerful up to 500000 documents (25000 to 30000 docs / hour) >> then the performance decreases strongly in progress, we observe freezes >> very very long, you might think that the crawl is stopped. >> We suspect a reindexing, noticeably of the intrinsiclink table which is >> very important 85 Million lines. >> Is it possible to prohibit re-indexing controlled by manifoldCF? >> An other idea ? >> >> best Regards >> LIROT daniel >> -- >> > >