Hi,

We see the table "Advanced properties.xml properties", we use it to parametrized : "<property name="org.apache.manifoldcf.db.postgres.reindex.intrinsiclink" value="5000000" />" for the intrinsiclink table, and we do the same for the other tables, but is there a value that allows to disable the reindex and the analyze, for example "-1" or "0", i didn't find it in the documentation.

Thank you


Le 11/02/2019 à 12:26, > Karl Wright (par Internet, dépôt user-return-5690-daniel.lirot=developpement-durable.gouv...@manifoldcf.apache.org) a écrit :
See: https://manifoldcf.apache.org/release/release-1.10/en_US/how-to-build-and-deploy.html#file+properties

Look at the table "Advanced properties.xml properties"

Karl


On Mon, Feb 11, 2019 at 4:16 AM LIROT Daniel - SG/SPSSI/CPII/DOSO/ET <daniel.li...@developpement-durable.gouv.fr <mailto:daniel.li...@developpement-durable.gouv.fr>> wrote:

    Hello,

    1/ The database we use is Postgresql version 9.6

    2/ I will look at what is happening about the queries in the logs.

    3/ We do a vacuum full analyse every 24 hours, for each table we
    adjust the reindex at the value 5000000 (in properties.xml) with
    the line :
     <property
    name="org.apache.manifoldcf.db.postgres.reindex.intrinsiclink"
    value="5000000" />

    Is there an instruction that allows to disable the reindex
    requested by manifoldcf

    thanks

    Daniel


    Le 08/02/2019 à 16:00, > Karl Wright (par Internet, dépôt
    
user-return-5674-daniel.lirot=developpement-durable.gouv...@manifoldcf.apache.org
    
<mailto:user-return-5674-daniel.lirot=developpement-durable.gouv...@manifoldcf.apache.org>)
    a écrit :
    Hello,

    (1) What database are you using for this?  Some databases require
    maintenance periodically or have other heavy usage constraints.
    (2) Every time a query takes more than an minute to execute, it
    is logged, along with the query plan.  You need to look at the
    manifoldcf log to see which queries are problematic before
    concluding anything.
    (3) For every database table, you can individually configure how
    many table operations approximately occur before MCF re-analyzes
    the table.  However, it's likely that you have the opposite
    problem: a bad query plan for the query that queues documents for
processing. That may mean more frequent analysis to prevent. But we cannot tell that until we understand what queries are
    taking a long time.

    Thanks,
    Karl



    On Fri, Feb 8, 2019 at 8:07 AM LIROT Daniel -
    SG/SPSSI/CPII/DOSO/ET <daniel.li...@developpement-durable.gouv.fr
    <mailto:daniel.li...@developpement-durable.gouv.fr>> wrote:

        Hello,

        We use ManifoldCF v2.10, with postgresql (9.6) to crawl our
        websites.
        this represents approximately 1.2 million documents.
        We split the crawl into 4 jobs that distribute their results
        on 3 SOLR collections.
        The crawl is powerful up to 500000 documents (25000 to 30000
        docs / hour) then the performance decreases strongly in
        progress, we observe freezes very very long, you might think
        that the crawl is stopped.
        We suspect a reindexing, noticeably of the intrinsiclink
        table which is very important 85 Million lines.
        Is it possible to prohibit re-indexing controlled by manifoldCF?
        An other idea ?

        best Regards
        LIROT daniel
--


Reply via email to