[jira] [Created] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-12 Thread Mingchun Zhao (Jira)
Mingchun Zhao created CONNECTORS-1747:
-

 Summary: Add a property to disable logging hop count to database
 Key: CONNECTORS-1747
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
 Project: ManifoldCF
  Issue Type: Improvement
Reporter: Mingchun Zhao


If we do not require “Hop Filters“ feature, we need to consider to disable 
logging records related to hopcount to database like "intrinsiclink" and 
"hopcount" tables. This can increase throughput and reduce the rate of growth 
of the database.
I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: ManifoldCF » ManifoldCF-ant #74

2023-05-12 Thread Apache Jenkins Server
See 


Changes:

[Karl Wright] CONNECTORS-1746: Add support for smarter analysis


--
[...truncated 483.06 KB...]
AUsite/src/documentation/skins/common/xslt/html/document-to-html.xsl
AUsite/src/documentation/skins/common/xslt/html/renderlogo.xsl
AUsite/src/documentation/skins/common/xslt/html/strip_namespaces.xsl
AUsite/src/documentation/skins/common/xslt/html/tabutils.xsl
AUsite/src/documentation/skins/common/xslt/html/tab-to-menu.xsl
A site/src/documentation/skins/common/xslt/fo
AUsite/src/documentation/skins/common/xslt/fo/footerinfo.xsl
AUsite/src/documentation/skins/common/xslt/fo/document-to-fo.xsl
AUsite/src/documentation/skins/common/xslt/fo/pdfoutline.xsl
A site/src/documentation/skins/common/xslt/svg
AUsite/src/documentation/skins/common/xslt/svg/document-to-svg.xsl
AUsite/src/documentation/skins/common/skinconf.xsl
A site/src/documentation/skins/common/translations
AUsite/src/documentation/skins/common/translations/CommonMessages_es.xml
AUsite/src/documentation/skins/common/translations/CommonMessages_fr.xml
AUsite/src/documentation/skins/common/translations/CommonMessages_de.xml
AU
site/src/documentation/skins/common/translations/CommonMessages_en_US.xml
A site/src/documentation/skins/common/images
AUsite/src/documentation/skins/common/images/poddoc.svg.xslt
AUsite/src/documentation/skins/common/images/corner-imports.svg.xslt
AUsite/src/documentation/skins/common/images/rc.svg.xslt
AUsite/src/documentation/skins/common/images/README.txt
AUsite/src/documentation/skins/common/images/txtdoc.svg.xslt
AUsite/src/documentation/skins/common/images/dc.svg.xslt
AUsite/src/documentation/skins/common/images/instruction_arrow.png
A site/src/documentation/skins/common/scripts
AUsite/src/documentation/skins/common/scripts/menu.js
AUsite/src/documentation/skins/common/scripts/getMenu.js
AUsite/src/documentation/skins/common/scripts/prototype.js
AUsite/src/documentation/skins/common/scripts/getBlank.js
AUsite/src/documentation/skins/common/scripts/breadcrumbs.js
AUsite/src/documentation/skins/common/scripts/breadcrumbs-optimized.js
AUsite/src/documentation/skins/common/scripts/fontsize.js
A site/src/documentation/skins/common/css
AUsite/src/documentation/skins/common/css/forrest.css.xslt
A site/src/documentation/skins/lucene
A site/src/documentation/skins/lucene/xslt
A site/src/documentation/skins/lucene/xslt/html
AUsite/src/documentation/skins/lucene/xslt/html/site-to-xhtml.xsl
AUsite/src/documentation/skins/lucene/xslt/html/book-to-menu.xsl
AUsite/src/documentation/skins/lucene/xslt/html/document-to-html.xsl
AUsite/src/documentation/skins/lucene/xslt/html/tab-to-menu.xsl
A site/src/documentation/skins/lucene/xslt/fo
AUsite/src/documentation/skins/lucene/xslt/fo/document-to-fo.xsl
AUsite/src/documentation/skins/lucene/skinconf.xsl
AUsite/src/documentation/skins/lucene/note.txt
A site/src/documentation/skins/lucene/images
AUsite/src/documentation/skins/lucene/images/page.gif
AUsite/src/documentation/skins/lucene/images/current.gif
AUsite/src/documentation/skins/lucene/images/chapter.gif
AUsite/src/documentation/skins/lucene/images/instruction_arrow.png
A site/src/documentation/skins/lucene/css
AUsite/src/documentation/skins/lucene/css/profile.css.xslt
AUsite/src/documentation/skins/lucene/css/print.css
AUsite/src/documentation/skins/lucene/css/screen.css
AUsite/src/documentation/skins/lucene/css/basic.css
A site/src/documentation/conf
AUsite/src/documentation/conf/cli.xconf
AUsite/src/documentation/README.txt
A site/src/documentation/classes
AUsite/src/documentation/classes/CatalogManager.properties
AUsite/build.xml
A site/pdf
AUsite/pdf/config.xml
AUsite/pdf/helper-footerinfo.xsl
AUsite/pdf/document-to-fo.xsl
AUsite/pdf/output.xmap
AUsite/README.txt
AUsite/.htaccess
AUsite/forrest.properties.xml
AUsite/forrest.properties
A dist-license
AUdist-license/LICENSE.txt
AUdist-license/NOTICE.txt
AUdist-license/DEPENDENCIES.txt
AUdist-license/README.txt
A lib-license
AUlib-license/LICENSE.txt
AUlib-license/NOTICE.txt
AUlib-license/README.txt
A .gitignore
AULICENSE.txt
AUDEPENDENCIES.txt
AUREADME.txt
AUKEYS
A src
A src/main
A src/main/assembly
AUsrc/main/assembly/src.xml
AU

Jenkins build is back to normal : ManifoldCF » ManifoldCF-mvn #81

2023-05-12 Thread Apache Jenkins Server
See 




Build failed in Jenkins: ManifoldCF » ManifoldCF-Artifacts-Ant-JDK11 #77

2023-05-12 Thread Apache Jenkins Server
See 


Changes:

[Karl Wright] CONNECTORS-1746: Add support for smarter analysis


--
Started by an SCM change
Running as SYSTEM
[EnvInject] - Loading node environment variables.
Building remotely on builds40 (ubuntu) in workspace 

Updating https://svn.apache.org/repos/asf/manifoldcf/trunk at revision 
'2023-05-13T01:34:11.264 +'
U 
framework/core/src/main/java/org/apache/manifoldcf/core/database/DBInterfacePostgreSQL.java
At revision 1909784

[ManifoldCF-Artifacts-Ant-JDK11] $ ant  clean-core-deps make-core-deps 
clean
Buildfile: 

Trying to override old definition of task javac

clean-core-deps:
   [delete] Deleting directory 


download-jakarta-apis:
[mkdir] Created dir: 


setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/jakarta/activation/jakarta.activation-api/1.2.1/jakarta.activation-api-1.2.1.jar
  [get] To: 


setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/jakarta/ws/rs/jakarta.ws.rs-api/2.1.6/jakarta.ws.rs-api-2.1.6.jar
  [get] To: 


setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/jakarta/xml/bind/jakarta.xml.bind-api/2.3.2/jakarta.xml.bind-api-2.3.2.jar
  [get] To: 


setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/jakarta/xml/ws/jakarta.xml.ws-api/2.3.2/jakarta.xml.ws-api-2.3.2.jar
  [get] To: 


setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/jakarta/xml/soap/jakarta.xml.soap-api/1.4.1/jakarta.xml.soap-api-1.4.1.jar
  [get] To: 


setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/jakarta/annotation/jakarta.annotation-api/1.3.4/jakarta.annotation-api-1.3.4.jar
  [get] To: 


setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/jakarta/jws/jakarta.jws-api/1.1.1/jakarta.jws-api-1.1.1.jar
  [get] To: 


download-jakarta-runtimes:

setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/com/sun/activation/jakarta.activation/1.2.2/jakarta.activation-1.2.2.jar
  [get] To: 


download-protobuf:

setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/com/google/protobuf/protobuf-java/3.19.4/protobuf-java-3.19.4.jar
  [get] To: 


download-less-compiler:

setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/com/github/sommeri/less4j/1.17.2/less4j-1.17.2.jar
  [get] To: 


setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/org/antlr/antlr-runtime/3.5.2/antlr-runtime-3.5.2.jar
  [get] To: 


setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.9.4/commons-beanutils-1.9.4.jar
  [get] To: 


download-forbidden-checks:

setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/de/thetaphi/forbiddenapis/3.4/forbiddenapis-3.4.jar
  [get] To: 

[jira] [Comment Edited] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-12 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722207#comment-17722207
 ] 

Mingchun Zhao edited comment on CONNECTORS-1746 at 5/12/23 10:34 PM:
-

Hello,

Here is a patch for adding options for PostgreSQL’s “ANALYZE” command.
I’ve tried to add two properties to handle 'ANALYZE' command as below.

1. "org.apache.manifoldcf.db.postgres.analyzeatstart"
If this property is set to true, then analyze a table which is specified by 
property "org.apache.manifoldcf.db.postgres.analyze." at the start 
of job. defaults to false (not to run "ANALYZE" at the start).

2. "org.apache.manifoldcf.db.postgres.analyzeratethreshold"
If this property is set to a positive integer, then analyze a table which is 
specified by property "org.apache.manifoldcf.db.postgres.analyze." 
only when events per second drops below the threshold. defaults to 0(not to 
check event rate).

I tested using the attached patch and confirmed that the “ANALYZE” command was 
executed correctly in the above two situations. Especially, when MCF's 
throughput (event counts per second) dropped due to PostgreSQL's bad query 
plan, an “ANALYZE” command was executed and the MCF's performance recovered.

[^DBInterfacePostgreSQL.java.patch]


was (Author: mingchun.zhao):
Hello,

Here is a patch for adding options for PostgreSQL’s “ANALYZE” command.
I’ve tried to add two properties to handle 'ANALYZE' command as below.

1. "org.apache.manifoldcf.db.postgres.analyzeatstart"
If this property is set to true, then analyze a table which is specified by 
property "org.apache.manifoldcf.db.postgres.analyze." at the start 
of job. defaults to false (not to run "ANALYZE" at the start).

2. "org.apache.manifoldcf.db.postgres.analyzeratethreshold"
If this property is set to a positive integer, then analyze a table which is 
specified by property "org.apache.manifoldcf.db.postgres.analyze." 
only when events per second drops below the threshold. defaults to 1 (1 event 
processed per second).

I tested using the attached patch and confirmed that the “ANALYZE” command was 
executed correctly in the above two situations. Especially, when MCF's 
throughput (event counts per second) dropped due to PostgreSQL's bad query 
plan, an “ANALYZE” command was executed and the MCF's performance recovered.

[^DBInterfacePostgreSQL.java.patch]

> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Attachments: DBInterfacePostgreSQL.java.patch
>
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-12 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722261#comment-17722261
 ] 

Karl Wright commented on CONNECTORS-1746:
-

Patch committed: r1909780


> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Attachments: DBInterfacePostgreSQL.java.patch
>
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-12 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722207#comment-17722207
 ] 

Mingchun Zhao edited comment on CONNECTORS-1746 at 5/12/23 3:25 PM:


Hello,

Here is a patch for adding options for PostgreSQL’s “ANALYZE” command.
I’ve tried to add two properties to handle 'ANALYZE' command as below.

1. "org.apache.manifoldcf.db.postgres.analyzeatstart"
If this property is set to true, then analyze a table which is specified by 
property "org.apache.manifoldcf.db.postgres.analyze." at the start 
of job. defaults to false (not to run "ANALYZE" at the start).

2. "org.apache.manifoldcf.db.postgres.analyzeratethreshold"
If this property is set to a positive integer, then analyze a table which is 
specified by property "org.apache.manifoldcf.db.postgres.analyze." 
only when events per second drops below the threshold. defaults to 1 (1 event 
processed per second).

I tested using the attached patch and confirmed that the “ANALYZE” command was 
executed correctly in the above two situations. Especially, when MCF's 
throughput (event counts per second) dropped due to PostgreSQL's bad query 
plan, an “ANALYZE” command was executed and the MCF's performance recovered.

[^DBInterfacePostgreSQL.java.patch]


was (Author: mingchun.zhao):
Hello,

Here is a patch for adding options for PostgreSQL’s “ANALYZE” command.
I’ve tried to add two properties to handle 'ANALYZE' command as below.
 # "org.apache.manifoldcf.db.postgres.analyzeatstart"
If this property is set to true, then analyze a table which is specified by 
property "org.apache.manifoldcf.db.postgres.analyze." at the start 
of job. defaults to false (not to run "ANALYZE" at the start).

 # "org.apache.manifoldcf.db.postgres.analyzeratethreshold"
If this property is set to a positive integer, then analyze a table which is 
specified by property "org.apache.manifoldcf.db.postgres.analyze." 
only when events per second drops below the threshold. defaults to 1 (1 event 
processed per second).

I tested using the attached patch and confirmed that the “ANALYZE” command was 
executed correctly in the above two situations. Especially, when MCF's 
throughput (event counts per second) dropped due to PostgreSQL's bad query 
plan, an “ANALYZE” command was executed and the MCF's performance recovered.

[^DBInterfacePostgreSQL.java.patch]

> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
> Attachments: DBInterfacePostgreSQL.java.patch
>
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-12 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722207#comment-17722207
 ] 

Mingchun Zhao commented on CONNECTORS-1746:
---

Hello,

Here is a patch for adding options for PostgreSQL’s “ANALYZE” command.
I’ve tried to add two properties to handle 'ANALYZE' command as below.
 # "org.apache.manifoldcf.db.postgres.analyzeatstart"
If this property is set to true, then analyze a table which is specified by 
property "org.apache.manifoldcf.db.postgres.analyze." at the start 
of job. defaults to false (not to run "ANALYZE" at the start).

 # "org.apache.manifoldcf.db.postgres.analyzeratethreshold"
If this property is set to a positive integer, then analyze a table which is 
specified by property "org.apache.manifoldcf.db.postgres.analyze." 
only when events per second drops below the threshold. defaults to 1 (1 event 
processed per second).

I tested using the attached patch and confirmed that the “ANALYZE” command was 
executed correctly in the above two situations. Especially, when MCF's 
throughput (event counts per second) dropped due to PostgreSQL's bad query 
plan, an “ANALYZE” command was executed and the MCF's performance recovered.

[^DBInterfacePostgreSQL.java.patch]

> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
> Attachments: DBInterfacePostgreSQL.java.patch
>
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-12 Thread Mingchun Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao updated CONNECTORS-1746:
--
Attachment: DBInterfacePostgreSQL.java.patch

> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
> Attachments: DBInterfacePostgreSQL.java.patch
>
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-12 Thread Mingchun Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao updated CONNECTORS-1746:
--
Description: 
Sometimes, the crawling does not process any documents for a while and there is 
nothing logged about long-running queries. The performance can be restored by 
firing the 'ANALYZE' command manually. It seems that a bad query plan caused 
this performance problem.

Therefore, in addition to the current configuration parameter 
'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
necessary to execute the 'ANALYZE' even in the following situations.
1. When the number of records in the table exceeds the number required for 
creating a execution plan after the job starts.
2. When the crawling performance slows down. For example, if the processing 
rate of documents drops below a specified threshold.

  was:
Sometimes, the crawling does not process any documents for a while and there is 
nothing logged about long-running queries. The performance can be restored by 
firing the 'ANALYZE' command manually. It seems that a bad query plan caused 
this performance problem.

Therefore, in addition to the current configuration parameter 
'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
necessary to execute the 'ANALYZE' even in the following situations.
1. When the number of records in the table exceeds the number required for 
creating a execution plan after the job starts.
2. When the crawling performance slows down. For example, if the processing 
rate of documents drops below a specified threshold.

So, how about adding two parameters to handle the timing of 'ANALYZE' execution 
as below?
1.'org.apache.manifoldcf.db.postgres.analyze..minimumrowcount'
Specify how many records should be inserted before carrying out an 'ANALYZE' on 
the specified table as the first time.defaults to 100.
2.'org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate'
Specify the minimum number of documents processed per minute. If the processing 
rate of documents drops below this threshold, the 'ANALYZE' will be executed. 
defaults to 1.


> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)