[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.
[ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-1746: -- Attachment: DBInterfacePostgreSQL.java.patch > Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling > become extremely slow. > -- > > Key: CONNECTORS-1746 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1746 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the > database. >Reporter: Mingchun Zhao >Priority: Major > Attachments: DBInterfacePostgreSQL.java.patch > > > Sometimes, the crawling does not process any documents for a while and there > is nothing logged about long-running queries. The performance can be restored > by firing the 'ANALYZE' command manually. It seems that a bad query plan > caused this performance problem. > Therefore, in addition to the current configuration parameter > 'org.apache.manifoldcf.db.postgres.analyze.', it is considered > necessary to execute the 'ANALYZE' even in the following situations. > 1. When the number of records in the table exceeds the number required for > creating a execution plan after the job starts. > 2. When the crawling performance slows down. For example, if the processing > rate of documents drops below a specified threshold. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.
[ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-1746: -- Description: Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem. Therefore, in addition to the current configuration parameter 'org.apache.manifoldcf.db.postgres.analyze.', it is considered necessary to execute the 'ANALYZE' even in the following situations. 1. When the number of records in the table exceeds the number required for creating a execution plan after the job starts. 2. When the crawling performance slows down. For example, if the processing rate of documents drops below a specified threshold. was: Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem. Therefore, in addition to the current configuration parameter 'org.apache.manifoldcf.db.postgres.analyze.', it is considered necessary to execute the 'ANALYZE' even in the following situations. 1. When the number of records in the table exceeds the number required for creating a execution plan after the job starts. 2. When the crawling performance slows down. For example, if the processing rate of documents drops below a specified threshold. So, how about adding two parameters to handle the timing of 'ANALYZE' execution as below? 1.'org.apache.manifoldcf.db.postgres.analyze..minimumrowcount' Specify how many records should be inserted before carrying out an 'ANALYZE' on the specified table as the first time.defaults to 100. 2.'org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate' Specify the minimum number of documents processed per minute. If the processing rate of documents drops below this threshold, the 'ANALYZE' will be executed. defaults to 1. > Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling > become extremely slow. > -- > > Key: CONNECTORS-1746 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1746 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the > database. >Reporter: Mingchun Zhao >Priority: Major > > Sometimes, the crawling does not process any documents for a while and there > is nothing logged about long-running queries. The performance can be restored > by firing the 'ANALYZE' command manually. It seems that a bad query plan > caused this performance problem. > Therefore, in addition to the current configuration parameter > 'org.apache.manifoldcf.db.postgres.analyze.', it is considered > necessary to execute the 'ANALYZE' even in the following situations. > 1. When the number of records in the table exceeds the number required for > creating a execution plan after the job starts. > 2. When the crawling performance slows down. For example, if the processing > rate of documents drops below a specified threshold. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.
[ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-1746: -- Description: Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem. Therefore, in addition to the current configuration parameter 'org.apache.manifoldcf.db.postgres.analyze.', it is considered necessary to execute the 'ANALYZE' even in the following situations. 1. When the number of records in the table exceeds the number required for creating a execution plan after the job starts. 2. When the crawling performance slows down. For example, if the processing rate of documents drops below a specified threshold. So, how about adding two parameters to handle the timing of 'ANALYZE' execution as below? 1.'org.apache.manifoldcf.db.postgres.analyze..minimumrowcount' Specify how many records should be inserted before carrying out an 'ANALYZE' on the specified table as the first time.defaults to 100. 2.'org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate' Specify the minimum number of documents processed per minute. If the processing rate of documents drops below this threshold, the 'ANALYZE' will be executed. defaults to 1. was: Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem. Therefore, in addition to the current configuration parameter org.apache.manifoldcf.db.postgres.analyze. , it is considered necessary to execute the 'ANALYZE' even in the following situations. 1. When the number of records in the table exceeds the number required for creating a execution plan after the job starts. 2. When the crawling performance slows down. For example, if the processing rate of documents drops below a specified threshold. So, how about adding two parameters to handle the timing of 'ANALYZE' execution as below? 1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount` Specify how many records should be inserted before carrying out an 'ANALYZE' on the specified table as the first time.defaults to 100. 2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate` Specify the number of documents processed in the last minute. If the actual processing rate falls below this, the 'ANALYZE' will be executed. defaults to 1. > Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling > become extremely slow. > -- > > Key: CONNECTORS-1746 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1746 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the > database. >Reporter: Mingchun Zhao >Priority: Major > > Sometimes, the crawling does not process any documents for a while and there > is nothing logged about long-running queries. The performance can be restored > by firing the 'ANALYZE' command manually. It seems that a bad query plan > caused this performance problem. > Therefore, in addition to the current configuration parameter > 'org.apache.manifoldcf.db.postgres.analyze.', it is considered > necessary to execute the 'ANALYZE' even in the following situations. > 1. When the number of records in the table exceeds the number required for > creating a execution plan after the job starts. > 2. When the crawling performance slows down. For example, if the processing > rate of documents drops below a specified threshold. > So, how about adding two parameters to handle the timing of 'ANALYZE' > execution as below? > 1.'org.apache.manifoldcf.db.postgres.analyze..minimumrowcount' > Specify how many records should be inserted before carrying out an 'ANALYZE' > on the specified table as the first time.defaults to 100. > 2.'org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate' > Specify the minimum number of documents processed per minute. If the > processing rate of documents drops below this threshold, the 'ANALYZE' will > be executed. defaults to 1. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.
[ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-1746: -- Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the database. (was: I am using ManifoldCF 2.24 with PostgreSQL 12.14 as the database. ) > Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling > become extremely slow. > -- > > Key: CONNECTORS-1746 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1746 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the > database. >Reporter: Mingchun Zhao >Priority: Major > > Sometimes, the crawling does not process any documents for a while and there > is nothing logged about long-running queries. The performance can be restored > by firing the 'ANALYZE' command manually. It seems that a bad query plan > caused this performance problem. > Therefore, in addition to the current configuration parameter > org.apache.manifoldcf.db.postgres.analyze. , it is considered > necessary to execute the 'ANALYZE' even in the following situations. > 1. When the number of records in the table exceeds the number required for > creating a execution plan after the job starts. > 2. When the crawling performance slows down. For example, if the processing > rate of documents drops below a specified threshold. > So, how about adding two parameters to handle the timing of 'ANALYZE' > execution as below? > 1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount` > Specify how many records should be inserted before carrying out an 'ANALYZE' > on the specified table as the first time.defaults to 100. > 2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate` > Specify the number of documents processed in the last minute. If the actual > processing rate falls below this, the 'ANALYZE' will be executed. defaults to > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.
[ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-1746: -- Description: Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem. Therefore, in addition to the current configuration parameter org.apache.manifoldcf.db.postgres.analyze. , it is considered necessary to execute the 'ANALYZE' even in the following situations. 1. When the number of records in the table exceeds the number required for creating a execution plan after the job starts. 2. When the crawling performance slows down. For example, if the processing rate of documents drops below a specified threshold. So, how about adding two parameters to handle the timing of 'ANALYZE' execution as below? 1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount` Specify how many records should be inserted before carrying out an 'ANALYZE' on the specified table as the first time.defaults to 100. 2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate` Specify the number of documents processed in the last minute. If the actual processing rate falls below this, the 'ANALYZE' will be executed. defaults to 1. was: Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem. Therefore, in addition to the current configuration parameter org.apache.manifoldcf.db.postgres.analyze. , it is considered necessary to execute the 'ANALYZE' even in the following situations. 1. When the number of records in the table exceeds the number required for creating an query plan after the job starts. 2. When the crawling performance slows down. For example, if the document processing rate drops below a specified threshold. How about adding two parameters to handle the timing of 'ANALYZE' execution as below? 1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount` Specify how many records should be accumulated before carrying out an 'ANALYZE' on the specified table as the first time.defaults to 100. 2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate` Specify the number of documents processed in the last minute. If the actual processing rate falls below this, the 'ANALYZE' will be carrying out. defaults to 1. > Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling > become extremely slow. > -- > > Key: CONNECTORS-1746 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1746 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Environment: I am using ManifoldCF 2.24 with PostgreSQL 12.14 as the > database. >Reporter: Mingchun Zhao >Priority: Major > > Sometimes, the crawling does not process any documents for a while and there > is nothing logged about long-running queries. The performance can be restored > by firing the 'ANALYZE' command manually. It seems that a bad query plan > caused this performance problem. > Therefore, in addition to the current configuration parameter > org.apache.manifoldcf.db.postgres.analyze. , it is considered > necessary to execute the 'ANALYZE' even in the following situations. > 1. When the number of records in the table exceeds the number required for > creating a execution plan after the job starts. > 2. When the crawling performance slows down. For example, if the processing > rate of documents drops below a specified threshold. > So, how about adding two parameters to handle the timing of 'ANALYZE' > execution as below? > 1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount` > Specify how many records should be inserted before carrying out an 'ANALYZE' > on the specified table as the first time.defaults to 100. > 2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate` > Specify the number of documents processed in the last minute. If the actual > processing rate falls below this, the 'ANALYZE' will be executed. defaults to > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.
[ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-1746: -- Summary: Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow. (was: Adding execution conditions of PostgreSQL's ANALYZE command to avoid crawling become extremely slow.) > Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling > become extremely slow. > -- > > Key: CONNECTORS-1746 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1746 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Environment: I am using ManifoldCF 2.24 with PostgreSQL 12.14 as the > database. >Reporter: Mingchun Zhao >Priority: Major > > Sometimes, the crawling does not process any documents for a while and there > is nothing logged about long-running queries. The performance can be restored > by firing the 'ANALYZE' command manually. It seems that a bad query plan > caused this performance problem. > Therefore, in addition to the current configuration parameter > org.apache.manifoldcf.db.postgres.analyze. , it is considered > necessary to execute the 'ANALYZE' even in the following situations. > 1. When the number of records in the table exceeds the number required for > creating an query plan after the job starts. > 2. When the crawling performance slows down. For example, if the document > processing rate drops below a specified threshold. > How about adding two parameters to handle the timing of 'ANALYZE' execution > as below? > 1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount` > Specify how many records should be accumulated before carrying out an > 'ANALYZE' on the specified table as the first time.defaults to 100. > 2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate` > Specify the number of documents processed in the last minute. If the actual > processing rate falls below this, the 'ANALYZE' will be carrying out. > defaults to 1. -- This message was sent by Atlassian Jira (v8.20.10#820010)