[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-12 Thread Mingchun Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao updated CONNECTORS-1746:
--
Attachment: DBInterfacePostgreSQL.java.patch

> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
> Attachments: DBInterfacePostgreSQL.java.patch
>
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-12 Thread Mingchun Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao updated CONNECTORS-1746:
--
Description: 
Sometimes, the crawling does not process any documents for a while and there is 
nothing logged about long-running queries. The performance can be restored by 
firing the 'ANALYZE' command manually. It seems that a bad query plan caused 
this performance problem.

Therefore, in addition to the current configuration parameter 
'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
necessary to execute the 'ANALYZE' even in the following situations.
1. When the number of records in the table exceeds the number required for 
creating a execution plan after the job starts.
2. When the crawling performance slows down. For example, if the processing 
rate of documents drops below a specified threshold.

  was:
Sometimes, the crawling does not process any documents for a while and there is 
nothing logged about long-running queries. The performance can be restored by 
firing the 'ANALYZE' command manually. It seems that a bad query plan caused 
this performance problem.

Therefore, in addition to the current configuration parameter 
'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
necessary to execute the 'ANALYZE' even in the following situations.
1. When the number of records in the table exceeds the number required for 
creating a execution plan after the job starts.
2. When the crawling performance slows down. For example, if the processing 
rate of documents drops below a specified threshold.

So, how about adding two parameters to handle the timing of 'ANALYZE' execution 
as below?
1.'org.apache.manifoldcf.db.postgres.analyze..minimumrowcount'
Specify how many records should be inserted before carrying out an 'ANALYZE' on 
the specified table as the first time.defaults to 100.
2.'org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate'
Specify the minimum number of documents processed per minute. If the processing 
rate of documents drops below this threshold, the 'ANALYZE' will be executed. 
defaults to 1.


> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-07 Thread Mingchun Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao updated CONNECTORS-1746:
--
Description: 
Sometimes, the crawling does not process any documents for a while and there is 
nothing logged about long-running queries. The performance can be restored by 
firing the 'ANALYZE' command manually. It seems that a bad query plan caused 
this performance problem.

Therefore, in addition to the current configuration parameter 
'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
necessary to execute the 'ANALYZE' even in the following situations.
1. When the number of records in the table exceeds the number required for 
creating a execution plan after the job starts.
2. When the crawling performance slows down. For example, if the processing 
rate of documents drops below a specified threshold.

So, how about adding two parameters to handle the timing of 'ANALYZE' execution 
as below?
1.'org.apache.manifoldcf.db.postgres.analyze..minimumrowcount'
Specify how many records should be inserted before carrying out an 'ANALYZE' on 
the specified table as the first time.defaults to 100.
2.'org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate'
Specify the minimum number of documents processed per minute. If the processing 
rate of documents drops below this threshold, the 'ANALYZE' will be executed. 
defaults to 1.

  was:
Sometimes, the crawling does not process any documents for a while and there is 
nothing logged about long-running queries. The performance can be restored by 
firing the 'ANALYZE' command manually. It seems that a bad query plan caused 
this performance problem.

Therefore, in addition to the current configuration parameter 
org.apache.manifoldcf.db.postgres.analyze. , it is considered 
necessary to execute the 'ANALYZE' even in the following situations.
1. When the number of records in the table exceeds the number required for 
creating a execution plan after the job starts.
2. When the crawling performance slows down. For example, if the processing 
rate of documents drops below a specified threshold.

So, how about adding two parameters to handle the timing of 'ANALYZE' execution 
as below?
1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount`
Specify how many records should be inserted before carrying out an 'ANALYZE' on 
the specified table as the first time.defaults to 100.
2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate`
Specify the number of documents processed in the last minute. If the actual 
processing rate falls below this, the 'ANALYZE' will be executed. defaults to 1.


> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.
> So, how about adding two parameters to handle the timing of 'ANALYZE' 
> execution as below?
> 1.'org.apache.manifoldcf.db.postgres.analyze..minimumrowcount'
> Specify how many records should be inserted before carrying out an 'ANALYZE' 
> on the specified table as the first time.defaults to 100.
> 2.'org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate'
> Specify the minimum number of documents processed per minute. If the 
> processing rate of documents drops below this threshold, the 'ANALYZE' will 
> be executed. defaults to 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-06 Thread Mingchun Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao updated CONNECTORS-1746:
--
Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the database.   
(was: I am using ManifoldCF 2.24 with PostgreSQL 12.14 as the database. )

> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> org.apache.manifoldcf.db.postgres.analyze. , it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.
> So, how about adding two parameters to handle the timing of 'ANALYZE' 
> execution as below?
> 1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount`
> Specify how many records should be inserted before carrying out an 'ANALYZE' 
> on the specified table as the first time.defaults to 100.
> 2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate`
> Specify the number of documents processed in the last minute. If the actual 
> processing rate falls below this, the 'ANALYZE' will be executed. defaults to 
> 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-06 Thread Mingchun Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao updated CONNECTORS-1746:
--
Description: 
Sometimes, the crawling does not process any documents for a while and there is 
nothing logged about long-running queries. The performance can be restored by 
firing the 'ANALYZE' command manually. It seems that a bad query plan caused 
this performance problem.

Therefore, in addition to the current configuration parameter 
org.apache.manifoldcf.db.postgres.analyze. , it is considered 
necessary to execute the 'ANALYZE' even in the following situations.
1. When the number of records in the table exceeds the number required for 
creating a execution plan after the job starts.
2. When the crawling performance slows down. For example, if the processing 
rate of documents drops below a specified threshold.

So, how about adding two parameters to handle the timing of 'ANALYZE' execution 
as below?
1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount`
Specify how many records should be inserted before carrying out an 'ANALYZE' on 
the specified table as the first time.defaults to 100.
2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate`
Specify the number of documents processed in the last minute. If the actual 
processing rate falls below this, the 'ANALYZE' will be executed. defaults to 1.

  was:
Sometimes, the crawling does not process any documents for a while and there is 
nothing logged about long-running queries. The performance can be restored by 
firing the 'ANALYZE' command manually. It seems that a bad query plan caused 
this performance problem.

Therefore, in addition to the current configuration parameter 
org.apache.manifoldcf.db.postgres.analyze. , it is considered 
necessary to execute the 'ANALYZE' even in the following situations.
1. When the number of records in the table exceeds the number required for 
creating an query plan after the job starts.
2. When the crawling performance slows down. For example, if the document 
processing rate drops below a specified threshold. 

How about adding two parameters to handle the timing of 'ANALYZE' execution as 
below?
1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount`
Specify how many records should be accumulated before carrying out an 'ANALYZE' 
on the specified table as the first time.defaults to 100.
2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate`
Specify the number of documents processed in the last minute. If the actual 
processing rate falls below this, the 'ANALYZE' will be carrying out. defaults 
to 1.


> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: I am using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> org.apache.manifoldcf.db.postgres.analyze. , it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.
> So, how about adding two parameters to handle the timing of 'ANALYZE' 
> execution as below?
> 1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount`
> Specify how many records should be inserted before carrying out an 'ANALYZE' 
> on the specified table as the first time.defaults to 100.
> 2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate`
> Specify the number of documents processed in the last minute. If the actual 
> processing rate falls below this, the 'ANALYZE' will be executed. defaults to 
> 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

2023-05-06 Thread Mingchun Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao updated CONNECTORS-1746:
--
Summary: Adding conditions to execute PostgreSQL's ANALYZE command to avoid 
crawling become extremely slow.  (was: Adding execution conditions of 
PostgreSQL's ANALYZE command to avoid crawling become extremely slow.)

> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --
>
> Key: CONNECTORS-1746
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
> Environment: I am using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>Reporter: Mingchun Zhao
>Priority: Major
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> org.apache.manifoldcf.db.postgres.analyze. , it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating an query plan after the job starts.
> 2. When the crawling performance slows down. For example, if the document 
> processing rate drops below a specified threshold. 
> How about adding two parameters to handle the timing of 'ANALYZE' execution 
> as below?
> 1. `org.apache.manifoldcf.db.postgres.analyze..minimumrowcount`
> Specify how many records should be accumulated before carrying out an 
> 'ANALYZE' on the specified table as the first time.defaults to 100.
> 2.`org.apache.manifoldcf.db.postgres.analyze..minimumprocessrate`
> Specify the number of documents processed in the last minute. If the actual 
> processing rate falls below this, the 'ANALYZE' will be carrying out. 
> defaults to 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)