[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-27 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726850#comment-17726850
 ] 

Mingchun Zhao commented on CONNECTORS-1747:
---

[~kwri...@metacarta.com] Thanks, that's so helpful!

> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.25
>
> Attachments: CONNECTORS-1747.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-27 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726821#comment-17726821
 ] 

Karl Wright commented on CONNECTORS-1747:
-

I am away from fast internet until Monday.  I will put up a release candidate 
then and call a vote.


> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.25
>
> Attachments: CONNECTORS-1747.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-27 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726777#comment-17726777
 ] 

Mingchun Zhao commented on CONNECTORS-1747:
---

[~kwri...@metacarta.com], If you need my help with the release work, I'll do 
whatever I can, so feel free to ask me please. I feel sorry for not being able 
to participate in ManifoldCF community activities until now, but I will be 
actively involved in MCF activities from now on. Also, I will try to contact 
committers in Japan to liven up ManifoldCF together. Thank you.

> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.25
>
> Attachments: CONNECTORS-1747.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-24 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725989#comment-17725989
 ] 

Karl Wright commented on CONNECTORS-1747:
-

I can put up a release candidate easily enough; however, it may be hard to get 
a voting quorum.  That's the issue these days in this project.


> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.25
>
> Attachments: CONNECTORS-1747.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-24 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725971#comment-17725971
 ] 

Mingchun Zhao commented on CONNECTORS-1747:
---

Hi [~kwri...@metacarta.com],

Thank you for confirming this patch. 
BTW, do we have a plan for when the next release of ManifoldCF will be? The 
project I'm participating in now uses ManifoldCF+PostgreSQL and will be 
released at the end of next month. It would be very helpful if the latest 
version of MCF could be used. If there is anything I can do, I will actively 
participate in the MCF release work.

Kind regards,
Mingchun

> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1747.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-24 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725921#comment-17725921
 ] 

Karl Wright commented on CONNECTORS-1747:
-

This looks good.  I'll try to commit it tonight.


> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1747.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-23 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725478#comment-17725478
 ] 

Mingchun Zhao commented on CONNECTORS-1747:
---

Hi [~kwri...@metacarta.com],

I've created a patch as you mentioned above, could you please confirm the 
attached [^CONNECTORS-1747.patch]
In my testing with the attached patch, when I set the additional property as 
below,
``
I confirmed:
(1) The hopcount handling was completely disabled.
(2) No records were inserted into the `intrinsiclink` or `hopcount` tables.
(3) The hopcount tab did not appear in the UI for any job. 

Regards,
Mingchun

> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1747.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-21 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724663#comment-17724663
 ] 

Mingchun Zhao commented on CONNECTORS-1747:
---

[~kwri...@metacarta.com] Thank you for your review, it was very helpful. I 
understood, will try and fix the patch as you mentioned above.

> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Attachments: JobManager.java.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-21 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724654#comment-17724654
 ] 

Karl Wright commented on CONNECTORS-1747:
-

Hi - so just to be clear, what you need to do here is:
(1) Introduce a property, as you have done, that disables support for hopcount 
handling completely.  It obviously should be a global cluster property, not a 
local one.
(2) When that property is set, the HopCount.java class should never record 
anything in the intrinsicLinks or HopCount tables at all.
(3) When that property is set, the Hopcount tab should not appear in the UI for 
any job.


> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Attachments: JobManager.java.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-21 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724652#comment-17724652
 ] 

Karl Wright commented on CONNECTORS-1747:
-

[~mingchun.zhao], it will be necessary to also disable the hopcount tab for all 
jobs entirely if you set this flag, since essentially the installation no 
longer can track hopcount at all.  Please include that in your commit, thanks.




> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Assignee: Karl Wright
>Priority: Major
> Attachments: JobManager.java.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-16 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723328#comment-17723328
 ] 

Mingchun Zhao commented on CONNECTORS-1747:
---

Hello,

I changed spec for new property as below. Could you please review the attached 
new patch [^JobManager.java.patch2]?



You can use this property to disable logging hopcount to database only for jobs 
with hopcount mode "keep unreachable documents, forever" specified in the "Hop 
Filters" tab.

> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Priority: Major
> Attachments: JobManager.java.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-15 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722975#comment-17722975
 ] 

Mingchun Zhao commented on CONNECTORS-1747:
---

Hello, 
If there are no objections to the above patch, would it be okay to commit it in 
a couple of days?

> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Priority: Major
> Attachments: JobManager.java.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1747) Add a property to disable logging hop count to database

2023-05-15 Thread Mingchun Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722789#comment-17722789
 ] 

Mingchun Zhao commented on CONNECTORS-1747:
---

Hello,

I’ve tried to create a patch for adding the below property to disable logging 
hopcount to the database as below.

"org.apache.manifoldcf.db.postgres.crewler.jobs.store_hopcount"

If you do not require hopcount from within, this will disable logging hotcount 
to the related database tables. This can increase throughput and reduce the 
rate of growth of the database. defaults to true(logging hopcount to the 
database).

In my testing with the attached patch, I compared the execution time of the 
same job with the property “store_hopcount” set to true and false. As a result, 
the throughput doubled and the rate of growth of the database was cut by more 
than half, while the number of crawled documents remained the same.

[^JobManager.java.patch]

> Add a property to disable logging hop count to database
> ---
>
> Key: CONNECTORS-1747
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1747
> Project: ManifoldCF
>  Issue Type: Improvement
>Reporter: Mingchun Zhao
>Priority: Major
> Attachments: JobManager.java.patch
>
>
> If we do not require “Hop Filters“ feature, we need to consider to disable 
> logging records related to hopcount to database like "intrinsiclink" and 
> "hopcount" tables. This can increase throughput and reduce the rate of growth 
> of the database.
> I will try to create a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)