Postgres db maintenance

2019-02-08 Thread Bisonti Mario
Hallo. I noted that my postgres dbname is 28GB Is there a way to clean old data or do I need to maintan all data in my db ? Thanks a lot Mario

Re: Postgres db maintenance

2019-02-08 Thread Karl Wright
The only "old data" kept by MCF is the history information. By default it's expunged after 30 days. You can shorten the amount of time it's kept around though by setting a properties.xml parameter (need to refer to the "how-to-build-and-deploy" page for details). Karl On Fri, Feb 8, 2019 at

ManifoldCF + Postgresql - long freeze on job

2019-02-08 Thread LIROT Daniel - SG/SPSSI/CPII/DOSO/ET
Hello, We use ManifoldCF v2.10, with postgresql (9.6) to crawl our websites. this represents approximately 1.2 million documents. We split the crawl into 4 jobs that distribute their results on 3 SOLR collections. The crawl is powerful up to 50 documents (25000 to 3 docs / hour) then

Re: ManifoldCF + Postgresql - long freeze on job

2019-02-08 Thread Karl Wright
Hello, (1) What database are you using for this? Some databases require maintenance periodically or have other heavy usage constraints. (2) Every time a query takes more than an minute to execute, it is logged, along with the query plan. You need to look at the manifoldcf log to see which

Sharepoint Job - Incremental Crawling

2019-02-08 Thread Gaurav G
Hi All, We're trying to crawl a Sharepoint repo with about 3 docs. Ideally we would like to be able to synchronize changes with the repo within 30 minutes. We are scheduling incremental crawling on this. Our observation is that a full crawl takes about 60-75 minutes. So if we schedule the

Re: Sharepoint Job - Incremental Crawling

2019-02-08 Thread Karl Wright
Hi Guarav, The right way to do this is to schedule "minimal" crawls every 15 minutes (which will process only the minimum needed to deal with adds and updates), and periodically perform "full" crawls (which will also include deletions). Thanks, Karl On Fri, Feb 8, 2019 at 10:11 AM Gaurav G

Re: Sharepoint Job - Incremental Crawling

2019-02-08 Thread Gaurav G
Hi Karl, Thanks for the response. We tried scheduling minimal crawl for 15 minutes. At the end of fifteen minutes it stops with about 3000 docs in processing state and takes about 20-25 mins to stop. Then the question becomes when to schedule the next crawl. And also in those 15 minutes would it

Re: Sharepoint Job - Incremental Crawling

2019-02-08 Thread Karl Wright
It does the minimum necessary. That means it can't do it in less. If this is a business requirement, then you should be angry with whoever made this requirement. Share point doesn't give you the ability to grab all changes or added documents up front. You have to crawl to discover them. That

Re: Sharepoint Job - Incremental Crawling

2019-02-08 Thread Gaurav G
Got it. Is there any way we can increase the speed of the minimal crawl. Currently we are running one VM for manifold with 8 cores and 32 gb Ram. Postgres runs on another machine with a similar configuration. Have tuned the Postgres and Manifoldcf parameters as per the recommendations. We run a

Re: Sharepoint Job - Incremental Crawling

2019-02-08 Thread Karl Wright
The problem is not the speed of Manifold, but rather the work it has to do and the performance of SharePoint. All the speed in the world in the crawler will not fix the bottleneck that is SharePoint. Karl On Fri, Feb 8, 2019 at 4:06 PM Gaurav G wrote: > Got it. > Is there any way we can

Re: Sharepoint Job - Incremental Crawling

2019-02-08 Thread Gaurav G
Hi Karl, Thanks for your insights. So I'm thinking of exploring the following options to get the most optimal performance. Your thoughts..Is the first option, the one which might give the most bang for the buck? 1) Ask the Sharepoint application team to dedicate a web and app server specifically