Hi. I just saw this thread.
I believe Msft recommends a dedicated document source instance for larger
corpora.
I know in my SP days we were often frustrating users by making SP very slow
while we were crawling. Which was mostly solved by having a dedicated
source node.
S
On Sat, Feb 9, 2019, 2:10
Hi Guarav,
The number of connections you permit should depend on the resources on the
Sharepoint instance you're crawling. ManifoldCF will limit the number of
connections to that instance to the number you select. Making it larger
might help if there's a lot of resources on the SharePoint side,
Hi Karl,
Thanks for your insights. So I'm thinking of exploring the following
options to get the most optimal performance. Your thoughts..Is the first
option, the one which might give the most bang for the buck?
1) Ask the Sharepoint application team to dedicate a web and app server
specifically
The problem is not the speed of Manifold, but rather the work it has to do
and the performance of SharePoint. All the speed in the world in the
crawler will not fix the bottleneck that is SharePoint.
Karl
On Fri, Feb 8, 2019 at 4:06 PM Gaurav G wrote:
> Got it.
> Is there any way we can
Got it.
Is there any way we can increase the speed of the minimal crawl. Currently
we are running one VM for manifold with 8 cores and 32 gb Ram. Postgres
runs on another machine with a similar configuration. Have tuned the
Postgres and Manifoldcf parameters as per the recommendations. We run a
It does the minimum necessary. That means it can't do it in less. If this
is a business requirement, then you should be angry with whoever made this
requirement.
Share point doesn't give you the ability to grab all changes or added
documents up front. You have to crawl to discover them. That
Hi Karl,
Thanks for the response. We tried scheduling minimal crawl for 15 minutes.
At the end of fifteen minutes it stops with about 3000 docs in processing
state and takes about 20-25 mins to stop. Then the question becomes when to
schedule the next crawl. And also in those 15 minutes would it
Hi Guarav,
The right way to do this is to schedule "minimal" crawls every 15 minutes
(which will process only the minimum needed to deal with adds and updates),
and periodically perform "full" crawls (which will also include deletions).
Thanks,
Karl
On Fri, Feb 8, 2019 at 10:11 AM Gaurav G
Hi All,
We're trying to crawl a Sharepoint repo with about 3 docs. Ideally we
would like to be able to synchronize changes with the repo within 30
minutes. We are scheduling incremental crawling on this. Our observation is
that a full crawl takes about 60-75 minutes. So if we schedule the