Re: Job Content Length issue

2021-02-17 Thread Karl Wright
The internal Tika is not memory bounded; some transformations stream, but others put everything into memory. You can try using the external tika, with a tika instance you run separately, and that would likely help. But you may need to give it lots of memory too. Karl On Wed, Feb 17, 2021 at

Re: Job Content Length issue

2021-02-17 Thread ritika jain
Hi Karl, I am using Elastic search as an output connector and yes using an internal Tika extracter, not using solr output connection. Also Elastic search server is on hosted on different server with huge memory allocation. On Tue, Feb 16, 2021 at 7:29 PM Karl Wright wrote: > Hi, do you mean

Re: Job Content Length issue

2021-02-16 Thread Karl Wright
Hi, do you mean content limiter length of 100? I assume you are using the internal Tika transformer? Are you combining this with a Solr output connection that is not using the extract handler? By "manifold crashes" I assume you actually mean it runs out of memory. The "long running query"

Job Content Length issue

2021-02-16 Thread ritika jain
Hi users I am using manifoldcf 2.14 Fileshare connector to crawl files from smb server which is having some millions billions of records to process and crawl. Total system memory is 64Gb out of which start options file of manifold is defined as 32GB. We have some larger files to crawl around