Re: Manifold Crawler Crashes

2019-06-20 Thread Karl Wright
If you are already on postgresql, then the memory usage is likely due to the Tika Extractor. It's not very well determined how much Tika uses for any given document; we try never to load documents into memory, but in some situations Tika uses a ton of memory nonetheless. The more worker threads

Re: Manifold Crawler Crashes

2019-06-20 Thread Priya Arora
I would highly recommend moving to Postgresql if you have any really sizable crawl. Yes, we are already using Postgresql 9.6.10 for it. Below are the settings in postgresql.conf file our postgres server. max_connections = 100 shared_buffers = 128MB #temp_buffers = 8MB #max_prepared_transactions

Re: Manifold Crawler Crashes

2019-06-20 Thread Karl Wright
If you are running single-process on top of HSQLDB, all database tables are kept in memory so you need a lot of memory. I would highly recommend moving to Postgresql if you have any really sizable crawl. Alternatively you could just hand the manifoldCF process more memory. Your choice.

Re: Manifold Crawler Crashes

2019-06-20 Thread Priya Arora
Hi Karl, 1) It's single process deployment process. 2) Not able to access through bash(during crash happens) 3) Server Configuration:- For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz and For Elasticsearch server - 48GB and 1-Core Intel(R) Xeon(R) CPU E5-2660

Re: Manifold Crawler Crashes

2019-06-20 Thread Karl Wright
Hi Priya, Being unable to reach the web interface sounds like either a network issue or a problem with the app server. Can you describe the configuration you are running in? Is this a multiprocess deployment or a single-process deployment? When your docker container dies, can you still reach

Re: Manifold Crawler Crashes

2019-06-20 Thread Priya Arora
Hi Karl, Crash here means, "the site could not be reached" kind of HTML page appears , when accessing http://localhost:3000/mcf-crawler-ui/index.jsp. Explanation:- When running certain job on ManifoldCF server(2.13) after sometime (of successful running state), suddenly browser gives me "the site

Re: Manifold Crawler Crashes

2019-06-20 Thread Karl Wright
Please describe what you mean by "crash". What actually happens? Karl On Thu, Jun 20, 2019, 2:04 AM Priya Arora wrote: > > > Hi, > > I am running multiple jobs(2,3) simultaneously on Manifold server and the > configuration is > > 1) For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R)