Hi everyone! I've struggling with the performance problem already for the couple of weeks. We have two environments:
- `dev` with 2 nodes of ManifoldCF agent + 1 node of Zookeeper - `prod` with 4 nodes of ManifoldCF agent + 3 node of Zookeeper ManifoldCF agent settings are identical at the moment, we have expicitly indicated following settings: - 200 db handles (`org.apache.manifoldcf.database.maxhandles`) - 100 worker threads (`org.apache.manifoldcf.crawler.threads`) - 10 expire threads (`org.apache.manifoldcf.crawler.expirethreads`) - 10 cleanup threads (`org.apache.manifoldcf.crawler.cleanupthreads`) - 10 document delete threads (`org.apache.manifoldcf.crawler.deletethreads`) (but I have tried prod with various configs, the result is the same) In the Postgres config, at the moment we have `max_connections` of `840` and `shared_buffers` of `244559`. I have a job that is runninhg really slowly on production evironment in comparing to the development environment. I have monitored the JVM using VisualVM and noticed that all worker threads almost all the time spending in `WAITING` or `TIMED_WAITING` statuses. I grabbed a lot of threadudmps and almost every time I found worker threads are waiting on `LockGate`. What can be the possible cause of it and what I can do with that? Another thing that makes threads sleep for a long time is a concurrent modification failures caused by PostgreSQL due to the usage of `SYNCHRONIZED` isolation level. After failure thread is send to sleep for a random time up to `60000` millis. It is made by design, but is there a way to reduce amount of these failures? I will be grateful for any hints or ideas. Thank you! With respect, Abeleshev Artem