I am observing some weird behavior with how Solr is using memory. We are running both Solr and zookeeper on the same node. We tested memory settings on Solr Cloud Setup of 1 shard with 146GB index size, and 2 Shard Solr setup with 44GB index size. Both are running on similar beefy machines.
After running the setup for 3-4 days, I see that a lot of memory is inactive in all the nodes - 99052952 total memory 98606256 used memory 19143796 active memory 75063504 inactive memory And inactive memory is never reclaimed by the OS. When total memory size is reached, latency and disk IO shoots up. We observed this behavior in both Solr Cloud setup with 1 shard and Solr setup with 2 shards. For the Solr Cloud setup, we are running a cron job with following command to clear out the inactive memory. It is working as expected. Even though the index size of Cloud is 146GB, the used memory is always below 55GB. Our response times are better and no errors/exceptions are thrown. (This command causes issue in 2 Shard setup) echo 3 > /proc/sys/vm/drop_caches We have disabled the query, doc and solr caches in our setup. Zookeeper is using around 10GB of memory and we are not running any other process in this system. Has anyone faced this issue before?