Hi Shawn Here is my top screenshot: https://www.dropbox.com/s/jaw10mkmipz943y/topscreen.jpg?dl=0
It is captured when my system is normal.And I have reduced the memory size down to 48GB originating from 64GB. We have two hardware clusters ,each is comprised of 3 machines,and On one cluster we deploy 3 different SolrCloud application clusters,the above top screenshot is the machine crached 4:30PM yesterday. To be convenient,I post a top sceenshot of another machine of the other cluster: https://www.dropbox.com/s/p3j3bpcl8l2i1nt/another64GBnodeTop.jpg?dl=0 On this machine ,the biggest Solrcloud node which jvm memory size is 64GB holds 730GB index size.The machine hung up for a long time just at yesterday middle night. We also have capture the iotop when it hung up. https://www.dropbox.com/s/keqqjabmon9f1ea/anthoer64GBnodeIotop.jpg?dl=0 as the iotop shows the process jdb2 is writing large .I think it will be helpfull. Best Regards 2016-03-17 7:35 GMT+08:00 Shawn Heisey <apa...@elyograg.org>: > On 3/16/2016 8:59 AM, Patrick Plaatje wrote: > > From the sar output you supplied, it looks like you might have a memory > issue on your hosts. The memory usage just before your crash seems to be > *very* close to 100%. Even the slightest increase (Solr itself, or possibly > by a system service) could caused the system crash. What are the > specifications of your hosts and how much memory are you allocating? > > It's completely normal for a machine, especially a machine running Solr > with a very large index, to run at nearly 100% memory usage. The > "Average" line from the sar output indicates 97.45 percent usage, but it > also shows 81GB of memory in the "kbcached" column -- this is memory > that can be instantly claimed by any program that asks for it. If we > discount this 81GB, since it is instantly available, the "true" memory > usage is closer to 70 percent than 100. > > https://en.wikipedia.org/wiki/Page_cache > > If YouPeng can run top and sort it by memory usage (press shift-M), then > grab a screenshot, that will be helpful for more insight. Here's an > example of this from one of my servers, shared on dropbox: > > https://www.dropbox.com/s/qfuxhw20q0y1ckx/linux-8gb-heap.png?dl=0 > > This is a server with 64GB of RAM and 110GB of index data. About 48GB > of my memory is used by the disk cache. I've got slightly less than > half my index data in the cache. > > Thanks, > Shawn > >