Anyone else besides Shawn and me to reproduce this problem? Shawn contacted Oracle off-list but that was useless at best (attach JConsole, watch heap, etc).
Is this a real problem, just a bad reporting issue of the JVM and Linux? Thanks, Markus -----Original message----- > From:Markus Jelsma <markus.jel...@openindex.io> > Sent: Thursday 24th August 2017 17:20 > To: solr-user@lucene.apache.org > Subject: RE: Solr uses lots of shared memory! > > Hello Bernd, > > According to the man page, i should get a list of stuff in shared memory if i > invoke it with just a PID. Which shows a list of libraries that together > account for about 25 MB's shared memory usage. Accoring to ps and top, the > JVM uses 2800 MB shared memory (not virtual), that leaves 2775 MB unaccounted > for. Any ideas? Anyone else to reproduce it on a freshly restarted node? > > Thanks, > Markus > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 18901 markus 20 0 14,778g 4,965g 2,987g S 891,1 31,7 20:21.63 java > > 0x000055b9a17f1000 6K /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java > 0x00007fdf1d314000 182K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libsunec.so > 0x00007fdf1e548000 38K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libmanagement.so > 0x00007fdf1e78e000 94K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so > 0x00007fdf1e9a6000 75K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnio.so > 0x00007fdf5cd6e000 34K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libzip.so > 0x00007fdf5cf77000 46K /lib/x86_64-linux-gnu/libnss_files-2.24.so > 0x00007fdf5d189000 46K /lib/x86_64-linux-gnu/libnss_nis-2.24.so > 0x00007fdf5d395000 90K /lib/x86_64-linux-gnu/libnsl-2.24.so > 0x00007fdf5d5ae000 34K /lib/x86_64-linux-gnu/libnss_compat-2.24.so > 0x00007fdf5d7b7000 187K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjava.so > 0x00007fdf5d9e6000 70K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libverify.so > 0x00007fdf5dbf8000 30K /lib/x86_64-linux-gnu/librt-2.24.so > 0x00007fdf5de00000 90K /lib/x86_64-linux-gnu/libgcc_s.so.1 > 0x00007fdf5e017000 1063K /lib/x86_64-linux-gnu/libm-2.24.so > 0x00007fdf5e320000 1553K /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22 > 0x00007fdf5e6a8000 15936K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > 0x00007fdf5f5ed000 139K /lib/x86_64-linux-gnu/libpthread-2.24.so > 0x00007fdf5f80b000 14K /lib/x86_64-linux-gnu/libdl-2.24.so > 0x00007fdf5fa0f000 110K /lib/x86_64-linux-gnu/libz.so.1.2.11 > 0x00007fdf5fc2b000 1813K /lib/x86_64-linux-gnu/libc-2.24.so > 0x00007fdf5fff2000 58K > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/jli/libjli.so > 0x00007fdf60201000 158K /lib/x86_64-linux-gnu/ld-2.24.so > > -----Original message----- > > From:Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > > Sent: Thursday 24th August 2017 15:39 > > To: solr-user@lucene.apache.org > > Subject: Re: Solr uses lots of shared memory! > > > > Just an idea, how about taking a dump with jmap and using > > MemoryAnalyzerTool to see what is going on? > > > > Regards > > Bernd > > > > > > Am 24.08.2017 um 11:49 schrieb Markus Jelsma: > > > Hello Shalin, > > > > > > Yes, the main search index has DocValues on just a few fields, they are > > > used for facetting and function queries, we started using DocValues when > > > 6.0 was released. Most fields are content fields for many languages. I > > > don't think it is going to be DocValues because the max shared memory > > > consumption is reduced my searching on fields fewer languages, and by > > > disabling highlighting, both not using DocValues. > > > > > > But it tried the option regardless, and because i didn't know about it. > > > But it appears the option does exactly nothing. First is without any > > > configuration for preload, second is with preload=true, third is > > > preload=false > > > > > > 14220 markus 20 0 14,675g 1,508g 62800 S 1,0 9,6 0:36.98 java > > > 14803 markus 20 0 14,674g 1,537g 63248 S 0,0 9,8 0:34.50 java > > > 15324 markus 20 0 14,674g 1,409g 63152 S 0,0 9,0 0:35.50 java > > > > > > Please correct my config is i am wrong: > > > > > > <directoryFactory name="DirectoryFactory" > > > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"> > > > <bool name="preload">false</bool> > > > </directoryFactory> > > > > > > NRTCachingDirectoryFactory implies MMapDirectory right? > > > > > > Thanks, > > > Markus > > > > > > -----Original message----- > > >> From:Shalin Shekhar Mangar <shalinman...@gmail.com> > > >> Sent: Thursday 24th August 2017 5:51 > > >> To: solr-user@lucene.apache.org > > >> Subject: Re: Solr uses lots of shared memory! > > >> > > >> Very interesting. Do you have many DocValue fields? Have you always > > >> had them i.e. did you see this problem before you turned on DocValues? > > >> The DocValue fields are in a separate file and they will be memory > > >> mapped on demand. One thing you can experiment with is to use > > >> preload=true option on the MMapDirectoryFactory which will mmap all > > >> index files on startup [1]. Once you do this, and if you still notice > > >> shared memory leakage then it may be a genuine memory leak that we > > >> should investigate. > > >> > > >> [1] - > > >> http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex > > >> > > >> On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma > > >> <markus.jel...@openindex.io> wrote: > > >>> I do not think it is a problem of reporting after watching top after > > >>> restart of some Solr instances, it dropped back to `normal`, around 350 > > >>> MB, which i think it high to but anyway. > > >>> > > >>> Two hours later, the restarted nodes are slowly increasing shared > > >>> memory consumption to about 1500 MB now. I don't understand why shared > > >>> memory usage should/would increase slowly over time, it makes little > > >>> sense to me and i cannot remember Solr doing this in the past ten years. > > >>> > > >>> But it seems to correlate to index size on disk, these main text search > > >>> nodes have an index of around 16 GB and up 3 GB of shared memory after > > >>> a few days. Logs nodes up to 800 MB index size and 320 MB of shared > > >>> memory, the low latency nodes have four different cores that make up > > >>> just over 100 MB index size, shared memory consumption is just 22 MB, > > >>> which seems more reasonable for the case of shared memory. > > >>> > > >>> I can also force Solr to 'leak' shared memory just by sending queries > > >>> to it. My freshly restarted local node used 68 MB shared memory at > > >>> startup. Two minutes and 25.000 queries later it was already 2748 MB! > > >>> At first there is a very sharp increase to 2000, then it takes almost > > >>> two minutes more to increase to 2748. I can decrease the maximum shared > > >>> memory usage to 1200 if i query (via edismax) only on fields of one > > >>> language instead of 25 orso. I can decrease it as well further if i > > >>> disable highlighting (HUH?) but still query on all fields. > > >>> > > >>> * We have tried patching Java's ByteBuffer [1] because it seemed to fit > > >>> the problems, it does not fix it. > > >>> * We have also removed all our custom plugins, so it has become a > > >>> vanilla Solr 6.6 just with our stripped down schema and solrconfig, it > > >>> neither fixes it. > > >>> > > >>> Why does it slowly increase over time? > > >>> Why does it appear to correlate to index size? > > >>> Is anyone else seeing this on their 6.6 cloud production or local > > >>> machines? > > >>> > > >>> Thanks, > > >>> Markus > > >>> > > >>> [1]: http://www.evanjones.ca/java-bytebuffer-leak.html > > >>> > > >>> -----Original message----- > > >>>> From:Shawn Heisey <apa...@elyograg.org> > > >>>> Sent: Tuesday 22nd August 2017 17:32 > > >>>> To: solr-user@lucene.apache.org > > >>>> Subject: Re: Solr uses lots of shared memory! > > >>>> > > >>>> On 8/22/2017 7:24 AM, Markus Jelsma wrote: > > >>>>> I have never seen this before, one of our collections, all nodes > > >>>>> eating tons of shared memory! > > >>>>> > > >>>>> Here's one of the nodes: > > >>>>> 10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 > > >>>>> java > > >>>>> > > >>>>> RSS is roughly equal to heap size + usual off-heap space + shared > > >>>>> memory. Virtual is equal to RSS and index size on disk. For two other > > >>>>> collections, the nodes use shared memory as expected, in the MB range. > > >>>>> > > >>>>> How can Solr, this collection, use so much shared memory? Why? > > >>>> > > >>>> I've seen this on my own servers at work, and when I add up a subset of > > >>>> the memory numbers I can see from the system, it ends up being more > > >>>> memory than I even have in the server. > > >>>> > > >>>> I suspect there is something odd going on in how Java reports memory > > >>>> usage to the OS, or maybe a glitch in how Linux interprets Java's > > >>>> memory > > >>>> usage. At some point in the past, numbers were reported correctly. I > > >>>> do not know if the change came about because of a Solr upgrade, because > > >>>> of a Java upgrade, or because of an OS kernel upgrade. All three were > > >>>> upgraded between when I know the numbers looked right and when I > > >>>> noticed > > >>>> they were wrong. > > >>>> > > >>>> https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0 > > >>>> > > >>>> This screenshot shows that Solr is using 17GB of memory, 41.45GB of > > >>>> memory is being used by the OS disk cache, and 10.23GB of memory is > > >>>> free. Add those up, and it comes to 68.68GB ... but the machine only > > >>>> has 64GB of memory, and that total doesn't include the memory usage of > > >>>> the other processes seen in the screenshot. This impossible situation > > >>>> means that something is being misreported somewhere. If I deduct that > > >>>> 11GB of SHR from the RES value, then all the numbers work. > > >>>> > > >>>> The screenshot was almost 3 years ago, so I do not know what machine it > > >>>> came from, and therefore I can't be sure what the actual heap size was. > > >>>> I think it was about 6GB -- the difference between RES and SHR. I have > > >>>> used a 6GB heap on some of my production servers in the past. The > > >>>> server where I got this screenshot was not having any noticeable > > >>>> performance or memory problems, so I think that I can trust that the > > >>>> main numbers above the process list (which only come from the OS) are > > >>>> correct. > > >>>> > > >>>> Thanks, > > >>>> Shawn > > >>>> > > >>>> > > >> > > >> > > >> > > >> -- > > >> Regards, > > >> Shalin Shekhar Mangar. > > >> > > >