I haven't looked at reproducing this locally, but since it seems like
there haven't been any new ideas decided to share this in case it
helps:

I noticed in Travis CI [1] they are adding the environment variable
MALLOC_ARENA_MAX=2 and so I googled what that configuration did. To my
surprise, I came across a stackoverflow post [2] about how glibc could
actually be the case and report memory differently. I then found a
Hadoop issue HADOOP-7154 [3] about setting this as well to reduce
virtual memory usage. I found some more cases where this has helped as
well [4], [5], and [6]

[1] https://docs.travis-ci.com/user/build-environment-updates/2017-09-06/#Added
[2] 
https://stackoverflow.com/questions/10575342/what-would-cause-a-java-process-to-greatly-exceed-the-xmx-or-xss-limit
[3] https://issues.apache.org/jira/browse/HADOOP-7154?focusedCommentId=14505792
[4] https://github.com/cloudfoundry/java-buildpack/issues/320
[5] https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
[6] 
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en
Kevin Risden


On Thu, Aug 24, 2017 at 10:19 AM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
> Hello Bernd,
>
> According to the man page, i should get a list of stuff in shared memory if i 
> invoke it with just a PID. Which shows a list of libraries that together 
> account for about 25 MB's shared memory usage. Accoring to ps and top, the 
> JVM uses 2800 MB shared memory (not virtual), that leaves 2775 MB unaccounted 
> for. Any ideas? Anyone else to reproduce it on a freshly restarted node?
>
> Thanks,
> Markus
>
>
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 18901 markus    20   0 14,778g 4,965g 2,987g S 891,1 31,7  20:21.63 java
>
> 0x000055b9a17f1000      6K      /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
> 0x00007fdf1d314000      182K    
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libsunec.so
> 0x00007fdf1e548000      38K     
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libmanagement.so
> 0x00007fdf1e78e000      94K     
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
> 0x00007fdf1e9a6000      75K     
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnio.so
> 0x00007fdf5cd6e000      34K     
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libzip.so
> 0x00007fdf5cf77000      46K     /lib/x86_64-linux-gnu/libnss_files-2.24.so
> 0x00007fdf5d189000      46K     /lib/x86_64-linux-gnu/libnss_nis-2.24.so
> 0x00007fdf5d395000      90K     /lib/x86_64-linux-gnu/libnsl-2.24.so
> 0x00007fdf5d5ae000      34K     /lib/x86_64-linux-gnu/libnss_compat-2.24.so
> 0x00007fdf5d7b7000      187K    
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjava.so
> 0x00007fdf5d9e6000      70K     
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libverify.so
> 0x00007fdf5dbf8000      30K     /lib/x86_64-linux-gnu/librt-2.24.so
> 0x00007fdf5de00000      90K     /lib/x86_64-linux-gnu/libgcc_s.so.1
> 0x00007fdf5e017000      1063K   /lib/x86_64-linux-gnu/libm-2.24.so
> 0x00007fdf5e320000      1553K   /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
> 0x00007fdf5e6a8000      15936K  
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> 0x00007fdf5f5ed000      139K    /lib/x86_64-linux-gnu/libpthread-2.24.so
> 0x00007fdf5f80b000      14K     /lib/x86_64-linux-gnu/libdl-2.24.so
> 0x00007fdf5fa0f000      110K    /lib/x86_64-linux-gnu/libz.so.1.2.11
> 0x00007fdf5fc2b000      1813K   /lib/x86_64-linux-gnu/libc-2.24.so
> 0x00007fdf5fff2000      58K     
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/jli/libjli.so
> 0x00007fdf60201000      158K    /lib/x86_64-linux-gnu/ld-2.24.so
>
> -----Original message-----
>> From:Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
>> Sent: Thursday 24th August 2017 15:39
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr uses lots of shared memory!
>>
>> Just an idea, how about taking a dump with jmap and using
>> MemoryAnalyzerTool to see what is going on?
>>
>> Regards
>> Bernd
>>
>>
>> Am 24.08.2017 um 11:49 schrieb Markus Jelsma:
>> > Hello Shalin,
>> >
>> > Yes, the main search index has DocValues on just a few fields, they are 
>> > used for facetting and function queries, we started using DocValues when 
>> > 6.0 was released. Most fields are content fields for many languages. I 
>> > don't think it is going to be DocValues because the max shared memory 
>> > consumption is reduced my searching on fields fewer languages, and by 
>> > disabling highlighting, both not using DocValues.
>> >
>> > But it tried the option regardless, and because i didn't know about it. 
>> > But it appears the option does exactly nothing. First is without any 
>> > configuration for preload, second is with preload=true, third is 
>> > preload=false
>> >
>> > 14220 markus    20   0 14,675g 1,508g  62800 S   1,0  9,6   0:36.98 java
>> > 14803 markus    20   0 14,674g 1,537g  63248 S   0,0  9,8   0:34.50 java
>> > 15324 markus    20   0 14,674g 1,409g  63152 S   0,0  9,0   0:35.50 java
>> >
>> > Please correct my config is i am wrong:
>> >
>> >   <directoryFactory name="DirectoryFactory" 
>> > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
>> >      <bool name="preload">false</bool>
>> >   </directoryFactory>
>> >
>> > NRTCachingDirectoryFactory implies MMapDirectory right?
>> >
>> > Thanks,
>> > Markus
>> >
>> > -----Original message-----
>> >> From:Shalin Shekhar Mangar <shalinman...@gmail.com>
>> >> Sent: Thursday 24th August 2017 5:51
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: Solr uses lots of shared memory!
>> >>
>> >> Very interesting. Do you have many DocValue fields? Have you always
>> >> had them i.e. did you see this problem before you turned on DocValues?
>> >> The DocValue fields are in a separate file and they will be memory
>> >> mapped on demand. One thing you can experiment with is to use
>> >> preload=true option on the MMapDirectoryFactory which will mmap all
>> >> index files on startup [1]. Once you do this, and if you still notice
>> >> shared memory leakage then it may be a genuine memory leak that we
>> >> should investigate.
>> >>
>> >> [1] - 
>> >> http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex
>> >>
>> >> On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma
>> >> <markus.jel...@openindex.io> wrote:
>> >>> I do not think it is a problem of reporting after watching top after 
>> >>> restart of some Solr instances, it dropped back to `normal`, around 350 
>> >>> MB, which i think it high to but anyway.
>> >>>
>> >>> Two hours later, the restarted nodes are slowly increasing shared memory 
>> >>> consumption to about 1500 MB now. I don't understand why shared memory 
>> >>> usage should/would increase slowly over time, it makes little sense to 
>> >>> me and i cannot remember Solr doing this in the past ten years.
>> >>>
>> >>> But it seems to correlate to index size on disk, these main text search 
>> >>> nodes have an index of around 16 GB and up 3 GB of shared memory after a 
>> >>> few days. Logs nodes up to 800 MB index size and 320 MB of shared 
>> >>> memory, the low latency nodes have four different cores that make up 
>> >>> just over 100 MB index size, shared memory consumption is just 22 MB, 
>> >>> which seems more reasonable for the case of shared memory.
>> >>>
>> >>> I can also force Solr to 'leak' shared memory just by sending queries to 
>> >>> it. My freshly restarted local node used 68 MB shared memory at startup. 
>> >>> Two minutes and 25.000 queries later it was already 2748 MB! At first 
>> >>> there is a very sharp increase to 2000, then it takes almost two minutes 
>> >>> more to increase to 2748. I can decrease the maximum shared memory usage 
>> >>> to 1200 if i query (via edismax) only on fields of one language instead 
>> >>> of 25 orso. I can decrease it as well further if i disable highlighting 
>> >>> (HUH?) but still query on all fields.
>> >>>
>> >>> * We have tried patching Java's ByteBuffer [1] because it seemed to fit 
>> >>> the problems, it does not fix it.
>> >>> * We have also removed all our custom plugins, so it has become a 
>> >>> vanilla Solr 6.6 just with our stripped down schema and solrconfig, it 
>> >>> neither fixes it.
>> >>>
>> >>> Why does it slowly increase over time?
>> >>> Why does it appear to correlate to index size?
>> >>> Is anyone else seeing this on their 6.6 cloud production or local 
>> >>> machines?
>> >>>
>> >>> Thanks,
>> >>> Markus
>> >>>
>> >>> [1]: http://www.evanjones.ca/java-bytebuffer-leak.html
>> >>>
>> >>> -----Original message-----
>> >>>> From:Shawn Heisey <apa...@elyograg.org>
>> >>>> Sent: Tuesday 22nd August 2017 17:32
>> >>>> To: solr-user@lucene.apache.org
>> >>>> Subject: Re: Solr uses lots of shared memory!
>> >>>>
>> >>>> On 8/22/2017 7:24 AM, Markus Jelsma wrote:
>> >>>>> I have never seen this before, one of our collections, all nodes 
>> >>>>> eating tons of shared memory!
>> >>>>>
>> >>>>> Here's one of the nodes:
>> >>>>> 10497 solr      20   0 19.439g 4.505g 3.139g S   1.0 57.8   2511:46 
>> >>>>> java
>> >>>>>
>> >>>>> RSS is roughly equal to heap size + usual off-heap space + shared 
>> >>>>> memory. Virtual is equal to RSS and index size on disk. For two other 
>> >>>>> collections, the nodes use shared memory as expected, in the MB range.
>> >>>>>
>> >>>>> How can Solr, this collection, use so much shared memory? Why?
>> >>>>
>> >>>> I've seen this on my own servers at work, and when I add up a subset of
>> >>>> the memory numbers I can see from the system, it ends up being more
>> >>>> memory than I even have in the server.
>> >>>>
>> >>>> I suspect there is something odd going on in how Java reports memory
>> >>>> usage to the OS, or maybe a glitch in how Linux interprets Java's memory
>> >>>> usage.  At some point in the past, numbers were reported correctly.  I
>> >>>> do not know if the change came about because of a Solr upgrade, because
>> >>>> of a Java upgrade, or because of an OS kernel upgrade.  All three were
>> >>>> upgraded between when I know the numbers looked right and when I noticed
>> >>>> they were wrong.
>> >>>>
>> >>>> https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
>> >>>>
>> >>>> This screenshot shows that Solr is using 17GB of memory, 41.45GB of
>> >>>> memory is being used by the OS disk cache, and 10.23GB of memory is
>> >>>> free.  Add those up, and it comes to 68.68GB ... but the machine only
>> >>>> has 64GB of memory, and that total doesn't include the memory usage of
>> >>>> the other processes seen in the screenshot.  This impossible situation
>> >>>> means that something is being misreported somewhere.  If I deduct that
>> >>>> 11GB of SHR from the RES value, then all the numbers work.
>> >>>>
>> >>>> The screenshot was almost 3 years ago, so I do not know what machine it
>> >>>> came from, and therefore I can't be sure what the actual heap size was.
>> >>>> I think it was about 6GB -- the difference between RES and SHR.  I have
>> >>>> used a 6GB heap on some of my production servers in the past.  The
>> >>>> server where I got this screenshot was not having any noticeable
>> >>>> performance or memory problems, so I think that I can trust that the
>> >>>> main numbers above the process list (which only come from the OS) are
>> >>>> correct.
>> >>>>
>> >>>> Thanks,
>> >>>> Shawn
>> >>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Shalin Shekhar Mangar.
>> >>
>>

Reply via email to