Re: System Out of Memory

Eugene Strokin Tue, 26 Apr 2016 12:46:55 -0700

Right, this is the region I'm still using. And the disk store looks like
this:


Cache cache = new CacheFactory()
.set("locators", LOCATORS.get())
.set("start-locator", LOCATOR_IP.get()+"["+LOCATOR_PORT.get()+"]")
.set("bind-address", LOCATOR_IP.get())
.create();

cache.createDiskStoreFactory()
.setMaxOplogSize(500)
.setDiskDirsAndSizes(new File[] { new File("/opt/ccio/geode/store") }, new
int[] { 18000 })
.setCompactionThreshold(95)
.create("-ccio-store");

RegionFactory<String, byte[]> regionFactory = cache.createRegionFactory();
Region<String, byte[]> region = regionFactory
.setDiskStoreName("-ccio-store")
.setDataPolicy(DataPolicy.PERSISTENT_PARTITION)
.setOffHeap(false)
.setMulticastEnabled(false)
.setCacheLoader(new AwsS3CacheLoader())
.create("ccio-images");

I thought, since I have disk store specified the overflow if set.
Please correct me if I'm wrong.

Thank you,
Eugene

On Tue, Apr 26, 2016 at 3:40 PM, Udo Kohlmeyer <[email protected]>
wrote:

> Hi there Eugene,
>
> Geode will try and keep as much data in memory as it can, depending on LRU
> eviction strategy. Once data is overflowed to disk, the memory for the
> "value" would be freed up once GC has run.
>
> Is this still the correct region configuration you are using?
>
>  Region<String, byte[]> region = regionFactory
> .setDiskStoreName("-ccio-store")
> .setDataPolicy(DataPolicy.PERSISTENT_PARTITION)
> .setOffHeap(false)
> .setMulticastEnabled(false)
> .setCacheLoader(new AwsS3CacheLoader())
> .create("ccio-images");
>
> If not could you please provide your current config you are testing with?
> Because this config does not enable overflow.
>
>
> <http://geode.docs.pivotal.io/docs/reference/topics/memory_requirements_guidelines_and_calc.html#topic_ac4_mtz_j4>
> --Udo
>
>
> On 27/04/2016 4:51 am, Eugene Strokin wrote:
>
> Right, I do have 1432 objects in my cache. But I thought, only the keys
> will be in the memory, but the actual data would still be on the disk, and
> when a client would try to get it, the data would be retrieved from the
> storage.
> I'm expecting to keep millions records in the cache, but I don't have
> memory to keep all of them in there, so I've set up overflow to the disk,
> assuming, that the memory will be freed up when more and more data would be
> coming.
> Is my assumption wrong? Or I do need to have RAM for all the data?
>
> Thanks,
> Eugene
>
>
> On Tue, Apr 26, 2016 at 2:04 PM, Barry Oglesby <[email protected]>
> wrote:
>
>> The VersionedThinDiskRegionEntryHeapObjectKey are your region entries
>> (your data). When you restart your server, it recovers that data from disk
>> and stores it in those Region entries. Are you not meaning to persist your
>> data?
>>
>> If I run a quick test with 1432 objects with ~120k data size and
>> non-primitive keys, a histogram shows output like below. I deleted most of
>> the lines that are not relevant. You can see there are 1432
>> VersionedThinDiskRegionEntryHeapObjectKeys, TradeKeys (my key) and
>> VMCachedDeserializables (these are wrappers on the value). You should see
>> something similar. The byte arrays and character arrays are most of my data.
>>
>> If you configure your regions to not be persistent, you won't see any of
>> this upon recovery.
>>
>>  num     #instances         #bytes  class name
>> ----------------------------------------------
>>    1:          3229      172532264  [B
>>    2:         37058        3199464  [C
>>   27:          1432          80192
>>  
>> com.gemstone.gemfire.internal.cache.VersionedThinDiskRegionEntryHeapObjectKey
>>   41:          1432          34368  TradeKey
>>   42:          1432          34368
>>  com.gemstone.gemfire.internal.cache.VMCachedDeserializable
>> Total        256685      184447072
>>
>>
>> Thanks,
>> Barry Oglesby
>>
>>
>> On Tue, Apr 26, 2016 at 10:09 AM, Eugene Strokin < <[email protected]>
>> [email protected]> wrote:
>>
>>> Digging more into the problem, I've found that 91% of heap is taken by:
>>>
>>> 1,432 instances of
>>> *"com.gemstone.gemfire.internal.cache.VersionedThinDiskRegionEntryHeapObjectKey"*,
>>> loaded by *"sun.misc.Launcher$AppClassLoader @ 0xef589a90"* occupy 
>>> *121,257,480
>>> (91.26%)* bytes. These instances are referenced from one instance of
>>> *"com.gemstone.gemfire.internal.cache.ProxyBucketRegion[]"*, loaded by 
>>> *"sun.misc.Launcher$AppClassLoader
>>> @ 0xef589a90"*
>>>
>>> *Keywords*
>>>
>>> sun.misc.Launcher$AppClassLoader @ 0xef589a90
>>>
>>>
>>> com.gemstone.gemfire.internal.cache.VersionedThinDiskRegionEntryHeapObjectKey
>>>
>>> com.gemstone.gemfire.internal.cache.ProxyBucketRegion[]
>>>
>>>
>>> 1,432 instances doesn't sound like a lot, but looks like those are big
>>> instances, about 121k each. Maybe something wrong with my configuration,
>>> and I can limit creating such instances?
>>>
>>> Thanks,
>>> Eugene
>>>
>>> On Mon, Apr 25, 2016 at 4:19 PM, Jens Deppe < <[email protected]>
>>> [email protected]> wrote:
>>>
>>>> I think you're looking at the wrong info in ps.
>>>>
>>>> What you're showing is the Virtual size (vsz) of memory. This is how
>>>> much the process has requested, but that does not mean it is actually using
>>>> it. In fact, your output says that Java has reserved 3Gb of memory, not
>>>> 300Mb! You should instead look at the Resident Set Size (rss option) as
>>>> that will give you a much more accurate picture of what is actually using
>>>> real memory.
>>>>
>>>> Also, remember that the JVM also needs memory for loaded code (jars and
>>>> classes), JITed code, thread stacks, etc. so when setting your heap size
>>>> you should take that into account too.
>>>>
>>>> Finally, especially on virtualized hardware and doubly so on small
>>>> configs, make sure you *never, ever* end up swapping because that will
>>>> really kill your performance.
>>>>
>>>> --Jens
>>>>
>>>> On Mon, Apr 25, 2016 at 12:32 PM, Anilkumar Gingade <
>>>> <[email protected]>[email protected]> wrote:
>>>>
>>>>> >> It joined the cluster, and loaded data from overflow files.
>>>>> Not sure if this makes the OS file-system (disk buffer/cache) to
>>>>> consume memory...
>>>>> When you say overflow, I am assuming you are initializing the
>>>>> data/regions using persistence files, if so can you try without the
>>>>> persistence...
>>>>>
>>>>> -Anil.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 25, 2016 at 12:18 PM, Eugene Strokin <
>>>>> <[email protected]>[email protected]> wrote:
>>>>>
>>>>>> And when I'm checking memory usage per process, it looks normal, java
>>>>>> took only 300Mb as it supposed to, but free -m still shows no memory:
>>>>>>
>>>>>> # ps axo pid,vsz,comm=|sort -n -k 2
>>>>>>   PID    VSZ
>>>>>>   465  26396 systemd-logind
>>>>>>   444  26724 dbus-daemon
>>>>>>   454  27984 avahi-daemon
>>>>>>   443  28108 avahi-daemon
>>>>>>   344  32720 systemd-journal
>>>>>>     1  41212 systemd
>>>>>>   364  43132 systemd-udevd
>>>>>> 27138  52688 sftp-server
>>>>>>   511  53056 wpa_supplicant
>>>>>>   769  82548 sshd
>>>>>> 30734  83972 sshd
>>>>>>  1068  91128 master
>>>>>> 28534  91232 pickup
>>>>>>  1073  91300 qmgr
>>>>>>   519 110032 agetty
>>>>>> 27029 115380 bash
>>>>>> 27145 115380 bash
>>>>>> 30736 116440 sort
>>>>>>   385 116720 auditd
>>>>>>   489 126332 crond
>>>>>> 30733 139624 sshd
>>>>>> 27027 140840 sshd
>>>>>> 27136 140840 sshd
>>>>>> 27143 140840 sshd
>>>>>> 30735 148904 ps
>>>>>>   438 242360 rsyslogd
>>>>>>   466 447932 NetworkManager
>>>>>>   510 527448 polkitd
>>>>>>   770 553060 tuned
>>>>>> 30074 2922460 java
>>>>>>
>>>>>> # free -m
>>>>>>               total        used        free      shared  buff/cache
>>>>>> available
>>>>>> Mem:            489         424           5           0          58
>>>>>>        41
>>>>>> Swap:           255          57         198
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 25, 2016 at 2:52 PM, Eugene Strokin <
>>>>>> <[email protected]>[email protected]> wrote:
>>>>>>
>>>>>>> thanks for your help, but I still struggling with the System
>>>>>>> OOMKiller issue.
>>>>>>> I was doing more digging. And still couldn't find the problem.
>>>>>>> All settings are normal overcommit_memory=0, overcommit_ratio=50.
>>>>>>> free -m before the process starts:
>>>>>>>
>>>>>>> # free -m
>>>>>>>               total        used        free      shared  buff/cache
>>>>>>>   available
>>>>>>> Mem:            489          25         399           1          63
>>>>>>>         440
>>>>>>> Swap:           255          57         198
>>>>>>>
>>>>>>> I start my process like this:
>>>>>>>
>>>>>>> *java* -server -Xmx300m -Xms300m -XX:+HeapDumpOnOutOfMemoryError
>>>>>>> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=55 -jar
>>>>>>> /opt/ccio-image.jar
>>>>>>>
>>>>>>> So, I should still have about 99Mb of free memory, but:
>>>>>>>
>>>>>>> # free -m
>>>>>>>               total        used        free      shared  buff/cache
>>>>>>>   available
>>>>>>> Mem:            489         409           6           1          73
>>>>>>>          55
>>>>>>> Swap:           255          54         201
>>>>>>>
>>>>>>> And I didn't even make a single call to the process yet. It joined
>>>>>>> the cluster, and loaded data from overflow files. And all my free 
>>>>>>> memory is
>>>>>>> gone. Even though I've set 300Mb max for Java.
>>>>>>> As I mentioned before, I've set off-heap memory setting to false:
>>>>>>>
>>>>>>> Cache cache = new CacheFactory()
>>>>>>> .set("locators", LOCATORS.get())
>>>>>>> .set("start-locator", LOCATOR_IP.get()+"["+LOCATOR_PORT.get()+"]")
>>>>>>> .set("bind-address", LOCATOR_IP.get())
>>>>>>> .create();
>>>>>>>
>>>>>>> cache.createDiskStoreFactory()
>>>>>>> .setMaxOplogSize(500)
>>>>>>> .setDiskDirsAndSizes(new File[] { new File("/opt/ccio/geode/store")
>>>>>>> }, new int[] { 18000 })
>>>>>>> .setCompactionThreshold(95)
>>>>>>> .create("-ccio-store");
>>>>>>>
>>>>>>> RegionFactory<String, byte[]> regionFactory =
>>>>>>> cache.createRegionFactory();
>>>>>>>
>>>>>>> Region<String, byte[]> region = regionFactory
>>>>>>> .setDiskStoreName("-ccio-store")
>>>>>>> .setDataPolicy(DataPolicy.PERSISTENT_PARTITION)
>>>>>>> .setOffHeap(false)
>>>>>>> .setMulticastEnabled(false)
>>>>>>> .setCacheLoader(new AwsS3CacheLoader())
>>>>>>> .create("ccio-images");
>>>>>>>
>>>>>>> I don't understand how the memory is getting overcommitted.
>>>>>>>
>>>>>>> Eugene
>>>>>>>
>>>>>>> On Fri, Apr 22, 2016 at 8:03 PM, Barry Oglesby <
>>>>>>> <[email protected]>[email protected]> wrote:
>>>>>>>
>>>>>>>> The OOM killer uses the overcommit_memory and overcommit_ratio
>>>>>>>> parameters to determine if / when to kill a process.
>>>>>>>>
>>>>>>>> What are the settings for these parameters in your environment?
>>>>>>>>
>>>>>>>> The defaults are 0 and 50.
>>>>>>>>
>>>>>>>> cat /proc/sys/vm/overcommit_memory
>>>>>>>> 0
>>>>>>>>
>>>>>>>> cat /proc/sys/vm/overcommit_ratio
>>>>>>>> 50
>>>>>>>>
>>>>>>>> How much free memory is available before you start the JVM?
>>>>>>>>
>>>>>>>> How much free memory is available when your process is killed?
>>>>>>>>
>>>>>>>> You can monitor free memory using either free or vmstat before and
>>>>>>>> during your test.
>>>>>>>>
>>>>>>>> Run free -m in a loop to monitor free memory like:
>>>>>>>>
>>>>>>>> free -ms2
>>>>>>>>              total       used       free     shared    buffers
>>>>>>>> cached
>>>>>>>> Mem:        290639      35021     255617          0       9215
>>>>>>>>  21396
>>>>>>>> -/+ buffers/cache:       4408     286230
>>>>>>>> Swap:        20473          0      20473
>>>>>>>>
>>>>>>>> Run vmstat in a loop to monitor memory like:
>>>>>>>>
>>>>>>>> vmstat -SM 2
>>>>>>>> procs -----------memory---------- ---swap-- -----io---- --system--
>>>>>>>> -----cpu-----
>>>>>>>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs
>>>>>>>> us sy id wa st
>>>>>>>>  0  0      0 255619   9215  21396    0    0     0    23    0    0
>>>>>>>>  2  0 98  0  0
>>>>>>>>  0  0      0 255619   9215  21396    0    0     0     0  121  198
>>>>>>>>  0  0 100  0  0
>>>>>>>>  0  0      0 255619   9215  21396    0    0     0     0  102  189
>>>>>>>>  0  0 100  0  0
>>>>>>>>  0  0      0 255619   9215  21396    0    0     0     0  110  195
>>>>>>>>  0  0 100  0  0
>>>>>>>>  0  0      0 255619   9215  21396    0    0     0     0  117  205
>>>>>>>>  0  0 100  0  0
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Barry Oglesby
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 22, 2016 at 4:44 PM, Dan Smith < <[email protected]>
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> The java metaspace will also take up memory. Maybe try setting
>>>>>>>>> -XX:MaxMetaspaceSize
>>>>>>>>>
>>>>>>>>> -Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------- Original message --------
>>>>>>>>> From: Eugene Strokin < <[email protected]>[email protected]>
>>>>>>>>> Date: 4/22/2016 4:34 PM (GMT-08:00)
>>>>>>>>> To: <[email protected]>
>>>>>>>>> [email protected]
>>>>>>>>> Subject: Re: System Out of Memory
>>>>>>>>>
>>>>>>>>> The machine is small, it has only 512mb RAM, plus 256mb swap.
>>>>>>>>> But java is set max heap size to 400mb. I've tried less, no help.
>>>>>>>>> And the most interesting part is that I don't see Java OOM Exceptions 
>>>>>>>>> at
>>>>>>>>> all. I even included a code with memory leak, and I saw the Java OOM
>>>>>>>>> Exceptions before the java process got killed then.
>>>>>>>>> I've browsed internet, and some people are actually noticed the
>>>>>>>>> same problem with other frameworks, not Geode. So, I'm suspecting this
>>>>>>>>> could be not Geode, but Geode was the first suspect because it has 
>>>>>>>>> off-heap
>>>>>>>>> storage feature. They say that there was a memory leak, but for some 
>>>>>>>>> reason
>>>>>>>>> OS was killing the process even before Java was getting OOM,
>>>>>>>>> I'll connect with JProbe, and will be monitoring the system with
>>>>>>>>> the console. Will let you know if I'll find something interesting.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Eugene
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Apr 22, 2016 at 5:55 PM, Dan Smith < <[email protected]>
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> What's your -Xmx for your JVM set to, and how much memory does
>>>>>>>>>> your
>>>>>>>>>> droplet have? Does it have any swap space? My guess is you need to
>>>>>>>>>> reduce the heap size of your JVM and the OS is killing your
>>>>>>>>>> process
>>>>>>>>>> because there is not enough memory left.
>>>>>>>>>>
>>>>>>>>>> -Dan
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 22, 2016 at 1:55 PM, Darrel Schneider <
>>>>>>>>>> <[email protected]>[email protected]> wrote:
>>>>>>>>>> > I don't know why your OS would be killing your process which
>>>>>>>>>> seems like your
>>>>>>>>>> > main problem.
>>>>>>>>>> >
>>>>>>>>>> > But I did want you to know that if you don't have any regions
>>>>>>>>>> with
>>>>>>>>>> > off-heap=true then you have no reason to have
>>>>>>>>>> off-heap-memory-size to be set
>>>>>>>>>> > to anything other than 0.
>>>>>>>>>> >
>>>>>>>>>> > On Fri, Apr 22, 2016 at 12:48 PM, Eugene Strokin <
>>>>>>>>>> <[email protected]>[email protected]>
>>>>>>>>>> > wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> I'm running load tests on the Geode cluster I've built.
>>>>>>>>>> >> The OS is killing my process occasionally, complaining that
>>>>>>>>>> the process
>>>>>>>>>> >> takes too much memory:
>>>>>>>>>> >>
>>>>>>>>>> >> # dmesg
>>>>>>>>>> >> [ 2544.932226 <%5B%202544.932226>] Out of memory: Kill
>>>>>>>>>> process 5382 (java) score 780 or
>>>>>>>>>> >> sacrifice child
>>>>>>>>>> >> [ 2544.933591 <%5B%202544.933591>] Killed process 5382 (java)
>>>>>>>>>> total-vm:3102804kB,
>>>>>>>>>> >> anon-rss:335780kB, file-rss:0kB
>>>>>>>>>> >>
>>>>>>>>>> >> Java doesn't have any problems, I don't see OOM exception.
>>>>>>>>>> >> Looks like Geode is using off-heap memory. But I set offHeap
>>>>>>>>>> to false for
>>>>>>>>>> >> my region, and I do have only one region:
>>>>>>>>>> >>
>>>>>>>>>> >> RegionFactory<String, byte[]> regionFactory =
>>>>>>>>>> cache.createRegionFactory();
>>>>>>>>>> >> regionFactory
>>>>>>>>>> >> .setDiskStoreName("-ccio-store")
>>>>>>>>>> >> .setDataPolicy(DataPolicy.PERSISTENT_PARTITION)
>>>>>>>>>> >> .setOffHeap(false)
>>>>>>>>>> >> .setCacheLoader(new AwsS3CacheLoader());
>>>>>>>>>> >>
>>>>>>>>>> >> Also, I've played with off-heap-memory-size setting, setting
>>>>>>>>>> it to small
>>>>>>>>>> >> number like 20M to prevent Geode to take too much off-heap
>>>>>>>>>> memory, but
>>>>>>>>>> >> result is the same.
>>>>>>>>>> >>
>>>>>>>>>> >> Do you have any other ideas what could I do here? I'm stack at
>>>>>>>>>> this point.
>>>>>>>>>> >>
>>>>>>>>>> >> Thank you,
>>>>>>>>>> >> Eugene
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: System Out of Memory

Reply via email to