Pavel,

I have run into a different instance of a memory out of error in a data
region in a different context from the one I wrote the reproducer for. In
this case, there is an activity which queues items for processing at a
point in the future and which does use a continuous query, however there is
also significant vanilla put/get activity against a range of other caches..

This data region was permitted to grow to 1Gb and has persistence enabled.
We are now using Ignite 2.8

I would like to understand if this is a possible failure mode given that
the data region has persistence enabled. The underlying cause appears to be
'Unable to find a page for eviction'. Should this be expected on data
regions with persistence?

I have included the error below.

This is the initial error reported by Ignite:

2020-06-11 12:53:35,082 [98] ERR [ImmutableCacheComputeServer] JVM will be
halted immediately due to the failure: [failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
Failed to find a page for eviction [segmentCapacity=13612, loaded=5417,
maxDirtyPages=4063, dirtyPages=5417, cpPages=0, pinnedInSegment=0,
failedToPrepare=5417]
Out of memory in data region [name=Default-Immutable, initSize=128.0 MiB,
maxSize=1.0 GiB, persistenceEnabled=true] Try the following:
  ^-- Increase maximum off-heap memory size
(DataRegionConfiguration.maxSize)
  ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
  ^-- Enable eviction or expiration policies]]

Following this error is a lock dump, where this is the only thread with a
lock:(I am assuming the structureId member with the value
'Spatial-SubGridSegment-Mutable-602' refers to a remote actor holding a
lock against an item in the local node )

Thread=[name=sys-stripe-11-#12%TRex-Immutable%, id=26], state=RUNNABLE
Locked pages = [284060547022916[0001025a00000044](r=0|w=1)]
Locked pages log: name=sys-stripe-11-#12%TRex-Immutable%
time=(1591836815071, 2020-06-11 12:53:35.071)
L=1 -> Write lock pageId=284060547022916,
structureId=Spatial-SubGridSegment-Mutable-602 [pageIdHex=0001025a00000044,
partId=602, pageIdx=68, flags=00000001]

Following the lock dump is this final error before the Ignite node stops:

2020-06-11 12:53:35,082 [98] ERR [ImmutableCacheComputeServer] JVM will be
halted immediately due to the failure: [failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
Failed to find a page for eviction [segmentCapacity=13612, loaded=5417,
maxDirtyPages=4063, dirtyPages=5417, cpPages=0, pinnedInSegment=0,
failedToPrepare=5417]
Out of memory in data region [name=Default-Immutable, initSize=128.0 MiB,
maxSize=1.0 GiB, persistenceEnabled=true] Try the following:
  ^-- Increase maximum off-heap memory size
(DataRegionConfiguration.maxSize)
  ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
  ^-- Enable eviction or expiration policies]]




On Wed, May 13, 2020 at 2:15 AM Raymond Wilson <raymond_wil...@trimble.com>
wrote:

> Hi Pavel,
>
> The reproducer is not the actual use case which is too big to use - it's a
> small example using the same mechanisms. I have not used a data streamer
> before, I'll read up on it.
>
> I'll try running the reproducer again against 2.8 (I used 2.7.6 for the
> reproducer).
>
> Thanks,
> Raymond.
>
>
> On Tue, May 12, 2020 at 11:18 PM Pavel Tupitsyn <ptupit...@apache.org>
> wrote:
>
>> Hi Raymond,
>>
>> First, I could not reproduce the issue. Attached program runs to
>> completion on my machine.
>>
>> Second, I see a few issues with the attached code:
>> - Cache.PutIfAbsent is used instead of DataStreamer
>> - ICacheEntryEventFilter is used to remove cache entries, and is called
>> twice - on add and on remove
>>
>> My recommendation is to use a "classic" combination of Data Streamer,
>> Continuous Query, and Expiry Policy.
>> Set expiry policy to a few seconds, and you won't keep much data in
>> memory. Ignite will handle the removal for you.
>> Let me know if I should prepare an example.
>>
>> Also it is not clear why persistence is needed for such a "buffer" cache
>> - items are removed almost immediately,
>> it would be much more efficient to disable persistence.
>>
>> Thanks,
>> Pavel
>>
>> On Tue, May 12, 2020 at 12:23 PM Raymond Wilson <
>> raymond_wil...@trimble.com> wrote:
>>
>>> Well, it appears I was wrong. It reappeared. :(
>>>
>>> I thought I had sent a reply to this thread but cannot find it, so I am
>>> resending it now.
>>>
>>> Attached is a c# reproducer that throws Ignite out of memory errors in
>>> the situation I outlined above where cache operations against a small cache
>>> with persistence enabled.
>>>
>>> Let me know if you're able to reproduce it on your local systems.
>>>
>>> Thanks,
>>> Raymond.
>>>
>>>
>>> On Tue, Mar 3, 2020 at 1:31 PM Raymond Wilson <
>>> raymond_wil...@trimble.com> wrote:
>>>
>>>> It's possible this is user (me) error.
>>>>
>>>> I discovered I had set the cache size to be 64Mb in the server, but
>>>> 65Mb (typo!) in the client. Making these two values consistent appeared to
>>>> prevent the error.
>>>>
>>>> Raymond.
>>>>
>>>>
>>>> On Tue, Mar 3, 2020 at 12:58 PM Raymond Wilson <
>>>> raymond_wil...@trimble.com> wrote:
>>>>
>>>>> I'm using Ignite v2.7.5 with C# client.
>>>>>
>>>>> I have an error where Ignite throws an out of memory exception, like
>>>>> this:
>>>>>
>>>>> 2020-03-03 12:02:58,036 [287] ERR [MutableCacheComputeServer] JVM will
>>>>> be halted immediately due to the failure: [failureCtx=FailureContext
>>>>> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: 
>>>>> Out
>>>>> of memory in data region [name=TAGFileBufferQueue, initSize=64.0 MiB,
>>>>> maxSize=64.0 MiB, persistenceEnabled=true] Try the following:
>>>>>   ^-- Increase maximum off-heap memory size
>>>>> (DataRegionConfiguration.maxSize)
>>>>>   ^-- Enable Ignite persistence
>>>>> (DataRegionConfiguration.persistenceEnabled)
>>>>>   ^-- Enable eviction or expiration policies]]
>>>>>
>>>>> I don't have an eviction policy set (is this even a valid
>>>>> recommendation when using persistence?)
>>>>>
>>>>> Increasing the off heap memory size for the data region does prevent
>>>>> this error, but I want to minimise the in-memory size for this buffer as 
>>>>> it
>>>>> is essentially just a queue.
>>>>>
>>>>> The suggestion of enabling data persistence is strange as this data
>>>>> region has already persistence enabled for it.
>>>>>
>>>>> My assumption is that Ignite manages the memory in this cache by
>>>>> saving and loading values as required.
>>>>>
>>>>> The test workflow in this failure is one where ~14,500 objects
>>>>> totalling ~440 Mb in size (avery object size = ~30Kb) are added to the
>>>>> cache, and are then drained by a processors using a continuous query.
>>>>> Elements are removed from the cache as the processor completes them.
>>>>>
>>>>> Is this kind of out of memory error supposed to be possible when using
>>>>> persistent data regions?
>>>>>
>>>>> Thanks,
>>>>> Raymond.
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>> 11 Birmingham Drive | Christchurch, New Zealand
>>> +64-21-2013317 Mobile
>>> raymond_wil...@trimble.com
>>>
>>>
>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>
>>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>


-- 
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wil...@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Reply via email to