Pavel, I have run into a different instance of a memory out of error in a data region in a different context from the one I wrote the reproducer for. In this case, there is an activity which queues items for processing at a point in the future and which does use a continuous query, however there is also significant vanilla put/get activity against a range of other caches..
This data region was permitted to grow to 1Gb and has persistence enabled. We are now using Ignite 2.8 I would like to understand if this is a possible failure mode given that the data region has persistence enabled. The underlying cause appears to be 'Unable to find a page for eviction'. Should this be expected on data regions with persistence? I have included the error below. This is the initial error reported by Ignite: 2020-06-11 12:53:35,082 [98] ERR [ImmutableCacheComputeServer] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=13612, loaded=5417, maxDirtyPages=4063, dirtyPages=5417, cpPages=0, pinnedInSegment=0, failedToPrepare=5417] Out of memory in data region [name=Default-Immutable, initSize=128.0 MiB, maxSize=1.0 GiB, persistenceEnabled=true] Try the following: ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize) ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled) ^-- Enable eviction or expiration policies]] Following this error is a lock dump, where this is the only thread with a lock:(I am assuming the structureId member with the value 'Spatial-SubGridSegment-Mutable-602' refers to a remote actor holding a lock against an item in the local node ) Thread=[name=sys-stripe-11-#12%TRex-Immutable%, id=26], state=RUNNABLE Locked pages = [284060547022916[0001025a00000044](r=0|w=1)] Locked pages log: name=sys-stripe-11-#12%TRex-Immutable% time=(1591836815071, 2020-06-11 12:53:35.071) L=1 -> Write lock pageId=284060547022916, structureId=Spatial-SubGridSegment-Mutable-602 [pageIdHex=0001025a00000044, partId=602, pageIdx=68, flags=00000001] Following the lock dump is this final error before the Ignite node stops: 2020-06-11 12:53:35,082 [98] ERR [ImmutableCacheComputeServer] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=13612, loaded=5417, maxDirtyPages=4063, dirtyPages=5417, cpPages=0, pinnedInSegment=0, failedToPrepare=5417] Out of memory in data region [name=Default-Immutable, initSize=128.0 MiB, maxSize=1.0 GiB, persistenceEnabled=true] Try the following: ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize) ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled) ^-- Enable eviction or expiration policies]] On Wed, May 13, 2020 at 2:15 AM Raymond Wilson <raymond_wil...@trimble.com> wrote: > Hi Pavel, > > The reproducer is not the actual use case which is too big to use - it's a > small example using the same mechanisms. I have not used a data streamer > before, I'll read up on it. > > I'll try running the reproducer again against 2.8 (I used 2.7.6 for the > reproducer). > > Thanks, > Raymond. > > > On Tue, May 12, 2020 at 11:18 PM Pavel Tupitsyn <ptupit...@apache.org> > wrote: > >> Hi Raymond, >> >> First, I could not reproduce the issue. Attached program runs to >> completion on my machine. >> >> Second, I see a few issues with the attached code: >> - Cache.PutIfAbsent is used instead of DataStreamer >> - ICacheEntryEventFilter is used to remove cache entries, and is called >> twice - on add and on remove >> >> My recommendation is to use a "classic" combination of Data Streamer, >> Continuous Query, and Expiry Policy. >> Set expiry policy to a few seconds, and you won't keep much data in >> memory. Ignite will handle the removal for you. >> Let me know if I should prepare an example. >> >> Also it is not clear why persistence is needed for such a "buffer" cache >> - items are removed almost immediately, >> it would be much more efficient to disable persistence. >> >> Thanks, >> Pavel >> >> On Tue, May 12, 2020 at 12:23 PM Raymond Wilson < >> raymond_wil...@trimble.com> wrote: >> >>> Well, it appears I was wrong. It reappeared. :( >>> >>> I thought I had sent a reply to this thread but cannot find it, so I am >>> resending it now. >>> >>> Attached is a c# reproducer that throws Ignite out of memory errors in >>> the situation I outlined above where cache operations against a small cache >>> with persistence enabled. >>> >>> Let me know if you're able to reproduce it on your local systems. >>> >>> Thanks, >>> Raymond. >>> >>> >>> On Tue, Mar 3, 2020 at 1:31 PM Raymond Wilson < >>> raymond_wil...@trimble.com> wrote: >>> >>>> It's possible this is user (me) error. >>>> >>>> I discovered I had set the cache size to be 64Mb in the server, but >>>> 65Mb (typo!) in the client. Making these two values consistent appeared to >>>> prevent the error. >>>> >>>> Raymond. >>>> >>>> >>>> On Tue, Mar 3, 2020 at 12:58 PM Raymond Wilson < >>>> raymond_wil...@trimble.com> wrote: >>>> >>>>> I'm using Ignite v2.7.5 with C# client. >>>>> >>>>> I have an error where Ignite throws an out of memory exception, like >>>>> this: >>>>> >>>>> 2020-03-03 12:02:58,036 [287] ERR [MutableCacheComputeServer] JVM will >>>>> be halted immediately due to the failure: [failureCtx=FailureContext >>>>> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: >>>>> Out >>>>> of memory in data region [name=TAGFileBufferQueue, initSize=64.0 MiB, >>>>> maxSize=64.0 MiB, persistenceEnabled=true] Try the following: >>>>> ^-- Increase maximum off-heap memory size >>>>> (DataRegionConfiguration.maxSize) >>>>> ^-- Enable Ignite persistence >>>>> (DataRegionConfiguration.persistenceEnabled) >>>>> ^-- Enable eviction or expiration policies]] >>>>> >>>>> I don't have an eviction policy set (is this even a valid >>>>> recommendation when using persistence?) >>>>> >>>>> Increasing the off heap memory size for the data region does prevent >>>>> this error, but I want to minimise the in-memory size for this buffer as >>>>> it >>>>> is essentially just a queue. >>>>> >>>>> The suggestion of enabling data persistence is strange as this data >>>>> region has already persistence enabled for it. >>>>> >>>>> My assumption is that Ignite manages the memory in this cache by >>>>> saving and loading values as required. >>>>> >>>>> The test workflow in this failure is one where ~14,500 objects >>>>> totalling ~440 Mb in size (avery object size = ~30Kb) are added to the >>>>> cache, and are then drained by a processors using a continuous query. >>>>> Elements are removed from the cache as the processor completes them. >>>>> >>>>> Is this kind of out of memory error supposed to be possible when using >>>>> persistent data regions? >>>>> >>>>> Thanks, >>>>> Raymond. >>>>> >>>>> >>>>> >>> >>> -- >>> <http://www.trimble.com/> >>> Raymond Wilson >>> Solution Architect, Civil Construction Software Systems (CCSS) >>> 11 Birmingham Drive | Christchurch, New Zealand >>> +64-21-2013317 Mobile >>> raymond_wil...@trimble.com >>> >>> >>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >>> >> > > -- > <http://www.trimble.com/> > Raymond Wilson > Solution Architect, Civil Construction Software Systems (CCSS) > 11 Birmingham Drive | Christchurch, New Zealand > +64-21-2013317 Mobile > raymond_wil...@trimble.com > > > <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> > -- <http://www.trimble.com/> Raymond Wilson Solution Architect, Civil Construction Software Systems (CCSS) 11 Birmingham Drive | Christchurch, New Zealand +64-21-2013317 Mobile raymond_wil...@trimble.com <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>