Just a correction to context of the data region running out of memory: This one does not have a queue of items or a continuous query operating on a cache within it.
Thanks, Raymond. On Thu, Jun 11, 2020 at 4:12 PM Raymond Wilson <raymond_wil...@trimble.com> wrote: > Pavel, > > I have run into a different instance of a memory out of error in a data > region in a different context from the one I wrote the reproducer for. In > this case, there is an activity which queues items for processing at a > point in the future and which does use a continuous query, however there is > also significant vanilla put/get activity against a range of other caches.. > > This data region was permitted to grow to 1Gb and has persistence enabled. > We are now using Ignite 2.8 > > I would like to understand if this is a possible failure mode given that > the data region has persistence enabled. The underlying cause appears to be > 'Unable to find a page for eviction'. Should this be expected on data > regions with persistence? > > I have included the error below. > > This is the initial error reported by Ignite: > > 2020-06-11 12:53:35,082 [98] ERR [ImmutableCacheComputeServer] JVM will be > halted immediately due to the failure: [failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: > Failed to find a page for eviction [segmentCapacity=13612, loaded=5417, > maxDirtyPages=4063, dirtyPages=5417, cpPages=0, pinnedInSegment=0, > failedToPrepare=5417] > Out of memory in data region [name=Default-Immutable, initSize=128.0 MiB, > maxSize=1.0 GiB, persistenceEnabled=true] Try the following: > ^-- Increase maximum off-heap memory size > (DataRegionConfiguration.maxSize) > ^-- Enable Ignite persistence > (DataRegionConfiguration.persistenceEnabled) > ^-- Enable eviction or expiration policies]] > > Following this error is a lock dump, where this is the only thread with a > lock:(I am assuming the structureId member with the value > 'Spatial-SubGridSegment-Mutable-602' refers to a remote actor holding a > lock against an item in the local node ) > > Thread=[name=sys-stripe-11-#12%TRex-Immutable%, id=26], state=RUNNABLE > Locked pages = [284060547022916[0001025a00000044](r=0|w=1)] > Locked pages log: name=sys-stripe-11-#12%TRex-Immutable% > time=(1591836815071, 2020-06-11 12:53:35.071) > L=1 -> Write lock pageId=284060547022916, > structureId=Spatial-SubGridSegment-Mutable-602 [pageIdHex=0001025a00000044, > partId=602, pageIdx=68, flags=00000001] > > Following the lock dump is this final error before the Ignite node stops: > > 2020-06-11 12:53:35,082 [98] ERR [ImmutableCacheComputeServer] JVM will be > halted immediately due to the failure: [failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: > Failed to find a page for eviction [segmentCapacity=13612, loaded=5417, > maxDirtyPages=4063, dirtyPages=5417, cpPages=0, pinnedInSegment=0, > failedToPrepare=5417] > Out of memory in data region [name=Default-Immutable, initSize=128.0 MiB, > maxSize=1.0 GiB, persistenceEnabled=true] Try the following: > ^-- Increase maximum off-heap memory size > (DataRegionConfiguration.maxSize) > ^-- Enable Ignite persistence > (DataRegionConfiguration.persistenceEnabled) > ^-- Enable eviction or expiration policies]] > > > > > On Wed, May 13, 2020 at 2:15 AM Raymond Wilson <raymond_wil...@trimble.com> > wrote: > >> Hi Pavel, >> >> The reproducer is not the actual use case which is too big to use - it's >> a small example using the same mechanisms. I have not used a data streamer >> before, I'll read up on it. >> >> I'll try running the reproducer again against 2.8 (I used 2.7.6 for the >> reproducer). >> >> Thanks, >> Raymond. >> >> >> On Tue, May 12, 2020 at 11:18 PM Pavel Tupitsyn <ptupit...@apache.org> >> wrote: >> >>> Hi Raymond, >>> >>> First, I could not reproduce the issue. Attached program runs to >>> completion on my machine. >>> >>> Second, I see a few issues with the attached code: >>> - Cache.PutIfAbsent is used instead of DataStreamer >>> - ICacheEntryEventFilter is used to remove cache entries, and is called >>> twice - on add and on remove >>> >>> My recommendation is to use a "classic" combination of Data Streamer, >>> Continuous Query, and Expiry Policy. >>> Set expiry policy to a few seconds, and you won't keep much data in >>> memory. Ignite will handle the removal for you. >>> Let me know if I should prepare an example. >>> >>> Also it is not clear why persistence is needed for such a "buffer" cache >>> - items are removed almost immediately, >>> it would be much more efficient to disable persistence. >>> >>> Thanks, >>> Pavel >>> >>> On Tue, May 12, 2020 at 12:23 PM Raymond Wilson < >>> raymond_wil...@trimble.com> wrote: >>> >>>> Well, it appears I was wrong. It reappeared. :( >>>> >>>> I thought I had sent a reply to this thread but cannot find it, so I am >>>> resending it now. >>>> >>>> Attached is a c# reproducer that throws Ignite out of memory errors in >>>> the situation I outlined above where cache operations against a small cache >>>> with persistence enabled. >>>> >>>> Let me know if you're able to reproduce it on your local systems. >>>> >>>> Thanks, >>>> Raymond. >>>> >>>> >>>> On Tue, Mar 3, 2020 at 1:31 PM Raymond Wilson < >>>> raymond_wil...@trimble.com> wrote: >>>> >>>>> It's possible this is user (me) error. >>>>> >>>>> I discovered I had set the cache size to be 64Mb in the server, but >>>>> 65Mb (typo!) in the client. Making these two values consistent appeared to >>>>> prevent the error. >>>>> >>>>> Raymond. >>>>> >>>>> >>>>> On Tue, Mar 3, 2020 at 12:58 PM Raymond Wilson < >>>>> raymond_wil...@trimble.com> wrote: >>>>> >>>>>> I'm using Ignite v2.7.5 with C# client. >>>>>> >>>>>> I have an error where Ignite throws an out of memory exception, like >>>>>> this: >>>>>> >>>>>> 2020-03-03 12:02:58,036 [287] ERR [MutableCacheComputeServer] JVM >>>>>> will be halted immediately due to the failure: [failureCtx=FailureContext >>>>>> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: >>>>>> Out >>>>>> of memory in data region [name=TAGFileBufferQueue, initSize=64.0 MiB, >>>>>> maxSize=64.0 MiB, persistenceEnabled=true] Try the following: >>>>>> ^-- Increase maximum off-heap memory size >>>>>> (DataRegionConfiguration.maxSize) >>>>>> ^-- Enable Ignite persistence >>>>>> (DataRegionConfiguration.persistenceEnabled) >>>>>> ^-- Enable eviction or expiration policies]] >>>>>> >>>>>> I don't have an eviction policy set (is this even a valid >>>>>> recommendation when using persistence?) >>>>>> >>>>>> Increasing the off heap memory size for the data region does prevent >>>>>> this error, but I want to minimise the in-memory size for this buffer as >>>>>> it >>>>>> is essentially just a queue. >>>>>> >>>>>> The suggestion of enabling data persistence is strange as this data >>>>>> region has already persistence enabled for it. >>>>>> >>>>>> My assumption is that Ignite manages the memory in this cache by >>>>>> saving and loading values as required. >>>>>> >>>>>> The test workflow in this failure is one where ~14,500 objects >>>>>> totalling ~440 Mb in size (avery object size = ~30Kb) are added to the >>>>>> cache, and are then drained by a processors using a continuous query. >>>>>> Elements are removed from the cache as the processor completes them. >>>>>> >>>>>> Is this kind of out of memory error supposed to be possible when >>>>>> using persistent data regions? >>>>>> >>>>>> Thanks, >>>>>> Raymond. >>>>>> >>>>>> >>>>>> >>>> >>>> -- >>>> <http://www.trimble.com/> >>>> Raymond Wilson >>>> Solution Architect, Civil Construction Software Systems (CCSS) >>>> 11 Birmingham Drive | Christchurch, New Zealand >>>> +64-21-2013317 Mobile >>>> raymond_wil...@trimble.com >>>> >>>> >>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >>>> >>> >> >> -- >> <http://www.trimble.com/> >> Raymond Wilson >> Solution Architect, Civil Construction Software Systems (CCSS) >> 11 Birmingham Drive | Christchurch, New Zealand >> +64-21-2013317 Mobile >> raymond_wil...@trimble.com >> >> >> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >> > > > -- > <http://www.trimble.com/> > Raymond Wilson > Solution Architect, Civil Construction Software Systems (CCSS) > 11 Birmingham Drive | Christchurch, New Zealand > +64-21-2013317 Mobile > raymond_wil...@trimble.com > > > <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> > -- <http://www.trimble.com/> Raymond Wilson Solution Architect, Civil Construction Software Systems (CCSS) 11 Birmingham Drive | Christchurch, New Zealand +64-21-2013317 Mobile raymond_wil...@trimble.com <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>