Re: Data loss in an Ignite application

Aleksej Avrutin Thu, 22 Feb 2024 13:14:08 -0800

Jeremy,

Thank you for the response. I reviewed cache properties using GG Control
Center and there was nothing in the cache props that would lead me to the
conclusion that any expiry policy/TTL is set up for the cache. It wasn't
set on the operation level, either.


I decided to delete the cache entirely and re-create it. Tomorrow I'll
check if it helps.

My best,
Alex Avrutin


On Thu, Feb 22, 2024 at 3:56 AM Jeremy McMillan <
[email protected]> wrote:

> First, logging should be configured to at least WARN level if not INFO.
>
> Ignite manages data internally at the page level. If you see errors about
> pages, it is low, low level ignite problems. The next level up is
> partitions. Errors involving partitions are mid low level ignite problems.
> The next level up is caches. Errors at the cache level are mid to high
> level problems. The next level is cache records. Errors in cache record
> handling are high level of abstraction, and the next level is client
> application operations.
>
> The lower level of abstraction the errors appear, the less chance
> operations in general will succeed. Since the cache appears to operate
> mostly as expected, and there are no obvious errors in the ignite logs,
> most likely there is some client side logic which is deleting records, and
> ignite does not consider this behavior to be in error.
>
> I would recommend fine tuning cache delete method log coverage. First
> identify if the deletion is happening on a client connection thread pool or
> a thread for server initiated operations.
>
> My guess is that a client is connecting, getting a cache object, and then
> setting expiration on that cache connection so that all cache adds under
> that cache connection will have expiration applied to them.
>
>
> https://ignite.apache.org/docs/2.14.0/configuring-caches/expiry-policies#configuration
>
> "You can also change or set Expiry Policy for individual cache operations.
> This policy is used for each operation invoked on the returned cache
> instance."
>
>
> https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Client.Cache.ICacheClient-2.html?q=withExpiryPolicy#Apache_Ignite_Core_Client_Cache_ICacheClient_2_WithExpiryPolicy_Apache_Ignite_Core_Cache_Expiry_IExpiryPolicy_
>
> On Wed, Feb 21, 2024, 19:17 Aleksej Avrutin <[email protected]> wrote:
>
>> Hello,
>>
>> A couple of days ago I encountered a strange phenomenon in our
>> application based on Apache Ignite .Net 2.14 with persistence (3 nodes, 1
>> backup per cache).
>> Data in a cache started disappearing for seemingly no reason and the
>> amount of records could be halved (220K to 108K) overnight. I spent a
>> couple of days trying to find a problem in the application, crunched
>> hundreds megabytes of application logs but didn't manage to find a reason
>> to blame the application. Retention/TTL is not set for the cache. Apache
>> Ignite logs with the option -DIGNITE_QUIET=false also don't reveal any
>> anomalies (or I don't know what to look for). The data shares are expected
>> to be durable (based on Azure Disk) and we never had any issues with them.
>> RAM utilisation is normal and there's plenty of available RAM.
>> The Ignite cluster is hosted in a 3 node Kubernetes cluster on Azure.
>>
>> The question is: how would you recommend investigating issues like this?
>> What metrics and logs can I check? Is it possible to log and track
>> individual Remove() operations as well as SQL queries at Ignite engine
>> level?
>>
>> The application has been working on Ignite for years already and we
>> didn't encounter data loss at such scales before. It's possible that the
>> app wasn't used so extensively before as it is now and the problem left
>> unnoticed.
>>
>> My best,
>> Alex Avrutin
>>
>

Re: Data loss in an Ignite application

Reply via email to