Hi Folks,
We are running into strange issues in running queries into ignite. Here is
our current setup
- 8 Node ignite on 128 GB VMs deployed on Azure kubernetes
- Persistence enabled with 30GB Data region size
With following node configuration:
<property name="dataStorageConfiguration">
<bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="metricsEnabled" value="true"/>
<property name="pageSize" value="#{8 * 1024}"/>
<property name="defaultDataRegionConfiguration">
<bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
<property name="maxSize" value="#{30L * 1024 * 1024
* 1024}"/>
<property name="pageReplacementMode"
value="SEGMENTED_LRU"/>
<property name="pageEvictionMode" value="RANDOM_2_LRU"/>
<property name="metricsEnabled" value="true"/>
</bean>
</property>
<property name="walSegmentSize" value="#{128L * 1024 *
1024}"/>
<property name="walPath" value="/ignite/wal"/>
<property name="walArchivePath" value="/ignite/walarchive"/>
<property name="walMode" value="FSYNC"/>
</bean>
</property>
<property name="failureHandler">
<bean
class="org.apache.ignite.failure.RestartProcessFailureHandler"/>
</property>
When query exception start, we got multiple waiting error like this:
Thread [name="main", id=1, state=WAITING, blockCnt=5, waitCnt=2636]
Lock [object=java.util.concurrent.CountDownLatch$Sync@b027ad0,
ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at
o.a.i.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:398)
[14:25:07,980][SEVERE][disco-event-worker-#67][FailureProcessor] Ignite
node is in invalid state due to a critical failure.
And then all nodes gets crashed.
Please suggest if there is any config value we can change to terminate long
running queries.
Thanks