Hello,

It means that node with id "5423e6b5-c9be-4eb8-8f68-e643357ec2b3" has
outdated data (possibly due to restart) and started to rebalance missed
updates from a node with up-to-date data (where you have exception) using
WAL.
WAL rebalance is used when the number of entries in some partition exceeds
threshold controlled by system property IGNITE_PDS_WAL_REBALANCE_THRESHOLD
, default value of that is 500k entries. WAL rebalance is very efficient
when node has a lot of data and was in short period of down-time.
Unfortunately this mechanism is currently unstable and may lead to such
errors you noticed. A very few users have such amount of data in
persistence in 1 partition. There are a couple of tickets [1], [2], [3]
which should be fixed in 2.8 release and make it more robust.

To avoid such problem you should set JVM system property
IGNITE_PDS_WAL_REBALANCE_THRESHOLD value to some very high threshold (e.g.
2kk) in all Ignite instances and perform rolling restart. In this case
default full rebalance will be used. It's slower but durable approach.

[1] https://issues.apache.org/jira/browse/IGNITE-8459
[2] https://issues.apache.org/jira/browse/IGNITE-8391
[3] https://issues.apache.org/jira/browse/IGNITE-10078

ср, 26 дек. 2018 г. в 11:19, aMark <feku.fa...@gmail.com>:

> Hi,
>
> We are using Ignite 2.6 as persistent store in Partitioned Mode having 12
> server node running in cluster, each node is running on different machine.
>
> There are around 48 client JVM as well which connect to cluster to fetch
> the
> data.
>
> Recently we have started getting following exception on server nodes
> (Though
> clients are still able to read/write data):
>
> 2018-12-25 02:59:48,423 ERROR
> [sys-#22846%a738c793-6e94-48cc-b6cf-d53ccab5f0fe%] {}
>
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier
> - Failed to send partition supply message to node:
> 5423e6b5-c9be-4eb8-8f68-e643357ec2b3 class
> org.apache.ignite.IgniteCheckedException: Could not find start pointer for
> partition [part=9, partCntrSince=484857]
>         at
>
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.historicalIterator(GridCacheOffheapManager.java:792)
>         at
>
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.historicalIterator(GridCacheOffheapManager.java:90)
>         at
>
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.rebalanceIterator(IgniteCacheOffheapManagerImpl.java:893)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:283)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:364)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:379)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:364)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:99)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1603)
>         at
>
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
>         at
>
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:125)
>         at
>
> org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2752)
>         at
>
> org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1516)
>         at
>
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:125)
>         at
>
> org.apache.ignite.internal.managers.communication.GridIoManager$10.run(GridIoManager.java:1485)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
>
> Does someone has any idea about the exception and possible resolution as
> well ?
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Reply via email to