Misconfigured Ip6 address on node causes cluster outage on node join

2017-09-07 Thread Kristian Rosenvold
I've just spent the better part of a week diagnosing our production cluster. It turned out someone (tm) had turned on IP6 router advertisement without actually enabling IP6 routing, hence our servers in one datacenter started picking up IP6 addresses that were not known or resolvable in the other

"Found long running transaction" with Arjuna and rollback

2017-06-14 Thread Kristian Rosenvold
We are running ignite 2.0.0 with the Arjuna transaction manger, configuration snippet follows: igniteConfiguration.getTransactionConfiguration().setTxManagerFactory((Factory) () -> jtaTransactionManager); igniteConfiguration.getTransactionConfiguration().setUseJtaSynchronization(true);

Re: Does read-through work with replicated caches ?

2016-10-27 Thread Kristian Rosenvold
number of backups equivalent to number of nodes. > > On Thu, Oct 27, 2016 at 2:13 PM, Kristian Rosenvold <krosenv...@apache.org > > wrote: > >> Does this configuration actually do anything with a replicated cache, if >> so what does it do ? >> >> Kristian >> >> > > > -- > Vladislav Pyatkov >

Does read-through work with replicated caches ?

2016-10-27 Thread Kristian Rosenvold
Does this configuration actually do anything with a replicated cache, if so what does it do ? Kristian

Properly Immutable Keys/values with Binary objects ?

2016-07-05 Thread Kristian Rosenvold
Some of our keys/values are properly immutable Java objects (with final), and it appears that the java Serialization is the only way these can be serialized. Looking at BinaryClassDescriptor#newInstance it appears a no-args constructor is the only supported method, and the BinaryConfiguration does

Re: Adding a third node to REPLICATED cluster fails to get correct number of elements

2016-06-23 Thread Kristian Rosenvold
onsistent equals/hashCode > implementation? Probably we will be able to detect your case internally > somehow and print a warning. > > — > Denis > >> On Jun 17, 2016, at 10:27 PM, Kristian Rosenvold <krosenv...@apache.org> >> wrote: >> >> This whole issue was ca

Re: How to call loadCache before node is started ?

2016-06-22 Thread Kristian Rosenvold
, should be reproducable in 5 minutes or less :) Kristian 2016-06-22 9:03 GMT+02:00 Kristian Rosenvold <krosenv...@apache.org>: > I have created a testcase that quite consistently reproduces this > problem on my mac; > https://gist.github.com/krosenvold/fa20521ad121a0cbb4c6ed6be914

Re: How to call loadCache before node is started ?

2016-06-22 Thread Kristian Rosenvold
never achieve any kind of consistent cache. This just does not make any sense with a cache that is replicated ?!? Kristian 2016-06-21 9:28 GMT+02:00 Kristian Rosenvold <krosenv...@apache.org>: > 2016-06-20 10:27 GMT+02:00 Alexei Scherbakov <alexey.scherbak...@gmail.com>: >>

Re: How to call loadCache before node is started ?

2016-06-21 Thread Kristian Rosenvold
2016-06-20 10:27 GMT+02:00 Alexei Scherbakov : > Hi, > > You should not rely on cache.size method for checking data consistency. > cache.size skips keys which is not currently loaded by read-through > behavior. Sorry for not mentioning that this is

Re: How to call loadCache before node is started ?

2016-06-18 Thread Kristian Rosenvold
2016-06-18 13:02 GMT+02:00 Alexei Scherbakov : > You should be safe calling loadCache just after getOrCreate. I am testing various disaster recovery scenarios here, and I'm not entirely convinced this is the case. Our system starts 6 replicated caches, the script

How to call loadCache before node is started ?

2016-06-18 Thread Kristian Rosenvold
Our current node startup logic includes a simple heuristic like this: final IgniteCache cache = ignite.getOrCreateCache(configuration); if (cache.localSize(CachePeekMode.ALL) == 0) { LOGGER.info("Empty cache or No-one else around in the ignite cloud, loading cache {} from database",

Re: Replicated cache leaks entries on 1.6 and 1.7-SNAPSHOT

2016-06-17 Thread Kristian Rosenvold
an environment problem, rather than Ignite problem. > Can you create a simple reproducer that starts 2 nodes in the same JVM and > proves that data is not replicated? If the problem is in Ignite, we will fix > it asap. > > On Thu, Jun 16, 2016 at 10:58 PM, Krist

Re: Adding a third node to REPLICATED cluster fails to get correct number of elements

2016-06-17 Thread Kristian Rosenvold
ably advanced logging will let us to pin point the issue that > happens in Kristian’s environment. > > — > Denis > > On Jun 17, 2016, at 10:02 AM, Kristian Rosenvold <krosenv...@apache.org> > wrote: > > For ignite 1.5, 1.6 and 1.7-SNAPSHOT, I see the same behaviour. S

Adding a third node to REPLICATED cluster fails to get correct number of elements

2016-06-17 Thread Kristian Rosenvold
For ignite 1.5, 1.6 and 1.7-SNAPSHOT, I see the same behaviour. Since REPLICATED caches seem to be broken on 1.6 and beyond, I am testing this on 1.5: I can reliably start two nodes and get consistent correct results, lets say each node has 1.5 million elements in a given cache. Once I start a

Replicated cache leaks entries on 1.6 and 1.7-SNAPSHOT

2016-06-16 Thread Kristian Rosenvold
We're using a cache with CacheMode.REPLICATED. Using 2 nodes, I start each node sequentially and they both get the same number of elements in their caches (as expected so far). Almost immedately, the caches start to drift out sync, all of the elements are simply not getting replicated. There is

Rapidly starting and stopping ignite for integration tests

2016-06-14 Thread Kristian Rosenvold
After adding Ignite to some of our project-specific integration tests, I was not overly happy with the performance hit this gave us. I tweaked around a bit, and was able to find a couple of things; - Switch to TcpDiscoverySharedFsIpFinder to avoid network service discovery. - Reduce initial size

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

2016-06-13 Thread Kristian Rosenvold
t; all ok, the fix will be merged to master shortly. > > [1] https://issues.apache.org/jira/browse/IGNITE-3305 > > --AG > > 2016-06-13 11:38 GMT-07:00 Kristian Rosenvold <krosenv...@apache.org>: >> >> Alexey, >> >> we were discussing what was happening

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

2016-06-13 Thread Kristian Rosenvold
Alexey, we were discussing what was happening in the 10-20 seconds while the cache was being replicated, to find out if any inconsistencies could occur in this window. So I started a first node with a known number of elements, say 1 million. The testcase I showed in the first code was then

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

2016-06-13 Thread Kristian Rosenvold
mentation or it needs to be added Thanks a lot, Kristian 2016-06-13 10:02 GMT+02:00 Denis Magda <dma...@gridgain.com>: > > — > Denis > >> On Jun 13, 2016, at 10:59 AM, Kristian Rosenvold <krosenv...@apache.org> >> wrote: >> >> This is a r

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

2016-06-13 Thread Kristian Rosenvold
2016-06-13 9:14 GMT+02:00 Denis Magda : > This property means that a node that is being started and where a part of > cache data is being rebalanced won’t be considered for any cache related > operations until the rebalancing has finished. > > In my understanding such a node

State of initially started cache with CacheRebalanceMode.SYNC ?

2016-06-09 Thread Kristian Rosenvold
The javadoc on CacheRebalanceMode.SYNC seems to indicate that the cache should block until rebalancing is complete. When I run the code below, the assert statement fails unless I add the explicit call to cache.rebalance().get(). Am I doing something wrong ? Kristian CacheConfiguration config =

Re: ignite cache get performance issue when Key, Value not exist

2016-06-08 Thread Kristian Rosenvold
I have a related question which I have been trying to find the answer to in the docs: Given a cache REPLICATED cache with CacheRebalanceMode.SYNC, is there any meaningful use case for a readthrough handler ? (Is there any situation where the readthrough database can actually produce a result that

Re: One failing node stalling the whole cluster

2016-06-06 Thread Kristian Rosenvold
We're also seeing this total hang of our replicated cache cluster when a single node goes totally lethargic due to too heavy memory load. The culprit node typically does not respond to "jstack" due to either excessive memory load or missing safepoints. Sometimes we need to do kill -9 to get the

Re: Tcp discovery with clustered docker nodes

2016-05-12 Thread Kristian Rosenvold
o properties(localAddress and localPort), which > allows you to set local address and port for Discovery SPI to bind. > T > Does it solve your problem? > > > . > > > 2016-05-12 10:56 GMT+03:00 Kristian Rosenvold <krosenv...@apache.org>: > >> We have been usin

Tcp discovery with clustered docker nodes

2016-05-12 Thread Kristian Rosenvold
We have been using jdbc based discovery successfully, but have been trying to get this to work with docker. The problem is of course that all the internal IP's inside the docker container are useless, so we'd really need some way to override the discovery mechanism. Now you could of course say

Pecuilar loopback address on Mac seems to break cluster of linux and mac....

2016-04-14 Thread Kristian Rosenvold
I was seeing quite substantial instabilities in my newly configured 1.5.0 cluster, where messages like this would pop up, resulting in the termination of the node.: java.net.UnknownHostException: no such interface lo at java.net.Inet6Address.initstr(Inet6Address.java:487) ~[na:1.8.0_60] at