Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-28 Thread Ilya Kasnacheev
Hello! 2-4 kilobytes is not big. Still you may want to check your logs for long-running transactions, etc. Regards, -- Ilya Kasnacheev пт, 25 окт. 2019 г. в 18:21, ihalilaltun : > Hi Ilya, > > It is almost impossible for us to get thread dumps since this is production > environment we cannot

Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-25 Thread ihalilaltun
Hi Ilya, It is almost impossible for us to get thread dumps since this is production environment we cannot use profiler :( Our biggest object range from 2 to 4 kilobytes. We are planning to shrink the sizes but time for this is not decided yet. regards. - İbrahim Halil Altun Senior Softwa

Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-21 Thread Ilya Kasnacheev
Hello! Ignite operations will use built-in locks even if you don't explicitly use any. If you have uncommitted transactions or something like that, checkpoint can't start (and other operations are waiting for it too). How big are we talking about? I recommend capturing several thread dumps after

Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-18 Thread ihalilaltun
Hi Ilya, Sorry for the late response. We don't use lock mechanism in our environment. We have a lot of put, get operaitons, as far as i remember these operations does not hold the locks. In addition to these operations, in many update/put operations we use CacheEntryProcessor which also does not h

Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-14 Thread Ilya Kasnacheev
Hello! Then this error likely means that you have very long operations preventing checkpoint from starting in time. Make sure your code does not cause Ignite to hold locks for prolonged time. It may work OK without persistence, but if it interferes with checkpoint it becomes a problem. Regards,

Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-11 Thread ihalilaltun
Hi Ilya, Yes we have persistence enabled. OS is not swapping out ignite memory, since we have more than enough resources on the server. The disks used for persistence are ssd ones with 96MB/s read and write speed. Is there any easy way to check if we are r

Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-11 Thread Ilya Kasnacheev
Hello! Unfortunately it's hard to say what happens here, because the oldest log already starts with error messages and any root causes are clobbered. Do you have persistence in this cluster? If the answer is yes, are you sure that OS is not swapping out Ignite memory to disk, and that your disk i

Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-09 Thread ihalilaltun
Hi There, We had a unresponsive cluster today after the following error; [2019-10-09T07:08:13,623][ERROR][sys-stripe-94-#95][GridCacheDatabaseSharedManager] Checkpoint read lock acquisition has been timed out. org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$