> > But on client reconnect, doesn't it mean it will still block until the > cluster is active even if I get new IgniteCache instance?
No, the client will be getting an exception on an attempt to get an IgniteCache instance. - Denis On Fri, Aug 14, 2020 at 4:14 PM John Smith <java.dev....@gmail.com> wrote: > Yeah I can maybe use vertx event bus or something to do this... But now I > have to tie the ignite instance to the IgniteCahe repository I wrote. > > But on client reconnect, doesn't it mean it will still block until the > cluster is active even if I get new IgniteCache instance? > > On Fri, 14 Aug 2020 at 18:22, Denis Magda <dma...@apache.org> wrote: > >> @Evgenii Zhuravlev <ezhurav...@gridgain.com>, @Ilya Kasnacheev >> <ilya.kasnach...@gmail.com>, any thoughts on this? >> >> As a dirty workaround, you can update your cache references on client >> reconnect events. You will be getting an exception by calling >> ignite.cache(cacheName) in the time when the cluster is not activated yet. >> Does this work for you? >> >> - >> Denis >> >> >> On Fri, Aug 14, 2020 at 3:12 PM John Smith <java.dev....@gmail.com> >> wrote: >> >>> Is there any work around? I can't have an HTTP server block on all >>> requests. >>> >>> 1- I need to figure out why I lose a server nodes every few weeks, which >>> when rebooting the nodes cause the inactive state until they are back.... >>> >>> 2- Implement some kind of logic on the client side not to block the HTTP >>> part... >>> >>> Can IgniteCache instance be notified of disconnected events so I can >>> maybe tell the repository class I have to set a flag to skip the operation? >>> >>> >>> On Fri., Aug. 14, 2020, 5:17 p.m. Denis Magda, <dma...@apache.org> >>> wrote: >>> >>>> My guess that it's standard behavior for all operations (SQL, >>>> key-value, compute, etc.). But I'll let the maintainers of those modules >>>> clarify. >>>> >>>> - >>>> Denis >>>> >>>> >>>> On Fri, Aug 14, 2020 at 1:44 PM John Smith <java.dev....@gmail.com> >>>> wrote: >>>> >>>>> Hi Denis, so to understand it's all operations or just the query? >>>>> >>>>> On Fri., Aug. 14, 2020, 12:53 p.m. Denis Magda, <dma...@apache.org> >>>>> wrote: >>>>> >>>>>> John, >>>>>> >>>>>> Ok, we nailed it. That's the current expected behavior. Generally, I >>>>>> agree with you that the platform should support an option when operations >>>>>> fail if the cluster is deactivated. Could you propose the change by >>>>>> starting a discussion on the dev list? You can refer to this user list >>>>>> discussion for reference. Let me know if you need help with this. >>>>>> >>>>>> - >>>>>> Denis >>>>>> >>>>>> >>>>>> On Thu, Aug 13, 2020 at 5:55 PM John Smith <java.dev....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> No I, reuse the instance. The cache instance is created once at >>>>>>> startup of the application and I pass it to my "repository" class >>>>>>> >>>>>>> public abstract class AbstractIgniteRepository<K,V> implements >>>>>>> CacheRepository<K, V> { >>>>>>> public final long DEFAULT_OPERATION_TIMEOUT = 2000; >>>>>>> >>>>>>> private Vertx vertx; >>>>>>> private IgniteCache<K, V> cache; >>>>>>> >>>>>>> AbstractIgniteRepository(Vertx vertx, IgniteCache<K, V> cache) { >>>>>>> this.vertx = vertx; >>>>>>> this.cache = cache; >>>>>>> } >>>>>>> >>>>>>> ... >>>>>>> >>>>>>> Future<List<JsonArray>> query(final String sql, final long >>>>>>> timeoutMs, final Object... args) { >>>>>>> final Promise<List<JsonArray>> promise = Promise.promise(); >>>>>>> >>>>>>> vertx.setTimer(timeoutMs, l -> { >>>>>>> promise.tryFail(new TimeoutException("Cache operation did >>>>>>> not complete within: " + timeoutMs + " Ms.")); // THIS FIRE IF THE BLOE >>>>>>> DOESN"T COMPLETE IN TIME. >>>>>>> }); >>>>>>> >>>>>>> vertx.<List<JsonArray>>executeBlocking(code -> { >>>>>>> SqlFieldsQuery query = new >>>>>>> SqlFieldsQuery(sql).setArgs(args); >>>>>>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS); >>>>>>> >>>>>>> >>>>>>> try (QueryCursor<List<?>> cursor = cache.query(query)) { // >>>>>>> <--- BLOCKS HERE. >>>>>>> List<JsonArray> rows = new ArrayList<>(); >>>>>>> Iterator<List<?>> iterator = cursor.iterator(); >>>>>>> >>>>>>> while(iterator.hasNext()) { >>>>>>> List currentRow = iterator.next(); >>>>>>> JsonArray row = new JsonArray(); >>>>>>> >>>>>>> currentRow.forEach(o -> row.add(o)); >>>>>>> >>>>>>> rows.add(row); >>>>>>> } >>>>>>> >>>>>>> code.complete(rows); >>>>>>> } catch(Exception ex) { >>>>>>> code.fail(ex); >>>>>>> } >>>>>>> }, result -> { >>>>>>> if(result.succeeded()) { >>>>>>> promise.tryComplete(result.result()); >>>>>>> } else { >>>>>>> promise.tryFail(result.cause()); >>>>>>> } >>>>>>> }); >>>>>>> >>>>>>> return promise.future(); >>>>>>> } >>>>>>> >>>>>>> public <T> T cache() { >>>>>>> return (T) cache; >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, 13 Aug 2020 at 16:29, Denis Magda <dma...@apache.org> wrote: >>>>>>> >>>>>>>> I've created a simple test and always getting the exception below >>>>>>>> on an attempt to get a reference to an IgniteCache instance in cases >>>>>>>> when >>>>>>>> the cluster is not activated: >>>>>>>> >>>>>>>> *Exception in thread "main" class >>>>>>>> org.apache.ignite.IgniteException: Can not perform the operation >>>>>>>> because >>>>>>>> the cluster is inactive. Note, that the cluster is considered inactive >>>>>>>> by >>>>>>>> default if Ignite Persistent Store is used to let all the nodes join >>>>>>>> the >>>>>>>> cluster. To activate the cluster call Ignite.active(true)* >>>>>>>> >>>>>>>> Are you trying to get a new IgniteCache reference whenever the >>>>>>>> client reconnects successfully to the cluster? My guts feel that >>>>>>>> currently, >>>>>>>> Ignite verifies the activation status and generates the exception above >>>>>>>> whenever you're getting a reference to an IgniteCache or >>>>>>>> IgniteCompute. But >>>>>>>> once you got those references and try to run some operations then >>>>>>>> those get >>>>>>>> stuck if the cluster is not activated. >>>>>>>> - >>>>>>>> Denis >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 13, 2020 at 6:37 AM John Smith <java.dev....@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> The cache.query() starts to block when ignite server nodes are >>>>>>>>> being restarted and there's no baseline topology yet. The server >>>>>>>>> nodes do >>>>>>>>> not block. It's the client that blocks. >>>>>>>>> >>>>>>>>> The dumpfiles are of the server nodes. The screen shot is from the >>>>>>>>> client app using your kit profiler on the client side the threads are >>>>>>>>> marked as red on your kit. >>>>>>>>> >>>>>>>>> The app is simple, make http request, it runs cache Sql query on >>>>>>>>> ignite and if it succeeds does a put back to ignite. >>>>>>>>> >>>>>>>>> The Client disconnected exception only happens when all server >>>>>>>>> nodes in the cluster are down. The blockage only happens when the >>>>>>>>> cluster >>>>>>>>> is trying to establish baseline topology. >>>>>>>>> >>>>>>>>> On Wed., Aug. 12, 2020, 6:28 p.m. Denis Magda, <dma...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> John, >>>>>>>>>> >>>>>>>>>> I don't see any traits of an application-caused deadlock in the >>>>>>>>>> thread dumps. Please elaborate on the following: >>>>>>>>>> >>>>>>>>>> 7- Restart 1st node, run operation, operation fails with >>>>>>>>>>> ClientDisconectedException but application still able to complete >>>>>>>>>>> it's >>>>>>>>>>> request. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What's the IP address of the server node the client app uses to >>>>>>>>>> join the cluster? If that's not the address of the 1st node, that is >>>>>>>>>> already restarted, then the client couldn't join the cluster and it's >>>>>>>>>> expected that it fails with the ClientDisconnectedException. >>>>>>>>>> >>>>>>>>>> 8- Start 2nd node, run operation, from here on all operations >>>>>>>>>>> just block. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Are the operations unblocked and completed successfully when the >>>>>>>>>> third node joins the cluster and the cluster gets activated >>>>>>>>>> automatically? >>>>>>>>>> >>>>>>>>>> - >>>>>>>>>> Denis >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Aug 12, 2020 at 11:08 AM John Smith < >>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Ok Denis here they are... >>>>>>>>>>> >>>>>>>>>>> 3 nodes and I capture a yourlit screenshot of what it thinks are >>>>>>>>>>> deadlocks on the client app. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://www.dropbox.com/sh/2cxjkngvx0ubw3b/AADa--HQg-rRsY3RBo2vQeJ9a?dl=0 >>>>>>>>>>> >>>>>>>>>>> On Wed, 12 Aug 2020 at 11:07, John Smith <java.dev....@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Denis. I will asap but you I think you were right it is the >>>>>>>>>>>> query that blocks. >>>>>>>>>>>> >>>>>>>>>>>> My application first first runs a select on the cache and then >>>>>>>>>>>> does a put to cache. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, 11 Aug 2020 at 19:22, Denis Magda <dma...@apache.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> John, >>>>>>>>>>>>> >>>>>>>>>>>>> It sounds like a deadlock caused by the application logic. Is >>>>>>>>>>>>> there any chance that the operation you run on step 8 accesses >>>>>>>>>>>>> several keys >>>>>>>>>>>>> in one order while the other operations work with the same keys >>>>>>>>>>>>> but in a >>>>>>>>>>>>> different order. The deadlocks are possible when you use Ignite >>>>>>>>>>>>> Transaction >>>>>>>>>>>>> API or simply execute bulk operations such as cache.readAll() or >>>>>>>>>>>>> cache.writeAll(..). >>>>>>>>>>>>> >>>>>>>>>>>>> Please take and attach thread dumps from all the cluster nodes >>>>>>>>>>>>> for analysis if we need to dig deeper. >>>>>>>>>>>>> >>>>>>>>>>>>> - >>>>>>>>>>>>> Denis >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Aug 10, 2020 at 6:23 PM John Smith < >>>>>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Denis, I think you are right. It's the query that blocks >>>>>>>>>>>>>> the other k/v operations are ok. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any thoughts on this? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, 10 Aug 2020 at 15:28, John Smith < >>>>>>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I tried with 2.8.1, same issue. Operations block >>>>>>>>>>>>>>> indefinitely... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1- Start 3 node cluster >>>>>>>>>>>>>>> 2- Start client application client = true with >>>>>>>>>>>>>>> Ignition.start() >>>>>>>>>>>>>>> 3- Run some cache operations, everything ok... >>>>>>>>>>>>>>> 4- Shut down one node, run operation, still ok >>>>>>>>>>>>>>> 5- Shut down 2nd node, run operation, still ok >>>>>>>>>>>>>>> 6- Shut down 3rd node, run operation, still ok... >>>>>>>>>>>>>>> Operations start failing with ClientDisconectedException... >>>>>>>>>>>>>>> 7- Restart 1st node, run operation, operation fails >>>>>>>>>>>>>>> with ClientDisconectedException but application still able to >>>>>>>>>>>>>>> complete it's >>>>>>>>>>>>>>> request. >>>>>>>>>>>>>>> 8- Start 2nd node, run operation, from here on all >>>>>>>>>>>>>>> operations just block. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Basically the client application is an HTTP Server on each >>>>>>>>>>>>>>> HTTP request does cache exception. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, 7 Aug 2020 at 19:46, John Smith < >>>>>>>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> No, everything blocks... Also using 2.7.0 just in case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Only time I get exception is if the cluster is >>>>>>>>>>>>>>>> completely off, then I get ClientDisconectedException... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda <dma...@apache.org> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If I'm not mistaken, key-value operations (cache.get/put) >>>>>>>>>>>>>>>>> and compute calls fail with an exception if the cluster is >>>>>>>>>>>>>>>>> deactivated. Do >>>>>>>>>>>>>>>>> those fail on your end? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As for the async and SQL operations, let's see what other >>>>>>>>>>>>>>>>> community members say. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> Denis >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith < >>>>>>>>>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi any thoughts on this? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith < >>>>>>>>>>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Here is another example where it blocks. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> SqlFieldsQuery query = new SqlFieldsQuery( >>>>>>>>>>>>>>>>>>> "select * from my_table") >>>>>>>>>>>>>>>>>>> .setArgs(providerId, carrierCode); >>>>>>>>>>>>>>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> try (QueryCursor<List<?>> cursor = cache.query(query)) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> cache.query just blocks even with the timeout set. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is there a way to timeout and at least have the >>>>>>>>>>>>>>>>>>> application continue and respond with an appropriate >>>>>>>>>>>>>>>>>>> message? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith < >>>>>>>>>>>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi running 2.7.0 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> When I reboot a node and it begins to rejoin the >>>>>>>>>>>>>>>>>>>> cluster or the cluster is not yet activated with baseline >>>>>>>>>>>>>>>>>>>> topology >>>>>>>>>>>>>>>>>>>> operations seem to block forever, operations that are >>>>>>>>>>>>>>>>>>>> supposed to return >>>>>>>>>>>>>>>>>>>> IgniteFuture. I.e: putAsync, getAsync etc... They just >>>>>>>>>>>>>>>>>>>> block, until the >>>>>>>>>>>>>>>>>>>> cluster resolves it's state. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>