Is there any work around? I can't have an HTTP server block on all requests.
1- I need to figure out why I lose a server nodes every few weeks, which when rebooting the nodes cause the inactive state until they are back.... 2- Implement some kind of logic on the client side not to block the HTTP part... Can IgniteCache instance be notified of disconnected events so I can maybe tell the repository class I have to set a flag to skip the operation? On Fri., Aug. 14, 2020, 5:17 p.m. Denis Magda, <dma...@apache.org> wrote: > My guess that it's standard behavior for all operations (SQL, key-value, > compute, etc.). But I'll let the maintainers of those modules clarify. > > - > Denis > > > On Fri, Aug 14, 2020 at 1:44 PM John Smith <java.dev....@gmail.com> wrote: > >> Hi Denis, so to understand it's all operations or just the query? >> >> On Fri., Aug. 14, 2020, 12:53 p.m. Denis Magda, <dma...@apache.org> >> wrote: >> >>> John, >>> >>> Ok, we nailed it. That's the current expected behavior. Generally, I >>> agree with you that the platform should support an option when operations >>> fail if the cluster is deactivated. Could you propose the change by >>> starting a discussion on the dev list? You can refer to this user list >>> discussion for reference. Let me know if you need help with this. >>> >>> - >>> Denis >>> >>> >>> On Thu, Aug 13, 2020 at 5:55 PM John Smith <java.dev....@gmail.com> >>> wrote: >>> >>>> No I, reuse the instance. The cache instance is created once at startup >>>> of the application and I pass it to my "repository" class >>>> >>>> public abstract class AbstractIgniteRepository<K,V> implements >>>> CacheRepository<K, V> { >>>> public final long DEFAULT_OPERATION_TIMEOUT = 2000; >>>> >>>> private Vertx vertx; >>>> private IgniteCache<K, V> cache; >>>> >>>> AbstractIgniteRepository(Vertx vertx, IgniteCache<K, V> cache) { >>>> this.vertx = vertx; >>>> this.cache = cache; >>>> } >>>> >>>> ... >>>> >>>> Future<List<JsonArray>> query(final String sql, final long timeoutMs, >>>> final Object... args) { >>>> final Promise<List<JsonArray>> promise = Promise.promise(); >>>> >>>> vertx.setTimer(timeoutMs, l -> { >>>> promise.tryFail(new TimeoutException("Cache operation did not >>>> complete within: " + timeoutMs + " Ms.")); // THIS FIRE IF THE BLOE >>>> DOESN"T COMPLETE IN TIME. >>>> }); >>>> >>>> vertx.<List<JsonArray>>executeBlocking(code -> { >>>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args); >>>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS); >>>> >>>> >>>> try (QueryCursor<List<?>> cursor = cache.query(query)) { // >>>> <--- BLOCKS HERE. >>>> List<JsonArray> rows = new ArrayList<>(); >>>> Iterator<List<?>> iterator = cursor.iterator(); >>>> >>>> while(iterator.hasNext()) { >>>> List currentRow = iterator.next(); >>>> JsonArray row = new JsonArray(); >>>> >>>> currentRow.forEach(o -> row.add(o)); >>>> >>>> rows.add(row); >>>> } >>>> >>>> code.complete(rows); >>>> } catch(Exception ex) { >>>> code.fail(ex); >>>> } >>>> }, result -> { >>>> if(result.succeeded()) { >>>> promise.tryComplete(result.result()); >>>> } else { >>>> promise.tryFail(result.cause()); >>>> } >>>> }); >>>> >>>> return promise.future(); >>>> } >>>> >>>> public <T> T cache() { >>>> return (T) cache; >>>> } >>>> } >>>> >>>> >>>> >>>> On Thu, 13 Aug 2020 at 16:29, Denis Magda <dma...@apache.org> wrote: >>>> >>>>> I've created a simple test and always getting the exception below on >>>>> an attempt to get a reference to an IgniteCache instance in cases when the >>>>> cluster is not activated: >>>>> >>>>> *Exception in thread "main" class org.apache.ignite.IgniteException: >>>>> Can not perform the operation because the cluster is inactive. Note, that >>>>> the cluster is considered inactive by default if Ignite Persistent Store >>>>> is >>>>> used to let all the nodes join the cluster. To activate the cluster call >>>>> Ignite.active(true)* >>>>> >>>>> Are you trying to get a new IgniteCache reference whenever the client >>>>> reconnects successfully to the cluster? My guts feel that currently, >>>>> Ignite >>>>> verifies the activation status and generates the exception above whenever >>>>> you're getting a reference to an IgniteCache or IgniteCompute. But once >>>>> you >>>>> got those references and try to run some operations then those get stuck >>>>> if >>>>> the cluster is not activated. >>>>> - >>>>> Denis >>>>> >>>>> >>>>> On Thu, Aug 13, 2020 at 6:37 AM John Smith <java.dev....@gmail.com> >>>>> wrote: >>>>> >>>>>> The cache.query() starts to block when ignite server nodes are being >>>>>> restarted and there's no baseline topology yet. The server nodes do not >>>>>> block. It's the client that blocks. >>>>>> >>>>>> The dumpfiles are of the server nodes. The screen shot is from the >>>>>> client app using your kit profiler on the client side the threads are >>>>>> marked as red on your kit. >>>>>> >>>>>> The app is simple, make http request, it runs cache Sql query on >>>>>> ignite and if it succeeds does a put back to ignite. >>>>>> >>>>>> The Client disconnected exception only happens when all server nodes >>>>>> in the cluster are down. The blockage only happens when the cluster is >>>>>> trying to establish baseline topology. >>>>>> >>>>>> On Wed., Aug. 12, 2020, 6:28 p.m. Denis Magda, <dma...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> John, >>>>>>> >>>>>>> I don't see any traits of an application-caused deadlock in the >>>>>>> thread dumps. Please elaborate on the following: >>>>>>> >>>>>>> 7- Restart 1st node, run operation, operation fails with >>>>>>>> ClientDisconectedException but application still able to complete it's >>>>>>>> request. >>>>>>> >>>>>>> >>>>>>> What's the IP address of the server node the client app uses to join >>>>>>> the cluster? If that's not the address of the 1st node, that is already >>>>>>> restarted, then the client couldn't join the cluster and it's expected >>>>>>> that >>>>>>> it fails with the ClientDisconnectedException. >>>>>>> >>>>>>> 8- Start 2nd node, run operation, from here on all operations just >>>>>>>> block. >>>>>>> >>>>>>> >>>>>>> Are the operations unblocked and completed successfully when the >>>>>>> third node joins the cluster and the cluster gets activated >>>>>>> automatically? >>>>>>> >>>>>>> - >>>>>>> Denis >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 12, 2020 at 11:08 AM John Smith <java.dev....@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Ok Denis here they are... >>>>>>>> >>>>>>>> 3 nodes and I capture a yourlit screenshot of what it thinks are >>>>>>>> deadlocks on the client app. >>>>>>>> >>>>>>>> >>>>>>>> https://www.dropbox.com/sh/2cxjkngvx0ubw3b/AADa--HQg-rRsY3RBo2vQeJ9a?dl=0 >>>>>>>> >>>>>>>> On Wed, 12 Aug 2020 at 11:07, John Smith <java.dev....@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Denis. I will asap but you I think you were right it is the >>>>>>>>> query that blocks. >>>>>>>>> >>>>>>>>> My application first first runs a select on the cache and then >>>>>>>>> does a put to cache. >>>>>>>>> >>>>>>>>> On Tue, 11 Aug 2020 at 19:22, Denis Magda <dma...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> John, >>>>>>>>>> >>>>>>>>>> It sounds like a deadlock caused by the application logic. Is >>>>>>>>>> there any chance that the operation you run on step 8 accesses >>>>>>>>>> several keys >>>>>>>>>> in one order while the other operations work with the same keys but >>>>>>>>>> in a >>>>>>>>>> different order. The deadlocks are possible when you use Ignite >>>>>>>>>> Transaction >>>>>>>>>> API or simply execute bulk operations such as cache.readAll() or >>>>>>>>>> cache.writeAll(..). >>>>>>>>>> >>>>>>>>>> Please take and attach thread dumps from all the cluster nodes >>>>>>>>>> for analysis if we need to dig deeper. >>>>>>>>>> >>>>>>>>>> - >>>>>>>>>> Denis >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Aug 10, 2020 at 6:23 PM John Smith < >>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Denis, I think you are right. It's the query that blocks the >>>>>>>>>>> other k/v operations are ok. >>>>>>>>>>> >>>>>>>>>>> Any thoughts on this? >>>>>>>>>>> >>>>>>>>>>> On Mon, 10 Aug 2020 at 15:28, John Smith <java.dev....@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I tried with 2.8.1, same issue. Operations block indefinitely... >>>>>>>>>>>> >>>>>>>>>>>> 1- Start 3 node cluster >>>>>>>>>>>> 2- Start client application client = true with Ignition.start() >>>>>>>>>>>> 3- Run some cache operations, everything ok... >>>>>>>>>>>> 4- Shut down one node, run operation, still ok >>>>>>>>>>>> 5- Shut down 2nd node, run operation, still ok >>>>>>>>>>>> 6- Shut down 3rd node, run operation, still ok... >>>>>>>>>>>> Operations start failing with ClientDisconectedException... >>>>>>>>>>>> 7- Restart 1st node, run operation, operation fails >>>>>>>>>>>> with ClientDisconectedException but application still able to >>>>>>>>>>>> complete it's >>>>>>>>>>>> request. >>>>>>>>>>>> 8- Start 2nd node, run operation, from here on all operations >>>>>>>>>>>> just block. >>>>>>>>>>>> >>>>>>>>>>>> Basically the client application is an HTTP Server on each HTTP >>>>>>>>>>>> request does cache exception. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, 7 Aug 2020 at 19:46, John Smith <java.dev....@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> No, everything blocks... Also using 2.7.0 just in case. >>>>>>>>>>>>> >>>>>>>>>>>>> Only time I get exception is if the cluster is completely off, >>>>>>>>>>>>> then I get ClientDisconectedException... >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda <dma...@apache.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> If I'm not mistaken, key-value operations (cache.get/put) and >>>>>>>>>>>>>> compute calls fail with an exception if the cluster is >>>>>>>>>>>>>> deactivated. Do >>>>>>>>>>>>>> those fail on your end? >>>>>>>>>>>>>> >>>>>>>>>>>>>> As for the async and SQL operations, let's see what other >>>>>>>>>>>>>> community members say. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - >>>>>>>>>>>>>> Denis >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith < >>>>>>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi any thoughts on this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith < >>>>>>>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here is another example where it blocks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> SqlFieldsQuery query = new SqlFieldsQuery( >>>>>>>>>>>>>>>> "select * from my_table") >>>>>>>>>>>>>>>> .setArgs(providerId, carrierCode); >>>>>>>>>>>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> try (QueryCursor<List<?>> cursor = cache.query(query)) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> cache.query just blocks even with the timeout set. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is there a way to timeout and at least have the application >>>>>>>>>>>>>>>> continue and respond with an appropriate message? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith < >>>>>>>>>>>>>>>> java.dev....@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi running 2.7.0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> When I reboot a node and it begins to rejoin the cluster >>>>>>>>>>>>>>>>> or the cluster is not yet activated with baseline topology >>>>>>>>>>>>>>>>> operations seem >>>>>>>>>>>>>>>>> to block forever, operations that are supposed to return >>>>>>>>>>>>>>>>> IgniteFuture. I.e: >>>>>>>>>>>>>>>>> putAsync, getAsync etc... They just block, until the cluster >>>>>>>>>>>>>>>>> resolves it's >>>>>>>>>>>>>>>>> state. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>