John,

It sounds like a deadlock caused by the application logic. Is there any
chance that the operation you run on step 8 accesses several keys in one
order while the other operations work with the same keys but in a different
order. The deadlocks are possible when you use Ignite Transaction API or
simply execute bulk operations such as cache.readAll() or
cache.writeAll(..).

Please take and attach thread dumps from all the cluster nodes for analysis
if we need to dig deeper.

-
Denis


On Mon, Aug 10, 2020 at 6:23 PM John Smith <java.dev....@gmail.com> wrote:

> Hi Denis, I think you are right. It's the query that blocks the other k/v
> operations are ok.
>
> Any thoughts on this?
>
> On Mon, 10 Aug 2020 at 15:28, John Smith <java.dev....@gmail.com> wrote:
>
>> I tried with 2.8.1, same issue. Operations block indefinitely...
>>
>> 1- Start 3 node cluster
>> 2- Start client application client = true with Ignition.start()
>> 3- Run some cache operations, everything ok...
>> 4- Shut down one node, run operation, still ok
>> 5- Shut down 2nd node, run operation, still ok
>> 6- Shut down 3rd node, run operation, still ok... Operations start
>> failing with ClientDisconectedException...
>> 7- Restart 1st node, run operation, operation fails
>> with ClientDisconectedException but application still able to complete it's
>> request.
>> 8- Start 2nd node, run operation, from here on all operations just block.
>>
>> Basically the client application is an HTTP Server on each HTTP request
>> does cache exception.
>>
>>
>>
>>
>>
>>
>> On Fri, 7 Aug 2020 at 19:46, John Smith <java.dev....@gmail.com> wrote:
>>
>>> No, everything blocks... Also using 2.7.0 just in case.
>>>
>>> Only time I get exception is if the cluster is completely off, then I
>>> get ClientDisconectedException...
>>>
>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda <dma...@apache.org> wrote:
>>>
>>>> If I'm not mistaken, key-value operations (cache.get/put) and compute
>>>> calls fail with an exception if the cluster is deactivated. Do those fail
>>>> on your end?
>>>>
>>>> As for the async and SQL operations, let's see what other community
>>>> members say.
>>>>
>>>> -
>>>> Denis
>>>>
>>>>
>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith <java.dev....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi any thoughts on this?
>>>>>
>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith <java.dev....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Here is another example where it blocks.
>>>>>>
>>>>>> SqlFieldsQuery query = new SqlFieldsQuery(
>>>>>>         "select * from my_table")
>>>>>>         .setArgs(providerId, carrierCode);
>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS);
>>>>>>
>>>>>> try (QueryCursor<List<?>> cursor = cache.query(query))
>>>>>>
>>>>>> cache.query just blocks even with the timeout set.
>>>>>>
>>>>>> Is there a way to timeout and at least have the application continue
>>>>>> and respond with an appropriate message?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith <java.dev....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi running 2.7.0
>>>>>>>
>>>>>>> When I reboot a node and it begins to rejoin the cluster or the
>>>>>>> cluster is not yet activated with baseline topology operations seem to
>>>>>>> block forever, operations that are supposed to return IgniteFuture. I.e:
>>>>>>> putAsync, getAsync etc... They just block, until the cluster resolves 
>>>>>>> it's
>>>>>>> state.
>>>>>>>
>>>>>>>
>>>>>>>

Reply via email to