Re: Operation block on Cluster recovery/rebalance.

John Smith Wed, 12 Aug 2020 08:08:27 -0700

Hi Denis. I will asap but you I think you were right it is the query that
blocks.


My application first first runs a select on the cache and then does a put
to cache.

On Tue, 11 Aug 2020 at 19:22, Denis Magda <dma...@apache.org> wrote:

> John,
>
> It sounds like a deadlock caused by the application logic. Is there any
> chance that the operation you run on step 8 accesses several keys in one
> order while the other operations work with the same keys but in a different
> order. The deadlocks are possible when you use Ignite Transaction API or
> simply execute bulk operations such as cache.readAll() or
> cache.writeAll(..).
>
> Please take and attach thread dumps from all the cluster nodes for
> analysis if we need to dig deeper.
>
> -
> Denis
>
>
> On Mon, Aug 10, 2020 at 6:23 PM John Smith <java.dev....@gmail.com> wrote:
>
>> Hi Denis, I think you are right. It's the query that blocks the other k/v
>> operations are ok.
>>
>> Any thoughts on this?
>>
>> On Mon, 10 Aug 2020 at 15:28, John Smith <java.dev....@gmail.com> wrote:
>>
>>> I tried with 2.8.1, same issue. Operations block indefinitely...
>>>
>>> 1- Start 3 node cluster
>>> 2- Start client application client = true with Ignition.start()
>>> 3- Run some cache operations, everything ok...
>>> 4- Shut down one node, run operation, still ok
>>> 5- Shut down 2nd node, run operation, still ok
>>> 6- Shut down 3rd node, run operation, still ok... Operations start
>>> failing with ClientDisconectedException...
>>> 7- Restart 1st node, run operation, operation fails
>>> with ClientDisconectedException but application still able to complete it's
>>> request.
>>> 8- Start 2nd node, run operation, from here on all operations just block.
>>>
>>> Basically the client application is an HTTP Server on each HTTP request
>>> does cache exception.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 7 Aug 2020 at 19:46, John Smith <java.dev....@gmail.com> wrote:
>>>
>>>> No, everything blocks... Also using 2.7.0 just in case.
>>>>
>>>> Only time I get exception is if the cluster is completely off, then I
>>>> get ClientDisconectedException...
>>>>
>>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda <dma...@apache.org> wrote:
>>>>
>>>>> If I'm not mistaken, key-value operations (cache.get/put) and compute
>>>>> calls fail with an exception if the cluster is deactivated. Do those fail
>>>>> on your end?
>>>>>
>>>>> As for the async and SQL operations, let's see what other community
>>>>> members say.
>>>>>
>>>>> -
>>>>> Denis
>>>>>
>>>>>
>>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith <java.dev....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi any thoughts on this?
>>>>>>
>>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith <java.dev....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is another example where it blocks.
>>>>>>>
>>>>>>> SqlFieldsQuery query = new SqlFieldsQuery(
>>>>>>>         "select * from my_table")
>>>>>>>         .setArgs(providerId, carrierCode);
>>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS);
>>>>>>>
>>>>>>> try (QueryCursor<List<?>> cursor = cache.query(query))
>>>>>>>
>>>>>>> cache.query just blocks even with the timeout set.
>>>>>>>
>>>>>>> Is there a way to timeout and at least have the application continue
>>>>>>> and respond with an appropriate message?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith <java.dev....@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi running 2.7.0
>>>>>>>>
>>>>>>>> When I reboot a node and it begins to rejoin the cluster or the
>>>>>>>> cluster is not yet activated with baseline topology operations seem to
>>>>>>>> block forever, operations that are supposed to return IgniteFuture. 
>>>>>>>> I.e:
>>>>>>>> putAsync, getAsync etc... They just block, until the cluster resolves 
>>>>>>>> it's
>>>>>>>> state.
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: Operation block on Cluster recovery/rebalance.

Reply via email to