Re: Graceful shutdown and request draining of Ignite servers

Ilya Kasnacheev Thu, 18 Feb 2021 05:05:36 -0800

Hello!

This sounds like a too detailed and peculiar scenario that should be taken
care of on the application level, as you already do.


Regards,
-- 
Ilya Kasnacheev


ср, 17 февр. 2021 г. в 23:50, Raymond Wilson <[email protected]>:

> I Ilya,
>
> Sorry, that was a response to another problem!
>
> In this case, we have a more asynchronous mode of query-response where the
> processing node can asynchronously send back a response to a query. The
> reasons for this are: (1) Some responses are effectively streams of data
> and we can't structure them as a single response, and (2) we can have
> thousands of concurrent requests per node, which causes thread pool
> exhaustion and response starvation due to the synchronous nature of the
> IComputeFunc.Invoke() method.
>
> eg: We may have a request sequence like this where A, B and C are nodes in
> the grid
>
> Request: A -> B -> C
> Response: C -> B -> A
>
> If node B goes away unexpectedly, requests executing on 'C' can't send
> their response and the request fails.
>
> From the perspective of A, it may attempt a retry after failing to receive
> the response from B, but that's unsatisfactory for other reasons.
>
> I have built a POC that permits nodes to emit an application level
> availability state which requestors can use to exclude certain nodes from
> their request topology projections. This means a node being removed due to
> auto-scale down or container scheduling can gracefully exit the grid after
> ensuring the active requests it is involved in can complete normally. In
> the case above, node B would be a client node providing services through a
> web api gateway (A) and requesting results from co-located processing on
> node C.
>
> Thanks,
> Raymond.
>
>
> On Thu, Feb 18, 2021 at 9:15 AM Raymond Wilson <[email protected]>
> wrote:
>
>> Hi Ilya,
>>
>> That is the current method we use to stop the grid.
>>
>> However, this can leave uncheckpointed changes in the in-memory stores
>> (only in the WAL), so when we restart the grid it goes into the cache
>> recovery mode which is very slow.
>>
>> Raymond.
>>
>> On Thu, Feb 18, 2021 at 3:34 AM Ilya Kasnacheev <
>> [email protected]> wrote:
>>
>>> Hello!
>>>
>>> Why can't you just use Ignite.stop(instanceName, false)?
>>>
>>> Just make sure your projections are not singleton and the tasks will be
>>> rolled over.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> вт, 9 февр. 2021 г. в 06:41, Raymond Wilson <[email protected]
>>> >:
>>>
>>>> All,
>>>>
>>>> We have a very similar requirement as described in this item:
>>>> https://issues.apache.org/jira/browse/IGNITE-10872
>>>>
>>>> Namely, when removing a node from a Ignite grid, we want to do two
>>>> things:
>>>>
>>>> 1. Prevent new requests from reaching it
>>>> 2. Allow all running requests the node is involved in to complete
>>>> before it terminates.
>>>>
>>>> The solution outlined in 10872 partially solves these elements within
>>>> our architecture in that it allows Ignite to pause shutdown of the node
>>>> until all requests are completed (and, I assume, prevent new requests from
>>>> reaching the node being shut down).
>>>>
>>>> In our architecture the phrase 'requests the node is involved in' made
>>>> be opaque from the context on Ignite due to an asynchronous calling model
>>>> we are using to permit very large numbers of concurrent requests to execute
>>>> without saturating the Ignite thread pools. What this means is that a node
>>>> that may be a candidate to be shut down may be waiting for a response from
>>>> another node on the grid in a way that Ignite can't see, so would determine
>>>> the node was safe to shut down when it is not.
>>>>
>>>> A good example of this in our system is an Apply style Ignite call
>>>> where the request is sent to one of a set of nodes. That set of nodes may
>>>> scale in/out due to request demand. On a scale in operation, the node to be
>>>> removed needs to be excluded from the topology projection constructed to
>>>> perform the Apply() against. Once we are satisfied the node has no further
>>>> request involved (eg: by a simple timeout) then we would proceed with
>>>> actual shut down of that node.
>>>>
>>>> I have not seen any capability in Ignite today where a node can be
>>>> 'un-blessed'; does one exist? Or should we construct this facility within
>>>> our application logic layer?
>>>>
>>>> Thanks,
>>>> Raymond.
>>>>
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> [email protected]
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Solution Architect, Civil Construction Software Systems (CCSS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> [email protected]
>>
>>
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> [email protected]
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>

Re: Graceful shutdown and request draining of Ignite servers

Reply via email to