Re: Backup filter in ignite [Multi AZ deployment]

Surinder Mehra Sat, 05 Nov 2022 19:57:43 -0700

Yeah I think there is a misunderstanding. Although I figured out my answers
from our discussion, I will try one final attempt to clarify my point on 2X
space for node3


Node setup:
Node1 and node 2 placed in AZ1
Node 3 placed in AZ2

 Since I am using AZ as backup filter as I mentioned in my first message.
Back up if node 1 cannot be placed on node2 and back up of node 2 cannot be
placed on node1 as they are in same AZ. This simply means their backups
would go to node3 which in another AZ. Hence node 3 space =(node3 primary
partitions+node 1 back up partitions+node2 backup partitions)

Wouldn't this mean node 3 need 2X space as compared to node 1 and node2.
Assuming backup partitions of node 3 would be equally distributed among
other two nodes. They would need almost same space.


On Tue, 1 Nov 2022, 23:30 Jeremy McMillan, <jeremy.mcmil...@gridgain.com>
wrote:

>
>
> On Tue, Nov 1, 2022 at 10:02 AM Surinder Mehra <redni...@gmail.com> wrote:
>
>> Even if we have 2 copies of data and primary and backup copy would be
>> stored in different AZs. My question remains valid in this case as well.
>>
>
> I think additional backup copies in the same AZ are superfluous if we
> start with the assumption that multiple concurrent failures are most likely
> to affect resources in the concurrent AZ. A second node failure, if that's
> your failure budget, is likely to corrupt all the backup copies in the
> second AZ.
>
> If you only have two AZs available in some data centers/deployments, but
> you need 3-way redundancy on certain caches/tables, then using AZ node
> attribute for backup filtering is too coarse grained. Using AZ is a general
> case best practice which gives your cluster the best chance of surviving
> multiple hardware failures in AWS because they pool hardware resources in
> AZs. Maybe you just need three AZs? Maybe AZ isn't the correct failure
> domain for your use case?
>
>
>> Do we have to ensure nodes in two AZs are always present or does ignite
>> have a way to indicate it couldn't create backups. Silently killing backups
>> is not desirable state.
>>
>
> Do you use synchronous or asynchronous backups?
>
> https://ignite.apache.org/docs/2.11.1/configuring-caches/configuring-backups#synchronous-and-asynchronous-backups
>
> You can periodically poll caches' configurations or hook a cluster state
> event, and re-compare the cache backup configuration against the enumerated
> available AZs, and raise an exception or log a message or whatever to
> detect the issue as soon as AZ count drops below minimum. This way might
> also be good for fuzzy warning condition detection point for proactive
> infrastructure operations. If you count all of the nodes in each AZ, you
> can detect and track AZ load imbalances as the ratio between the smallest
> AZ node count and the average AZ node count.
>
>
>> 2. In my original message with 2 nodes(node1 and node2) in AZ1, and
>> 3rdnode in second AZ, backups of node1 and node2 would be placed one node 3
>> in AZ2. It would mean it need to have 2X space to store backups.
>> Just trying to ensure my understanding is correct.
>>
>
> If you have three nodes, you divide your total footprint by three to get
> the minimum node capacity.
>
> If you have 2 backups, that is one primary copy plus two more backup
> copies, so you multiply your total footprint by 3.
>
> If you multiply, say 32GB by three for redundancy, that would be 96GB
> total space needed for the sum of all nodes' footprint.
>
> If you divide the 96GB storage commitment among three nodes, then each
> node must have a minimum of 32GB. That's what we started with as a nominal
> data footprint, so 1x not 2x. Node 1 will need to accommodate backups from
> node 2 and node 3. Node 2 will need to accommodate backups from node 1 and
> node 3. Each node has one primary and two backup partition copies for each
> partition of each cache with two backups.
>
>
>> Hope my queries are clear to you now
>>
>
> I still don't understand your operational goals, so I feel like we may be
> dancing around a misunderstanding.
>
>
>> On Tue, 1 Nov 2022, 20:19 Surinder Mehra, <redni...@gmail.com> wrote:
>>
>>> Thanks for your reply. Let me try to answer your 2 questions below.
>>> 1. I understand that it sacrifices the backups incase it can't place
>>> backups appropriately. Question is, is it possible to fail the deployment
>>> rather than risking single copy of data present in cluster. If this only
>>> copy goes down, we will have downtime as data won't be present in cluster.
>>> We should rather throw error if enough hardware is not present than risking
>>> data unavailability issue during business activity
>>>
>>> 2. Why we want 3 copies of data. It's a design choice. We want to ensure
>>> even if 2 nodes go down, we still have 3rd present to serve the data.
>>>
>>> Hope I answered your question
>>>
>>> On Tue, 1 Nov 2022, 19:40 Jeremy McMillan, <jeremy.mcmil...@gridgain.com>
>>> wrote:
>>>
>>>> This question is a design question.
>>>>
>>>> What kids of fault states do you expect to tolerate? What is your
>>>> failure budget?
>>>>
>>>> Why are you trying to make more than 2 copies of the data distribute
>>>> across only two failure domains?
>>>>
>>>> Also "fail fast" means discover your implementation defects faster than
>>>> your release cycle, not how fast you can cause data loss.
>>>>
>>>> On Tue, Nov 1, 2022, 09:01 Surinder Mehra <redni...@gmail.com> wrote:
>>>>
>>>>> gentle reminder.
>>>>> One additional question: We have observed that if available AZs are
>>>>> less than backups count, ignite skips creating backups. Is this correct
>>>>> understanding? If yes, how can we fail fast if backups can not be placed
>>>>> due to AZ limitation?
>>>>>
>>>>> On Mon, Oct 31, 2022 at 6:30 PM Surinder Mehra <redni...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> As per link attached, to ensure primary and backup partitions are not
>>>>>> stored on same node, We used AWS AZ as backup filter and now I can see 
>>>>>> if I
>>>>>> start two ignite nodes on the same machine, primary partitions are evenly
>>>>>> distributed but backups are always zero which is expected.
>>>>>>
>>>>>>
>>>>>> https://www.gridgain.com/docs/latest/installation-guide/aws/multiple-availability-zone-aws
>>>>>>
>>>>>> My question is what would happen if AZ-1 has 2 machines and AZ-2 has
>>>>>> 1 machine and ignite cluster has only 3 nodes, each machine having one
>>>>>> ignite node.
>>>>>>
>>>>>> Node1[AZ1] - keys 1-100
>>>>>> Node2[AZ1] -  keys 101-200
>>>>>> Node3[AZ2] - keys  201 -300
>>>>>>
>>>>>> In the above scenario, if the backup count is 2, how would back up
>>>>>> partitions be distributed.
>>>>>>
>>>>>> 1. Would it mean node3 will have 2 backup copies of primary
>>>>>> partitions of node 1 and 2 ?
>>>>>> 2. If we have a 4 node cluster with 2 nodes in each AZ, would backup
>>>>>> copies also be placed on different nodes(In other words, does the backup
>>>>>> filter also apply to how backup copies are placed on nodes) ?
>>>>>>
>>>>>>
>>>>>>

Re: Backup filter in ignite [Multi AZ deployment]

Reply via email to