Hi, Thank you for your reply, but I already did all of that but I didn't understand everything and I got several problems on the way of doing the fail-over then a fail-back. I am writing this mail in hope to clarify those things. I will try to express my self correctly and give as much details as that I can.
*The LAB :* My LAB contains two single-host oVirt-HCI platforms, one to act as the *primary site (the source)* the second as the *disaster-recovery site (the target)*. Each HCI site contains one data domain, the domain is comprised of a gluster volume which is backed by one brick. The volumes (source and target) have the same size, and they have been created within the process of the HCI deployment. *At the end of the deployment, I detached the deleted the gluster data domain on the target site, but I didn't delete the target volume.* My goal is to test the disaster recovery (active-passive DR to be precise) process on an HCI implementation. To test the fail-over and the fail-back process entirely. *Documentation* RHHI 1.7 Maintaining_Red_Hat_Hyperconverged_Infrastructure_for_Virtualization-en-US and I started my implementation I prepared all the ansible playbooks. *The Test procedure:* *Fail-over* 1 - Create a Windows10 VM on the source volume. 2 - Replicate to the DR site. 3 - Execute the fail-over procedure and test if the WM is usable in the target platform. 4 - Detach and Delete the data domain in the target platform without touching the target volume 5 - Make changes to the Win10 VM on the source volume (creating files and installing software) 6 - Replicate again to the DR site then execute another fail-over and see if the modification were synced. *Fail-back* 1 - Make changes to the Win10 VM on the target volume (deleting files) *and especially creating a snapshot* 2 - Detach and Delete the data domain in the source platform without touching the source volume. 3 - Replicate to the source site. 4 - Execute the clean up playbook 5 - Execute the fail-over and WM is usable in the source platform and that the modifications were synced especially the snapshot *Things I need to confirm :* 1 - When creating the geo-replication from the primary site to the target site, we get to a point where we have to create "*Scheduling regular backups using geo-replication*", from my understanding it's like a cron job that starts the geo-replication at a specific time (or day time), and from my testing, the geo-replication starts syncing at that precise time and when its "*CRAWL STATUS*" reaches "Changelog Crawl" it stops the synchronization. In other terms when the geo-replication reaches the same date as the check-point (the specific time). The smallest time you can get from the configuration window is 24hours, which means in the event of a disaster, you can at most recover the data from the day before. *Is this correct?* *Problems encountered during the test:* *Fail-over* 1 - When executing the fail-over the first time (ansible-playbook dr-rhv-failover.yml --tags "fail_over"), the import of the target data domain failed with the error : *An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400. *I tried manually to import the domain from oVirt's admin console and I got the same error. so I did the following - I deleted the target volume and the brick and the sub-directory of the brick. - I recreated the volume from scratch. - I redid the geo-replication synchronization from the source. - I executed the fail-over and this time the target data domain was imported correctly and the Win10 VM was started correctly. 2 - I detached then deleted the target data domain without touching the target volume, then I made change to the Win10 VM on the source site, then I created a new schedule of geo-replication, and after the replication I executed another fail-over. - The Win10 VM started successfully and the changes made were synced. *Fail-back* 1 - The documentation doesn't explain the fail-back procedure thoroughly. It doesn't explain what does the dr-cleanup.yml do? 2 - When launching the fail-back playbook at some point I get this message : *TASK [oVirt.disaster-recovery : Failback Replication Sync pause] ****************************************************************************************************************************[oVirt.disaster-recovery : Failback Replication Sync pause][Failback Replication Sync] Please press ENTER once the destination storage domains are ready to be used for the destination setup:* What does this mean? 3 - I did some changed on the Win10 VM and I created snapshot of that VM. 4.a - To replicate the data from the target site to the primary site I create a new geo-replication from the target volume to the source volume, but I get a warning that the source volume was not empty so I forced the geo-replication creation, then : - I detached and deleted the source data domain without touching the source volume. - I started the geo-replication manually (without a schedule) and when it reached the state of "Changelog Crawl" I stopped it. - I executed the clean-up plyabook then I executed the fail-back playbook - I got the error : the import of the source data domain failed with the error : *An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.* 4.b - So I redid the test but, - I deleted the source volume and its brick, then I created them again. - I started the geo-replication manually (without a schedule) and when it reached the state of "changelog" I stopped it. - I executed the clean-up plyabook then I executed the fail-back playbook - I got the error : the import of the source data domain failed with the error : *An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.* 4.c - I redid the test but : - I deleted the source volume and its brick, then I created them again. - I started the geo-replication using a shedule this time - I executed the clean-up plyabook then I executed the fail-back playbook - *This time the source data domain was imported correctly and the Win10 VM was started and the modifications were synced.* - The snapshot was imported, but there was another snapshot with it called "Win10-TMPDR". Regards. Le jeu. 2 avr. 2020 à 08:42, Eyal Shenitzky <eshen...@redhat.com> a écrit : > If you intention is to use active-passive disaster recovery solution, you > can have a look at the following guild: > > https://ovirt.org/documentation/disaster-recovery-guide/active_passive_overview.html > > On Wed, 1 Apr 2020 at 16:42, wodel youchi <wodel.you...@gmail.com> wrote: > >> Hi, >> >> I am trying to configure and test disaster recovery on ovirt HCI >> >> And to understand how it works >> What is the minimum RPO and its relationship with checkpoint >> And what are the steps to fail back >> >> Regards >> >> Le mer. 1 avr. 2020 14:16, Eyal Shenitzky <eshen...@redhat.com> a écrit : >> >>> Hi Wodel, >>> >>> Can you please explain what you are trying to do? >>> I am not sure I understand it from your question. >>> >>> On Wed, 1 Apr 2020 at 12:55, wodel youchi <wodel.you...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I re-did the test and it seems that the minimum RPO is one day and if >>>> someone could confirm that would be great >>>> >>>> As for the snapshot this time it was synced >>>> >>>> Then I tried to test the fail back and I found that the documentation >>>> is not clear : >>>> - it is not clear what is the purpose of the dr-clear playbook >>>> - it is not clear what does mean : put the target volume in read write >>>> mode and source volume in read-only mode >>>> - Do we have to sync back using a new georeplication link from the dr >>>> volume to source volume? >>>> I tried to so, in my first trial I forced the creation of the back >>>> georeplication without deleting the content of the source volume then I >>>> started the replication manually (I didn't use the checkpoint) and I >>>> stopped the replication once it reached the changelog state, but I couldn't >>>> import the source volume I got the error : volume is not empty >>>> >>>> In my second trial I deleted and recreated the source volume from >>>> scratch and the i started the replication back manually at the end I got >>>> the error >>>> >>>> In my third trial I deleted the source volume and recreated it from >>>> scratch but I replicated back using the check point method and this time >>>> the fail back worked. >>>> >>>> Could someone sheds some light on this? >>>> >>>> Thank you >>>> Regards. >>>> >>>> Le dim. 29 mars 2020 19:19, wodel youchi <wodel.you...@gmail.com> a >>>> écrit : >>>> >>>>> Hi, >>>>> >>>>> Need to understand somethings about DR on oVirt-HI >>>>> >>>>> >>>>> - What does mean : Scheduling regular backups using >>>>> geo-replication (point 3.3.4 RHHI 1.7 Doc Maintaining RHHI) : >>>>> - Does this mean creating a check-point? >>>>> - If yes, does this mean that the geo-replication process will >>>>> sync data up to that check-point and then stops the >>>>> synchronization, then >>>>> repeat the same cycle the day after? does this mean that the >>>>> minimum RPO is >>>>> one day? >>>>> - I created a snapshot of a VM on the source Manager, I synced the >>>>> volume then I executed a DR, The VM was started on the Target Manager >>>>> but >>>>> the VM didn't have its snapshot, any idea??? >>>>> >>>>> >>>>> Regards, be safe. >>>>> >>>> _______________________________________________ >>>> Users mailing list -- users@ovirt.org >>>> To unsubscribe send an email to users-le...@ovirt.org >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>> oVirt Code of Conduct: >>>> https://www.ovirt.org/community/about/community-guidelines/ >>>> List Archives: >>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/N2MSZUYT2GE33IVUKGVYHLAO33ZFMJ7N/ >>>> >>> >>> >>> -- >>> Regards, >>> Eyal Shenitzky >>> >> > > -- > Regards, > Eyal Shenitzky >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/4BDPPF2KWU5PDQTHNDTC6JBWD57UMFAE/