Re: [ovirt-users] Workflow after restoring engine from backup

Yedidyah Bar David Sat, 24 Mar 2018 22:46:36 -0700

On Fri, Mar 23, 2018 at 10:35 AM, Sven Achtelik <sven.achte...@eps.aero> wrote:
> It looks like I can't get a chance to shut down the HA VMs. I check the 
> restore log and it did mention that it change the HA-VM entries. Just to make 
> sure I looked at the DB and for the vms in question it looks like this.
>
> engine=# select vm_guid,status,vm_host,exit_status,exit_reason from 
> vm_dynamic Where vm_guid IN (SELECT vm_guid FROM vm_static WHERE 
> auto_startup='t' AND lease_sd_id is NULL);
>                vm_guid                | status |     vm_host     | 
> exit_status | exit_reason
> --------------------------------------+--------+-----------------+-------------+-------------
>  8733d4a6-0844-xxxx-804f-6b919e93e076 |      0 | DXXXX          |           2 
> |          -1
>  4eeaa622-17f9-xxxx-b99a-cddb3ad942de |      0 | xxxxAPP        |           2 
> |          -1
>  fbbdc0a0-23a4-4d32-xxxx-a35c59eb790d |      0 | xxxxDB0 |           2 |      
>     -1
>  45a4e7ce-19a9-4db9-xxxxx-66bd1b9d83af |      0 | xxxxxWOR |           2 |    
>       -1
> (4 rows)
>
> Should that be enough to have a safe start of the engine without any HA 
> action kicking in. ?


Looks ok, but check also run_on_vds and migrating_to_vds. See also bz 1446055.

Best regards,

>
>
> -----Ursprüngliche Nachricht-----
> Von: Yedidyah Bar David [mailto:d...@redhat.com]
> Gesendet: Montag, 19. März 2018 10:18
> An: Sven Achtelik
> Cc: users@ovirt.org
> Betreff: Re: [ovirt-users] Workflow after restoring engine from backup
>
> On Mon, Mar 19, 2018 at 11:03 AM, Sven Achtelik <sven.achte...@eps.aero> 
> wrote:
>> Hi Didi,
>>
>> my backups where taken with the end. Backup utility. I have 3 Data
>> centers, two of them with just one host and the third one with 3 hosts
>> running the engine.  The backup three days old, was taken on engine
>> version 4.1 (4.1.7) and the restored engine is running on 4.1.9.
>
> Since the bug I mentioned was fixed in 4.1.3, you should be covered.
>
>> I have three HA VMs that would
>> be affected. All others are just normal vms. Sounds like it would be
>> the safest to shut down the HA vm S to make sure that nothing happens ?
>
> If you can have downtime, I agree it sounds safer to shutdown the VMs.
>
>> Or can I
>> disable the HA action in the DB for now ?
>
> No need to. If you restored with 4.1.9 engine-backup, it should have done 
> this for you. If you still have the restore log, you can verify this by 
> checking it. It should contain 'Resetting HA VM status', and then the result 
> of the sql that it ran.
>
> Best regards,
>
>>
>> Thank you,
>>
>> Sven
>>
>>
>>
>> Von meinem Samsung Galaxy Smartphone gesendet.
>>
>>
>> -------- Ursprüngliche Nachricht --------
>> Von: Yedidyah Bar David <d...@redhat.com>
>> Datum: 19.03.18 07:33 (GMT+01:00)
>> An: Sven Achtelik <sven.achte...@eps.aero>
>> Cc: users@ovirt.org
>> Betreff: Re: [ovirt-users] Workflow after restoring engine from backup
>>
>> On Sun, Mar 18, 2018 at 11:45 PM, Sven Achtelik
>> <sven.achte...@eps.aero>
>> wrote:
>>> Hi All,
>>>
>>>
>>>
>>> I had issue with the storage that hosted my engine vm. The disk got
>>> corrupted and I needed to restore the engine from a backup.
>>
>> How did you backup, and how did you restore?
>>
>> Which version was used for each?
>>
>>> That worked as
>>> expected, I just didn’t start the engine yet.
>>
>> OK.
>>
>>> I know that after the backup
>>> was taken some machines where migrated around before the engine disks
>>> failed.
>>
>> Are these machines HA?
>>
>>> My question is what will happen once I start the engine service which
>>> has the restored backup on it ? Will it query the hosts for the
>>> running VMs
>>
>> It will, but HA machines are handled differently.
>>
>> See also:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1441322
>> https://bugzilla.redhat.com/show_bug.cgi?id=1446055
>>
>>> or will it assume that the VMs are still on the hosts as they resided
>>> at the point of backup ?
>>
>> It does, initially, but then updates status according to what it gets
>> from hosts.
>>
>> But polling the hosts takes time, especially if you have many, and HA
>> policy might require faster handling. So if it polls first a host that
>> had a machine on it during backup, and sees that it's gone, and didn't
>> yet poll the new host, HA handling starts immediately, which
>> eventually might lead to starting the VM on another host.
>>
>> To prevent that, the fixes to above bugs make the restore process mark
>> HA VMs that do not have leases on the storage as "dead".
>>
>>> Would I need to change the DB manual to let the engine know where VMs
>>> are up at this point ?
>>
>> You might need to, if you have HA VMs and a too-old version of restore.
>>
>>> What will happen to HA VMs
>>> ? I feel that it might try to start them a second time.  My biggest
>>> issue is that I can’t get a service Windows to shutdown all VMs and
>>> then lat them restart by the engine.
>>>
>>>
>>>
>>> Is there a known workflow for that ?
>>
>> I am not aware of a tested procedure for handling above if you have a
>> too-old version, but you can check the patches linked from above bugs
>> and manually run the SQL command(s) they include. They are essentially
>> comment 4 of the first bug.
>>
>> Good luck and best regards,
>> --
>> Didi
>
>
>
> --
> Didi



-- 
Didi
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Workflow after restoring engine from backup

Reply via email to