On Wed, Feb 3, 2021 at 4:52 PM Roderick Mooi <roder...@sanren.ac.za> wrote: > > Thanks, > > > I didn't check, but am pretty certain that it's not related to the > > engine db. Do you see such duplicates there as well (using the web ui > > or sql against it)? If so, fix these first. If no other means, put the > > host to maintenance and reinstall with the correct name. > > Not seeing duplicates in the web UI, only in the --vm-status. Can you please > assist me with the sql commands or reference to the database schema + where > to check? I'd like to check that first before doing anything too drastic.
/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c 'select * from vds' > > Note: it only duplicated the hostname after I changed the host_id, before > that it had the correct hostname but duplicate host_id. > > PS I have a recent backup of the database (just before which I could restore > if you think that'll do the trick without breaking anything? > > > On 2021/02/03 16:33, Yedidyah Bar David wrote: > > On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi <roder...@sanren.ac.za> wrote: > >> > >> Hi, > >> > >>> Any idea how this happened? > >> > >> Somehow related to the power being "pulled" at the wrong time? > >> > >>> Perhaps this is a backup done by emacs? > >> > >> Not sure what does it but I'm glad it did ;) > >> > >>> Please compare it to your other hosts. It should be (mostly?) > >>> identical, but make sure that host_id= is unique per host. It should > >>> match the spm host id for this host in the engine database. > >> > >> I had to restore one of my hosts (host 1) manually due a cleanup during my > >> re-deploy attempts. I managed to do this successfully by copying the > >> missing files from another host (host 2) but the first time the host ID > >> matched one of the other hosts (which made at least hosted-engine > >> --vm-status unhappy) [I hadn't seen your email yet :(]. I subsequently > >> corrected the host_id and rebooted the guilty host. Things mostly seem to > >> be working now except that in hosted-engine --vm-status my first two hosts > >> (the one I copied the .conf from as well as the one I copied it to > >> [without changing the ID :O]) now have the same hostname :-/ I'm assuming > >> there's a mismatch in the engine database - where/how do I fix that? > >> > > > > I didn't check, but am pretty certain that it's not related to the > > engine db. Do you see such duplicates there as well (using the web ui > > or sql against it)? If so, fix these first. If no other means, put the > > host to maintenance and reinstall with the correct name. > > > > If it's just the shared storage, you can try the following. Carefully. > > Didn't try myself. Try on a test system first. > > > > 1. Set global maintenance > > > > 2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd > > > > 3. hosted-engine --clean_metadata --host-id=1 > > > > - Perhaps even pass --force-cleanup, not sure when it's needed > > > > - Repeat for other IDs as needed > > > > 4. Start ovirt-ha-agent (I think this should start all the others, but > > make sure) > > > > 5. Wait a bit. I am pretty certain that they should recreate their > > entries in the shared storage and eventually --vm-status should look > > ok. > > > > 6. Exit global maintenance > > > > Good luck, > > > >> Appreciated! (and happy cos our cluster is almost back to normal :) ) > >> > >> On 2021/02/03 11:30, Yedidyah Bar David wrote: > >>> On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi <roder...@sanren.ac.za> > >>> wrote: > >>>> > >>>> Hello and thanks for assisting! > >>>> > >>>> I think I may have found the problem :) > >>>> > >>>> /etc/ovirt-hosted-engine/hosted-engine.conf > >>>> > >>>> is blank. > >>>> > >>>> But I do have hosted-engine.conf~ > >>> > >>> Any idea how this happened? > >>> > >>> Perhaps this is a backup done by emacs? > >>> > >>>> > >>>> Can I cp this to restore the original? > >>> > >>> Please compare it to your other hosts. It should be (mostly?) > >>> identical, but make sure that host_id= is unique per host. It should > >>> match the spm host id for this host in the engine database. > >>> > >>>> > >>>> Anything else I need to do? > >>> > >>> Not sure, but better find the root cause to make sure no other damage was > >>> done. > >>> > >>> Good luck, > >>> > >>>> > >>>> Appreciated > >>>> > >>>> > >>>> On 2021/02/02 11:37, Strahil Nikolov wrote: > >>>>> Usually, > >>>>> > >>>>> I would start with checking the output of the > >>>>> /var/log/ovirt-hosted-engine-ha/{broker,agent}.log > >>>>> > >>>>> I'm typing it on my phone, so the path could have a typo. > >>>>> > >>>>> Check if the following services (also typed by memory, might have to > >>>>> remove the 'd') are running: > >>>>> - sanlock > >>>>> - supervdsmd > >>>>> - vdsmd > >>>>> > >>>>> > >>>>> Sometimes, some of my VGs (gluster) are not activated, so if you run > >>>>> hyperconverged -> you can 'vgchange -ay'. > >>>>> > >>>>> Best Regards, > >>>>> Strahil Nikolov > >>>>> > >>>>> > >>>>> Sent from Yahoo Mail on Android > >>>>> <https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature> > >>>>> > >>>>> On Tue, Feb 2, 2021 at 11:28, Roderick Mooi > >>>>> <roder...@sanren.ac.za> wrote: > >>>>> Hi! > >>>>> > >>>>> We had a power outage and all our servers (oVirt hosts) went > >>>>> down. When they started up neither the hosted-engine nor VMs were > >>>>> started. > >>>>> > >>>>> hosted-engine --vm-status > >>>>> says: > >>>>> You must run deploy first > >>>>> > >>>>> I tried running deploy with various options but ultimately get > >>>>> stuck at: > >>>>> > >>>>> The Host ID is already known. Is this a re-deployment on an > >>>>> additional host that was previously set up (Yes, No)[Yes]? > >>>>> ... > >>>>> [ ERROR ] Failed to execute stage 'Closing up': <urlopen error > >>>>> [Errno 113] No route to host> > >>>>> > >>>>> OR > >>>>> > >>>>> The specified storage location already contains a data domain. Is > >>>>> this an additional host setup (Yes, No)[Yes]? No > >>>>> [ ERROR ] Re-deploying the engine VM over a previously > >>>>> (partially) deployed system is not supported. Please clean up the > >>>>> storage device or select a different one and retry. > >>>>> > >>>>> NOTES: > >>>>> 1. This is oVirt v3.6 (legacy install, I know...) > >>>>> 2. We do have daily engine backups (.bak files) [till the day the > >>>>> power failed] > >>>>> > >>>>> Any advice/assistance appreciated. > >>>>> > >>>>> Thanks! > >>>>> > >>>>> Roderick > >>>>> _______________________________________________ > >>>>> Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> > >>>>> To unsubscribe send an email to users-le...@ovirt.org > >>>>> <mailto:users-le...@ovirt.org> > >>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>>>> <https://www.ovirt.org/privacy-policy.html> > >>>>> oVirt Code of Conduct: > >>>>> https://www.ovirt.org/community/about/community-guidelines/ > >>>>> <https://www.ovirt.org/community/about/community-guidelines/> > >>>>> List Archives: > >>>>> > >>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/ > >>>>> > >>>>> <https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/> > >>>>> > >>>> _______________________________________________ > >>>> Users mailing list -- users@ovirt.org > >>>> To unsubscribe send an email to users-le...@ovirt.org > >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>>> oVirt Code of Conduct: > >>>> https://www.ovirt.org/community/about/community-guidelines/ > >>>> List Archives: > >>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/HTWNERBX42JNOMONSCG6BL2MCIQZDW7C/ > >>> > >>> > >>> > >> > > > > > -- Didi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/EYNRI3DYZEPXXN4DGFOF4CURCY6XTU3O/