On Thu, Sep 17, 2020 at 11:29 AM Adam Xu <adam...@adagene.com.cn> wrote:
在 2020/9/17 15:07, Yedidyah Bar David 写道:
On Thu, Sep 17, 2020 at 8:16 AM Adam Xu <adam...@adagene.com.cn> wrote:
在 2020/9/16 15:53, Yedidyah Bar David 写道:
On Wed, Sep 16, 2020 at 10:46 AM Adam Xu <adam...@adagene.com.cn> wrote:
在 2020/9/16 15:12, Yedidyah Bar David 写道:
On Wed, Sep 16, 2020 at 6:10 AM Adam Xu <adam...@adagene.com.cn> wrote:
Hi ovirt
I just try to upgrade a self-Hosted engine from 4.3.10 to 4.4.1.4. I followed
the step in the document:
https://www.ovirt.org/documentation/upgrade_guide/#SHE_Upgrading_from_4-3
the old 4.3 env has a FC storage as engine storage domain and I have created a
new FC storage vv for the new storage domain to be used in the next steps.
I backup the old 4.3 env and prepare a total new host to restore the env.
in charter 4.4 step 8, it said:
"During the deployment you need to provide a new storage domain. The deployment
script renames the 4.3 storage domain and retains its data."
it does rename the old storage domain. but it didn't let me choose a new
storage domain during the deployment. So the new enigne just deployed in the
new host's local storage and can not move to the FC storage domain.
Can anyone tell me what the problem is?
What do you mean in "deployed in the new host's local storage"?
Did deploy finish successfully?
I think it was not finished yet.
You did 'hosted-engine --deploy --restore-from-file=something', right?
Did this finish?
not finished yet.
What are the last few lines of the output?
[ INFO ] You can now connect to
https://ovirt6.ntbaobei.com:6900/ovirt-engine/ and check the status of
this host and eventually remediate it, please continue only when the
host is listed as 'up'
[ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Create temporary lock file]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Pause execution until
/tmp/ansible.g2opa_y6_he_setup_lock is removed, delete it once ready to
proceed]
Great. This means that you replied 'Yes' to 'Pause the execution
after adding this host to the engine?', and it's now waiting.
but the new host which run the self-hosted engine's status is
"NonOperational" and never will be "up"
You seem to to imply that you expected it to become "up" by itself,
and that you claim that this will never happen, in which you are
correct.
But that's not the intention. The message you got is:
You will be able to iteratively connect to the restored engine in
order to manually review and remediate its configuration before
proceeding with the deployment:
please ensure that all the datacenter hosts and storage domain are
listed as up or in maintenance mode before proceeding.
This is normally not required when restoring an up to date and
coherent backup.
This means that it's up to you to handle this nonoperational host,
and that you are requested to continue (by removing that file) only
then.
So now, let's try to understand why the host is nonoperational, and
try to fix that. Ok?
You should be able to find the current (private/local) IP address of
the engine vm by searching the hosted-engine setup logs for 'local_vm_ip'.
You can ssh (and scp etc.) there from the host, using user 'root' and
the password you supplied.
Please check/share all of /var/log/ovirt-engine on the engine vm.
In particular, please check host-deploy/* logs there. The last lines
show a summary, like:
HOSTNAME : ok=97 changed=34 unreachable=0 failed=0
skipped=46 rescued=0 ignored=1
my log here is:
2020-09-17 12:19:40 CST - TASK [Executing post tasks defined by user]
************************************
2020-09-17 12:19:40 CST - PLAY RECAP
*********************************************************************
ovirt2.ntbaobei.com : ok=99 changed=45 unreachable=0
failed=0 skipped=45 rescued=0 ignored=1
Good.
Is 'failed' higher than 0? If so, please find the failed task and
check/share the relevant error (or just the entire file).
Also, please check engine.log there for any ' ERROR '.
I collected some error log in engine.log
Only those below?
2020-09-17 12:14:35,084+08 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-83)
[4a6cf221] Command 'UploadStreamVDSCommand(HostName =
ovirt6.ntbaobei.com,
UploadStreamVDSCommandParameters:{hostId='784eada4-49e3-4d6c-95cd-f7c81337c2f7'})'
execution failed: java.net.SocketException: Connection reset
This, and similar ones, are expected - the engine is still on the
private network, so it can't access the other hosts.
...
2020-09-17 12:14:35,085+08 ERROR
[org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-83)
[4a6cf221] Command
'org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand' failed:
EngineException:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
java.net.SocketException: Connection reset (Failed with error
VDS_NETWORK_ERROR and code 5022)
...
2020-09-17 12:14:40,322+08 ERROR
[org.ovirt.engine.core.bll.pm.FenceProxyLocator]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53)
[8b0987a
] Can not run fence action on host 'ovirt2.ntbaobei.com', no suitable
proxy host was found.
Not sure why it would want to fence ovirt2, but I think it can be ignored
for now as well.
...
2020-09-17 12:14:48,861+08 ERROR
[org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-2)
[4a6cf221] Ending command
'org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand'
with failure.
Same - it can't access the storage, so updating ovfstore fails. OK.
2020-09-17 12:14:52,630+08 ERROR
[org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41)
[56d6bb10] Failed to update OVF_STORE content
2020-09-17 12:14:52,630+08 ERROR
[org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41)
[56d6bb10] Command 'ProcessOvfUpdateForStorageDomain' id:
'8e6e1fa1-1fdf-4928-9153-4fe2ae9b77b0' with children
[1c4d99f8-2d05-4b0a-938b-8733157778e1,
62caf674-5567-461c-8e86-4ed7b03306af] failed when attempting to perform
the next operation, marking as 'ACTIVE'
2020-09-17 12:14:52,630+08 ERROR
[org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41)
[56d6bb10] null: java.lang.RuntimeException
Same.
Are these the only errors?
In particular, try to search for 'ovirt2' (your host's name), try to
find when it became nonoperational, and check errors around this.