Greetings, After a few days of trial, error, and madness - I *think* I found the source of my problem. Or at least I can now replicate it reliably. These are the basics of my speed-run-to-test-failures setup.
Fresh minimal install of Scientific Linux 7.4 on a physical host for my engine. Add the 4.2 repo and run engine-setup - just blast through the defaults. Configure it with default DC and cluster. Fresh minimal install of Scientific Linux 7.4 on node1 - configure only the primary network card. Add the ovirt repo. Add the host into cluster. Provisions just fine. Life is good. Now here is where things split. Scenario 1: build node2 same as node 1 configuring only the primary network card and add it as a host. Provisions just fine. Life is good. Scenario 2: Configure a second network. In my case a BMC/IPMI network. Doesn't matter if it is required or not - both will cause failures however the errors are slightly more evident with required. Make sure the network is assigned to your node1 and is properly assigned an IP and configured in the up state. Now build node2 same as before with only the primary network configured and add it as a host. Failure followed by infinite loop of setting it into Non-Operational! The pop-up gives you some crap about "Host has no default route." but that is 100% a red-herring. Dig a little deeper and you get a message like this: "node2 does not comply with the cluster Default networks, the following networks are missing on host: 'ovirtmgmt'" Ah. That's a bit more relevant, but why can't it configure it? Or at least get to the point where it asks me "Hey, networking is a bit off - do you want to configure that now?" That would be nice... Fortunately the troubleshooting guide has something about that! https://www.ovirt.org/documentation/how-to/troubleshooting/troubleshooting/ Unfortunately, it doesn't do anything to help. Even after doing these steps, the loop just keeps going...nothing changes. https://www.ovirt.org/develop/developer-guide/vdsm/installing-vdsm-from-rpm/ Scratch it all and completely rebuild AGAIN for... Scenario 3: Configure a second network (BMC) and assign it to node1 just like before. Build out node2 same as node1 but this time add in the EXACT SAME NETWORK CONFIGURATION THAT IS WORKING ON NODE1 - ALL of the ifcfg-* files (but update the IP address to correct host, obviously). Now add it as a host. Doh! Same error. :-/ OK fine. Let's really get into it. First off, the networking page for the host is blank. It never pulls back the network cards so you can't actually make changes via the web page. Nor can you assign networks. So the web interface doesn't help at all. Let's look at the engine log instead. 2018-04-17 14:33:00,336-05 INFO [org.ovirt.engine.core.bll.VdsEventListener] (EE-ManagedThreadFactory-engine-Thread-1091) [] ResourceManager::vdsNotResponding entered for Host 'f0a3d515-8ba2-490e-8d65-54edbb52cefc', '192.168.1.4' 2018-04-17 14:33:00,360-05 INFO [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand] (EE-ManagedThreadFactory-engine-Thread-1091) [5291eee5] Lock Acquired to object 'EngineLock:{exclusiveLocks='[f0a3d515-8ba2-490e-8d65-54edbb52cefc=VDS_FENCE]', sharedLocks=''}' 2018-04-17 14:33:00,388-05 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [2b853e43] Host 'node2' is set to Non-Operational, it is missing the following networks: 'ovirtmgmt' 2018-04-17 14:33:00,403-05 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [2b853e43] EVENT_ID: VDS_SET_NONOPERATIONAL_NETWORK(519), Host node2 does not comply with the cluster Default networks, the following networks are missing on host: 'ovirtmgmt' 2018-04-17 14:33:00,407-05 INFO [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand] (EE-ManagedThreadFactory-engine-Thread-1091) [5291eee5] Running command: VdsNotRespondingTreatmentCommand internal: true. Entities affected : ID: f0a3d515-8ba2-490e-8d65-54edbb52cefc Type: VDS There's the message from before. Good. On the right track. Not sure why it thinks the host is unreachable because the host is just fine. 2018-04-17 14:33:01,978-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-31) [] Command 'GetAllVmStatsVDSCommand(HostName = node2, VdsIdVDSCommandParametersBase:{hostId='f0a3d515-8ba2-490e-8d65-54edbb52cefc'})' execution failed: java.net.NoRouteToHostException: No route to host Huh. Again with the no route to host. But THERE IS! The network is functioning perfectly. IP's all work. DNS all works. Routing is fine. I have no idea what it is complaining about. 2018-04-17 14:33:03,873-05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-39) [4f72afaa] START, SetVdsStatusVDSCommand(HostName = node2, SetVdsStatusVDSCommandParameters:{hostId='f0a3d515-8ba2-490e-8d65-54edbb52cefc', status='NonOperational', nonOperationalReason='NETWORK_UNREACHABLE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 7459a748 Which network is unreachable? Because every single one of them is fine! Ugh! I am completely stumped as to why it works perfectly pre-additional-networks but fails every time after a network is configured. A couple of questions. 1. I assume people have added hosts _after_ they've configured multiple networks. So what am I doing wrong? Why am I unable to add a host? Again, if I don't configure that second network, it will happily add all my hosts. But what happens when I want to add a host in the future? 2. How do I break that infuriating infinite non-operational loop? I can't put it into maintenance mode, I can't delete the host, or anything else. The options are greyed out. The only solution I've found is yank the power and after it freaks out for about 30 minutes because it can't find the host, it will stop trying. But I still can't seem to remove the bad host. There has to be a way via command-line to say "stop timing out, knock that off, and delete this host!" but I'm not finding it in my searching. 3. I feel like I go through periods with oVirt where everything is running exactly the way I want then something happens (like me trying to add a host! Or thinking I can just change a host IP without the whole thing dying on me!) and it all just falls apart. I feel like I am just stumbling through most of it. I've previously gotten a lot out of the Red Hat classes and work has offered to send me to a training of my choice this year. I am really considering taking the 318 Virtualization class. I'm curious though, how close is that to what I would be working with oVirt? I'm guessing that since 4.2 recently came out, there is probably minimal chance the class will be over 4.2 but maybe it is close enough? I would love to hear feedback. Thanks! ~Stack~
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users