BTW, once you thing you have fixed all your network configuration issues - destroy all system VM (CPVM, SSVM and restart all networks with "cleanup" - so that new VMs are created_ Inside SSVM, run the the following script, which should give you results similar as below - confirming that your SSVM is healthy
root@s-2536-VM:/usr/local/cloud/systemvm# /usr/local/cloud/systemvm/ssvm- check.sh ================================================ First DNS server is 192.168.169.254 PING 192.168.169.254 (192.168.169.254): 56 data bytes 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms --- 192.168.169.254 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms Good: Can ping DNS server ================================================ Good: DNS resolves cloudstack.apache.org ================================================ nfs is currently mounted Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2 Good: Can write to mount point ================================================ Management server is 192.168.169.13. Checking connectivity. Good: Can connect to management server 192.168.169.13 port 8250 ================================================ Good: Java process is running ================================================ Tests Complete. Look for ERROR or WARNING above. On Thu, 17 Jun 2021 at 23:55, Andrija Panic <[email protected]> wrote: > Since you really bothered to provide so very detailed inputs and help us > help you (vs what some other people tend to do) - I think you really > deserved a decent answer (and some explanation). > > The last question first -even though you don't specify/have dedicated > Storage traffic, there will be an additional interface inside the SSVM > connected to the same Management network (not to the old Storage network - > if you see the old storage network, restart your mgmt server and destroy > the SSVM - a new one should be created, with proper interfaces inside it) > > bond naming issues: > - rename your "bond-services" to something industry-standard like "bond0" > or similar - cloudstack extracts "child" interfaces from cloudbr1 IF you > specify a VLAN for a network that ACS should create - so your > "bond-services", while fancy (and unclear to me WHY you named it in that > weird way - smiley here) - is NOT something CloudStack will recognize and > this is the reason it fails (it even says so in that error message) > - no reason to NOT have that dedicated storage network - feel free to > bring it back - the same issue you have as for the public traffic - rename > "bond-storage" to e.g. "bond1" and you will be good to go - since you are > NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 (or > whatever bridge name you use for it). > > Now some explanation (even though your deduction capabilities certainly > made you draw some conclusions from what I wrote above ^^^) > > - When you specify a VLAN id for some network in CLoudStack - CloudStack > will look for the device name that is specified as the "Traffic label" for > that traffic (and you have none??? for your Public traffic - while it > should be set to the name of the bridge device "cloudbr1") - and then it > will provision a VLAN interface and create a new bridge - (i.e. for Public > network with VLAN id 48, it will extract "bond0" from the "cloudbr1", and > create bond0.48 VLAN interface - AND it will create a brand new bridge with > this bond0.48 interface (bridge with funny name), and plug Public vNICs > into this new bridge.... > - When you do NOT specify a VLAN id for some network in CloudStack (i.e. > your storage network doesn't use VLAN ID in CloudStack, your switch ports > are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) with the > bondYYY child interface (instead of that "bond-storage" fancy but > unrecognized child interface name) - and then ACS will NOT extract child > interface (nor do everything I explained in the previous paraghraph/bullet > point) - it will just bluntly "stick" all the vNICs into that cloudbr2 - > and hope you have a proper physical/child interface also added to the > cloudbr2 that will carry the traffic down the line... (purely FYI - you > could also e.g. use trunking on Linux if you want to, and have e.g. > "bondXXX.96" VLAN interface manually configured and add it to the bridge, > while still NOT defining any VLAN in the CloudStack for that Storage > network - and ACS will just stick vNIC to this bridge) > > Public traffic/network - is the network that all systemVMs (SSVM, CPVM and > all VRs) are connected to - this network is "public" like "external" to > other CloudStack internal or Guest network - this is the network to which > the "north" interface is connected - but does NOT have to be " non-RFC 1918 > " - it can be any private IP range from your company internal network (that > will eventually route traffic to internet - IF you want your ACS to be able > to download stuff/templates from Internet - otherwise it does NOT have to > route to internet - if you are using private cloud and do NOT want external > access to your ACS, well to SSVM and CPVM and VRs external ("public") > interfaces/IPs - but if you are running a public cloud - then you want to > provide a non-RFC 1918 i.e. a really Publicly routable IP addresses/range > for the Public network - ACS will assign 1IP for SSVM, 1 IP for CPVM, and > many IPs to your many VRs you create. > > A thing that I briefly touched somewhere upstairs ^^^ - for each traffic > type you have defined - you need to define a traffic label - my deduction > capabilities make me believe you are using KVM, so you need to set your KVM > traffic label for all your network traffic (traffic label, in you case = > exact name of the bridge as visible in Linux) - I recall there are some new > UI issues when it comes to tags, so go to your <MGMT-IP>:8080/client/legacy > - and check your traffic label there - and set it there, UI in 4.15.0.0 > doesn't allow you to update/set it after the zone is created - but old UI > will allow you to do it. > > Not sure why I spent 30 minutes of my life, but there you go - hope you > got everything from my email - let me know if anything is unclear! > > Cheers, > > On Wed, 16 Jun 2021 at 19:15, Joshua Schaeffer <[email protected]> > wrote: > >> So Suresh's advise has pushed me in the right direction. The VM was up >> but the agent state was down. I was able to connect to the VM in order to >> continue investigating and the VM is having network issues connecting to >> both my load balancer and my secondary storage server. I don't think I'm >> understanding how the public network portion is supposed to work in my zone >> and could use some clarification. First let me explain my network setup. On >> my compute nodes, ideally, I want to use 3 NIC's: >> >> 1. A management NIC for management traffic. I was using cloudbr0 for >> this. cloudbr0 is a bridge I created that is connected to an access port on >> my switch. No vlan tagging is required to use this network (it uses VLAN 20) >> 2. A cloud NIC for both public and guest traffic. I was using cloudbr1 >> for this. cloudbr1 is a bridge I created that is connected to a trunk port >> on my switch. Public traffic uses VLAN 48 and guest traffic should use >> VLANs 400 - 656. As the port is trunked I have to use vlan tagging for any >> traffic over this NIC. >> 3. A storage NIC for storage traffic. I use a bond called "bond-storage" >> for this. bond-storage is connected to an access port on my switch. No vlan >> tagging is required to use this network (it uses VLAN 96) >> >> For now I've removed the storage NIC from the configuration to simplify >> my troubleshooting, so I should only be working with cloudbr0 and cloudbr1. >> To me the public network is a *non-RFC 1918* address that should be >> assigned to tenant VM's for external internet access. Why do system VM's >> need/get a public IP address? Can't they access all the internal CloudStack >> servers using the pod's management network? >> >> So the first problem I'm seeing is whenever I tell CloudStack to tag VLAN >> 48 for public traffic it uses the underlying bond under cloudbr1 and not >> the bridge. I don't know where it is even getting this name as I never >> provided it to CloudStack >> >> Here is how I have it configured: >> https://drive.google.com/file/d/10PxLdp6e46_GW7oPFJwB3sQQxnvwUhvH/view?usp=sharing >> >> Here is the message in the management logs: >> >> 2021-06-16 16:00:40,454 INFO [c.c.v.VirtualMachineManagerImpl] >> (Work-Job-Executor-13:ctx-0f39d8e2 job-4/job-68 ctx-a4f832c5) >> (logid:eb82035c) Unable to start VM on Host[-2-Routing] due to Failed to >> create vnet 48: Error: argument "bond-services.48" is wrong: "name" not a >> valid ifnameCannot find device "bond-services.48"Failed to create vlan 48 >> on pif: bond-services. >> >> This ultimately results in an error and the system VM never even starts. >> >> If I remove the vlan tag from the configuration ( >> https://drive.google.com/file/d/11tF6YIHm9xDZvQkvphi1xvHCX_X9rDz1/view?usp=sharing) >> then the VM starts and gets a public IP but without a tagged NIC it can't >> actually connect to the network. This is from inside the system VM: >> >> root@s-9-VM:~# ip --brief addr >> lo UNKNOWN 127.0.0.1/8 >> eth0 UP 169.254.91.216/16 >> eth1 UP 10.2.21.72/22 >> eth2 UP 192.41.41.162/25 >> eth3 UP 10.2.99.15/22 >> root@s-9-VM:~# ping 192.41.41.129 >> PING 192.41.41.129 (192.41.41.129): 56 data bytes >> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable >> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable >> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable >> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable >> ^C--- 192.41.41.129 ping statistics --- >> 5 packets transmitted, 0 packets received, 100% packet loss >> >> Obviously if the network isn't functioning then it can't connect to my >> storage server and the agent never starts. How do I setup my public network >> so that it tags the packets going over cloudbr1? Also, can I not have a >> public IP address for system VM's or is this required? >> >> I have some other issues as well like the fact that it is creating a >> storage NIC on the system VM's even though I deleted my storage network >> from the zone, but I can tackle one problem at a time. If anyone is curious >> or it helps visualize my network, here is is a little ASCII diagram of how >> I have the compute node's networking setup. Hopefully it comes across the >> mailing list correctly and not all mangled: >> >> >> +=============================================================================================================== >> | >> | enp3s0f0 (eth) enp3s0f1 (eth) enp65s0f0 (eth) enp65s0f1 >> (eth) enp71s0 (eth) enp72s0 (eth) >> | | | | >> | | | >> | | | >> +--------+---------+ +--------+---------+ >> | | | >> | | >> | | | bond-services >> (bond) | >> | | | >> | | >> | | | >> | | >> | | | >> | | >> | cloudbr0 (bridge) N/A cloudbr1 >> (bridge) bond-storage (bond) >> | VLAN 20 (access) VLAN 48, 400 - 656 >> (trunk) VLAN 96 (access) >> >> On 6/16/21 9:38 AM, Andrija Panic wrote: >> > " There is no secondary storage VM for downloading template to image >> store >> > LXC_SEC_STOR1 " >> > >> > So next step to investigate why there is no SSVM (can hosts access the >> > secondary storage NFS, can they access the Primary Storage, etc - those >> > tests you can do manually) - and as Suresh advised - one it's up, is it >> all >> > green (COnnected / Up state). >> > >> > Best, >> > >> >> I appreciate everyone's help. >> >> -- >> Thanks, >> Joshua Schaeffer >> >> > > -- > > Andrija Panić > -- Andrija Panić
