You're most welcome! (and apologies about the naming convention jokes - I also would name things in a meaningful way instead of bond0/1 etc - the same way I'm switching back from those "predictable interface names "ensp0p1" and similar to old-fashioned eth0, eth1 etc - not sure what kind of drugs did the engineers take when they came with those "predictable" interface names...)
Cheers, On Fri, 18 Jun 2021 at 07:16, <[email protected]> wrote: > Andrija, > > Thanks so much for all the details. I'm out of the office for the next > couple of days so will update my cloud with your suggestions when I get > back. > > As far as the "fancy" naming, I just never found names like bondX useful > when Linux allows naming the network device something else. It has just > become a convention of mine. I can easily distinguish which bond carries > cloud traffic and which carries storage traffic by looking at the bond > name, but it is just a personal thing and can easily switch back to > using the standard bond names. > > I was aware of the traffic labels but forgot to mention that I had set > those up in my previous email. There were still some details that you > provided that helped me further understand how they work though, thanks. > > Again, thanks for you help. > > On 2021-06-17 22:04, Andrija Panic wrote: > > BTW, once you thing you have fixed all your network configuration > > issues - > > destroy all system VM (CPVM, SSVM and restart all networks with > > "cleanup" - > > so that new VMs are created_ > > Inside SSVM, run the the following script, which should give you > > results > > similar as below - confirming that your SSVM is healthy > > > > > > > > root@s-2536-VM:/usr/local/cloud/systemvm# > > /usr/local/cloud/systemvm/ssvm- > > check.sh > > ================================================ > > First DNS server is 192.168.169.254 > > PING 192.168.169.254 (192.168.169.254): 56 data bytes > > 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms > > 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms > > --- 192.168.169.254 ping statistics --- > > 2 packets transmitted, 2 packets received, 0% packet loss > > round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms > > Good: Can ping DNS server > > ================================================ > > Good: DNS resolves cloudstack.apache.org > > ================================================ > > nfs is currently mounted > > Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2 > > Good: Can write to mount point > > ================================================ > > Management server is 192.168.169.13. Checking connectivity. > > Good: Can connect to management server 192.168.169.13 port 8250 > > ================================================ > > Good: Java process is running > > ================================================ > > Tests Complete. Look for ERROR or WARNING above. > > > > On Thu, 17 Jun 2021 at 23:55, Andrija Panic <[email protected]> > > wrote: > > > >> Since you really bothered to provide so very detailed inputs and help > >> us > >> help you (vs what some other people tend to do) - I think you really > >> deserved a decent answer (and some explanation). > >> > >> The last question first -even though you don't specify/have dedicated > >> Storage traffic, there will be an additional interface inside the SSVM > >> connected to the same Management network (not to the old Storage > >> network - > >> if you see the old storage network, restart your mgmt server and > >> destroy > >> the SSVM - a new one should be created, with proper interfaces inside > >> it) > >> > >> bond naming issues: > >> - rename your "bond-services" to something industry-standard like > >> "bond0" > >> or similar - cloudstack extracts "child" interfaces from cloudbr1 IF > >> you > >> specify a VLAN for a network that ACS should create - so your > >> "bond-services", while fancy (and unclear to me WHY you named it in > >> that > >> weird way - smiley here) - is NOT something CloudStack will recognize > >> and > >> this is the reason it fails (it even says so in that error message) > >> - no reason to NOT have that dedicated storage network - feel free to > >> bring it back - the same issue you have as for the public traffic - > >> rename > >> "bond-storage" to e.g. "bond1" and you will be good to go - since you > >> are > >> NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 > >> (or > >> whatever bridge name you use for it). > >> > >> Now some explanation (even though your deduction capabilities > >> certainly > >> made you draw some conclusions from what I wrote above ^^^) > >> > >> - When you specify a VLAN id for some network in CLoudStack - > >> CloudStack > >> will look for the device name that is specified as the "Traffic label" > >> for > >> that traffic (and you have none??? for your Public traffic - while it > >> should be set to the name of the bridge device "cloudbr1") - and then > >> it > >> will provision a VLAN interface and create a new bridge - (i.e. for > >> Public > >> network with VLAN id 48, it will extract "bond0" from the "cloudbr1", > >> and > >> create bond0.48 VLAN interface - AND it will create a brand new bridge > >> with > >> this bond0.48 interface (bridge with funny name), and plug Public > >> vNICs > >> into this new bridge.... > >> - When you do NOT specify a VLAN id for some network in CloudStack > >> (i.e. > >> your storage network doesn't use VLAN ID in CloudStack, your switch > >> ports > >> are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) > >> with the > >> bondYYY child interface (instead of that "bond-storage" fancy but > >> unrecognized child interface name) - and then ACS will NOT extract > >> child > >> interface (nor do everything I explained in the previous > >> paraghraph/bullet > >> point) - it will just bluntly "stick" all the vNICs into that cloudbr2 > >> - > >> and hope you have a proper physical/child interface also added to the > >> cloudbr2 that will carry the traffic down the line... (purely FYI - > >> you > >> could also e.g. use trunking on Linux if you want to, and have e.g. > >> "bondXXX.96" VLAN interface manually configured and add it to the > >> bridge, > >> while still NOT defining any VLAN in the CloudStack for that Storage > >> network - and ACS will just stick vNIC to this bridge) > >> > >> Public traffic/network - is the network that all systemVMs (SSVM, CPVM > >> and > >> all VRs) are connected to - this network is "public" like "external" > >> to > >> other CloudStack internal or Guest network - this is the network to > >> which > >> the "north" interface is connected - but does NOT have to be " non-RFC > >> 1918 > >> " - it can be any private IP range from your company internal network > >> (that > >> will eventually route traffic to internet - IF you want your ACS to be > >> able > >> to download stuff/templates from Internet - otherwise it does NOT have > >> to > >> route to internet - if you are using private cloud and do NOT want > >> external > >> access to your ACS, well to SSVM and CPVM and VRs external ("public") > >> interfaces/IPs - but if you are running a public cloud - then you want > >> to > >> provide a non-RFC 1918 i.e. a really Publicly routable IP > >> addresses/range > >> for the Public network - ACS will assign 1IP for SSVM, 1 IP for CPVM, > >> and > >> many IPs to your many VRs you create. > >> > >> A thing that I briefly touched somewhere upstairs ^^^ - for each > >> traffic > >> type you have defined - you need to define a traffic label - my > >> deduction > >> capabilities make me believe you are using KVM, so you need to set > >> your KVM > >> traffic label for all your network traffic (traffic label, in you case > >> = > >> exact name of the bridge as visible in Linux) - I recall there are > >> some new > >> UI issues when it comes to tags, so go to your > >> <MGMT-IP>:8080/client/legacy > >> - and check your traffic label there - and set it there, UI in > >> 4.15.0.0 > >> doesn't allow you to update/set it after the zone is created - but old > >> UI > >> will allow you to do it. > >> > >> Not sure why I spent 30 minutes of my life, but there you go - hope > >> you > >> got everything from my email - let me know if anything is unclear! > >> > >> Cheers, > >> > >> On Wed, 16 Jun 2021 at 19:15, Joshua Schaeffer > >> <[email protected]> > >> wrote: > >> > >>> So Suresh's advise has pushed me in the right direction. The VM was > >>> up > >>> but the agent state was down. I was able to connect to the VM in > >>> order to > >>> continue investigating and the VM is having network issues connecting > >>> to > >>> both my load balancer and my secondary storage server. I don't think > >>> I'm > >>> understanding how the public network portion is supposed to work in > >>> my zone > >>> and could use some clarification. First let me explain my network > >>> setup. On > >>> my compute nodes, ideally, I want to use 3 NIC's: > >>> > >>> 1. A management NIC for management traffic. I was using cloudbr0 for > >>> this. cloudbr0 is a bridge I created that is connected to an access > >>> port on > >>> my switch. No vlan tagging is required to use this network (it uses > >>> VLAN 20) > >>> 2. A cloud NIC for both public and guest traffic. I was using > >>> cloudbr1 > >>> for this. cloudbr1 is a bridge I created that is connected to a trunk > >>> port > >>> on my switch. Public traffic uses VLAN 48 and guest traffic should > >>> use > >>> VLANs 400 - 656. As the port is trunked I have to use vlan tagging > >>> for any > >>> traffic over this NIC. > >>> 3. A storage NIC for storage traffic. I use a bond called > >>> "bond-storage" > >>> for this. bond-storage is connected to an access port on my switch. > >>> No vlan > >>> tagging is required to use this network (it uses VLAN 96) > >>> > >>> For now I've removed the storage NIC from the configuration to > >>> simplify > >>> my troubleshooting, so I should only be working with cloudbr0 and > >>> cloudbr1. > >>> To me the public network is a *non-RFC 1918* address that should be > >>> assigned to tenant VM's for external internet access. Why do system > >>> VM's > >>> need/get a public IP address? Can't they access all the internal > >>> CloudStack > >>> servers using the pod's management network? > >>> > >>> So the first problem I'm seeing is whenever I tell CloudStack to tag > >>> VLAN > >>> 48 for public traffic it uses the underlying bond under cloudbr1 and > >>> not > >>> the bridge. I don't know where it is even getting this name as I > >>> never > >>> provided it to CloudStack > >>> > >>> Here is how I have it configured: > >>> > https://drive.google.com/file/d/10PxLdp6e46_GW7oPFJwB3sQQxnvwUhvH/view?usp=sharing > >>> > >>> Here is the message in the management logs: > >>> > >>> 2021-06-16 16:00:40,454 INFO [c.c.v.VirtualMachineManagerImpl] > >>> (Work-Job-Executor-13:ctx-0f39d8e2 job-4/job-68 ctx-a4f832c5) > >>> (logid:eb82035c) Unable to start VM on Host[-2-Routing] due to Failed > >>> to > >>> create vnet 48: Error: argument "bond-services.48" is wrong: "name" > >>> not a > >>> valid ifnameCannot find device "bond-services.48"Failed to create > >>> vlan 48 > >>> on pif: bond-services. > >>> > >>> This ultimately results in an error and the system VM never even > >>> starts. > >>> > >>> If I remove the vlan tag from the configuration ( > >>> > https://drive.google.com/file/d/11tF6YIHm9xDZvQkvphi1xvHCX_X9rDz1/view?usp=sharing > ) > >>> then the VM starts and gets a public IP but without a tagged NIC it > >>> can't > >>> actually connect to the network. This is from inside the system VM: > >>> > >>> root@s-9-VM:~# ip --brief addr > >>> lo UNKNOWN 127.0.0.1/8 > >>> eth0 UP 169.254.91.216/16 > >>> eth1 UP 10.2.21.72/22 > >>> eth2 UP 192.41.41.162/25 > >>> eth3 UP 10.2.99.15/22 > >>> root@s-9-VM:~# ping 192.41.41.129 > >>> PING 192.41.41.129 (192.41.41.129): 56 data bytes > >>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable > >>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable > >>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable > >>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable > >>> ^C--- 192.41.41.129 ping statistics --- > >>> 5 packets transmitted, 0 packets received, 100% packet loss > >>> > >>> Obviously if the network isn't functioning then it can't connect to > >>> my > >>> storage server and the agent never starts. How do I setup my public > >>> network > >>> so that it tags the packets going over cloudbr1? Also, can I not have > >>> a > >>> public IP address for system VM's or is this required? > >>> > >>> I have some other issues as well like the fact that it is creating a > >>> storage NIC on the system VM's even though I deleted my storage > >>> network > >>> from the zone, but I can tackle one problem at a time. If anyone is > >>> curious > >>> or it helps visualize my network, here is is a little ASCII diagram > >>> of how > >>> I have the compute node's networking setup. Hopefully it comes across > >>> the > >>> mailing list correctly and not all mangled: > >>> > >>> > >>> > +=============================================================================================================== > >>> | > >>> | enp3s0f0 (eth) enp3s0f1 (eth) enp65s0f0 (eth) > >>> enp65s0f1 > >>> (eth) enp71s0 (eth) enp72s0 (eth) > >>> | | | | > >>> | | | > >>> | | | > >>> +--------+---------+ +--------+---------+ > >>> | | | > >>> | | > >>> | | | bond-services > >>> (bond) | > >>> | | | > >>> | | > >>> | | | > >>> | | > >>> | | | > >>> | | > >>> | cloudbr0 (bridge) N/A cloudbr1 > >>> (bridge) bond-storage (bond) > >>> | VLAN 20 (access) VLAN 48, 400 - 656 > >>> (trunk) VLAN 96 (access) > >>> > >>> On 6/16/21 9:38 AM, Andrija Panic wrote: > >>> > " There is no secondary storage VM for downloading template to image > >>> store > >>> > LXC_SEC_STOR1 " > >>> > > >>> > So next step to investigate why there is no SSVM (can hosts access > the > >>> > secondary storage NFS, can they access the Primary Storage, etc - > those > >>> > tests you can do manually) - and as Suresh advised - one it's up, is > it > >>> all > >>> > green (COnnected / Up state). > >>> > > >>> > Best, > >>> > > >>> > >>> I appreciate everyone's help. > >>> > >>> -- > >>> Thanks, > >>> Joshua Schaeffer > >>> > >>> > >> > >> -- > >> > >> Andrija Panić > >> > -- Andrija Panić
