BTW, once you thing you have fixed all your network configuration issues -
destroy all system VM (CPVM, SSVM and restart all networks with "cleanup" -
so that new VMs are created_
Inside SSVM, run the the following script, which should give you results
similar as below - confirming that your SSVM is healthy



  root@s-2536-VM:/usr/local/cloud/systemvm# /usr/local/cloud/systemvm/ssvm-
check.sh
================================================
First DNS server is  192.168.169.254
PING 192.168.169.254 (192.168.169.254): 56 data bytes
64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
--- 192.168.169.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
Good: Can ping DNS server
================================================
Good: DNS resolves cloudstack.apache.org
================================================
nfs is currently mounted
Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
Good: Can write to mount point
================================================
Management server is 192.168.169.13. Checking connectivity.
Good: Can connect to management server 192.168.169.13 port 8250
================================================
Good: Java process is running
================================================
Tests Complete. Look for ERROR or WARNING above.

On Thu, 17 Jun 2021 at 23:55, Andrija Panic <[email protected]> wrote:

> Since you really bothered to provide so very detailed inputs and help us
> help you (vs what some other people tend to do) -  I think you really
> deserved a decent answer (and some explanation).
>
> The last question first -even though you don't specify/have dedicated
> Storage traffic, there will be an additional interface inside the SSVM
> connected to the same Management network (not to the old Storage network -
> if you see the old storage network, restart your mgmt server and destroy
> the SSVM - a new one should be created, with proper interfaces inside it)
>
> bond naming issues:
> - rename  your "bond-services" to something industry-standard like "bond0"
> or similar - cloudstack extracts "child" interfaces from cloudbr1 IF you
> specify a VLAN for a network that ACS should create - so your
> "bond-services", while fancy (and unclear to me WHY you named it in that
> weird way - smiley here) - is NOT something CloudStack will recognize and
> this is the reason it fails (it even says so in that error message)
> - no reason to NOT have that dedicated storage network -  feel free to
> bring it back - the same issue you have as for the public traffic - rename
> "bond-storage" to e.g. "bond1" and you will be good to go -  since you are
> NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 (or
> whatever bridge name you use for it).
>
> Now some explanation (even though your deduction capabilities certainly
> made you draw some conclusions from what I wrote above ^^^)
>
> - When you specify a VLAN id for some network in CLoudStack - CloudStack
> will look for the device name that is specified as the "Traffic label" for
> that traffic (and you have none??? for your Public traffic - while it
> should be set to the name of the bridge device "cloudbr1") - and then it
> will provision a VLAN interface and create a new bridge - (i.e. for Public
> network with VLAN id 48, it will extract "bond0" from the "cloudbr1", and
> create bond0.48 VLAN interface - AND it will create a brand new bridge with
> this bond0.48 interface (bridge with funny name), and plug Public vNICs
> into this new bridge....
> - When you do NOT specify a VLAN id for some network in CloudStack (i.e.
> your storage network doesn't use VLAN ID in CloudStack, your switch ports
> are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) with the
> bondYYY child interface (instead of that "bond-storage" fancy but
> unrecognized child interface name) - and then ACS will NOT extract child
> interface (nor do everything I explained in the previous paraghraph/bullet
> point) - it will just bluntly "stick" all the vNICs into that cloudbr2 -
> and hope you have a proper physical/child interface also added to the
> cloudbr2 that will carry the traffic down the line... (purely FYI -  you
> could also e.g. use trunking on Linux if you want to, and have e.g.
> "bondXXX.96" VLAN interface manually configured and add it to the bridge,
> while still NOT defining any VLAN in the CloudStack for that Storage
> network - and ACS will just stick vNIC to this bridge)
>
> Public traffic/network - is the network that all systemVMs (SSVM, CPVM and
> all VRs) are connected to - this network is "public" like "external" to
> other CloudStack internal or Guest network - this is the network to which
> the "north" interface is connected - but does NOT have to be " non-RFC 1918
> " - it can be any private IP range from your company internal network (that
> will eventually route traffic to internet - IF you want your ACS to be able
> to download stuff/templates from Internet - otherwise it does NOT have to
> route to internet - if you are using private cloud and do NOT want external
> access to your ACS, well to SSVM and CPVM and VRs external ("public")
> interfaces/IPs - but if you are running a public cloud - then you want to
> provide a non-RFC 1918  i.e. a really Publicly routable IP addresses/range
> for the Public network - ACS will assign 1IP for SSVM, 1 IP for CPVM, and
> many IPs to your many VRs you create.
>
> A thing that I briefly touched somewhere upstairs ^^^ - for each traffic
> type you have defined - you need to define a traffic label - my deduction
> capabilities make me believe you are using KVM, so you need to set your KVM
> traffic label for all your network traffic (traffic label, in you case =
> exact name of the bridge as visible in Linux) - I recall there are some new
> UI issues when it comes to tags, so go to your <MGMT-IP>:8080/client/legacy
> - and check your traffic label there - and set it there, UI in 4.15.0.0
> doesn't allow you to update/set it after the zone is created - but old UI
> will allow you to do it.
>
> Not sure why I spent 30 minutes of my life, but there you go - hope you
> got everything from my email - let me know if anything is unclear!
>
> Cheers,
>
> On Wed, 16 Jun 2021 at 19:15, Joshua Schaeffer <[email protected]>
> wrote:
>
>> So Suresh's advise has pushed me in the right direction. The VM was up
>> but the agent state was down. I was able to connect to the VM in order to
>> continue investigating and the VM is having network issues connecting to
>> both my load balancer and my secondary storage server. I don't think I'm
>> understanding how the public network portion is supposed to work in my zone
>> and could use some clarification. First let me explain my network setup. On
>> my compute nodes, ideally, I want to use 3 NIC's:
>>
>> 1. A management NIC for management traffic. I was using cloudbr0 for
>> this. cloudbr0 is a bridge I created that is connected to an access port on
>> my switch. No vlan tagging is required to use this network (it uses VLAN 20)
>> 2. A cloud NIC for both public and guest traffic. I was using cloudbr1
>> for this. cloudbr1 is a bridge I created that is connected to a trunk port
>> on my switch. Public traffic uses VLAN 48 and guest traffic should use
>> VLANs 400 - 656. As the port is trunked I have to use vlan tagging for any
>> traffic over this NIC.
>> 3. A storage NIC for storage traffic. I use a bond called "bond-storage"
>> for this. bond-storage is connected to an access port on my switch. No vlan
>> tagging is required to use this network (it uses VLAN 96)
>>
>> For now I've removed the storage NIC from the configuration to simplify
>> my troubleshooting, so I should only be working with cloudbr0 and cloudbr1.
>> To me the public network is a *non-RFC 1918* address that should be
>> assigned to tenant VM's for external internet access. Why do system VM's
>> need/get a public IP address? Can't they access all the internal CloudStack
>> servers using the pod's management network?
>>
>> So the first problem I'm seeing is whenever I tell CloudStack to tag VLAN
>> 48 for public traffic it uses the underlying bond under cloudbr1 and not
>> the bridge. I don't know where it is even getting this name as I never
>> provided it to CloudStack
>>
>> Here is how I have it configured:
>> https://drive.google.com/file/d/10PxLdp6e46_GW7oPFJwB3sQQxnvwUhvH/view?usp=sharing
>>
>> Here is the message in the management logs:
>>
>> 2021-06-16 16:00:40,454 INFO  [c.c.v.VirtualMachineManagerImpl]
>> (Work-Job-Executor-13:ctx-0f39d8e2 job-4/job-68 ctx-a4f832c5)
>> (logid:eb82035c) Unable to start VM on Host[-2-Routing] due to Failed to
>> create vnet 48: Error: argument "bond-services.48" is wrong: "name" not a
>> valid ifnameCannot find device "bond-services.48"Failed to create vlan 48
>> on pif: bond-services.
>>
>> This ultimately results in an error and the system VM never even starts.
>>
>> If I remove the vlan tag from the configuration (
>> https://drive.google.com/file/d/11tF6YIHm9xDZvQkvphi1xvHCX_X9rDz1/view?usp=sharing)
>> then the VM starts and gets a public IP but without a tagged NIC it can't
>> actually connect to the network. This is from inside the system VM:
>>
>> root@s-9-VM:~# ip --brief addr
>> lo               UNKNOWN        127.0.0.1/8
>> eth0             UP             169.254.91.216/16
>> eth1             UP             10.2.21.72/22
>> eth2             UP             192.41.41.162/25
>> eth3             UP             10.2.99.15/22
>> root@s-9-VM:~# ping 192.41.41.129
>> PING 192.41.41.129 (192.41.41.129): 56 data bytes
>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
>> ^C--- 192.41.41.129 ping statistics ---
>> 5 packets transmitted, 0 packets received, 100% packet loss
>>
>> Obviously if the network isn't functioning then it can't connect to my
>> storage server and the agent never starts. How do I setup my public network
>> so that it tags the packets going over cloudbr1? Also, can I not have a
>> public IP address for system VM's or is this required?
>>
>> I have some other issues as well like the fact that it is creating a
>> storage NIC on the system VM's even though I deleted my storage network
>> from the zone, but I can tackle one problem at a time. If anyone is curious
>> or it helps visualize my network, here is is a little ASCII diagram of how
>> I have the compute node's networking setup. Hopefully it comes across the
>> mailing list correctly and not all mangled:
>>
>>
>> +===============================================================================================================
>> |
>> |    enp3s0f0 (eth)     enp3s0f1 (eth)     enp65s0f0 (eth)    enp65s0f1
>> (eth)    enp71s0 (eth)    enp72s0 (eth)
>> |       |                  |                   |
>> |                |                  |
>> |       |                  |
>> +--------+---------+                +--------+---------+
>> |       |                  |
>> |                                   |
>> |       |                  |                   bond-services
>> (bond)                         |
>> |       |                  |
>> |                                   |
>> |       |                  |
>> |                                   |
>> |       |                  |
>> |                                   |
>> |    cloudbr0 (bridge)    N/A                     cloudbr1
>> (bridge)                 bond-storage (bond)
>> |    VLAN 20 (access)                        VLAN 48, 400 - 656
>> (trunk)               VLAN 96 (access)
>>
>> On 6/16/21 9:38 AM, Andrija Panic wrote:
>> > " There is no secondary storage VM for downloading template to image
>> store
>> > LXC_SEC_STOR1 "
>> >
>> > So next step to investigate why there is no SSVM (can hosts access the
>> > secondary storage NFS, can they access the Primary Storage, etc - those
>> > tests you can do manually) - and as Suresh advised - one it's up, is it
>> all
>> > green (COnnected / Up state).
>> >
>> > Best,
>> >
>>
>> I appreciate everyone's help.
>>
>> --
>> Thanks,
>> Joshua Schaeffer
>>
>>
>
> --
>
> Andrija Panić
>


-- 

Andrija Panić

Reply via email to