You're most welcome!

(and apologies about the naming convention jokes - I also would name things
in a meaningful way instead of bond0/1 etc - the same way I'm switching
back from those "predictable interface names "ensp0p1" and similar to
old-fashioned eth0, eth1 etc - not sure what kind of drugs did the
engineers take when they came with those "predictable" interface names...)

Cheers,

On Fri, 18 Jun 2021 at 07:16, <[email protected]> wrote:

> Andrija,
>
> Thanks so much for all the details. I'm out of the office for the next
> couple of days so will update my cloud with your suggestions when I get
> back.
>
> As far as the "fancy" naming, I just never found names like bondX useful
> when Linux allows naming the network device something else. It has just
> become a convention of mine. I can easily distinguish which bond carries
> cloud traffic and which carries storage traffic by looking at the bond
> name, but it is just a personal thing and can easily switch back to
> using the standard bond names.
>
> I was aware of the traffic labels but forgot to mention that I had set
> those up in my previous email. There were still some details that you
> provided that helped me further understand how they work though, thanks.
>
> Again, thanks for you help.
>
> On 2021-06-17 22:04, Andrija Panic wrote:
> > BTW, once you thing you have fixed all your network configuration
> > issues -
> > destroy all system VM (CPVM, SSVM and restart all networks with
> > "cleanup" -
> > so that new VMs are created_
> > Inside SSVM, run the the following script, which should give you
> > results
> > similar as below - confirming that your SSVM is healthy
> >
> >
> >
> >   root@s-2536-VM:/usr/local/cloud/systemvm#
> > /usr/local/cloud/systemvm/ssvm-
> > check.sh
> > ================================================
> > First DNS server is  192.168.169.254
> > PING 192.168.169.254 (192.168.169.254): 56 data bytes
> > 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
> > 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
> > --- 192.168.169.254 ping statistics ---
> > 2 packets transmitted, 2 packets received, 0% packet loss
> > round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
> > Good: Can ping DNS server
> > ================================================
> > Good: DNS resolves cloudstack.apache.org
> > ================================================
> > nfs is currently mounted
> > Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
> > Good: Can write to mount point
> > ================================================
> > Management server is 192.168.169.13. Checking connectivity.
> > Good: Can connect to management server 192.168.169.13 port 8250
> > ================================================
> > Good: Java process is running
> > ================================================
> > Tests Complete. Look for ERROR or WARNING above.
> >
> > On Thu, 17 Jun 2021 at 23:55, Andrija Panic <[email protected]>
> > wrote:
> >
> >> Since you really bothered to provide so very detailed inputs and help
> >> us
> >> help you (vs what some other people tend to do) -  I think you really
> >> deserved a decent answer (and some explanation).
> >>
> >> The last question first -even though you don't specify/have dedicated
> >> Storage traffic, there will be an additional interface inside the SSVM
> >> connected to the same Management network (not to the old Storage
> >> network -
> >> if you see the old storage network, restart your mgmt server and
> >> destroy
> >> the SSVM - a new one should be created, with proper interfaces inside
> >> it)
> >>
> >> bond naming issues:
> >> - rename  your "bond-services" to something industry-standard like
> >> "bond0"
> >> or similar - cloudstack extracts "child" interfaces from cloudbr1 IF
> >> you
> >> specify a VLAN for a network that ACS should create - so your
> >> "bond-services", while fancy (and unclear to me WHY you named it in
> >> that
> >> weird way - smiley here) - is NOT something CloudStack will recognize
> >> and
> >> this is the reason it fails (it even says so in that error message)
> >> - no reason to NOT have that dedicated storage network -  feel free to
> >> bring it back - the same issue you have as for the public traffic -
> >> rename
> >> "bond-storage" to e.g. "bond1" and you will be good to go -  since you
> >> are
> >> NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2
> >> (or
> >> whatever bridge name you use for it).
> >>
> >> Now some explanation (even though your deduction capabilities
> >> certainly
> >> made you draw some conclusions from what I wrote above ^^^)
> >>
> >> - When you specify a VLAN id for some network in CLoudStack -
> >> CloudStack
> >> will look for the device name that is specified as the "Traffic label"
> >> for
> >> that traffic (and you have none??? for your Public traffic - while it
> >> should be set to the name of the bridge device "cloudbr1") - and then
> >> it
> >> will provision a VLAN interface and create a new bridge - (i.e. for
> >> Public
> >> network with VLAN id 48, it will extract "bond0" from the "cloudbr1",
> >> and
> >> create bond0.48 VLAN interface - AND it will create a brand new bridge
> >> with
> >> this bond0.48 interface (bridge with funny name), and plug Public
> >> vNICs
> >> into this new bridge....
> >> - When you do NOT specify a VLAN id for some network in CloudStack
> >> (i.e.
> >> your storage network doesn't use VLAN ID in CloudStack, your switch
> >> ports
> >> are in access vlan 96) - you need to have a bridge (i.e. cloudbr2)
> >> with the
> >> bondYYY child interface (instead of that "bond-storage" fancy but
> >> unrecognized child interface name) - and then ACS will NOT extract
> >> child
> >> interface (nor do everything I explained in the previous
> >> paraghraph/bullet
> >> point) - it will just bluntly "stick" all the vNICs into that cloudbr2
> >> -
> >> and hope you have a proper physical/child interface also added to the
> >> cloudbr2 that will carry the traffic down the line... (purely FYI -
> >> you
> >> could also e.g. use trunking on Linux if you want to, and have e.g.
> >> "bondXXX.96" VLAN interface manually configured and add it to the
> >> bridge,
> >> while still NOT defining any VLAN in the CloudStack for that Storage
> >> network - and ACS will just stick vNIC to this bridge)
> >>
> >> Public traffic/network - is the network that all systemVMs (SSVM, CPVM
> >> and
> >> all VRs) are connected to - this network is "public" like "external"
> >> to
> >> other CloudStack internal or Guest network - this is the network to
> >> which
> >> the "north" interface is connected - but does NOT have to be " non-RFC
> >> 1918
> >> " - it can be any private IP range from your company internal network
> >> (that
> >> will eventually route traffic to internet - IF you want your ACS to be
> >> able
> >> to download stuff/templates from Internet - otherwise it does NOT have
> >> to
> >> route to internet - if you are using private cloud and do NOT want
> >> external
> >> access to your ACS, well to SSVM and CPVM and VRs external ("public")
> >> interfaces/IPs - but if you are running a public cloud - then you want
> >> to
> >> provide a non-RFC 1918  i.e. a really Publicly routable IP
> >> addresses/range
> >> for the Public network - ACS will assign 1IP for SSVM, 1 IP for CPVM,
> >> and
> >> many IPs to your many VRs you create.
> >>
> >> A thing that I briefly touched somewhere upstairs ^^^ - for each
> >> traffic
> >> type you have defined - you need to define a traffic label - my
> >> deduction
> >> capabilities make me believe you are using KVM, so you need to set
> >> your KVM
> >> traffic label for all your network traffic (traffic label, in you case
> >> =
> >> exact name of the bridge as visible in Linux) - I recall there are
> >> some new
> >> UI issues when it comes to tags, so go to your
> >> <MGMT-IP>:8080/client/legacy
> >> - and check your traffic label there - and set it there, UI in
> >> 4.15.0.0
> >> doesn't allow you to update/set it after the zone is created - but old
> >> UI
> >> will allow you to do it.
> >>
> >> Not sure why I spent 30 minutes of my life, but there you go - hope
> >> you
> >> got everything from my email - let me know if anything is unclear!
> >>
> >> Cheers,
> >>
> >> On Wed, 16 Jun 2021 at 19:15, Joshua Schaeffer
> >> <[email protected]>
> >> wrote:
> >>
> >>> So Suresh's advise has pushed me in the right direction. The VM was
> >>> up
> >>> but the agent state was down. I was able to connect to the VM in
> >>> order to
> >>> continue investigating and the VM is having network issues connecting
> >>> to
> >>> both my load balancer and my secondary storage server. I don't think
> >>> I'm
> >>> understanding how the public network portion is supposed to work in
> >>> my zone
> >>> and could use some clarification. First let me explain my network
> >>> setup. On
> >>> my compute nodes, ideally, I want to use 3 NIC's:
> >>>
> >>> 1. A management NIC for management traffic. I was using cloudbr0 for
> >>> this. cloudbr0 is a bridge I created that is connected to an access
> >>> port on
> >>> my switch. No vlan tagging is required to use this network (it uses
> >>> VLAN 20)
> >>> 2. A cloud NIC for both public and guest traffic. I was using
> >>> cloudbr1
> >>> for this. cloudbr1 is a bridge I created that is connected to a trunk
> >>> port
> >>> on my switch. Public traffic uses VLAN 48 and guest traffic should
> >>> use
> >>> VLANs 400 - 656. As the port is trunked I have to use vlan tagging
> >>> for any
> >>> traffic over this NIC.
> >>> 3. A storage NIC for storage traffic. I use a bond called
> >>> "bond-storage"
> >>> for this. bond-storage is connected to an access port on my switch.
> >>> No vlan
> >>> tagging is required to use this network (it uses VLAN 96)
> >>>
> >>> For now I've removed the storage NIC from the configuration to
> >>> simplify
> >>> my troubleshooting, so I should only be working with cloudbr0 and
> >>> cloudbr1.
> >>> To me the public network is a *non-RFC 1918* address that should be
> >>> assigned to tenant VM's for external internet access. Why do system
> >>> VM's
> >>> need/get a public IP address? Can't they access all the internal
> >>> CloudStack
> >>> servers using the pod's management network?
> >>>
> >>> So the first problem I'm seeing is whenever I tell CloudStack to tag
> >>> VLAN
> >>> 48 for public traffic it uses the underlying bond under cloudbr1 and
> >>> not
> >>> the bridge. I don't know where it is even getting this name as I
> >>> never
> >>> provided it to CloudStack
> >>>
> >>> Here is how I have it configured:
> >>>
> https://drive.google.com/file/d/10PxLdp6e46_GW7oPFJwB3sQQxnvwUhvH/view?usp=sharing
> >>>
> >>> Here is the message in the management logs:
> >>>
> >>> 2021-06-16 16:00:40,454 INFO  [c.c.v.VirtualMachineManagerImpl]
> >>> (Work-Job-Executor-13:ctx-0f39d8e2 job-4/job-68 ctx-a4f832c5)
> >>> (logid:eb82035c) Unable to start VM on Host[-2-Routing] due to Failed
> >>> to
> >>> create vnet 48: Error: argument "bond-services.48" is wrong: "name"
> >>> not a
> >>> valid ifnameCannot find device "bond-services.48"Failed to create
> >>> vlan 48
> >>> on pif: bond-services.
> >>>
> >>> This ultimately results in an error and the system VM never even
> >>> starts.
> >>>
> >>> If I remove the vlan tag from the configuration (
> >>>
> https://drive.google.com/file/d/11tF6YIHm9xDZvQkvphi1xvHCX_X9rDz1/view?usp=sharing
> )
> >>> then the VM starts and gets a public IP but without a tagged NIC it
> >>> can't
> >>> actually connect to the network. This is from inside the system VM:
> >>>
> >>> root@s-9-VM:~# ip --brief addr
> >>> lo               UNKNOWN        127.0.0.1/8
> >>> eth0             UP             169.254.91.216/16
> >>> eth1             UP             10.2.21.72/22
> >>> eth2             UP             192.41.41.162/25
> >>> eth3             UP             10.2.99.15/22
> >>> root@s-9-VM:~# ping 192.41.41.129
> >>> PING 192.41.41.129 (192.41.41.129): 56 data bytes
> >>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
> >>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
> >>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
> >>> 92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
> >>> ^C--- 192.41.41.129 ping statistics ---
> >>> 5 packets transmitted, 0 packets received, 100% packet loss
> >>>
> >>> Obviously if the network isn't functioning then it can't connect to
> >>> my
> >>> storage server and the agent never starts. How do I setup my public
> >>> network
> >>> so that it tags the packets going over cloudbr1? Also, can I not have
> >>> a
> >>> public IP address for system VM's or is this required?
> >>>
> >>> I have some other issues as well like the fact that it is creating a
> >>> storage NIC on the system VM's even though I deleted my storage
> >>> network
> >>> from the zone, but I can tackle one problem at a time. If anyone is
> >>> curious
> >>> or it helps visualize my network, here is is a little ASCII diagram
> >>> of how
> >>> I have the compute node's networking setup. Hopefully it comes across
> >>> the
> >>> mailing list correctly and not all mangled:
> >>>
> >>>
> >>>
> +===============================================================================================================
> >>> |
> >>> |    enp3s0f0 (eth)     enp3s0f1 (eth)     enp65s0f0 (eth)
> >>> enp65s0f1
> >>> (eth)    enp71s0 (eth)    enp72s0 (eth)
> >>> |       |                  |                   |
> >>> |                |                  |
> >>> |       |                  |
> >>> +--------+---------+                +--------+---------+
> >>> |       |                  |
> >>> |                                   |
> >>> |       |                  |                   bond-services
> >>> (bond)                         |
> >>> |       |                  |
> >>> |                                   |
> >>> |       |                  |
> >>> |                                   |
> >>> |       |                  |
> >>> |                                   |
> >>> |    cloudbr0 (bridge)    N/A                     cloudbr1
> >>> (bridge)                 bond-storage (bond)
> >>> |    VLAN 20 (access)                        VLAN 48, 400 - 656
> >>> (trunk)               VLAN 96 (access)
> >>>
> >>> On 6/16/21 9:38 AM, Andrija Panic wrote:
> >>> > " There is no secondary storage VM for downloading template to image
> >>> store
> >>> > LXC_SEC_STOR1 "
> >>> >
> >>> > So next step to investigate why there is no SSVM (can hosts access
> the
> >>> > secondary storage NFS, can they access the Primary Storage, etc -
> those
> >>> > tests you can do manually) - and as Suresh advised - one it's up, is
> it
> >>> all
> >>> > green (COnnected / Up state).
> >>> >
> >>> > Best,
> >>> >
> >>>
> >>> I appreciate everyone's help.
> >>>
> >>> --
> >>> Thanks,
> >>> Joshua Schaeffer
> >>>
> >>>
> >>
> >> --
> >>
> >> Andrija Panić
> >>
>


-- 

Andrija Panić

Reply via email to