Re: BGP EVPN with CloudStack

Wido den Hollander Wed, 22 May 2024 01:56:48 -0700



Op 21/05/2024 om 04:26 schreef Hanis Irfan:

Hi Wido,

I'm currently running Rocky Linux 9 for the HV.


That should be fine

Why are you setting anything on cloudbr0? There is no need to create cloudbr0 
with VXLAN.


cloudbr0 is just a naming choice on my end. Is it okay for me to use something 
like NetworkManager to create the bridge?


You don't need this bridge at all with VXLAN.


So, no need to create a VXLAN interface the as the bridge slave? Then how would 
the bridge communicate through the VXLAN tunnels?

No, you just need to have a /32 (v4) and /128 (v6) attached to 'lo' andannounce it. The v4 address will be the VTEP source.

Make sure all HVs can ping eachother on their loopback address. Makesure this underlay works!


root@hv-138-a13-37:~# ip addr show dev lo

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWNgroup default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 10.255.255.108/32 brd 10.255.255.108 scope global lo
       valid_lft forever preferred_lft forever
    inet6 2a05:xxx:xxx:2::108/128 scope global
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
root@hv-138-a13-37:~#

You should be able to ping with -s 8192, this verifies that jumboframeswork.


Assuming we're using VNI 10027.

lo (VTEP) <---- vxlan10027 (slave of bridge) <---- cloudbr0/cloudbr1


No, not exactly.

vxlan10027 --> brvx-10027 -> vnet89

Both the VXLAN device and Bridge are created on the fly by the'modifyvxlan.sh' script (use my modified version! [0]) when needed.

Shouldn't it be the same for the guest VNIs later on? E.g:

lo <---- vxlan1000 <---- vxlan1000br <---- vnet1 <---- Guest VM
                        ^
                        |---------------------vnet2 <---- Guest VM


Yes, so that's about how it works. Quick snippet from a hypervisor:

root@hv-138-a13-37:~# bridge link show

9: vxlan505: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master brvx-505state forwarding priority 32 cost 10022: vxlan519: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master brvx-519state forwarding priority 32 cost 10024: vnet9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master brvx-519state forwarding priority 32 cost 100

root@hv-138-a13-37:~#

Hope this helps!

Wido

[0]: https://gist.github.com/wido/51cb9880d86f08f73766634d7f6df3f4

This /20 IPv4 is used for everything within CloudStack's communcation.

When talking about subnet size, the /20 is used for a couple of pods (row of 
racks) and not just a single pod, correct?
How do you calculate to allocate the range for system VMs in a pod? Considering 
maximum 16 hosts per cluster.
If let's say in the future we ran out of the /20 space (more than 4k host in a 
zone), we can connect new /20 subnets via a router, right?

In *agent.properties* we have only set 'private.network.device=cloudbr1'

I actually got an issue when trying to add a new KVM host before. I got the error 
"resource not found". I was able to add the host if we restart the host OS 
while adding the host in the web UI.
I don't really configure anything in the *agent.properties* file before adding 
the host via the web UI. Did I did it wrong? Is there any deployment procedure 
that you can share with me?

You first need to make sure that the HV can ping all the other loopback 
addresses of the Leaf and Spine switches and all HVs can connect to eachother 
via their loopback addresses.

I can assure you that the spine switches, leaf switches and HV can all ping 
each other via their loopback addresses. Just a note, all the L3 routing 
between the switches and the HV is configured BGP unnumbered and eBGP.

Thank you and sorry if there are unrelated questions in the thread.

Best Regards,

Hanis Irfan

-----Original Message-----
From: Wido den Hollander <[email protected]>
Sent: Tuesday, 21 May, 2024 04:06
To: Hanis Irfan <[email protected]>; [email protected]
Subject: Re: BGP EVPN with CloudStack

Hi Hanis,

See my reply inline.

Op 17/05/2024 om 12:38 schreef Hanis Irfan:

I think this is more about BGP EVPN than CloudStack but would
appreciate anyone that could help. So basically, I’ve tried the
Advanced Networking with VLAN isolation for my POC and now want to migrate to 
VXLAN.

I would say that I’ve little to no knowledge about VXLAN in particular
and BGP EVPN. Why don’t I use multicast instead is because our spine
leaf switches have been already configured with basic EVPN (though not
tested yet).

We have 2 spine switches (Mellanox SN2700) and 2 leaf switches
(Mellanox
SN2410) running BGP unnumbered for underlay between them. Underlay
between the hypervisor that is running FRR and the leaf switches also
configured with BGP unnumbered.

All the switches and hypervisor are assigned with 4-byte private ASN
individually. Here is the FRR config on the hypervisor, all basic config:

```

ip forwarding

ipv6 forwarding

interface ens3f0np0

      no ipv6 nd suppress-ra

exit

interface ens3f1np1

      no ipv6 nd suppress-ra

exit

router bgp 4200100005

      bgp router-id 10.XXX.118.1

      no bgp ebgp-requires-policy

      neighbor uplink peer-group

      neighbor uplink remote-as external

      neighbor ens3f0np0 interface peer-group uplink

      neighbor ens3f1np1 interface peer-group uplink

      address-family ipv4 unicast

          network 10.XXX.118.1/32

      exit-address-family

      address-family ipv6 unicast

          network 2407:XXXX:0:1::1/128

          neighbor uplink activate

          neighbor uplink soft-reconfiguration inbound

      exit-address-family

      address-family l2vpn evpn

          neighbor uplink activate

          neighbor uplink attribute-unchanged next-hop

          advertise-all-vni

          advertise-svi-ip

      exit-address-family

```

Now I want to configure cloudbr0 bridge as the management interface
for ACS. I’ve done it like so:

```

nmcli connection add type bridge con-name cloudbr0 ifname cloudbr0 \

ipv4.method manual ipv4.addresses 10.XXX.113.11/24 ipv4.gateway
10.XXX.113.1 \

ipv4.dns 1.1.1.1,8.8.8.8 \

ipv6.method manual ipv6.addresses 2407:XXXX:200:c002::11/64
ipv6.gateway
2407:XXXX:200:c002::1 \

ipv6.dns 2606:4700:4700::1111,2001:4860:4860::8888 \

bridge.stp no ethernet.mtu 9216

nmcli connection add type vxlan slave-type bridge con-name vxlan10027
ifname vxlan10027 \

id 10027 destination-port 4789 local 2407:XXXX:0:1::1 vxlan.learning
no \

master cloudbr0 ethernet.mtu 9216 dev lo

nmcli connection up cloudbr0

nmcli connection up vxlan10027

```


Why are you setting anything on cloudbr0? There is no need to create
cloudbr0 with VXLAN.

We only have created cloudbr1 (using systemd-networkd) for the POD 
communcation, but that's all:

*cloudbr1.network*
[Match]
Name=cloudbr1

[Network]
LinkLocalAddressing=no

[Address]
Address=10.100.2.108/20

[Route]
Gateway=10.100.1.1

[Link]
MTUBytes=1500

*cloudbr1.netdev*
[NetDev]
Name=cloudbr1
Kind=bridge


This /20 IPv4 is used for everything within CloudStack's communcation.

In *agent.properties* we have only set 'private.network.device=cloudbr1'

I can see that the EVPN route with MAC address of cloudbr0 can be seen
on both leaf switches. However, I can’t ping from the hypervisor to
its gateway (.1) which is a firewall running somewhere that’s
connected to a switchport tagged with VLAN 27.


You first need to make sure that the HV can ping all the other loopback 
addresses of the Leaf and Spine switches and all HVs can connect to eachother 
via their loopback addresses.

Can you check that? That's not EVPN nor VXLAN, just /32 (IPv4) routing with BGP.

Wido

Re: BGP EVPN with CloudStack

Reply via email to