[Bug 1737428] Re: VRF support to solve routing problems associated with multi-homing

Dmitrii Shcherbakov Thu, 21 Dec 2017 07:31:03 -0800

Andres,

I'm not going to be at the sprint but the problems described need a
proper solution in MAAS and Juju at least from the end host perspective.
Similar to how VLANs are supported natively in MAAS & Juju, L3
virtualization technologies like VRF should be as well. I hope the
information I will give here will be enough to understand the use-cases
and past experience in this field.


The concept is very similar to VLANs but for L3 which is probably less
familiar and spans many hosts and routers/L3 switches within a single
organization instead of being tied to a given switch fabric and either
the same process or a group of processes on a host need to (1) receive &
respond and (2) send data using different L3 topologies. Instead of
virtual broadcast domains you get virtual paths because of per-
virtual-L3 routing topologies. Good L2 analogies are Multiple Spanning
Tree Protocol (MSTP) or PVST+ that were created to avoid blocking of
switchports depending on logical L2 topologies related to a VLAN or
group of VLANs (this is hidden on L2 though - no end host modifications
required).

The use-cases I am talking about are not new - they were not used as
much in data center networks until a certain point. They were used in
service provider networks for multi-site L3 VPN for many years
(https://tools.ietf.org/html/rfc4364). There are still many deployments
which rely on large L2 domains where those problems do not occur as much
because routing is done trivially via using directly connected routes
and ARP broadcasts (there is never a hop between a source and
destination host in most cases).

I may be wrong but it seems to me that Network Spaces were originally
designed with multi-homing in mind but with limited support for multi-L2
and routing in mind (I don't judge, VRFs are fairly new to the Linux
kernel). They are not that far from supporting that though because of
the recent upstream kernel work.

With leaf-spine you are building a complex L3 network with different
virtual topologies for different purposes and different SLAs for various
kinds of traffic (IOW, a multi-tenant network). This is a typical
service provider scenario with different customers on a shared
infrastructure. You need to build many parallel dedicated communication
lines but since infrastructure is shared it is not possible physically,
however, you still need to do load-sharing across links, use distinct
paths for different kinds of traffic and other optimizations to make
sure your physical links are utilized and clients get certain quality of
service and are separated from each other. In this case L3 VPNs are
built not for clients (companies "x" and "y") but for different
purposes: general purpose data, storage access or replication,
management, public API traffic (originally, this was done for
voice/video/data, see the first two paragraphs in the "background"
section https://www.google.ch/patents/US8457117).

I can describe this in many ways, i.e. we need:

* multi-point L3VPN between racks to simulate L3 virtual circuits/pseudowires 
for different types of traffic;
* virtual routing domains (VRFs);
* traffic and routing separation for multi-L2 segment networks;
* L3 network multi-tenancy.

This is definitely not new, the service provider concepts may be less
familiar though:

1) Static routes + VLSM - DIY routing - doesn't scale and difficult to manage 
when a deployment grows beyond the original VLSM design;
2) VRF-lite (VRF without MPLS) - separate address spaces and routing tables for 
different traffic on routers and, potentially, hosts, interface-based selection 
of a VRF on a given network device;
3) MPLS - this is like VXLAN for virtual L3 networks. In a service provider 
network two MPLS labels are used: one for VRF identification and another one 
for next-hop router identification (in a data center network think of an 
internal or public API label, storage access label, storage replication label 
etc.).

This has been used for years to separate out traffic of different
customers or, for example, general purpose data, voice and video for a
single customer. Containers do not solve this problem with a separate
network namespace because the same process or a group of processes need
to use a different routing table "per-purpose".

What I am asking for is not that difficult because we are only concerned
with end hosts (unless MAAS resides on a ToR or a leaf and we control
the switch OS). I need building blocks to use either VRF-lite or full
VRFs with MPLS in a sane way while keeping routing complexity (BGP, MPLS
etc.) in a data center provider network managed by other people.

Terminology-wise, I think changes are needed as well:
https://github.com/CanonicalLtd/maas-docs/issues/737 - Routing Domain,
L3VPN or VRF are common names for what we refer to as a Network Space,
and what is actually a virtual L3 network with its own complete address
space, routing table copies and dedicated host/router physical or
logical interfaces.

Examples:

* https://routingnull0.com/2015/12/14/mpls-l3vpns-part-2/ case 4 here maps MPLS 
& L3VPN concepts to leaf-spine
* http://packetlife.net/blog/2014/apr/15/deploying-datacenter-mpls-vpn-junos/ - 
leaf-spine + MPLS

Analogies (not related to computer networking):
https://paste.ubuntu.com/26227512/

** Bug watch added: github.com/CanonicalLtd/maas-docs/issues #737
   https://github.com/CanonicalLtd/maas-docs/issues/737

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1737428

Title:
  VRF support to solve routing problems associated with multi-homing

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju/+bug/1737428/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1737428] Re: VRF support to solve routing problems associated with multi-homing

Reply via email to