First, about me -- I've been administering Linux systems since 1995. No, that's 
not a typo -- that's 22 years. I've also worked for a firewall manufacturer in 
the past, I designed the layer 2 VLAN support for a firewall vendor, so I know 
VLAN's and such. I run a fairly complex production network with multiple 
VLAN's, multiple networks, etc. already, and speak fluent Cisco CLI. In short, 
I'm not an amateur at this networking stuff, but figuring out how Cloudstack 
wanted my CentOS 7 networking to be configured, and doing all the gymnastics to 
make it happen, consumed nearly a week because the documentation simply isn't 
up to date, thorough, or accurate, at least for Centos 7. 

So anyhow, my configuration:

Cloudstack 4.9.2.0 from the RPM repository at cloudstack.apt-get.eu

Centos 7 servers with:

2 10gbit Ethernet ports -> bond0 

A handful of VLANS:

100 -- from my top of rack switch is sent to my core backbone switch layer 3 
routed to my local network as 10.100.x.x and thru the NAT border firewall and 
router to the Internet. Management.
101 -- same but for 10.101.x.x  -- public.
102 -- same but for 10.102.x.x  -- guest public (see below).
192 -- A video surveillance camera network that is not routed to anywhere, via 
a drop from the core video surveillance POE switch to an access mode port on my 
top of rack switch. Not routed.
200 -- 10 gig drop over to my production racks to my storage network there for 
accessing legacy storage. Not routed. (Legacy storage is not used for 
Cloudstack instance or secondary storage but can be accessed by virtual 
machines being migrated to this rack).
1000-2000 -- VLAN's that exist in my top of rack switch on the Cloudstack rack 
and assigned to my trunk ports to the cloud servers but routed nowhere else, 
for VPC's and such. 

Stuck with VLAN's rather than one of the SDN modules like VXNET because a) it's 
the oldest and most likely to be stable, b) compatible with my already-existing 
network hardware and networks (wouldn't have to somehow map a VLAN to a SDN 
virtual network to reach 192 or 200 or create a public 102), and c) least 
complex to set up and configure given my existing top-of-rack switch that does 
VLANs just fine.

Okay, here's how I had to configure Centos 7 to make it work: 

enp4s[01] -> bond0 -> bond0.100 -> br100  -- had to create two interface files, 
add them to bond0 bridge, then create a bond0.100 vlan interface, then a br100 
bridge,  for my management network. In
/etc/sysconfig-network-scripts: 

# ls ifcfg-*
ifcfg-bond0 ifcfg-bond0.100 ifcfg-br100 ifcfg-enp4s0 ifcfg-enp4s1

(where 4s0 and 4s1 are my 10 gigabit Ethernets).

Don't create anything else. You'll just confuse Cloudstack. Any other 
configuration of the network simply fails to work. In particular, creating 
br101 etc. fails because CloudStack wants to create its own VLANs and  bridges 
and if you traffic label it as br101 it'll try making vlan br101.101 (doesn't 
work, duh). Yes, I know this contradicts every single piece of advice I've seen 
on this list. All I know is that this is what works, while every other piece of 
advice I've seen for labeling the public and private guest networking fails. 

When creating the networks in the GUI under Advanced networking, set bond0 as 
your physical network and br100 as the KVM traffic label for the Management 
network and Storage network and give them addresses with VLAN 100 (assuming 
you're using the same network for both management and storage networks, which 
is what makes sense with my single 10gbit pipe), but do *not* set up anything 
as a traffic label for Guest or Public networks. You will confuse the agent 
greatly. Let it use the default labels. It'll work. It'll set up its on 
bond0.<tag> VLAN interface and brbond0-<tag> as needed. This violates every 
other piece of advice I've seen for labeling, but this is what actually works 
with this version of Cloudstack and this version of Centos when you're sending 
everything through a VLAN-tagged bond0.

A very important configuration option *not* documented in the installation 
documents:

secstorage.allowed.internal.sites=10.100.0.0/16

(for my particular network). 

Otherwise I couldn't upload ISO files to the server from my nginx server that's 
pointing at the NFS directory full of ISO files.

---

Very important guest VM image prep *NOT* in the docs:

Be sure to install / enable / run acpid on Linux guests, otherwise "clean" 
shutdowns can't happen. Turns out Cloudstack on KVM uses the ACPI shutdown 
functionality of qemu-kvm. Probably does that on other hypervisors too.

---

Now on for that mysterious VLAN 102:

I created a "public" shared network on the 102 vlan for stuff I don't care is 
out in the open. This is a QA lab environment, not a public cloud. So I 
assigned a subnet and a VLAN, ran a VLAN drop over to my main backbone layer 3 
switch (and bopped up to my border firewall and told it about the new subnet 
too so that we could get out to the Internet as needed), and let it go public. 
Gotta be a reason why we paid Cisco big bucks for all that hardware, right?

Plus it's very convenient to delegate a subdomain to the virtual router for 
that subnet, and have people able to access their instances as 
"my-instance.cloud.mycompany.com" where "my-instance" is the name of their 
instance in the GUI. It's not documented anywhere that I can find that you can 
do this (delegate a subdomain to the virtual router for a guest subnet). But it 
works, and it's very convenient for my QA people. 

I've played with the VPC stuff. It looks quite powerful. If I were doing a 
customer-facing cloud, that's how I'd do it. It's just not what our engineers 
need for testing our software.

---

Final thoughts:

1) The GUI is definitely in need of help. Maybe I'm just too accustomed to 
modern responsive RESTful UI's, but this GUI is the opposite of responsive in 
most locations. You do something, and the display never updates with the 
changes. Because it's not RESTful, you can't just hit the refresh button either 
-- that'll take you all the way back to the login screen. 
2) The documentation clearly is in need of help. If I, someone with 22 years of 
experience with Linux and advanced networking and an already-existing complex 
network of multiple VLAN's with multiple virtualization offerings and who 
already had a top-of-rack switch configured and VLANs and subnets to core 
backbone switch and Internet boundary router configured as well as working 
networking with NFS etc already configured on the CentOS 7 servers take a week 
of trial-and-error to actually get a working installation when it turns out to 
be ridiculously simple once you know the tricks, clearly the tricks need to be 
documented. It appears that most of the documentation is oriented around 
XenServer, and there's nothing specific to CentOS 7 either, though the CentOS 6 
documents are *almost* correct for CentOS 7.
3) Failures were mysterious. Error messages said '[Null] failed' way too often. 
'[Null]' what?! So then I had to examine the system itself via journalctl / ip 
addr / etc. to see what clues it may have left behind such as attempts to 
configure network ports, check agent logs, etc. to make guesses as to what may 
have gone wrong. A simple "Could not create network bridge for public network 
because the NIC is in use by another bridge" would have saved hours worth of 
time all by itself. 

That said, I looked at OpenStack -- a mess of incompatible technologies 
stitched together with hacks -- and waved off as something that was overkill 
for anything smaller than a Fortune 500 company or Rackspace.com that has the 
budget to have a team of consultants come in and hack it to their needs. 
Eucalyptus isn't flexible enough to do what I need to do with networks, we have 
a surveillance network with around 100 cameras that feeds data to the QA / R&D 
infrastructure, I could find no way in Eucalyptus to give that network to the 
virtual machines I wanted to have it. OpenNebula ate a friend's cloud multiple 
times.  Not going to talk about oVirt. Nope, won't. And CloudStack does 
everything I need it to do.

That said, my needs are almost fulfilled by vSphere / vCenter. It's quite clear 
why VMware still continues to exist despite the limitations of their solution. 
There is something to be said for bullet-proof and easy to install and manage.  
It's clunky and limited but bullet-proof. As in, the only time my ESXi servers 
have ever gone done, *ever*, is for power failures. As in, run for years at a 
time without any attention at all. And it didn't take much time to install and 
configure either, certainly none of the trial and error involved with 
Cloudstack. That's hard to beat... but the hardware requirements are exacting 
and would have required me to invest more in hardware than I did here, the 
software licenses are expensive too, and I just couldn't justify that for a QA 
playground.

So consider me slightly annoyed but appreciative. It appears Cloudstack is 
going to solve my needs here. We'll see.


Reply via email to