GitHub user dstoy53 edited a discussion: disabling security groups cleanly

In 4.20.1 (edit: KVM), is there an established non-disruptive way to disable 
security groups in an advanced zone for a shared network? 

If there's one obvious answer please don't bother with the mess below, I got a 
little carried away. 

There's a few things that don't play nice with SGs (CAPI, floating IPs), so I'm 
exploring how to disable the feature in-place without rebuilding the network or 
zone. I can just rely on other security boundaries instead. 

Here's what I've tried in a lab:

1. disable security group provider - no helpful effect, causes failures later
2. restarting the agent while the provider is disabled - no effect
3. stopping and starting the vm - fails to start because the security group 
provider was disabled in step 1
4. update network's network offering via api - fails, only allowed to change 
the offering for isolated networks
5. enable security group provider, uncheck all SGs on VM - VM launches with 
default sg 

I also remember reading a solution to just empty out security_group.py which 
would work until an agent update. 

So that's the UX options I could imagine, and here's the blind DB surgery 
solution that worked:

1. Change the offering id in the `networks` table - this breaks the network 
rendering in the UI but then...
2. Delete the row in `ntwk_sevice_map` that maps the network <-> service - this 
fixed the UI rendering
3. Power on the VM - this succeeds, and no rules are applied (for either 
ebtables or iptables)
4. Disable SG provider, power off VM, power on - this succeeds
5. security_group.py continues to fire for power on/off and mostly complains 
about having nothing to do 
6. power off the VM, uncheck Default SG that was automatically applied during 
my earlier efforts (was probably causing the script to fire), this hides the 
Security Group section in the VM UI
7. power on the VM - no security group is applied, and security_group.py does 
not fire anymore
8. management logs show the console proxy fails to deploy because the Zone is 
SG enabled and none of the networks use an offering with SG - "Can not found 
security enabled network in SG Zone" 
9. In the `data_center` table, for the zone's row set is_security_group_enabled 
to 0. If I wanted to provide both SG and non-SG networks in the zone I probably 
wouldn't need this step. 
10. Now the only remaining problem is that my lab scenario is incomplete in an 
obvious way so I can ignore the error and call it done. The network gurus 
refuse to design because this is the second zone, not dedicated, and the Public 
network gets allocated to `[ ROOT ] system` instead of `System Pool` which is 
already handled in Zone 1. 

The only reason I kept playing with powering on/off is to simulate a real world 
live migration to an empty host since that's one of the triggers to apply the 
SG rules. 

Unless I completely missed the beaten path, I think the updateNetwork API 
should be allowed to change the network offering, and also update the network 
service map. Then it would be up to the user to remove the security groups from 
the VMs. Disabling the SG Provider should also clear the SG enabled flag on the 
zone so system VMs can get deployed. Of course that's just one service, maybe 
doing this breaks netscalers catastrophically for some unexpected reason. 

I'm also guessing the power off to modify security groups requirement is 
because that's an easy trigger to fire the script, so if I remove the relevant 
row in security_group_vm_map and live migrate to another host after disabling 
everything things will probably work correctly and avoid downtime. 

So if I butcher the DB and live migrate a VM to an empty host, everything 
should work out. 

GitHub link: https://github.com/apache/cloudstack/discussions/11864

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to