Re: Problems after management server reboot & workaround

Chip Childers Wed, 08 May 2013 12:16:30 -0700

On Wed, May 08, 2013 at 09:11:43PM +0200, Jori Liesenborgs wrote:
> 
> Hi everyone,
> 
> On our cloudstack setup (4.0.2), I noticed that after a reboot of
> the management server, I was no longer able to start new instances.
> A secondary problem was that the management-server.log file filled
> up extremely fast (gigabytes in a few hours), with messages like
> these:
> 
> 2013-05-08 05:26:10,627 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-4:null) Seq 7-1033568320: Forwarding Seq
> 7-1033568320:  { Cmd , MgmtId: 38424150221294, via: 7, Ver: v1,
> Flags: 100111,
> [{"StopCommand":{"isProxy":false,"vmName":"i-2-6-VM","wait":0}}] }
> to 130450099353672
> 
> This turned out to contain an important clue: when looking at the
> 'mshost' table in the 'cloud' database, instead of seeing one entry
> for the management server ID, there now were two:
> 
> | id | msid            | runid         | name          | ...
> |  1 | 130450099353672 | 1367919381740 | cloud-manager | ...
> |  2 |  38424150221294 | 1367950608087 | cloud-manager | ...
> 
> And these two IDs were those that were mentioned in the logfile. In
> fact, every reboot a new entry in the 'mshost' table appeared, and
> that new ID was being inserted into the 'host' entries, for system
> VMs 'v-2-VM' and 's-1-VM'.
> 
> Browsing through the code, it appears that in the
> ManagementServerNode.java file, the function getManagementServerId()
> returns a static value created by the MacAddress class. Now, on a
> Linux platform (we are using ubuntu), this address is obtained from
> the first entry that the command "/sbin/ifconfig -a" shows as
> output. And this turned out to be the address of the cloud0 bridge
> interface, which changed after a reboot (or after deleting the
> bridge using brctl and restarting the entire cloudstack).
> 
> To avoid having to modify and recompile cloudstack, I created a fake
> ifconfig: a simple python process that most of the time just runs
> the real ifconfig (which I renamed to ifconfig-bin), but when called
> as "/sbin/ifconfig -a", it rearranges the output so that eth0 is
> shown first (and not cloud0). This way, the management server id is
> basically the MAC address of eth0, which stays the same after a
> reboot.
> 
> I haven't had the time to create a long running test yet (I only
> figured it out this afternoon), but after several reboots, the
> management server id now stays the same, and I am still able to
> start new instances.
> 
> Hope someone finds this useful.
> 
> Cheers,
> Jori
> 
>


Jori,

This is really interesting.  Would you mind opening a bug about it with
your findings?  And if you're interested in submitting a patch, we'd
love that too!

-chip

Re: Problems after management server reboot & workaround

Reply via email to