Then this is slight different than the case with management-server-and-KVM-in-one-box issue that I'm aware of.
When a management server is restarted with a new ID, it appears to the cluster as a new management server instance, we have some code logic to handle that. From what you described, the logic to make it appear fully as a new management node may be broken in this case. But anyway, to have a stable management server ID acquisition is much preferred and needed. Kelven On 5/8/13 1:04 PM, "Jori Liesenborgs" <jori.liesenbo...@gmail.com> wrote: >Hi Kelven, > >We have never used KVM in our cloud setup and the management server is a >separate machine, not a VM. I'm not sure what the code logic is supposed >to do, but in our case the problem did prevent the management server >from functioning: no new instances could be started. > >Cheers, >Jori > >> This is a known issue when you are running management server together >>with >> a KVM host. After KVM host is added to the running management server, it >> creates a bridge that can cause management server ID to be changed after >> reboot, but only for once. >> >> A similar issue can happen when you run management server in a VM and >> later on clone the VM. >> >> We have code logic to handle these cases, instead of seeing some >>annoying >> messages in the log, it will not affect management server from normal >> functioning. But it would be really nice to see a fix to have a stable >> management server ID acquisition process. >> >> Kelven >> >> On 5/8/13 12:15 PM, "Chip Childers" <chip.child...@sungard.com> wrote: >> >>> On Wed, May 08, 2013 at 09:11:43PM +0200, Jori Liesenborgs wrote: >>>> Hi everyone, >>>> >>>> On our cloudstack setup (4.0.2), I noticed that after a reboot of >>>> the management server, I was no longer able to start new instances. >>>> A secondary problem was that the management-server.log file filled >>>> up extremely fast (gigabytes in a few hours), with messages like >>>> these: >>>> >>>> 2013-05-08 05:26:10,627 DEBUG [agent.manager.ClusteredAgentAttache] >>>> (AgentManager-Handler-4:null) Seq 7-1033568320: Forwarding Seq >>>> 7-1033568320: { Cmd , MgmtId: 38424150221294, via: 7, Ver: v1, >>>> Flags: 100111, >>>> [{"StopCommand":{"isProxy":false,"vmName":"i-2-6-VM","wait":0}}] } >>>> to 130450099353672 >>>> >>>> This turned out to contain an important clue: when looking at the >>>> 'mshost' table in the 'cloud' database, instead of seeing one entry >>>> for the management server ID, there now were two: >>>> >>>> | id | msid | runid | name | ... >>>> | 1 | 130450099353672 | 1367919381740 | cloud-manager | ... >>>> | 2 | 38424150221294 | 1367950608087 | cloud-manager | ... >>>> >>>> And these two IDs were those that were mentioned in the logfile. In >>>> fact, every reboot a new entry in the 'mshost' table appeared, and >>>> that new ID was being inserted into the 'host' entries, for system >>>> VMs 'v-2-VM' and 's-1-VM'. >>>> >>>> Browsing through the code, it appears that in the >>>> ManagementServerNode.java file, the function getManagementServerId() >>>> returns a static value created by the MacAddress class. Now, on a >>>> Linux platform (we are using ubuntu), this address is obtained from >>>> the first entry that the command "/sbin/ifconfig -a" shows as >>>> output. And this turned out to be the address of the cloud0 bridge >>>> interface, which changed after a reboot (or after deleting the >>>> bridge using brctl and restarting the entire cloudstack). >>>> >>>> To avoid having to modify and recompile cloudstack, I created a fake >>>> ifconfig: a simple python process that most of the time just runs >>>> the real ifconfig (which I renamed to ifconfig-bin), but when called >>>> as "/sbin/ifconfig -a", it rearranges the output so that eth0 is >>>> shown first (and not cloud0). This way, the management server id is >>>> basically the MAC address of eth0, which stays the same after a >>>> reboot. >>>> >>>> I haven't had the time to create a long running test yet (I only >>>> figured it out this afternoon), but after several reboots, the >>>> management server id now stays the same, and I am still able to >>>> start new instances. >>>> >>>> Hope someone finds this useful. >>>> >>>> Cheers, >>>> Jori >>>> >>>> >>> Jori, >>> >>> This is really interesting. Would you mind opening a bug about it with >>> your findings? And if you're interested in submitting a patch, we'd >>> love that too! >>> >>> -chip >