Nithin, you are right about it being a hypothesis. It is being tested
in a live cloud at Schuberg Philis as we 'speak' it runs a patched
4.2.1 version. I committed my change to 4.3-forward.

We decided to let you scale to twice your initial number of cpu no
matter how many there where to begin with. If you need to go beyond
that you cen restart the vm with a new offering. we can also implement
both behaivures based on a setting if need be!?!

I think this hasn't been caught because of the two rather specific
setups Schuberg Philis is using. Thats why we oftem come with blocker
bugs in older version during rc votes as well. And this is also why we
want a stable master tree from which we can take snapshots to run in
our production environments.

Not there yet but still full of aspiration,
Daan

On Wed, Feb 5, 2014 at 5:04 PM, Nitin Mehta (JIRA) <j...@apache.org> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/CLOUDSTACK-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892260#comment-13892260
>  ]
>
> Nitin Mehta commented on CLOUDSTACK-6023:
> -----------------------------------------
>
> Guys - I still see this as a hypothesis since we haven't tested it. I don't 
> think we should cherry-pick unless we are sure this works.
> If possible I would recommend testing it through 4.3 than 4.2.1 as well.
> I am also wondering why this issue hasnt been caught yet.
>
> We should definitely have this setting to 16 (or other) when dynamic scaling 
> is enabled.
>
>> Non windows instances are created on XenServer with a vcpu-max above 
>> supported xenserver limits
>> -----------------------------------------------------------------------------------------------
>>
>>                 Key: CLOUDSTACK-6023
>>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6023
>>             Project: CloudStack
>>          Issue Type: Bug
>>      Security Level: Public(Anyone can view this level - this is the 
>> default.)
>>          Components: XenServer
>>    Affects Versions: Future, 4.2.1, 4.3.0
>>            Reporter: Joris van Lieshout
>>            Priority: Blocker
>>         Attachments: xentop.png
>>
>>
>> CitrixResourceBase.java contains a hardcoded value for vcpusmax for non 
>> windows instances:
>> if (guestOsTypeName.toLowerCase().contains("windows")) {
>>             vmr.VCPUsMax = (long) vmSpec.getCpus();
>>         } else {
>>             vmr.VCPUsMax = 32L;
>>         }
>> For all currently available versions of XenServer the limit is 16vcpus:
>> http://support.citrix.com/servlet/KbServlet/download/28909-102-664115/XenServer-6.0-Configuration-Limits.pdf
>> http://support.citrix.com/servlet/KbServlet/download/32312-102-704653/CTX134789%20-%20XenServer%206.1.0_Configuration%20Limits.pdf
>> http://support.citrix.com/servlet/KbServlet/download/34966-102-706122/CTX137837_XenServer%206_2_0_Configuration%20Limits.pdf
>> In addition there seems to be a limit to the total amount of assigned vpcus 
>> on a XenServer.
>> The impact of this bug is that xapi becomes unstable and keeps losing it's 
>> master_connection because the POST to the /remote_db_access is bigger then 
>> it's limit of 200K. This basically renders a pool slave unmanageable.
>> If you would look at the running instances using xentop you will see hosts 
>> reporting with 32 vcpus
>> Below the relevant portion of the xensource.log that shows the effect of the 
>> bug:
>> [20140204T13:52:17.264Z|debug|xenserverhost1|144 inet-RPC|host.call_plugin 
>> R:e58e985539ab|master_connection] stunnel: Using commandline: 
>> /usr/sbin/stunnel -fd f3b8bb12-4e03-b47a-0dc5-85ad5aef79e6
>> [20140204T13:52:17.269Z|debug|xenserverhost1|144 inet-RPC|host.call_plugin 
>> R:e58e985539ab|master_connection] stunnel: stunnel has pidty: (FEFork 
>> (43,30540))
>> [20140204T13:52:17.269Z|debug|xenserverhost1|144 inet-RPC|host.call_plugin 
>> R:e58e985539ab|master_connection] stunnel: stunnel start
>> [20140204T13:52:17.269Z| info|xenserverhost1|144 inet-RPC|host.call_plugin 
>> R:e58e985539ab|master_connection] stunnel connected pid=30540 fd=40
>> [20140204T13:52:17.346Z|error|xenserverhost1|144 inet-RPC|host.call_plugin 
>> R:e58e985539ab|master_connection] Received HTTP error 500 ({ method = POST; 
>> uri = /remote_db_access; query = [  ]; content_length = [ 315932 ]; transfer 
>> encoding = ; version = 1.1; cookie = [ 
>> pool_secret=386bbf39-8710-4d2d-f452-9725d79c2393/aa7bcda9-8ebb-0cef-bb77-c6b496c5d859/1f928d82-7a20-9117-dd30-f96c7349b16e
>>  ]; task = ; subtask_of = ; content-type = ; user_agent = xapi/1.9 }) from 
>> master. This suggests our master address is wrong. Sleeping for 60s and then 
>> restarting.
>> [20140204T13:53:18.620Z|error|xenserverhost1|10|dom0 networking update 
>> D:5c5376f0da6c|master_connection] Caught Master_connection.Goto_handler
>> [20140204T13:53:18.620Z|debug|xenserverhost1|10|dom0 networking update 
>> D:5c5376f0da6c|master_connection] Connection to master died. I will continue 
>> to retry indefinitely (supressing future logging of this message).
>> [20140204T13:53:18.620Z|error|xenserverhost1|10|dom0 networking update 
>> D:5c5376f0da6c|master_connection] Connection to master died. I will continue 
>> to retry indefinitely (supressing future logging of this message).
>> [20140204T13:53:18.620Z|debug|xenserverhost1|10|dom0 networking update 
>> D:5c5376f0da6c|master_connection] Sleeping 2.000000 seconds before retrying 
>> master connection...
>> [20140204T13:53:20.627Z|debug|xenserverhost1|10|dom0 networking update 
>> D:5c5376f0da6c|master_connection] stunnel: Using commandline: 
>> /usr/sbin/stunnel -fd 3c8aed8e-1fce-be7c-09f8-b45cdc40a1f5
>> [20140204T13:53:20.632Z|debug|xenserverhost1|10|dom0 networking update 
>> D:5c5376f0da6c|master_connection] stunnel: stunnel has pidty: (FEFork 
>> (23,31207))
>> [20140204T13:53:20.632Z|debug|xenserverhost1|10|dom0 networking update 
>> D:5c5376f0da6c|master_connection] stunnel: stunnel start
>> [20140204T13:53:20.632Z| info|xenserverhost1|10|dom0 networking update 
>> D:5c5376f0da6c|master_connection] stunnel connected pid=31207 fd=20
>> [20140204T13:53:28.874Z|error|xenserverhost1|4 
>> unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection] 
>> Caught Master_connection.Goto_handler
>> [20140204T13:53:28.874Z|debug|xenserverhost1|4 
>> unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection] 
>> Connection to master died. I will continue to retry indefinitely (supressing 
>> future logging of this message).
>> [20140204T13:53:28.874Z|error|xenserverhost1|4 
>> unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection] 
>> Connection to master died. I will continue to retry indefinitely (supressing 
>> future logging of this message).
>> [20140204T13:53:28.875Z|debug|xenserverhost1|4 
>> unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection] 
>> Sleeping 2.000000 seconds before retrying master connection...
>> [20140204T13:53:30.887Z|debug|xenserverhost1|4 
>> unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection] 
>> stunnel: Using commandline: /usr/sbin/stunnel -fd 
>> 665b8c15-8119-78a7-1888-cde60b2108dc
>> [20140204T13:53:30.892Z|debug|xenserverhost1|4 
>> unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection] 
>> stunnel: stunnel has pidty: (FEFork (25,31514))
>> [20140204T13:53:30.892Z|debug|xenserverhost1|4 
>> unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection] 
>> stunnel: stunnel start
>> [20140204T13:53:30.892Z| info|xenserverhost1|4 
>> unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection] 
>> stunnel connected pid=31514 fd=22
>> [20140204T13:54:31.472Z|error|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Caught Unix.Unix_error(31, "write", "")
>> [20140204T13:54:31.472Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Connection to master died. I will continue 
>> to retry indefinitely (supressing future logging of this message).
>> [20140204T13:54:31.477Z|error|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Connection to master died. I will continue 
>> to retry indefinitely (supressing future logging of this message).
>> [20140204T13:54:31.477Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Sleeping 2.000000 seconds before retrying 
>> master connection...
>> [20140204T13:54:33.488Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: Using commandline: 
>> /usr/sbin/stunnel -fd f5df840d-8ac0-39fd-050f-bfa23a96c148
>> [20140204T13:54:33.493Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel has pidty: (FEFork 
>> (28,2788))
>> [20140204T13:54:33.493Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel start
>> [20140204T13:54:33.493Z| info|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel connected pid=2788 fd=24
>> [20140204T13:54:33.572Z|error|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Caught Unix.Unix_error(31, "write", "")
>> [20140204T13:54:33.572Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Sleeping 4.000000 seconds before retrying 
>> master connection...
>> [20140204T13:54:37.578Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: Using commandline: 
>> /usr/sbin/stunnel -fd bcc34b6e-20cd-933c-7375-941d53884184
>> [20140204T13:54:37.583Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel has pidty: (FEFork 
>> (31,2808))
>> [20140204T13:54:37.584Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel start
>> [20140204T13:54:37.584Z| info|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel connected pid=2808 fd=26
>> [20140204T13:54:37.667Z|error|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Caught Unix.Unix_error(31, "write", "")
>> [20140204T13:54:37.667Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Sleeping 8.000000 seconds before retrying 
>> master connection...
>> [20140204T13:54:45.679Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: Using commandline: 
>> /usr/sbin/stunnel -fd 83e7a6c7-3482-8bb9-3275-b537fc695bd6
>> [20140204T13:54:45.683Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel has pidty: (FEFork 
>> (30,2919))
>> [20140204T13:54:45.683Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel start
>> [20140204T13:54:45.683Z| info|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel connected pid=2919 fd=25
>> [20140204T13:54:45.768Z|error|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Caught Unix.Unix_error(31, "write", "")
>> [20140204T13:54:45.768Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Sleeping 16.000000 seconds before retrying 
>> master connection...
>> [20140204T13:55:01.789Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: Using commandline: 
>> /usr/sbin/stunnel -fd abe83182-4ce5-0681-2c68-827dbbd95e94
>> [20140204T13:55:01.794Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel has pidty: (FEFork 
>> (32,3022))
>> [20140204T13:55:01.794Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel start
>> [20140204T13:55:01.794Z| info|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel connected pid=3022 fd=28
>> [20140204T13:55:02.143Z|error|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Caught Unix.Unix_error(31, "write", "")
>> [20140204T13:55:02.143Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Sleeping 32.000000 seconds before retrying 
>> master connection...
>> [20140204T13:55:34.179Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: Using commandline: 
>> /usr/sbin/stunnel -fd 00895b5f-b30c-0c3a-32ae-758993dcd791
>> [20140204T13:55:34.184Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel has pidty: (FEFork 
>> (37,3387))
>> [20140204T13:55:34.184Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel: stunnel start
>> [20140204T13:55:34.184Z| info|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] stunnel connected pid=3387 fd=33
>> [20140204T13:55:34.266Z|error|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Caught Unix.Unix_error(31, "write", "")
>> [20140204T13:55:34.267Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin 
>> R:4d3007755c69|master_connection] Sleeping 64.000000 seconds before retrying 
>> master connection...
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1.5#6160)



-- 
Daan

Reply via email to