RE: High CPU utilization on KVM hosts while doing RBD snapshot - was Re: snapshot caused host disconnected

Suresh Sadhu Mon, 07 Oct 2013 23:29:34 -0700

Indra,

Are you seeing high cpu utilization only thorough cloud stack or  its same even 
with  ceph also. please try with below commands  and share your results after 
performing snapshot from ceph .this will help us to isolate the problem  is it 
really  happening at the time of snapshot or  at the time of file conversion 
happening at background.


--> Create a volume ,write some data  and perform snapshot

##
rbd --pool rbd snap create --snap snapname foo
rbd snap create rbd/foo@snapname

regards
sadhu



-----Original Message-----
From: Indra Pramana [mailto:in...@sg.or.id] 
Sent: 08 October 2013 08:29
To: d...@cloudstack.apache.org; users@cloudstack.apache.org
Cc: Wido den Hollander
Subject: High CPU utilization on KVM hosts while doing RBD snapshot - was Re: 
snapshot caused host disconnected

Dear Wido and all,

I performed some further tests last night:

(1) CPU utilization of the KVM host while RBD snapshot running is still 
shooting up high even after I set global setting:
concurrent.snapshots.threshold.perhost to 2.

(2) Most of the concurrent snapshot processes will fail with either stuck in 
"Creating" state, or "CreatedOnPrimary" error message.

(3) I also have adjusted some other related global settings such as 
backup.snapshot.wait and job.expire.minutes, without any luck.

Any advise on the reason what causes the high CPU utilization is greatly 
appreciated.

Looking forward to your reply, thank you.

Cheers.


On Mon, Oct 7, 2013 at 11:03 PM, Indra Pramana <in...@sg.or.id> wrote:

> Dear all,
>
> I also found out that when the RBD snapshot is being run, the CPU 
> utilisation on the KVM host will be shooting up very high, which might 
> explain why the host becomes disconnected.
>
> top - 22:49:32 up 3 days, 19:31,  1 user,  load average: 7.85, 4.97, 3.47
> Tasks: 297 total,   3 running, 294 sleeping,   0 stopped,   0 zombie
> Cpu(s):  4.5%us,  1.2%sy,  0.0%ni, 94.1%id,  0.1%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Mem:  264125244k total, 77203460k used, 186921784k free,   154888k buffers
> Swap:   545788k total,        0k used,   545788k free, 60677092k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 18161 root      20   0 3871m  31m 8444 S  101  0.0 301:58.09 kvm
>  2790 root      20   0 43.5g 1.6g  19m S   97  0.7  45:52.42 jsvc
> 24544 root      20   0 4583m  31m 8364 S   97  0.0 425:29.48 kvm
>  6537 root      20   0     0    0    0 R   71  0.0   0:17.49 kworker/3:2
> 22546 root      20   0 6143m 2.0g 8452 S   26  0.8  55:14.07 kvm
>  4219 root      20   0 7671m 4.0g 8524 S    6  1.6 106:12.26 kvm
>  5989 root      20   0 43.2g 1.6g  232 D    6  0.6   0:08.13 jsvc
>  5993 root      20   0 43.3g 1.6g  224 D    6  0.6   0:08.36 jsvc
>
> Is it normal when snapshot is being run on the VM running on that 
> host, the host's CPU utilisation will be higher than usual? How can I 
> limit the CPU resources used by the snapshot?
>
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
>
>
> On Mon, Oct 7, 2013 at 7:18 PM, Indra Pramana <in...@sg.or.id> wrote:
>
>> Dear all,
>>
>> I did some tests on snapshots since it's now supported for my Ceph 
>> RBD primary storage in CloudStack 4.2. When I ran the snapshot for a 
>> particular VM instance earlier, I noticed that this has caused the 
>> host (where the VM is on) becomes disconnected.
>>
>> Here's the excerpt from the agent.log:
>>
>> http://pastebin.com/dxVV7stu
>>
>> The management-server.log doesn't much showing anything other than 
>> detecting that the host was down and HA is being activated:
>>
>> http://pastebin.com/UeLiSm9K
>>
>> Anyone can advise what is causing the problem? So far there is only 
>> one user doing the snapshotting and it has caused issues to the host, 
>> I can't imagine what if multiple users try to do snapshotting at the same 
>> time?
>>
>> I read about snapshot job throttling which is described on the manual:
>>
>>
>> http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/
>> Admin_Guide/working-with-snapshots.html
>>
>> But I am not too sure whether this will help to resolve the problem 
>> since there is only one user trying to perform snapshot and we 
>> already encounter the problem already.
>>
>> Anyone can advise how I can troubleshoot further and find a solution 
>> to the problem?
>>
>> Looking forward to your reply, thank you.
>>
>> Cheers.
>>
>
>

RE: High CPU utilization on KVM hosts while doing RBD snapshot - was Re: snapshot caused host disconnected

Reply via email to