Re: [opnfv-tech-discuss] ##freemail## Re: [Doctor] Reset Server State and alarms in general

Ryota Mibu Fri, 04 Nov 2016 04:58:42 -0700

Hi Yujun,


You can find a small update in the meeting log, but that’s it. We need to check 
which approach is effective and acceptable in upstream.

http://meetbot.opnfv.org/meetings/opnfv-doctor/2016/opnfv-doctor.2016-10-04-13.04.html


BR,
Ryota

From: Yujun Zhang [mailto:zhangyujun+...@gmail.com]
Sent: Friday, November 04, 2016 6:15 PM
To: Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvo...@nokia.com>; Mibu Ryota(壬生 
亮太) <r-m...@cq.jp.nec.com>; opnfv-tech-discuss@lists.opnfv.org
Subject: ##freemail## Re: [opnfv-tech-discuss] [Doctor] Reset Server State and 
alarms in general

Hi, doctors

Is there any update on this topic?

It seems parallel execution will boost the performance on large scale system.

But there was a different opinion on whether the current workflow is reasonable 
or not. Should the notification be sent from inspector directly or setting VM 
state error and leave it to NOVA to send notification?

BTW: the doctor demo on OpenStack Summit is fabulous[1]. Don't miss it.

[1] 
https://www.openstack.org/videos/video/demo-openstack-and-opnfv-keeping-your-mobile-phone-calls-connected
--
Yujun

On Tue, Oct 4, 2016 at 8:54 PM Juvonen, Tomi (Nokia - FI/Espoo) 
<tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>> wrote:
Hi,

Not DB issue since just sending notifications took same time as changing DB at 
the same.

Br,
Tomi

From: Ryota Mibu [mailto:r-m...@cq.jp.nec.com<mailto:r-m...@cq.jp.nec.com>]
Sent: Tuesday, October 04, 2016 3:48 PM
To: Juvonen, Tomi (Nokia - FI/Espoo) 
<tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>>; Yujun Zhang 
<zhangyujun+...@gmail.com<mailto:zhangyujun%2b...@gmail.com>>; 
opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>

Subject: RE: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in 
general

Tomi,


So, it seems to be a DB bottleneck issue.

Having bulk API to reset servers with query would be a solution?

Anyhow, we can talk in the meeting soon.


BR,
Ryota

From: Juvonen, Tomi (Nokia - FI/Espoo) [mailto:tomi.juvo...@nokia.com]
Sent: Tuesday, October 04, 2016 9:08 PM
To: Mibu Ryota(壬生 亮太) <r-m...@cq.jp.nec.com<mailto:r-m...@cq.jp.nec.com>>; 
Yujun Zhang <zhangyujun+...@gmail.com<mailto:zhangyujun+...@gmail.com>>; 
opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>
Subject: Re: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in 
general

Hi,

Still modified the test so that I do not do the reset server state, but instead 
just make the notification about “reset server state error” for each instance 
when force-down API called:
for instance in instances:
            notifications.send_update_with_states(context, instance,
            instance.vm_state, vm_states.ERROR,instance.task_state,
            None, service="compute", host=host,verify_states=False)

This had the same result as trough instance.save() that also changes the DB. So 
didn’t make things any better.

Br,
Tomi

From: 
opnfv-tech-discuss-boun...@lists.opnfv.org<mailto:opnfv-tech-discuss-boun...@lists.opnfv.org>
 [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of Juvonen, Tomi 
(Nokia - FI/Espoo)
Sent: Tuesday, October 04, 2016 12:30 PM
To: Ryota Mibu <r-m...@cq.jp.nec.com<mailto:r-m...@cq.jp.nec.com>>; Yujun Zhang 
<zhangyujun+...@gmail.com<mailto:zhangyujun+...@gmail.com>>; 
opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>
Subject: Suspected SPAM - Re: [opnfv-tech-discuss] [Doctor] Reset Server State 
and alarms in general

Hi,


1.      Tried that 300 while it is also the default, so no difference.



2.      Then I modified force-down API that it internally makes reset server 
state for all instances on host (so it is the only API called from inspector) 
and was no difference:
With 10VMs where 5 VMs on failing host: 1000ms
With 10VMs where 5 VMs on failing host: 1040ms
With 20VMs where 10 VMs on failing host: 1540ms to 1780ms


3.      Then added debug print over this code in modified forced-down API that 
gets servers and does reset state for each.
Run Doctor test case:
With 20VMs where 10 VMs on failing host: 1540ms
In Nova code:
Getting instances:
instances = self.host_api.instance_get_all_by_host(context, host)
Took: 32ms
Looping 10 instances to make reset server state to error:
for instance in instances:
            instance.vm_state = vm_states.ERROR
            instance.task_state = None
            instance.save(admin_state_reset=True)
Took: 1250ms
And can then also pick up the whole time the API took:
2016-10-04 09:05:46.075 5029 INFO nova.osapi_compute.wsgi.server 
[req-368d7fa5-dad6-4805-b9ed-535bf05fff06 b175813579a14b5d9eafe759a1d3e392 
1dedc52c8caa42b8aea83b913035f5d9 - - -] 192.0.2.6 "PUT 
/v2.1/1dedc52c8caa42b8aea83b913035f5d9/os-services/force-down HTTP/1.1" status: 
200 len: 354 time: 1.4085381

So the usage of reset server state is currently not feasible (and like 
indicated before, shouldn’t even be used).

Br,
Tomi

From: Ryota Mibu [mailto:r-m...@cq.jp.nec.com]
Sent: Saturday, October 01, 2016 8:54 AM
To: Yujun Zhang <zhangyujun+...@gmail.com<mailto:zhangyujun+...@gmail.com>>; 
Juvonen, Tomi (Nokia - FI/Espoo) 
<tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>>; 
opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>
Subject: RE: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in 
general

Hi,


That’s interesting evaluation!

Yes, this should be an important issue.

I doubt that token validation in keystone might take major part of processing 
time. If so, we should consider to use keystone trust which can skip the 
validation or to use token caches. Tomi, can you try the same evaluation with 
enabling token caches in client (by --os-cache) and nova ([keystone_authtoken] 
token_cache_time=300)?

OR, maybe we can check how many HTTP messages and DB query happened per VM 
reset?


Thanks,
Ryota

From: 
opnfv-tech-discuss-boun...@lists.opnfv.org<mailto:opnfv-tech-discuss-boun...@lists.opnfv.org>
 [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of Yujun Zhang
Sent: Friday, September 30, 2016 5:53 PM
To: Juvonen, Tomi (Nokia - FI/Espoo) 
<tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>>; 
opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>
Subject: Re: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in 
general

It is almost linear to the number of VMs since the requests are sent one by 
one. And I think we should raise the priority of this issue.

But I wonder what it will perform if the requests are sent simultaneously with 
async calls. How will nova deal with that?
On Fri, Sep 30, 2016 at 4:25 PM Juvonen, Tomi (Nokia - FI/Espoo) 
<tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>> wrote:
Hi,
Run Doctor test case in Nokia POD with APEX installer and state of the art 
Airframe HW. Modified the Doctor test case so that I can run several VMs and 
consumer can receive alarms from those. Measuring is it possible to stay within 
doctor requirement of under 1second from recognizing the fault to having alarm 
to consumer. This way I can see how much overhead comes when more VMs on 
failing host (overhead comes from calling reset server state API for each VM on 
failing host).

Here is how many milliseconds it took to get scenario trough:
With 1 VM on failing host: 180ms
With 10VMs where 5 VMs on failing host: 800ms to 1040ms
With 20VMs where 12 VMs on failing host: 2410ms
With 20VMs where 13 VMs on failing host: 2010ms
With 20VMs where 11 VMs on failing host: 2380ms
With 50VMs where 27 VMs on failing host: 5060ms
With 100VMs where 49 VMs on failing host: 8180ms

Conclusion: With ideal environment one can run 5 VMs on a host and still 
fulfill Doctor requirement. So this needs to be enhanced.

_______________________________________________
opnfv-tech-discuss mailing list
opnfv-tech-discuss@lists.opnfv.org
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss

Re: [opnfv-tech-discuss] ##freemail## Re: [Doctor] Reset Server State and alarms in general

Reply via email to