Hi Yujun,
You can find a small update in the meeting log, but that’s it. We need to check which approach is effective and acceptable in upstream. http://meetbot.opnfv.org/meetings/opnfv-doctor/2016/opnfv-doctor.2016-10-04-13.04.html BR, Ryota From: Yujun Zhang [mailto:zhangyujun+...@gmail.com] Sent: Friday, November 04, 2016 6:15 PM To: Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvo...@nokia.com>; Mibu Ryota(壬生 亮太) <r-m...@cq.jp.nec.com>; opnfv-tech-discuss@lists.opnfv.org Subject: ##freemail## Re: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in general Hi, doctors Is there any update on this topic? It seems parallel execution will boost the performance on large scale system. But there was a different opinion on whether the current workflow is reasonable or not. Should the notification be sent from inspector directly or setting VM state error and leave it to NOVA to send notification? BTW: the doctor demo on OpenStack Summit is fabulous[1]. Don't miss it. [1] https://www.openstack.org/videos/video/demo-openstack-and-opnfv-keeping-your-mobile-phone-calls-connected -- Yujun On Tue, Oct 4, 2016 at 8:54 PM Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>> wrote: Hi, Not DB issue since just sending notifications took same time as changing DB at the same. Br, Tomi From: Ryota Mibu [mailto:r-m...@cq.jp.nec.com<mailto:r-m...@cq.jp.nec.com>] Sent: Tuesday, October 04, 2016 3:48 PM To: Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>>; Yujun Zhang <zhangyujun+...@gmail.com<mailto:zhangyujun%2b...@gmail.com>>; opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org> Subject: RE: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in general Tomi, So, it seems to be a DB bottleneck issue. Having bulk API to reset servers with query would be a solution? Anyhow, we can talk in the meeting soon. BR, Ryota From: Juvonen, Tomi (Nokia - FI/Espoo) [mailto:tomi.juvo...@nokia.com] Sent: Tuesday, October 04, 2016 9:08 PM To: Mibu Ryota(壬生 亮太) <r-m...@cq.jp.nec.com<mailto:r-m...@cq.jp.nec.com>>; Yujun Zhang <zhangyujun+...@gmail.com<mailto:zhangyujun+...@gmail.com>>; opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org> Subject: Re: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in general Hi, Still modified the test so that I do not do the reset server state, but instead just make the notification about “reset server state error” for each instance when force-down API called: for instance in instances: notifications.send_update_with_states(context, instance, instance.vm_state, vm_states.ERROR,instance.task_state, None, service="compute", host=host,verify_states=False) This had the same result as trough instance.save() that also changes the DB. So didn’t make things any better. Br, Tomi From: opnfv-tech-discuss-boun...@lists.opnfv.org<mailto:opnfv-tech-discuss-boun...@lists.opnfv.org> [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of Juvonen, Tomi (Nokia - FI/Espoo) Sent: Tuesday, October 04, 2016 12:30 PM To: Ryota Mibu <r-m...@cq.jp.nec.com<mailto:r-m...@cq.jp.nec.com>>; Yujun Zhang <zhangyujun+...@gmail.com<mailto:zhangyujun+...@gmail.com>>; opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org> Subject: Suspected SPAM - Re: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in general Hi, 1. Tried that 300 while it is also the default, so no difference. 2. Then I modified force-down API that it internally makes reset server state for all instances on host (so it is the only API called from inspector) and was no difference: With 10VMs where 5 VMs on failing host: 1000ms With 10VMs where 5 VMs on failing host: 1040ms With 20VMs where 10 VMs on failing host: 1540ms to 1780ms 3. Then added debug print over this code in modified forced-down API that gets servers and does reset state for each. Run Doctor test case: With 20VMs where 10 VMs on failing host: 1540ms In Nova code: Getting instances: instances = self.host_api.instance_get_all_by_host(context, host) Took: 32ms Looping 10 instances to make reset server state to error: for instance in instances: instance.vm_state = vm_states.ERROR instance.task_state = None instance.save(admin_state_reset=True) Took: 1250ms And can then also pick up the whole time the API took: 2016-10-04 09:05:46.075 5029 INFO nova.osapi_compute.wsgi.server [req-368d7fa5-dad6-4805-b9ed-535bf05fff06 b175813579a14b5d9eafe759a1d3e392 1dedc52c8caa42b8aea83b913035f5d9 - - -] 192.0.2.6 "PUT /v2.1/1dedc52c8caa42b8aea83b913035f5d9/os-services/force-down HTTP/1.1" status: 200 len: 354 time: 1.4085381 So the usage of reset server state is currently not feasible (and like indicated before, shouldn’t even be used). Br, Tomi From: Ryota Mibu [mailto:r-m...@cq.jp.nec.com] Sent: Saturday, October 01, 2016 8:54 AM To: Yujun Zhang <zhangyujun+...@gmail.com<mailto:zhangyujun+...@gmail.com>>; Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>>; opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org> Subject: RE: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in general Hi, That’s interesting evaluation! Yes, this should be an important issue. I doubt that token validation in keystone might take major part of processing time. If so, we should consider to use keystone trust which can skip the validation or to use token caches. Tomi, can you try the same evaluation with enabling token caches in client (by --os-cache) and nova ([keystone_authtoken] token_cache_time=300)? OR, maybe we can check how many HTTP messages and DB query happened per VM reset? Thanks, Ryota From: opnfv-tech-discuss-boun...@lists.opnfv.org<mailto:opnfv-tech-discuss-boun...@lists.opnfv.org> [mailto:opnfv-tech-discuss-boun...@lists.opnfv.org] On Behalf Of Yujun Zhang Sent: Friday, September 30, 2016 5:53 PM To: Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>>; opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org> Subject: Re: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in general It is almost linear to the number of VMs since the requests are sent one by one. And I think we should raise the priority of this issue. But I wonder what it will perform if the requests are sent simultaneously with async calls. How will nova deal with that? On Fri, Sep 30, 2016 at 4:25 PM Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>> wrote: Hi, Run Doctor test case in Nokia POD with APEX installer and state of the art Airframe HW. Modified the Doctor test case so that I can run several VMs and consumer can receive alarms from those. Measuring is it possible to stay within doctor requirement of under 1second from recognizing the fault to having alarm to consumer. This way I can see how much overhead comes when more VMs on failing host (overhead comes from calling reset server state API for each VM on failing host). Here is how many milliseconds it took to get scenario trough: With 1 VM on failing host: 180ms With 10VMs where 5 VMs on failing host: 800ms to 1040ms With 20VMs where 12 VMs on failing host: 2410ms With 20VMs where 13 VMs on failing host: 2010ms With 20VMs where 11 VMs on failing host: 2380ms With 50VMs where 27 VMs on failing host: 5060ms With 100VMs where 49 VMs on failing host: 8180ms Conclusion: With ideal environment one can run 5 VMs on a host and still fulfill Doctor requirement. So this needs to be enhanced.
_______________________________________________ opnfv-tech-discuss mailing list opnfv-tech-discuss@lists.opnfv.org https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss