Thanks Maryam, Really helpful inputs. I think I need to change the strategy. Should consider one by one in detail.
Over past one week, I tried different things, and here are some points and queries. 1. What should be ‘metrics/parameters’ to compare the runtime behavior of these agents? Are the values from /proc/<pid>/stat sufficient? <pid> is the agent process-id for which we are analyzing the runtime behavior. Currently, I’m just considering those in /proc/<pid>/stat. 2. Below, the metrics you have mentioned refers to run the ‘agent’ to monitor them, Right? * Currently, I was only running the CPU, Processes, Disk, Memory, Libvirt, IPMI, and network interfaces. * I’ll add more and rerun. 3. Any choice on the initial set of agents to study the runtime behavior? * I have started with collectd, telegraf and snap. 4. For workload, I’m using stress-ng and run different kind of stresses for 5-minute. Is it OK? 5. I had not considered the ‘application’ aspect at all. Need to decide what app to run. Regards, Sridhar K. N. Rao (Ph. D) Solutions Architect +91-9900088064 From: Tahhan, Maryam [mailto:maryam.tah...@intel.com] Sent: Tuesday, May 23, 2017 8:49 PM To: Rao, Sridhar <sridhar....@spirent.com>; 'MORTON, ALFRED C (AL)' <acmor...@att.com>; Aaron Smith <aasm...@redhat.com> Cc: Mcmahon, Tony B <tony.b.mcma...@intel.com>; Power, Damien <damien.po...@intel.com>; TECH-DISCUSS OPNFV <opnfv-tech-discuss@lists.opnfv.org> Subject: [barometer] Runtime analysis of the monitoring agents Hi Sridhar To make the comparison fair I would really advise the following: * The publishing Mode – should really be writing somewhere else off the system – ideally some sort of time series DB… you want to minimize the impact of noise on the system * You isolate and pin cores appropriately * Footprints measurement process: * Measure Idle System resources usage * Run plugin/plugins combination - Measure System resources usage * Repeat tests on a busy System – or one running a workload. * Report results * Repeat with a busy system. * Metrics to collect: * Sysstat metrics * CPU %user %nice %system %iowait %steal * Memory usage: kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree * Cache thrashing if any * IO * tps – Transactions per second (this includes both read and write) * rtps – Read transactions per second * wtps – Write transactions per second * bread/s – Bytes read per second * bwrtn/s – Bytes written per second * collectd/any other collector specific process stats if possible. * Application stats for the application you are running – to determine the impact of collectd/other collectors on the workload. * You might pick a usecase with some network traffic – to see the impact on this if any. * Intervals: you might want to try 1 second, 10 seconds and 60 seconds… if possible you might drop below a second. I hope this helps, please let me know if you have any questions/comments. BR Maryam From: Rao, Sridhar [mailto:sridhar....@spirent.com] Sent: Friday, May 12, 2017 2:38 PM To: Mcmahon, Tony B <tony.b.mcma...@intel.com<mailto:tony.b.mcma...@intel.com>>; Tahhan, Maryam <maryam.tah...@intel.com<mailto:maryam.tah...@intel.com>>; Power, Damien <damien.po...@intel.com<mailto:damien.po...@intel.com>> Subject: RE: [barometer] Weekly Call No Tony. Here you go: -------------- This is the template I’ll be using for Runtime analysis of the monitoring agents. By runtime analysis I’ll be comparing how much of CPU and Memory these agents consume when they are ‘monitoring’. Please let me know if I have missed anything or anything else I should be considering to make the comparison more meaningful. Metrics monitored by the agents: Publishing Mode: Frequency of Reading the values: Other metrics that may apply only to few agents: System Configuration CPU, Processes, Memory, Interfaces, Libvirt, IPMI, Disk Status Writing to the file 1 Sec. OVS, DPDK, PCM, RAS (MceLog) 1. Intel Xeon Server with at least 3 ethernet interfaces. 2. KVM/Qemu 3. At least 2 VMs Running. Q. Why These metrics? A. The choice of metrics first started with this link - https://wiki.opnfv.org/display/fastpath/Collectd+Metrics+and+Events . From this list, these metrics where chosen as they are supported by all the agents. Q. Why publishing mode as ‘writing to the file’? A. This mode is, again, supported by all (well, almost). And, it makes comparison fairer! Q. Why Frequency of 1sec? A. Frankly, I wasn’t sure. This is based on the input I received during the previous Barometer weekly call. Q. What about other metrics from the link? A. These will be considered and studies only for those that support – as they are relevant to NFV. Regards, Sridhar K. N. Rao (Ph. D) Solutions Architect +91-9900088064 Spirent Communications e-mail confidentiality. ------------------------------------------------------------------------ This e-mail contains confidential and / or privileged information belonging to Spirent Communications plc, its affiliates and / or subsidiaries. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution and / or the taking of any action based upon reliance on the contents of this transmission is strictly forbidden. If you have received this message in error please notify the sender by return e-mail and delete it from your system. Spirent Communications plc Northwood Park, Gatwick Road, Crawley, West Sussex, RH10 9XN, United Kingdom. Tel No. +44 (0) 1293 767676 Fax No. +44 (0) 1293 767677 Registered in England Number 470893 Registered at Northwood Park, Gatwick Road, Crawley, West Sussex, RH10 9XN, United Kingdom. Or if within the US, Spirent Communications, 27349 Agoura Road, Calabasas, CA, 91301, USA. Tel No. 1-818-676- 2300
_______________________________________________ opnfv-tech-discuss mailing list opnfv-tech-discuss@lists.opnfv.org https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss