Hi Lydia,

I have used sar monitoring (sar -u -n DEV -p -d -r 1) and plotted the average 
over multiple nodes.

1)So for each node you can collect the sar output, and obtain for example:

Linux 3.2.0-4-amd64 (parasilo-4.rennes.grid5000.fr)     2016-01-27      
_x86_64_        (16 CPU)
12:54:09        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:54:10        all      4.63      0.00      3.25      0.13      0.00     91.99
12:54:09    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   
%commit  kbactive   kbinact
12:54:10    129538812   2525308      1.91      1292     85876   3662636      
2.69   2111652     55132
12:54:09          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     
await     svctm     %util
12:54:10          sda     28.71   2708.91     87.13     97.38      0.03      
1.10      0.97      2.77
12:54:09        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   
txcmp/s  rxmcst/s
12:54:10         eth0    632.67    587.13   3173.60     58.47      0.00      
0.00      0.00

2) Calculate the average over your nodes (sync clocks) and obtain a final 
output over which you run some plot scripts:

LINE      DATE      FILENAME                 CPU_user  CPU_SYS   KBMEMFREE 
KBMEMUSED MEMUSED   DISK_UTIL DISK_RKBs DISK_WKBs _IO_RSTs  _IO_WSTs
1         12:54:10  res1Avg                  6.12      1.25      129554704 
2509412   1.90      6.00      4253.63   87.04     3944.00   88.00     
2         12:54:11  res1Avg                  3.41      0.28      129523432 
2540690   1.92      4.00      2335.82   51.62     2692.00   0.00      
3         12:54:12  res1Avg                  0.06      0.03      129522000 
2542120   1.92      1.60      0.16      0.59      2048.00   32.00     
4         12:54:13  res1Avg                  0.09      0.06      129520936 
2543182   1.92      0.60      0.19      0.59      2048.00   0.00      
5         12:54:14  res1Avg                  0.06      0.06      129518448 
2545670   1.93      6.80      4.31      169.47    4044.00   16.00     

For other metrics specific to Flinkā€™s execution you may need to rely on various 
metrics Flink is currently exposing.

Best,
Ovidiu

> On 21 Dec 2016, at 19:55, Lydia Ickler <ickle...@googlemail.com> wrote:
> 
> Hi all,
> 
> I have a question regarding the Monitoring REST API;
> 
> I want to analyze the behavior of my program with regards to I/O MiB/s, 
> Network MiB/s and CPU % as the authors of this paper did. 
> (https://hal.inria.fr/hal-01347638v2/document 
> <https://hal.inria.fr/hal-01347638v2/document>)
> From the JSON file at http:master:8081/jobs/jobid/ I get a summary including 
> the information of read/write records and read/write bytes.
> Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am 
> running my program on a cluster with up to 32 nodes.
> 
> Where can I find the values for e.g. CPU or Network?
> 
> Thanks in advance!
> Lydia
> 

Reply via email to