OK, everybody ready for this explanation?

This boneheaded programmer finally realized today that the timestamps in my
metrics didn't quite look the same as the timestamps in the various
examples I've looked at or in other services.

I'm using a python program to push the metrics to the collector, and using
the time.time() function to get the current time. This time, of course, is
in seconds since the epoch.

AMS is expecting metrics from HadoopMetricsSink-derived Java classes, which
uses Java's System.currentTimeMillis() method, which, of course, is in
milliseconds since the epoch.

So, I changed my python program to use int(time.time()*1000.0) instead of
just int(time.time()) and everything magically starting working.

For completness' sake, I should add that I also changed the service's
"timelineAppid" in metainfo.xml to use lower case "gpfs" instead of "GPFS"
to be consistent with the lower case appId used everywhere else. And I also
added "gpfs" to timeline.metrics.service.cluster.aggregator.appIds.

I don't know if either of these changes were necessary to get
component-level metrics working properly.

Nate Falk
[email protected]



From:   Nathan Falk/Poughkeepsie/IBM@IBMUS
To:     [email protected]
Date:   11/10/2015 11:37 AM
Subject:        metrics visible from host_component but not component



I have a custom Ambari service, with a metrics.json and widgets.json
defined.

The widgets display on the service dashboard summary page, but instead of
the graph or data, I see "n/a".

When I use the REST API to query the ambari server, I see the metrics for
the host_component, but not when I query the component.

In metrics.json, I've added some of the basic ams host metrics, plus some
service-specific metrics. All metrics are defined in both "Component" and
"HostComponent". As an example:
      {
        "GPFS_MASTER": {
          "Component": [
            {
              "type": "ganglia",
              "metrics": {
                "default": {
                  "metrics/cpu/cpu_idle":{
                    "metric":"cpu_idle",
                    "pointInTime":true,
                    "temporal":true,
                    "amsHostMetric":true
                  },
                  ...
                  "metrics/gpfs/disk_used": {
                    "metric": "gpfs.disk_used",
                    "pointInTime": true,
                    "temporal": true
                  },
                  ...
                }
              }
            }
          ],
          "HostComponent": [
            {
              "type": "ganglia",
              "metrics": {
                "default": {
                  "metrics/cpu/cpu_idle":{
                    "metric":"cpu_idle",
                    "pointInTime":true,
                    "temporal":true,
                    "amsHostMetric":true
                  },
                  ...
                  "metrics/gpfs/disk_used": {
                    "metric": "gpfs.disk_used",
                    "pointInTime": true,
                    "temporal": true
                  },
                  ...


      I query the AMS Collector, and it seems that the metrics are there:
            [root@dn01-dat nathan]# curl -X GET -u admin:admin "
            
http://dn01:6188/ws/v1/timeline/metrics?metricNames=gpfs.disk_used&hostname=dn01-dat.ibm.com
            "
            
{"metrics":[{"timestamp":1447084964323,"metricname":"gpfs.disk_used","appid":"gpfs","hostname":"dn01-dat.ibm.com","starttime":1447084964,"metrics":{"1447084964":1437696.0}}]}



      I query Ambari, and whether I see the metric or not depends on how I
      do the query. If I query the GPFS_MASTER service component, I do NOT
      see the metric:
            [root@dn01-dat nathan]# curl -X GET -u admin:admin "
            
http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/gpfs/disk_used
            "
            {
              "href" : "
            
http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/gpfs/disk_used
            ",
              "ServiceComponentInfo" : {
                "cluster_name" : "nate",
                "component_name" : "GPFS_MASTER",
                "service_name" : "GPFS"
              }
            }

      If I query the GPFS_MASTER host component on dn01, then I do see the
      metric:
            [root@dn01-dat nathan]# curl -X GET -u admin:admin "
            
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/gpfs/disk_used
            "
            {
              "href" : "
            
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/gpfs/disk_used
            ",
              "HostRoles" : {
                "cluster_name" : "nate",
                "component_name" : "GPFS_MASTER",
                "host_name" : "dn01-dat.ibm.com"
              },
              "host" : {
                "href" : "
            http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com";
              },
              "metrics" : {
                "gpfs" : {
                  "disk_used" : 1437696.0
                }
              }
            }

      By comparison, if I query the "cpu_idle" metric, also defined in the
      GPFS metrics.json file, I see the metric in both queries:
            [root@dn01-dat nathan]# curl -X GET -u admin:admin "
            
http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/cpu/cpu_idle
            "
            {
              "href" : "
            
http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/cpu/cpu_idle
            ",
              "ServiceComponentInfo" : {
                "cluster_name" : "nate",
                "component_name" : "GPFS_MASTER",
                "service_name" : "GPFS"
              },
              "metrics" : {
                "cpu" : {
                  "cpu_idle" : 0.6248046875
                }
              }
            }[root@dn01-dat nathan]#
            [root@dn01-dat nathan]# curl -X GET -u admin:admin "
            
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/cpu/cpu_idle
            "
            {
              "href" : "
            
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/cpu/cpu_idle
            ",
              "HostRoles" : {
                "cluster_name" : "nate",
                "component_name" : "GPFS_MASTER",
                "host_name" : "dn01-dat.ibm.com"
              },
              "host" : {
                "href" : "
            http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com";
              },
              "metrics" : {
                "cpu" : {
                  "cpu_idle" : 0.624375
                }
              }
            }

      I feel like getting back "n/a" on the widgets is related to not
      seeing the metrics when I query the component rather than the
      host_component, but I'm not 100% sure about that either.

      My problems don't seem to end there, either. When I create new
      widgets using the gpfs metrics, I start seeing some wildly
      inconsistent behavior. Sometimes I'll get the right metric data,
      sometimes as I add and remove widgets they'll go back to displaying
      n/a or even displaying old values for the metric data.

      I must be missing something really simple, but I think I'm going to
      need help to figure out what that might be.

      Does anyone out there have any suggestions for how to investigate
      this further or what I might be missing with regard to defining or
      posting these metrics?

      Thanks,

      Nate Falk
      [email protected]

Reply via email to