Sid, Dmitry,
Thanks for the tips.
I had already been specifying the timelineAppid for the service component
in metainfo.xml:
<services>
<service>
<name>GPFS</name>
<displayName>Spectrum Scale</displayName>
<comment>High-performance, scalable storage manages yottabytes of
unstructured data (formerly known as General Parallel File System, or
GPFS)</comment>
<version>4.1.1</version>
<components>
<component>
<name>GPFS_MASTER</name>
<displayName>GPFS Master</displayName>
<category>MASTER</category>
<cardinality>1</cardinality>
<timelineAppid>GPFS</timelineAppid>
<commandScript>
<script>scripts/master.py</script>
<scriptType>PYTHON</scriptType>
<timeout>600</timeout>
</commandScript>
I am unable to query AMS by appId, though. Only by host.
[root@dn01-dat ~]# curl -X GET "
http://dn01:6188/ws/v1/timeline/metrics?metricNames=gpfs.disk_used&hostname=dn01-dat.ibm.com
"{"metrics":[{"timestamp":1447177544905,"metricname":"gpfs.disk_used","appid":"gpfs_master","hostname":"dn01-dat.ibm.com","starttime":1447177544,"metrics":{"1447177544":1439744.0}},{"timestamp":1447175401845,"metricname":"gpfs.disk_used","appid":"gpfs","hostname":"dn01-dat.ibm.com","starttime":1447175401,"metrics":{"1447175401":1439744.0}},{"timestamp":1447175289376,"metricname":"gpfs.disk_used","appid":"gpfs","hostname":"dn01-dat.ibm.com","starttime":1447175289,"metrics":{"1447175289":1439744.0}},{"timestamp":1447171202994,"metricname":"gpfs.disk_used","appid":"gpfs","hostname":"dn01-dat.ibm.com","starttime":1447171202,"metrics":{"1447171202":1439744.0}}]}
[root@dn01-dat ~]# curl -X GET "
http://dn01:6188/ws/v1/timeline/metrics?metricNames=gpfs.disk_used&appId=gpfs
"
{"metrics":[]}
[root@dn01-dat ~]# curl -X GET "
http://dn01:6188/ws/v1/timeline/metrics?metricNames=gpfs.disk_used&appId=gpfs_master
"
{"metrics":[]}
I have also experimented with setting the
timeline.metrics.service.cluster.aggregator.appIds property in ams-site to
include the "gpfs" appId (followed by a restart of AMS services). That did
not seem to change anything.
Thanks,
Nate Falk
[email protected]
From: Siddharth Wagle <[email protected]>
To: "[email protected]" <[email protected]>
Date: 11/10/2015 01:06 PM
Subject: Re: metrics visible from host_component but not component
Hi,
There is a field called timelineAppId, example:
common-services/ACCUMULO/1.6.1.2.2.0/metainfo.xml
This will override what Ambari uses in the call to AMS, you can set this to
gpfs.
- Sid
From: Dmitry Sen <[email protected]>
Sent: Tuesday, November 10, 2015 9:21 AM
To: [email protected]
Subject: Re: metrics visible from host_component but not component
Hi,
AMS doesn't support custom services metrics yet, but I can propose some
points to check
Try to call
curl -X GET -u admin:admin "
http://dn01:6188/ws/v1/timeline/metrics?metricNames=gpfs.disk_used&appId=gpfs
"
If you haven't get the metrics, then something to be fixed on AMS side.
Otherwise:
When you do
curl -X GET -u admin:admin "
http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/gpfs/disk_used
"
Ambari calls
curl -X GET -u admin:admin "
http://dn01:6188/ws/v1/timeline/metrics?metricNames=gpfs.disk_used&appId=gpfs_master
"
But as I can see in your previous message, appId is "gpfs".
{"metrics":[{"timestamp":1447084964323,"metricname":"gpfs.disk_used","appid":"gpfs","hostname":"dn01-dat.ibm.com","starttime":1447084964,"metrics":{"1447084964":1437696.0}}]}
Your custom application should report appId as gpfs_master, but not gpfs.
Another option is to rename GPFS_MASTER component to GPFS in metainfo.xml
BR,
Dmytro Sen
From: Nathan Falk <[email protected]>
Sent: Tuesday, November 10, 2015 6:37 PM
To: [email protected]
Subject: metrics visible from host_component but not component
I have a custom Ambari service, with a metrics.json and widgets.json
defined.
The widgets display on the service dashboard summary page, but instead of
the graph or data, I see "n/a".
When I use the REST API to query the ambari server, I see the metrics for
the host_component, but not when I query the component.
In metrics.json, I've added some of the basic ams host metrics, plus some
service-specific metrics. All metrics are defined in both "Component" and
"HostComponent". As an example:
{
"GPFS_MASTER": {
"Component": [
{
"type": "ganglia",
"metrics": {
"default": {
"metrics/cpu/cpu_idle":{
"metric":"cpu_idle",
"pointInTime":true,
"temporal":true,
"amsHostMetric":true
},
...
"metrics/gpfs/disk_used": {
"metric": "gpfs.disk_used",
"pointInTime": true,
"temporal": true
},
...
}
}
}
],
"HostComponent": [
{
"type": "ganglia",
"metrics": {
"default": {
"metrics/cpu/cpu_idle":{
"metric":"cpu_idle",
"pointInTime":true,
"temporal":true,
"amsHostMetric":true
},
...
"metrics/gpfs/disk_used": {
"metric": "gpfs.disk_used",
"pointInTime": true,
"temporal": true
},
...
I query the AMS Collector, and it seems that the metrics are there:
[root@dn01-dat nathan]# curl -X GET -u admin:admin "
http://dn01:6188/ws/v1/timeline/metrics?metricNames=gpfs.disk_used&hostname=dn01-dat.ibm.com
"
{"metrics":[{"timestamp":1447084964323,"metricname":"gpfs.disk_used","appid":"gpfs","hostname":"dn01-dat.ibm.com","starttime":1447084964,"metrics":{"1447084964":1437696.0}}]}
I query Ambari, and whether I see the metric or not depends on how I
do the query. If I query the GPFS_MASTER service component, I do NOT
see the metric:
[root@dn01-dat nathan]# curl -X GET -u admin:admin "
http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/gpfs/disk_used
"
{
"href" : "
http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/gpfs/disk_used
",
"ServiceComponentInfo" : {
"cluster_name" : "nate",
"component_name" : "GPFS_MASTER",
"service_name" : "GPFS"
}
}
If I query the GPFS_MASTER host component on dn01, then I do see the
metric:
[root@dn01-dat nathan]# curl -X GET -u admin:admin "
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/gpfs/disk_used
"
{
"href" : "
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/gpfs/disk_used
",
"HostRoles" : {
"cluster_name" : "nate",
"component_name" : "GPFS_MASTER",
"host_name" : "dn01-dat.ibm.com"
},
"host" : {
"href" : "
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com"
},
"metrics" : {
"gpfs" : {
"disk_used" : 1437696.0
}
}
}
By comparison, if I query the "cpu_idle" metric, also defined in the
GPFS metrics.json file, I see the metric in both queries:
[root@dn01-dat nathan]# curl -X GET -u admin:admin "
http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/cpu/cpu_idle
"
{
"href" : "
http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/cpu/cpu_idle
",
"ServiceComponentInfo" : {
"cluster_name" : "nate",
"component_name" : "GPFS_MASTER",
"service_name" : "GPFS"
},
"metrics" : {
"cpu" : {
"cpu_idle" : 0.6248046875
}
}
}[root@dn01-dat nathan]#
[root@dn01-dat nathan]# curl -X GET -u admin:admin "
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/cpu/cpu_idle
"
{
"href" : "
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/cpu/cpu_idle
",
"HostRoles" : {
"cluster_name" : "nate",
"component_name" : "GPFS_MASTER",
"host_name" : "dn01-dat.ibm.com"
},
"host" : {
"href" : "
http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com"
},
"metrics" : {
"cpu" : {
"cpu_idle" : 0.624375
}
}
}
I feel like getting back "n/a" on the widgets is related to not
seeing the metrics when I query the component rather than the
host_component, but I'm not 100% sure about that either.
My problems don't seem to end there, either. When I create new
widgets using the gpfs metrics, I start seeing some wildly
inconsistent behavior. Sometimes I'll get the right metric data,
sometimes as I add and remove widgets they'll go back to displaying
n/a or even displaying old values for the metric data.
I must be missing something really simple, but I think I'm going to
need help to figure out what that might be.
Does anyone out there have any suggestions for how to investigate
this further or what I might be missing with regard to defining or
posting these metrics?
Thanks,
Nate Falk
[email protected]