RE: Some Geode management metrics returning 0s after OS upgrade

Vahram Aharonyan Mon, 18 Feb 2019 04:58:32 -0800

Hi All,

Could someone confirm that I’m looking into correct charts in .gfsh 
org.apache.geode.management.MemberMXBean#getFunctionExecutionRate, 
org.apache.geode.management.MemberMXBean#getPutsRate and 
org.apache.geode.management.MemberMXBean#getGetsRate.


I do see in gfs that  CachePerfStats. “gets”, “puts” and 
“FunctionExecution.functionExecutionCalls” have non-zero values. Or could it be 
that these stats are not related to MXBean methods I’m referring to ?

Thanks,
Vahram.

From: Vahram Aharonyan <[email protected]>
Sent: Tuesday, February 12, 2019 9:20 PM
To: [email protected]
Subject: RE: Some Geode management metrics returning 0s after OS upgrade

Hi Darrel,

Thanks for the feedback.
Actually we continuously do region put/gets and function executions. If I’m not 
mistaken we have seen some non-0 values in CachePerfStats “gets” and “puts” 
metrics. Are these the ones that are getting propagated in  
org.apache.geode.management.MemberMXBean#getPutsRate, 
org.apache.geode.management.MemberMXBean#getGetsRate I mentioned before?

And could it be that for 
“org.apache.geode.management.MemberMXBean#getFunctionExecutionRate” I should 
refer to “FunctionExecution.functionExecutionCalls” in gfs?

Thanks,
Vahram.

From: Darrel Schneider <[email protected]<mailto:[email protected]>>
Sent: Tuesday, February 12, 2019 8:08 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Some Geode management metrics returning 0s after OS upgrade

You need to actually be executing functions and doing region puts/gets for 
these stats to be non-zero.
The gfs files record the gets/puts in the CachePerfStats. One CachePerfStats 
represents a combination of all the regions. Other CachePerfStats represent 
just one region (those have that region name on them). You want to look at the 
first one as it represents the entire cache.


On Tue, Feb 12, 2019 at 6:43 AM Vahram Aharonyan 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

Experiments with various experiments and long-term monitoring showed that the 
only real problem remains only with these 3 metrics:

org.apache.geode.management.MemberMXBean#getFunctionExecutionRate
org.apache.geode.management.MemberMXBean#getPutsRate
org.apache.geode.management.MemberMXBean#getGetsRate

All others related to either Network or Disk have some values differing from 0, 
but these three constantly have 0-values. These seem to be Geode-internal 
metrics and should not be related to system right? Could it be that there is 
some info on these metrics in *.gfs files, so we can see whether they have 
actual values or not?

Thanks,
Vahram.

From: Vahram Aharonyan <[email protected]<mailto:[email protected]>>
Sent: Thursday, February 7, 2019 5:19 PM
To: [email protected]<mailto:[email protected]>
Subject: RE: Some Geode management metrics returning 0s after OS upgrade

Hi Kirk,

We were not able to find any erroneous message from StatsSampler in our log 
files.
Is running of these tests straightforward, do we have some doc describing this 
process? What kind of requirements should be met to be able to run this test?

Hi Barry,

Yes, we see values for other MBean attributes reported.

You were right, thread is there:
INFO   | jvm 1    | 2019/02/07 12:15:54 | "Thread-10 StatSampler" #59 daemon 
prio=10 os_prio=0 tid=0x00007f1fc8951800 nid=0x2d0 in Object.wait() 
[0x00007f1fb14e3000]
INFO   | jvm 1    | 2019/02/07 12:15:54 |    java.lang.Thread.State: 
TIMED_WAITING (on object monitor)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       at java.lang.Object.wait(Native 
Method)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       at 
org.apache.geode.internal.statistics.HostStatSampler.delay(HostStatSampler.java:520)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       - locked <0x0000000651581a68> 
(a org.apache.geode.internal.statistics.GemFireStatSampler)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       at 
org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:208)
INFO   | jvm 1    | 2019/02/07 12:15:54 |       at 
java.lang.Thread.run(Thread.java:748)

Could it be that this is caused by missing some privileges to access system 
resources ? Or is there some way to check if this information is available in 
the *.gfs stat files from locator or server? I was looking into these files but 
was not able to find anything linking me with below-mentioned metrics.

Thanks,
Vahram.

From: Barry Oglesby <[email protected]<mailto:[email protected]>>
Sent: Wednesday, February 6, 2019 11:21 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Some Geode management metrics returning 0s after OS upgrade

Do you see values for other MBean attributes?

If you do a thread dump in your server JVM(s), you should see a thread like 
this running:

"StatSampler" #39 daemon prio=10 os_prio=31 tid=0x00007fdcbf004000 nid=0x7003 
in Object.wait() [0x000070000c50a000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                at 
org.apache.geode.internal.statistics.HostStatSampler.delay(HostStatSampler.java:519)
                - locked <0x00000007a8911160> (a 
org.apache.geode.internal.statistics.GemFireStatSampler)
                at 
org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:219)
                at java.lang.Thread.run(Thread.java:745)



On Wed, Feb 6, 2019 at 9:40 AM Kirk Lund 
<[email protected]<mailto:[email protected]>> wrote:
Phantom OS might have caused the StatSampler to fail or even crash. That's the 
only explanation I can think of that might result in the non-OS related stats 
remaining zero. You might want to look through the log to see if the 
StatSampler logged any problems. Other than that, you could try running every 
statistic related test/integrationTest/distributedTest in Geode on Phantom OS 
to see how the tests behave.

On Wed, Feb 6, 2019 at 7:49 AM Anthony Baker 
<[email protected]<mailto:[email protected]>> wrote:
I wouldn’t be surprised if other OS -related things are broken on Phantom OS as 
well.  We use JNA for most native calls.  Look at `git grep Native.register` to 
see what posix-like things might be affected.

Anthony


On Feb 6, 2019, at 7:28 AM, Jacob Barrett 
<[email protected]<mailto:[email protected]>> wrote:

We don’t have any hooks into the stats for this OS.

On Feb 6, 2019, at 7:16 AM, Jens Deppe 
<[email protected]<mailto:[email protected]>> wrote:
From SLES 11 to Phantom OS

(I had already asked asked, but my CC got scrambled :( )

On Wed, Feb 6, 2019 at 7:10 AM Anthony Baker 
<[email protected]<mailto:[email protected]>> wrote:
Which OS did you upgrade to?

Anthony

On Feb 6, 2019, at 1:25 AM, Vahram Aharonyan 
<[email protected]<mailto:[email protected]>> wrote:

Hi All,

For our troubleshooting purposes we have been collecting some data from Geode 
cluster members using following APIs:

org.apache.geode.management.MemberMXBean#getFunctionExecutionRate
org.apache.geode.management.MemberMXBean#getPutsRate
org.apache.geode.management.MemberMXBean#getGetsRate

org.apache.geode.management.NetworkMetrics#getBytesReceivedRate
org.apache.geode.management.NetworkMetrics#getBytesSentRate

org.apache.geode.management.DiskMetrics#getDiskFlushAvgLatency
org.apache.geode.management.DiskMetrics#getDiskReadsRate
org.apache.geode.management.DiskMetrics#getDiskWritesRate

Recently we have replaced our base OS and all the values reported back by Geode 
during this calls become 0s.
Could someone help us to understand how these metrics are being collected by 
Geode? Could it be that Geode uses some system utilities or system calls that 
existed in our previous appliance and are removed in our newer version of 
system causing Geode returning only 0s.

Thanks,
Vahram.

RE: Some Geode management metrics returning 0s after OS upgrade

Reply via email to