Jeremiah Jordan created CASSANDRA-9124:
------------------------------------------

             Summary: GCInspector logs very different times after CASSANDRA-7638
                 Key: CASSANDRA-9124
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9124
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Jeremiah Jordan
            Priority: Minor
             Fix For: 2.1.5


After the GCInspector rewrite in CASSANDRA-7638 the times reported for CMS are 
the full time (including all the concurrent time), not just the stop the world 
pause time.  In previous versions we reported just the stop the world pause 
time.  This change is kind of scary for someone used to the old logs, and is 
also not as useful.  You can't get "how long were things really stopped" from 
the log message any more.

For example, this is a CMS that got logged in C* 2.1:

{noformat}
INFO  [Service Thread] 2015-04-03 23:58:37,583  GCInspector.java:142 - 
ConcurrentMarkSweep GC in 12926ms.  CMS Old Gen: 5305346280 -> 1106799064; Par 
Eden Space: 223080 -> 158423560; Par Survivor Space: 42081744 -> 51339584
{noformat}

And here is the corresponding information for that CMS from the gc log.

{noformat}
2015-04-03T23:58:24.656+0000: 8064.780: [GC [1 CMS-initial-mark: 
5181002K(6901760K)] 5222315K(7639040K), 0.0316710 secs] [Times: user=0.03 
sys=0.00, real=0.03 secs] 
2015-04-03T23:58:24.688+0000: 8064.812: Total time for which application 
threads were stopped: 0.0324490 seconds
2015-04-03T23:58:24.688+0000: 8064.812: [CMS-concurrent-mark-start]
2015-04-03T23:58:26.939+0000: 8067.062: [CMS-concurrent-mark: 2.176/2.250 secs] 
[Times: user=12.94 sys=1.73, real=2.25 secs] 
2015-04-03T23:58:26.939+0000: 8067.063: [CMS-concurrent-preclean-start]
2015-04-03T23:58:27.209+0000: 8067.333: [CMS-concurrent-preclean: 0.187/0.270 
secs] [Times: user=1.53 sys=0.15, real=0.28 secs] 
2015-04-03T23:58:27.210+0000: 8067.333: 
[CMS-concurrent-abortable-preclean-start]
2015-04-03T23:58:27.988+0000: 8068.112: [CMS-concurrent-abortable-preclean: 
0.759/0.779 secs] [Times: user=4.07 sys=0.74, real=0.77 secs] 
2015-04-03T23:58:27.989+0000: 8068.113: [GC[YG occupancy: 488441 K (737280 
K)]2015-04-03T23:58:27.989+0000: 8068.113: [Rescan (parallel) , 0.3688960 
secs]2015-04-03T23:58:28.358+0000: 8068.482: [weak refs processing, 0.0009620 
secs]2015-04-03T23:58:28.359+0000: 8068.483: [class unloading, 0.0060870 
secs]2015-04-03T23:58:28.365+0000: 8068.489: [scrub symbol table, 0.0146010 
secs]2015-04-03T23:58:28.380+0000: 8068.504: [scrub string table, 0.0031270 
secs] [1 CMS-remark: 5231445K(6901760K)] 5719886K(7639040K), 0.3953770 secs] 
[Times: user=2.96 sys=0.00, real=0.39 secs] 
2015-04-03T23:58:28.385+0000: 8068.508: Total time for which application 
threads were stopped: 0.3962470 seconds
2015-04-03T23:58:28.385+0000: 8068.509: [CMS-concurrent-sweep-start]
2015-04-03T23:58:37.582+0000: 8077.706: [CMS-concurrent-sweep: 8.661/9.197 
secs] [Times: user=44.80 sys=9.58, real=9.20 secs] 
2015-04-03T23:58:37.589+0000: 8077.713: [CMS-concurrent-reset-start]
2015-04-03T23:58:37.633+0000: 8077.757: [CMS-concurrent-reset: 0.044/0.044 
secs] [Times: user=0.19 sys=0.10, real=0.04 secs] 
{noformat}

The entire CMS took the 12 seconds reported in the GCIspector log message.  
Previously we would have only reported the 0.39 seconds that were spent in STW 
pauses.

At the least we need to change the log message so that people don't think we 
are still just reporting STW time.  But it would be more helpful if we could 
get the STW time and put that into the log message like we had previously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to