From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com>

This is the v2 of the fix to change the default behaviour of cede_offline.
The previous version can be found here: https://lkml.org/lkml/2019/9/12/222

The main change from v1 is that the patch2 to create a sysfs file to
report and control the value of cede_offline_enabled has been dropped.

Problem Description:
====================
Currently on Pseries Linux Guests, the offlined CPU can be put to one
of the following two states:
   - Long term processor cede (also called extended cede)
   - Returned to the Hypervisor via RTAS "stop-self" call.

This is controlled by the kernel boot parameter "cede_offline=on/off".

By default the offlined CPUs enter extended cede. The PHYP hypervisor
considers CPUs in extended cede to be "active" since the CPUs are
still under the control fo the Linux Guests. Hence, when we change the
SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs
will continue to count the values for offlined CPUs in extended cede
as if they are online.

One of the expectations with PURR is that the for an interval of time,
the sum of the PURR increments across the online CPUs of a core should
equal the number of timebase ticks for that interval.

This is currently not the case.

In the following data (Generated using
https://github.com/gautshen/misc/blob/master/purr_tb.py):

SD-PURR = Sum of PURR increments on online CPUs of that core in 1 second
      
SMT=off
===========================================
Core            SD-PURR         SD-PURR
                (expected)      (observed)
===========================================
core00 [  0]    512000000       69883784        
core01 [  8]    512000000       88782536        
core02 [ 16]    512000000       94296824        
core03 [ 24]    512000000       80951968        

SMT=2
===========================================
Core            SD-PURR         SD-PURR
                (expected)      (observed)
===========================================
core00 [  0,1]  512000000       136147792       
core01 [  8,9]  512000000       128636784       
core02 [ 16,17] 512000000       135426488       
core03 [ 24,25] 512000000       153027520       

SMT=4
===================================================
Core                    SD-PURR         SD-PURR
                        (expected)      (observed)
===================================================
core00 [  0,1,2,3]      512000000       258331616       
core01 [  8,9,10,11]    512000000       274220072       
core02 [ 16,17,18,19]   512000000       260013736       
core03 [ 24,25,26,27]   512000000       260079672       

SMT=on
===================================================================
Core                                    SD-PURR         SD-PURR
                                        (expected)      (observed)
===================================================================
core00 [  0,1,2,3,4,5,6,7]              512000000       512941248       
core01 [  8,9,10,11,12,13,14,15]        512000000       512936544       
core02 [ 16,17,18,19,20,21,22,23]       512000000       512931544       
core03 [ 24,25,26,27,28,29,30,31]       512000000       512923800

This patchset addresses this issue by ensuring that by default, the
offlined CPUs are returned to the Hypervisor via RTAS "stop-self" call
by changing the default value of "cede_offline_enabled" to false.

With the patches, we see that the observed value of the sum of the
PURR increments across the the online threads of a core in 1-second
matches the number of tb-ticks in 1-second.

SMT=off
===========================================
Core            SD-PURR         SD-PURR
                (expected)      (observed)
===========================================
core00 [  0]    512000000        512527568      
core01 [  8]    512000000        512556128      
core02 [ 16]    512000000        512590016      
core03 [ 24]    512000000        512589440      

SMT=2
===========================================
Core            SD-PURR         SD-PURR
                (expected)      (observed)
===========================================
core00 [  0,1]  512000000       512635328
core01 [  8,9]  512000000       512610416       
core02 [ 16,17] 512000000       512639360       
core03 [ 24,25] 512000000       512638720       

SMT=4
===================================================
Core                    SD-PURR         SD-PURR
                        (expected)      (observed)
===================================================
core00 [  0,1,2,3]      512000000       512757328       
core01 [  8,9,10,11]    512000000       512727920       
core02 [ 16,17,18,19]   512000000       512754712       
core03 [ 24,25,26,27]   512000000       512739040       

SMT=on
==============================================================
Core                               SD-PURR         SD-PURR
                                   (expected)      (observed)
==============================================================
core00 [  0,1,2,3,4,5,6,7]         512000000       512920936    
core01 [  8,9,10,11,12,13,14,15]   512000000       512878728    
core02 [ 16,17,18,19,20,21,22,23]  512000000       512921192    
core03 [ 24,25,26,27,28,29,30,31]  512000000       512924816    

Further, the patch
   gives an improvement of 5% in offlining of a core on POWER8,
   gives an improvement of 18% in offlining of a core on POWER9,
   causes a regression of 2.5% in onlining of a core on POWER8,
   causes a regression of 4.5% in onlining of a core on POWER9.
                 
POWER8
======================================================================
| Operation | Patch status |#Samples|Min |Max  |Median|Avg    |Stddev|
|           |              |        |(ms)|(ms) | (ms) |(ms)   |      |
======================================================================
| Offline   | Without Patch| 20     | 822| 1232| 972  | 986.8 |112.58|
| Offline   | With Patch   | 20     | 831| 1152| 941  | 938.6 | 80.33|
| --------- | -------------|--------|----| ----|------|-------|------|
| Online    | Without Patch| 20     |1460| 1760| 1620 |1591.2 | 82.72|
| Online    | With Patch   | 20     |1489| 1839| 1629 |1629.6 | 94.90|
======================================================================

POWER9
======================================================================
| Operation | Patch status |#Samples|Min |Max  |Median|Avg    |Stddev|
|           |              |        |(ms)|(ms) | (ms) |(ms)   |      |
======================================================================
| Offline   | Without Patch| 20     |1120|1653 | 1394 |1392.9 |133.63|
| Offline   | With Patch   | 20     | 930|1316 | 1161 |1130.8 |117.76|
| --------- | -------------|--------|----| ----|------|-------|------|
| Online    | Without Patch| 20     |1652|2108 | 1903 |1891.6 |130.74|
| Online    | With Patch   | 20     |1824|2222 | 1960 |1976.1 | 93.98|
======================================================================

Gautham R. Shenoy (1):
  pseries/hotplug-cpu: Change default behaviour of cede_offline to "off"

 Documentation/core-api/cpu_hotplug.rst       |  2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 12 +++++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

-- 
1.9.4

Reply via email to