Intel Memory Bandwidth Monitoring (MBM) counters may report system
memory bandwidth incorrectly on some Intel processors. The errata are
reported in erratum SKX99 [1], erratum BDF102 [2] and RDT reference
manual [3].

To work around the errata, MBM total and local readings are corrected
using a correction factor table.

Since the correction factor table is not publicly documented anywhere,
the table and the errata are documented in Documentation/x86/resctrl.rst
for future reference. The resctrl.rst file is renamed from
Documentation/x86/resctrl_ui.rst because the file won't contain user
interface only anymore.

1. Erratum SKX99 in Intel Xeon Processor Scalable Family Specification
   Update:
http://web.archive.org/web/20200716124958/https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html
2. Erratum BDF102 in Intel Xeon E5-2600 v4 Processor Product Family
   Specification Update:
http://web.archive.org/web/20191125200531/https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v4-spec-update.pdf
3. The errata in Intel Resource Director Technology (Intel RDT) on 2nd
   Generation Intel Xeon Scalable Processors Reference Manual:
https://software.intel.com/content/www/us/en/develop/articles/intel-resource-director-technology-rdt-reference-manual.html

Suggested-by: Borislav Petkov <b...@alien8.de>
Signed-off-by: Fenghua Yu <fenghua...@intel.com>
Reviewed-by: Tony Luck <tony.l...@intel.com>
---
Change Log:
v3:
- Remove unnecessary conf.py change in patch 1 (Randy).

v2:
- Document the correction factor table and errata in resctrl.rst (Boris).
- Change the documentation URLs to stable archive.org (Tony).

 Documentation/x86/index.rst                   |  2 +-
 .../x86/{resctrl_ui.rst => resctrl.rst}       | 82 +++++++++++++++++++
 2 files changed, 83 insertions(+), 1 deletion(-)
 rename Documentation/x86/{resctrl_ui.rst => resctrl.rst} (92%)

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..49d2fd9f0e5b 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -25,7 +25,7 @@ x86-specific Documentation
    pti
    mds
    microcode
-   resctrl_ui
+   resctrl
    tsx_async_abort
    usb-legacy-support
    i386/index
diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl.rst
similarity index 92%
rename from Documentation/x86/resctrl_ui.rst
rename to Documentation/x86/resctrl.rst
index e59b7b93a9b4..8b8ca6de5e1f 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl.rst
@@ -1209,3 +1209,85 @@ View the llc occupancy snapshot::
 
   # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
   11234000
+
+Intel RDT Errata
+================
+
+Intel MBM Counters May Report System Memory Bandwidth Incorrectly
+-----------------------------------------------------------------
+
+Errata SKX99 for Skylake server and BDF102 for Broadwell server.
+
+Problem: Intel Memory Bandwidth Monitoring (MBM) counters track metrics
+according to the assigned Resource Monitor ID (RMID) for that logical core.
+The IA32_QM_CTR register(MSR 0xC8E), used to report these metrics, may
+report incorrect system bandwidth for certain RMID values.
+
+Implication: Due to the errata, system memory bandwidth may not match
+what is reported.
+
+Workaround: The kernel works around the errata.
+
+MBM total and local readings are corrected by the following correction
+factor table for the errata:
+
++---------------+---------------+---------------+-----------------+
+|core count    |rmid count     |rmid threshold |correction factor|
++---------------+---------------+---------------+-----------------+
+|1             |8              |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|2             |16             |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|3             |24             |15             |0.969650         |
++---------------+---------------+---------------+-----------------+
+|4             |32             |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|6             |48             |31             |0.969650         |
++---------------+---------------+---------------+-----------------+
+|7             |56             |47             |1.142857         |
++---------------+---------------+---------------+-----------------+
+|8             |64             |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|9             |72             |63             |1.185115         |
++---------------+---------------+---------------+-----------------+
+|10            |80             |63             |1.066553         |
++---------------+---------------+---------------+-----------------+
+|11            |88             |79             |1.454545         |
++---------------+---------------+---------------+-----------------+
+|12            |96             |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|13            |104            |95             |1.230769         |
++---------------+---------------+---------------+-----------------+
+|14            |112            |95             |1.142857         |
++---------------+---------------+---------------+-----------------+
+|15            |120            |95             |1.066667         |
++---------------+---------------+---------------+-----------------+
+|16            |128            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|17            |136            |127            |1.254863         |
++---------------+---------------+---------------+-----------------+
+|18            |144            |127            |1.185255         |
++---------------+---------------+---------------+-----------------+
+|19            |152            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|20            |160            |127            |1.066667         |
++---------------+---------------+---------------+-----------------+
+|21            |168            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|22            |176            |159            |1.454334         |
++---------------+---------------+---------------+-----------------+
+|23            |184            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|24            |192            |127            |0.969744         |
++---------------+---------------+---------------+-----------------+
+|25            |200            |191            |1.280246         |
++---------------+---------------+---------------+-----------------+
+|26            |208            |191            |1.230921         |
++---------------+---------------+---------------+-----------------+
+|27            |216            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|28            |224            |191            |1.143118         |
++---------------+---------------+---------------+-----------------+
+
+If rmid > rmid threshold, MBM total and local values should be multiplied
+by the correction factor.
-- 
2.28.0

Reply via email to