BBlack has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/362438 )

Change subject: [WIP] numa_networking: new state "isolate"
......................................................................

[WIP] numa_networking: new state "isolate"

This causes kernel cmdline params to be added (for future reboots)
which isolates the CPUs in the primary interface's NUMA node,
reserving them exclusively for tasks which are explicitly
configured there (e.g. via cset, taskset, and/or numactl).

TODOs:
  1. Sort out the writeback masking stuff
  2. Make it safer: ideally via facts we should recognize the case where 
"isolate" doesn't really make sense because it would isolate every CPU in the 
system.  Right now the only protection is "user should not set this parameter 
on hardware it doesn't make sense on"

Change-Id: I11027be1b9bcb66bf82dba0cf69c9c034a1d114e
---
M hieradata/hosts/cp4021.yaml
M manifests/realm.pp
M modules/interface/manifests/rps/modparams.pp
M modules/interface/templates/interface-rps-config.erb
M modules/profile/manifests/base.pp
M modules/tlsproxy/manifests/instance.pp
6 files changed, 26 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/38/362438/1

diff --git a/hieradata/hosts/cp4021.yaml b/hieradata/hosts/cp4021.yaml
index 84968de..4c422e3 100644
--- a/hieradata/hosts/cp4021.yaml
+++ b/hieradata/hosts/cp4021.yaml
@@ -1,2 +1,2 @@
 bbr_congestion_control: true
-numa_networking: true
+numa_networking: isolate
diff --git a/manifests/realm.pp b/manifests/realm.pp
index 900ca27..383a960 100644
--- a/manifests/realm.pp
+++ b/manifests/realm.pp
@@ -60,7 +60,15 @@
 }
 
 # Hiera->Global to configure various classes for NUMA-aware networking
-$numa_networking = hiera('numa_networking', false)
+# 3 possible values:
+# off: default, no NUMA awareness
+# on: try confine network stuff to the NUMA node of the adapter
+# isolate: also exclude all other tasks from the NUMA node of the adapter
+#   Note that "isolate" will probably be dysfunctional on nodes which do not
+#   have true multi-node NUMA hardware with the relevant interface(s) attached
+#   to less than all nodes!  Therefore setting 'isolate' should only be done in
+#   cases with known-compatible hardware.
+$numa_networking = hiera('numa_networking', 'off')
 
 # TODO: create hash of all LVS service IPs
 
diff --git a/modules/interface/manifests/rps/modparams.pp 
b/modules/interface/manifests/rps/modparams.pp
index a5fc7d0..b020730 100644
--- a/modules/interface/manifests/rps/modparams.pp
+++ b/modules/interface/manifests/rps/modparams.pp
@@ -1,7 +1,7 @@
 class interface::rps::modparams {
     include initramfs
 
-    if $::numa_networking {
+    if $::numa_networking != 'off' {
         # note this assumes if bnx2x queue counts matter at all, that the
         # primary interface is bnx2x.  This is true for current cases, but may
         # need to evolve later for hosts with multiple interfaces with distinct
diff --git a/modules/interface/templates/interface-rps-config.erb 
b/modules/interface/templates/interface-rps-config.erb
index a1c9e80..3c6892e 100644
--- a/modules/interface/templates/interface-rps-config.erb
+++ b/modules/interface/templates/interface-rps-config.erb
@@ -1,4 +1,4 @@
 [Options]
 <% if @rss_pattern != '' %>rss_pattern = <%= @rss_pattern %><% end %>
 <% if @qdisc != '' %>qdisc = <%= @qdisc %><% end %>
-<%- if @numa_networking %>numa_filter = yes<% end -%>
+<%- if @numa_networking != 'off' %>numa_filter = yes<% end -%>
diff --git a/modules/profile/manifests/base.pp 
b/modules/profile/manifests/base.pp
index 280f629..4e4591d 100644
--- a/modules/profile/manifests/base.pp
+++ b/modules/profile/manifests/base.pp
@@ -106,4 +106,17 @@
             source => 'puppet:///modules/base/logrotate/upstart',
         }
     }
+
+    if $::numa_networking == 'isolate' {
+        grub::bootparam { 'isolcpus':
+            value => 
join(sort(flatten($facts['numa']['device_to_htset'][$facts['interface_primary']])),
 ',')
+        }
+        # XXX TODO: move disk writeback off the isolated node, needs inverted 
(or opposite-node) cpumask...
+        # sysfs::parameters { 'cache_numa_isolate':
+        #     values => {
+        #         'bus/workqueue/devices/writeback/numa'    => 0,
+        #         'bus/workqueue/devices/writeback/cpumask' => XXX,
+        #     }
+        # }
+    }
 }
diff --git a/modules/tlsproxy/manifests/instance.pp 
b/modules/tlsproxy/manifests/instance.pp
index 959255d..57a7e11 100644
--- a/modules/tlsproxy/manifests/instance.pp
+++ b/modules/tlsproxy/manifests/instance.pp
@@ -27,7 +27,7 @@
     # otherwise use 'lo' for this purpose.  Assumes NUMA data has "lo" 
interface
     # mapped to all cpu cores in the non-NUMA case.  The numa_iface variable is
     # in turn consumed by the systemd unit and config templates.
-    if $::numa_networking {
+    if $::numa_networking != 'off' {
         $numa_iface = $facts['interface_primary']
     } else {
         $numa_iface = 'lo'

-- 
To view, visit https://gerrit.wikimedia.org/r/362438
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I11027be1b9bcb66bf82dba0cf69c9c034a1d114e
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: BBlack <bbl...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to