** Description changed:

- When thermald updates /sys/devices/virtual/powercap/intel-rapl/intel-
- rapl:0/constraint_0_power_limit_uw the kernel is emitting the following
- message:
+ [SRU Justification][Trusty][Wily]
+ 
+ thermald is triggering the kernel to SPAM the kernel log with frequent 
"package locked by BIOS, monitoring only" messages. 
+  
+ [Fix]
+ This issue is fixed with the following upstream commits:
+ 
+ f1a77c5f3b936ba8a7a63d587a803641974f8e62 ("thd_cdev_rapl: stop writing
+ to sysfs if the write fails (LP: #1543046)")
+ 
+ 833245725494eb26a1c61ca6f1a9db90599ae71b ("Initialize bios_locked to
+ false")
+ 
+ These two fixes have been shown to work on Xenial and apply cleanly to
+ Trusty and Wily versions of thermald.  The risk of regression is low
+ since these fixes add extra sanity checking to the code rather than
+ completely new functionality plus they are upstream commits that have
+ been available Xenial for some time now.
+ 
+ [Testcase]
+ Run on a system that reads 
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
 where the BIOS has this feature locked and the kernel emits this message every 
time thermald accesses this /sys file.
+ 
+ With the fix, this message only appears once, and no more spamming
+ occurs thereafter.
+ 
+ [Regression Potential]
+ Minimal. The fixes are upstream and have been tested in Xenial for quite a 
while.  The fixes patch cleanly to Trusty and Wily and result in the same 
upstream code, so the code paths are identical to that of Xenial's thermald.
+ 
+ ----------------------------------------
+ 
+ 
+ When thermald updates 
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
 the kernel is emitting the following message:
  
  [38458.753468] powercap intel-rapl:0: package locked by BIOS, monitoring only
  [38637.993447] powercap intel-rapl:0: package locked by BIOS, monitoring only
  [38674.154336] powercap intel-rapl:0: package locked by BIOS, monitoring only
  [38691.500619] powercap intel-rapl:0: package locked by BIOS, monitoring only
  
  This message comes from set_power_limit() in
  drivers/powercap/intel_rapl.c because the domain is locked by the BIOS.
  Writing to this interface fails with an error:
  
  
open("/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw",
 O_WRONLY) = 3
  write(3, "35000000", 8)                 = -1 ENODATA (No data available)
  
  so in theory thermald should be seeing this failed write and handling it
  appropriately rather.
  
  cthd_sysfs_cdev_rapl::set_curr_state() and
  cthd_sysfs_cdev_rapl::set_curr_state_raw()  in src/thd_cdev_rapl.cpp
  perform the update and they do check that the sysfs write fails:
  
-          if (cdev_sysfs.write(tc_state_dev.str(), state_str.str()) < 0)
-                 curr_state = (state == 0) ? 0 : max_state;
+          if (cdev_sysfs.write(tc_state_dev.str(), state_str.str()) < 0)
+                 curr_state = (state == 0) ? 0 : max_state;
  
  however,  I believe they should check errno for the failed write and
  disable the rapl interface if we get -ENODATA on this interface to avoid
  repeated failures and hence repeated spamming of kernel messages

** Description changed:

  [SRU Justification][Trusty][Wily]
  
- thermald is triggering the kernel to SPAM the kernel log with frequent 
"package locked by BIOS, monitoring only" messages. 
-  
+ thermald is triggering the kernel to SPAM the kernel log with frequent
+ "package locked by BIOS, monitoring only" messages.
+ 
  [Fix]
  This issue is fixed with the following upstream commits:
  
  f1a77c5f3b936ba8a7a63d587a803641974f8e62 ("thd_cdev_rapl: stop writing
  to sysfs if the write fails (LP: #1543046)")
  
  833245725494eb26a1c61ca6f1a9db90599ae71b ("Initialize bios_locked to
  false")
  
  These two fixes have been shown to work on Xenial and apply cleanly to
  Trusty and Wily versions of thermald.  The risk of regression is low
  since these fixes add extra sanity checking to the code rather than
  completely new functionality plus they are upstream commits that have
  been available Xenial for some time now.
  
  [Testcase]
  Run on a system that reads 
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
 where the BIOS has this feature locked and the kernel emits this message every 
time thermald accesses this /sys file.
  
  With the fix, this message only appears once, and no more spamming
  occurs thereafter.
  
  [Regression Potential]
  Minimal. The fixes are upstream and have been tested in Xenial for quite a 
while.  The fixes patch cleanly to Trusty and Wily and result in the same 
upstream code, so the code paths are identical to that of Xenial's thermald.
  
  ----------------------------------------
  
- 
- When thermald updates 
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
 the kernel is emitting the following message:
+ When thermald updates /sys/devices/virtual/powercap/intel-rapl/intel-
+ rapl:0/constraint_0_power_limit_uw the kernel is emitting the following
+ message:
  
  [38458.753468] powercap intel-rapl:0: package locked by BIOS, monitoring only
  [38637.993447] powercap intel-rapl:0: package locked by BIOS, monitoring only
  [38674.154336] powercap intel-rapl:0: package locked by BIOS, monitoring only
  [38691.500619] powercap intel-rapl:0: package locked by BIOS, monitoring only
  
  This message comes from set_power_limit() in
  drivers/powercap/intel_rapl.c because the domain is locked by the BIOS.
  Writing to this interface fails with an error:
  
  
open("/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw",
 O_WRONLY) = 3
  write(3, "35000000", 8)                 = -1 ENODATA (No data available)
  
  so in theory thermald should be seeing this failed write and handling it
  appropriately rather.
  
  cthd_sysfs_cdev_rapl::set_curr_state() and
  cthd_sysfs_cdev_rapl::set_curr_state_raw()  in src/thd_cdev_rapl.cpp
  perform the update and they do check that the sysfs write fails:
  
           if (cdev_sysfs.write(tc_state_dev.str(), state_str.str()) < 0)
                  curr_state = (state == 0) ? 0 : max_state;
  
  however,  I believe they should check errno for the failed write and
  disable the rapl interface if we get -ENODATA on this interface to avoid
  repeated failures and hence repeated spamming of kernel messages

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1543046

Title:
  thermald spamming kernel log when updating powercap RAPL  powerlimit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1543046/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to