** Description changed:

  [Impact]
  Some CPU sensors are not enumerated, this can make thermald deviates from the 
correct behavior of the CPU TDP.
  
  [Fix]
  Traverse all sensors under hwmon sysfs directory to make sure everything is 
enumerated.
  
  [Test]
  Check the output of thermald. Once the fix is in place, thermal zones that 
are previously omitted now shows up:
  [INFO]Zone 1: AMBF, Active:1 Bind:1 Sensor_cnt:1
  To do so
  0. get a large machine which will have more thermal zones
  1. stop the potentially auto-running service
-    $ systemctl stop thermald
+    $ systemctl stop thermald
  2. run the daemon in foreground with loglevel to see what is going on.
-    On many modern systemd (=the large ones) it might not know the CPUid,
-    to bypass that for the test you can ask it to ignore the check
-    $ sudo thermald --no-daemon --loglevel=info --ignore-cpuid-check
+    On many modern systemd (=the large ones) it might not know the CPUid,
+    to bypass that for the test you can ask it to ignore the check
+    $ sudo thermald --no-daemon --loglevel=info --ignore-cpuid-check
  3. check the output
-    On init the system will be probed and that will show something like:
+    On init the system will be probed and that will show something like:
  
  ...
-  ZONE DUMP BEGIN
+  ZONE DUMP BEGIN
  [1718954645][INFO]Zone 2: cpu, Active:1 Bind:0 Sensor_cnt:1
  ...
  [1718954645][INFO]Zone 3: cpu, Active:1 Bind:0 Sensor_cnt:1
  ...
-  ZONE DUMP END
+  ZONE DUMP END
  
  In here, on systems with many thermal zones one would before the fix
  only see a few, and with the fix more zones.
  
- 
  [Where problems could occur]
  Since the new logic traverse the whole hwmon sysfs, the startup time can take 
slightly longer.
+ 
+ [racb] Existing users' systems may have bad or otherwise irrelevant or
+ out of scope sensors that may not have been causing misbehaviour due to
+ being skipped, but after the fix, they would face a regression. I'm not
+ sure that we can realistically identify such cases though, and it seems
+ reasonable to favour correct systems over misbehaving ones.
+ 
+ [racb] We may pick up additional sensor data that we shouldn't do due to
+ inadequate filtering, causing incorrect behaviour.

** Description changed:

  [Impact]
  Some CPU sensors are not enumerated, this can make thermald deviates from the 
correct behavior of the CPU TDP.
  
  [Fix]
  Traverse all sensors under hwmon sysfs directory to make sure everything is 
enumerated.
  
  [Test]
  Check the output of thermald. Once the fix is in place, thermal zones that 
are previously omitted now shows up:
  [INFO]Zone 1: AMBF, Active:1 Bind:1 Sensor_cnt:1
  To do so
  0. get a large machine which will have more thermal zones
  1. stop the potentially auto-running service
     $ systemctl stop thermald
  2. run the daemon in foreground with loglevel to see what is going on.
     On many modern systemd (=the large ones) it might not know the CPUid,
     to bypass that for the test you can ask it to ignore the check
     $ sudo thermald --no-daemon --loglevel=info --ignore-cpuid-check
  3. check the output
     On init the system will be probed and that will show something like:
  
  ...
   ZONE DUMP BEGIN
  [1718954645][INFO]Zone 2: cpu, Active:1 Bind:0 Sensor_cnt:1
  ...
  [1718954645][INFO]Zone 3: cpu, Active:1 Bind:0 Sensor_cnt:1
  ...
   ZONE DUMP END
  
  In here, on systems with many thermal zones one would before the fix
  only see a few, and with the fix more zones.
  
  [Where problems could occur]
  Since the new logic traverse the whole hwmon sysfs, the startup time can take 
slightly longer.
  
  [racb] Existing users' systems may have bad or otherwise irrelevant or
  out of scope sensors that may not have been causing misbehaviour due to
  being skipped, but after the fix, they would face a regression. I'm not
  sure that we can realistically identify such cases though, and it seems
  reasonable to favour correct systems over misbehaving ones.
  
- [racb] We may pick up additional sensor data that we shouldn't do due to
- inadequate filtering, causing incorrect behaviour.
+ [racb] Similar to my previous point, we may pick up additional sensor
+ data that we shouldn't do due to inadequate filtering, causing incorrect
+ behaviour but this time it would be a bug in our filtering rather than
+ misbehaving existing systems.

** Description changed:

  [Impact]
  Some CPU sensors are not enumerated, this can make thermald deviates from the 
correct behavior of the CPU TDP.
  
  [Fix]
  Traverse all sensors under hwmon sysfs directory to make sure everything is 
enumerated.
  
  [Test]
  Check the output of thermald. Once the fix is in place, thermal zones that 
are previously omitted now shows up:
  [INFO]Zone 1: AMBF, Active:1 Bind:1 Sensor_cnt:1
  To do so
  0. get a large machine which will have more thermal zones
  1. stop the potentially auto-running service
     $ systemctl stop thermald
  2. run the daemon in foreground with loglevel to see what is going on.
     On many modern systemd (=the large ones) it might not know the CPUid,
     to bypass that for the test you can ask it to ignore the check
     $ sudo thermald --no-daemon --loglevel=info --ignore-cpuid-check
  3. check the output
     On init the system will be probed and that will show something like:
  
  ...
   ZONE DUMP BEGIN
  [1718954645][INFO]Zone 2: cpu, Active:1 Bind:0 Sensor_cnt:1
  ...
  [1718954645][INFO]Zone 3: cpu, Active:1 Bind:0 Sensor_cnt:1
  ...
   ZONE DUMP END
  
  In here, on systems with many thermal zones one would before the fix
  only see a few, and with the fix more zones.
  
  [Where problems could occur]
  Since the new logic traverse the whole hwmon sysfs, the startup time can take 
slightly longer.
  
  [racb] Existing users' systems may have bad or otherwise irrelevant or
  out of scope sensors that may not have been causing misbehaviour due to
  being skipped, but after the fix, they would face a regression. I'm not
  sure that we can realistically identify such cases though, and it seems
  reasonable to favour correct systems over misbehaving ones.
  
  [racb] Similar to my previous point, we may pick up additional sensor
  data that we shouldn't do due to inadequate filtering, causing incorrect
  behaviour but this time it would be a bug in our filtering rather than
- misbehaving existing systems.
+ misbehaving existing systems. In mitigation, I see that the fixed
+ version has been released in Kinetic, so has had some real world
+ testing, and I see no indication upstream or in Launchpad that this was
+ a problem in practice.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2054391

Title:
  Fix  CPU thermal sensors enumeration

To manage notifications about this bug go to:
https://bugs.launchpad.net/hwe-next/+bug/2054391/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to