More data for the problem:
Compaq V4000 (laptop) - Centrino 1.73G.  Max CPU Temp (100 C) (from Intel)
Using Ubuntu Dapper.  Under high load would claim critical temp reached and 
halt.  dmesg shows CPU reached (102 C).
Using powersave and kpowersave

For me this problem started out of the blue - not following any kernel
change or particular apt-get updates.

I use "watch 1 acpitool -tfc" to keep watch over the system atm.
Noteworthy mentions are:

  Fan            : <not available>
  Throttling control     : no
  Limit interface        : no
  critical (S5):           100 C
  passive:                 95 C: tc1=2 tc2=5 tsp=300 devices=0xdffea660

First 3 seems wrong for a Centrino -- but I guess that is a problem with
the ACPI interface to BIOS here.

The fan (there is one) responds autonomously -- probably BIOS
controlled?  So does the above really matter.

Doing something like kernel compile I would see the CPU temp hovering
between 80-100.  Passive would kick in every now and again.

polling_interval was set to 2, I changed this to 30 and observed that
sometimes the CPU temp spiked at 102, 105, 107 but for no more than 1
second then immediately dropped back to sub-100.  No instability, so
could be a glitch?

Sometimes Linux will hit 100+ on 30 seconds and halt.

My conclusions:

the polling is far too rigid.  Perhaps it should take some averages over
another interval or require a sustained critical temperature before
ditching the system. (make this user configurable under /proc/acpi/ as
is the rest).  I like the idea of polling_interval being 2 but my system
would be fine if it only acted on the critical temperature if the CPU
was 100+ for more than 3 of these intervals.

The passive trip could be wrong, but that depends on the interpretation
of the 100+ spikes.

I currently avoid the problem by changing things to:

  echo 5 > /proc/acpi/thermal_zone/THR0/polling_frequency
  echo 5 > /proc/acpi/thermal_zone/THR1/polling_frequency
  echo "110:102:90:60:50:40" > /proc/acpi/thermal_zone/THR0/trip_points

The 110 attempts to offset the spike (which is a rare spike); the 90
sets the passive kick-in which takes the CPU speed to 1.3G during the
passive region.

Powersave and co (tried a few) seemed to be doing their job.  (note
Klaptop is the only thing that can successfully suspend to RAM for me)

I'm of the belief that my hardware (1+ year old always working) is
showing some minor cracks with the 100+ temp spike.  But I also think
the kernel could be more forgiving of it.

http://www.columbia.edu/~ariel/acpi/acpi_howto.html was a very good
read.

-- 
laptop overheats when performing CPU intensive tasks.
https://launchpad.net/bugs/22336

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to