On Mon, Jul 03, 2006 at 11:48:08PM +0200, Pavel Machek wrote:
> Important info is "did kernel get control and crashed, or did it lock
> up in firmware". Beeping patch can be used to debug that, as can be
> Linus's RTC hack present in very new kernels.

I tried out netconsole, and it works very well for getting output during
the suspend/resume process.  So I've been trying out 64-bit s2ram while
watching the netconsole, and have some interesting results.

One: 64-bit s2ram actually works on my hardware, at least a little bit.
The normal series of events goes like this:

1. s2ram (with VBE_POST, and the NO64 thing from my previous patch
   disabled).
2. machine always suspends normally, power LED starts blinking.
3. resume. HD LED blinks, as well as wireless. After a few seconds, the
   HD LED stays on solid. About 30 seconds after that, the keyboard
   capslock LED starts blinking on and off.

   But what happens next seems to be variable. 
   a. sometimes, the backlight stays off.
   b. sometimes the backlight is on, but the screen stays black.
   c. ONCE, the machine resumed totally and I saw my normal X desktop.
      The s2ram invocation had produced output while doing VBE_POST,
      similar to the 32-bit output I posted earlier. I was able to type
      a few characters into an open xterm. Then, the machine locked up
      and the capslock LED started blinking.

The output on netconsole during steps 1 - 2 - 3b above looks like this:

Stopping tasks: 
===============================================================================|
pnp: Device 00:0b disabled.
ACPI: PCI interrupt for device 0000:01:04.1 disabled
ACPI: PCI interrupt for device 0000:01:04.0 disabled
bcm43xx: Suspending...
bcm43xx: Radio turned off
bcm43xx: DMA 0x0200 (RX) max used slots: 1/64
bcm43xx: DMA 0x0260 (TX) max used slots: 0/512
bcm43xx: DMA 0x0240 (TX) max used slots: 0/512
bcm43xx: DMA 0x0220 (TX) max used slots: 4/512
bcm43xx: DMA 0x0200 (TX) max used slots: 0/512
ACPI: PCI interrupt for device 0000:01:02.0 disabled
bcm43xx: Device suspended.
bcm43xx: Resuming...
PCI: Enabling device 0000:01:02.0 (0000 -> 0002)
ACPI: PCI Interrupt 0000:01:02.0[A] -> Link [LNK3] -> GSI 17 (level, low) -> 
IRQ 21
PM: Writing back config space on device 0000:01:02.0 at offset f (was 100, 
writing 10b)
PM: Writing back config space on device 0000:01:02.0 at offset 4 (was 0, 
writing e0104000)
PM: Writing back config space on device 0000:01:02.0 at offset 3 (was 0, 
writing 4000)
PM: Writing back config space on device 0000:01:02.0 at offset 1 (was 2, 
writing 106)
bcm43xx: PHY connected
bcm43xx: Radio turned on
bcm43xx: Chip initialized
bcm43xx: DMA initialized
bcm43xx: 80211 cores initialized
bcm43xx: Keys cleared
bcm43xx: Device resumed.
PM: Writing back config space on device 0000:01:04.0 at offset f (was 34001ff, 
writing 5c0010b)
PM: Writing back config space on device 0000:01:04.0 at offset e (was 0, 
writing 34fc)
PM: Writing back config space on device 0000:01:04.0 at offset d (was 0, 
writing 3400)
PM: Writing back config space on device 0000:01:04.0 at offset c (was 0, 
writing 30fc)
PM: Writing back config space on device 0000:01:04.0 at offset b (was 0, 
writing 3000)
PM: Writing back config space on device 0000:01:04.0 at offset a (was 0, 
writing e07ff000)
PM: Writing back config space on device 0000:01:04.0 at offset 8 (was 0, 
writing 31fff000)
PM: Writing back config space on device 0000:01:04.0 at offset 6 (was 40000000, 
writing b0050201)
PM: Writing back config space on device 0000:01:04.0 at offset 3 (was 824008, 
writing 82a810)
PM: Writing back config space on device 0000:01:04.0 at offset 1 (was 2100107, 
writing 2100007)
ACPI: PCI Interrupt 0000:01:04.0[A] -> Link [LNK1] -> GSI 19 (level, low) -> 
IRQ 16
PM: Writing back config space on device 0000:01:04.1 at offset f (was 34002ff, 
writing 5c0020a)
PM: Writing back config space on device 0000:01:04.1 at offset e (was 0, 
writing 3cfc)
PM: Writing back config space on device 0000:01:04.1 at offset d (was 0, 
writing 3c00)
PM: Writing back config space on device 0000:01:04.1 at offset c (was 0, 
writing 38fc)
PM: Writing back config space on device 0000:01:04.1 at offset b (was 0, 
writing 3800)
PM: Writing back config space on device 0000:01:04.1 at offset a (was 0, 
writing e0fff000)
PM: Writing back config space on device 0000:01:04.1 at offset 8 (was 0, 
writing 33fff000)
PM: Writing back config space on device 0000:01:04.1 at offset 7 (was e1000000, 
writing 32000000)
PM: Writing back config space on device 0000:01:04.1 at offset 6 (was 40000000, 
writing b0090601)
PM: Writing back config space on device 0000:01:04.1 at offset 3 (was 824008, 
writing 82a810)
PM: Writing back config space on device 0000:01:04.1 at offset 1 (was 2100103, 
writing 2100007)
ACPI: PCI Interrupt 0000:01:04.1[B] -> Link [LNK2] -> GSI 18 (level, low) -> 
IRQ 17
PM: Writing back config space on device 0000:01:04.2 at offset 4 (was 1, 
writing 7401)
PM: Writing back config space on device 0000:01:04.2 at offset 3 (was 0, 
writing 4010)
PM: Writing back config space on device 0000:01:04.2 at offset 1 (was 2100000, 
writing 2100107)
PM: Writing back config space on device 0000:0a:00.0 at offset f (was 1050100, 
writing 105010b)
PM: Writing back config space on device 0000:0a:00.0 at offset 6 (was 8, 
writing f8000008)
PM: Writing back config space on device 0000:0a:00.0 at offset 5 (was 8, 
writing f0000008)
PM: Writing back config space on device 0000:0a:00.0 at offset 4 (was 0, 
writing e2000000)
PM: Writing back config space on device 0000:0a:00.0 at offset 3 (was 0, 
writing 4000)
PM: Writing back config space on device 0000:0a:00.0 at offset 1 (was 2b00000, 
writing 2b00007)
pnp: Res cnt 3
pnp: res cnt 3
pnp: Encode io
pnp: Encode io
pnp: Encode irq
pnp: Failed to activate device 00:08.
pnp: Res cnt 1
pnp: res cnt 1
pnp: Encode irq
pnp: Failed to activate device 00:09.
pnp: Res cnt 4
pnp: res cnt 4
pnp: Encode io
pnp: Encode io
pnp: Encode irq
pnp: Encode dma
pnp: Device 00:0b activated.
SoftMAC: Open Authentication completed with 00:12:17:3a:e2:c7
Restarting tasks...<6>bcm43xx: set security called enabled = 0 encrypt = 0
bcm43xx: set security called enabled = 0 encrypt = 0
bcm43xx: set security called level = 0 enabled = 0 encrypt = 0
bcm43xx: set security called level = 0 enabled = 0 encrypt = 0
bcm43xx: set security called level = 0 enabled = 0 encrypt = 0
bcm43xx: set security called level = 0 enabled = 0 encrypt = 0
bcm43xx: set security called level = 0 enabled = 0 encrypt = 0
bcm43xx: set security called level = 0 enabled = 0 encrypt = 0
bcm43xx: set security called level = 0 enabled = 0 encrypt = 0
bcm43xx: set security called level = 0 enabled = 0 encrypt = 0
 done
bcm43xx: set security called active_key = 0 level = 4 enabled = 1 encrypt = 1
bcm43xx: set security called enabled = 1 encrypt = 1
hda: dma_timer_expiry: dma status == 0x21
hda: DMA timeout error

HARDWARE ERROR
CPU 0: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 136e4622cf
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Kernel panic - not syncing: Machine check
 <4>atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might 
be trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be 
trying access hardware directly.


After the machine check, those atkbd.c messages appear about once every
second.

Running that mce through 'mcelog --ascii' as the message suggests gives
this:

$ mcelog --ascii < mce
HARDWARE ERROR
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC 136e4622cf
  Northbridge Watchdog error
       bit57 = processor context corrupt
       bit61 = error uncorrected
  bus error 'generic participation, request timed out
      generic error mem transaction
      generic access, level generic'
STATUS b200000000070f0f MCGSTATUS 4
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Kernel panic - not syncing: Machine check

Anyone got a clue how to proceed from here?

Jason

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Suspend-devel mailing list
Suspend-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/suspend-devel

Reply via email to