On Mon, Jul 03, 2006 at 11:48:08PM +0200, Pavel Machek wrote: > Important info is "did kernel get control and crashed, or did it lock > up in firmware". Beeping patch can be used to debug that, as can be > Linus's RTC hack present in very new kernels.
I tried out netconsole, and it works very well for getting output during the suspend/resume process. So I've been trying out 64-bit s2ram while watching the netconsole, and have some interesting results. One: 64-bit s2ram actually works on my hardware, at least a little bit. The normal series of events goes like this: 1. s2ram (with VBE_POST, and the NO64 thing from my previous patch disabled). 2. machine always suspends normally, power LED starts blinking. 3. resume. HD LED blinks, as well as wireless. After a few seconds, the HD LED stays on solid. About 30 seconds after that, the keyboard capslock LED starts blinking on and off. But what happens next seems to be variable. a. sometimes, the backlight stays off. b. sometimes the backlight is on, but the screen stays black. c. ONCE, the machine resumed totally and I saw my normal X desktop. The s2ram invocation had produced output while doing VBE_POST, similar to the 32-bit output I posted earlier. I was able to type a few characters into an open xterm. Then, the machine locked up and the capslock LED started blinking. The output on netconsole during steps 1 - 2 - 3b above looks like this: Stopping tasks: ===============================================================================| pnp: Device 00:0b disabled. ACPI: PCI interrupt for device 0000:01:04.1 disabled ACPI: PCI interrupt for device 0000:01:04.0 disabled bcm43xx: Suspending... bcm43xx: Radio turned off bcm43xx: DMA 0x0200 (RX) max used slots: 1/64 bcm43xx: DMA 0x0260 (TX) max used slots: 0/512 bcm43xx: DMA 0x0240 (TX) max used slots: 0/512 bcm43xx: DMA 0x0220 (TX) max used slots: 4/512 bcm43xx: DMA 0x0200 (TX) max used slots: 0/512 ACPI: PCI interrupt for device 0000:01:02.0 disabled bcm43xx: Device suspended. bcm43xx: Resuming... PCI: Enabling device 0000:01:02.0 (0000 -> 0002) ACPI: PCI Interrupt 0000:01:02.0[A] -> Link [LNK3] -> GSI 17 (level, low) -> IRQ 21 PM: Writing back config space on device 0000:01:02.0 at offset f (was 100, writing 10b) PM: Writing back config space on device 0000:01:02.0 at offset 4 (was 0, writing e0104000) PM: Writing back config space on device 0000:01:02.0 at offset 3 (was 0, writing 4000) PM: Writing back config space on device 0000:01:02.0 at offset 1 (was 2, writing 106) bcm43xx: PHY connected bcm43xx: Radio turned on bcm43xx: Chip initialized bcm43xx: DMA initialized bcm43xx: 80211 cores initialized bcm43xx: Keys cleared bcm43xx: Device resumed. PM: Writing back config space on device 0000:01:04.0 at offset f (was 34001ff, writing 5c0010b) PM: Writing back config space on device 0000:01:04.0 at offset e (was 0, writing 34fc) PM: Writing back config space on device 0000:01:04.0 at offset d (was 0, writing 3400) PM: Writing back config space on device 0000:01:04.0 at offset c (was 0, writing 30fc) PM: Writing back config space on device 0000:01:04.0 at offset b (was 0, writing 3000) PM: Writing back config space on device 0000:01:04.0 at offset a (was 0, writing e07ff000) PM: Writing back config space on device 0000:01:04.0 at offset 8 (was 0, writing 31fff000) PM: Writing back config space on device 0000:01:04.0 at offset 6 (was 40000000, writing b0050201) PM: Writing back config space on device 0000:01:04.0 at offset 3 (was 824008, writing 82a810) PM: Writing back config space on device 0000:01:04.0 at offset 1 (was 2100107, writing 2100007) ACPI: PCI Interrupt 0000:01:04.0[A] -> Link [LNK1] -> GSI 19 (level, low) -> IRQ 16 PM: Writing back config space on device 0000:01:04.1 at offset f (was 34002ff, writing 5c0020a) PM: Writing back config space on device 0000:01:04.1 at offset e (was 0, writing 3cfc) PM: Writing back config space on device 0000:01:04.1 at offset d (was 0, writing 3c00) PM: Writing back config space on device 0000:01:04.1 at offset c (was 0, writing 38fc) PM: Writing back config space on device 0000:01:04.1 at offset b (was 0, writing 3800) PM: Writing back config space on device 0000:01:04.1 at offset a (was 0, writing e0fff000) PM: Writing back config space on device 0000:01:04.1 at offset 8 (was 0, writing 33fff000) PM: Writing back config space on device 0000:01:04.1 at offset 7 (was e1000000, writing 32000000) PM: Writing back config space on device 0000:01:04.1 at offset 6 (was 40000000, writing b0090601) PM: Writing back config space on device 0000:01:04.1 at offset 3 (was 824008, writing 82a810) PM: Writing back config space on device 0000:01:04.1 at offset 1 (was 2100103, writing 2100007) ACPI: PCI Interrupt 0000:01:04.1[B] -> Link [LNK2] -> GSI 18 (level, low) -> IRQ 17 PM: Writing back config space on device 0000:01:04.2 at offset 4 (was 1, writing 7401) PM: Writing back config space on device 0000:01:04.2 at offset 3 (was 0, writing 4010) PM: Writing back config space on device 0000:01:04.2 at offset 1 (was 2100000, writing 2100107) PM: Writing back config space on device 0000:0a:00.0 at offset f (was 1050100, writing 105010b) PM: Writing back config space on device 0000:0a:00.0 at offset 6 (was 8, writing f8000008) PM: Writing back config space on device 0000:0a:00.0 at offset 5 (was 8, writing f0000008) PM: Writing back config space on device 0000:0a:00.0 at offset 4 (was 0, writing e2000000) PM: Writing back config space on device 0000:0a:00.0 at offset 3 (was 0, writing 4000) PM: Writing back config space on device 0000:0a:00.0 at offset 1 (was 2b00000, writing 2b00007) pnp: Res cnt 3 pnp: res cnt 3 pnp: Encode io pnp: Encode io pnp: Encode irq pnp: Failed to activate device 00:08. pnp: Res cnt 1 pnp: res cnt 1 pnp: Encode irq pnp: Failed to activate device 00:09. pnp: Res cnt 4 pnp: res cnt 4 pnp: Encode io pnp: Encode io pnp: Encode irq pnp: Encode dma pnp: Device 00:0b activated. SoftMAC: Open Authentication completed with 00:12:17:3a:e2:c7 Restarting tasks...<6>bcm43xx: set security called enabled = 0 encrypt = 0 bcm43xx: set security called enabled = 0 encrypt = 0 bcm43xx: set security called level = 0 enabled = 0 encrypt = 0 bcm43xx: set security called level = 0 enabled = 0 encrypt = 0 bcm43xx: set security called level = 0 enabled = 0 encrypt = 0 bcm43xx: set security called level = 0 enabled = 0 encrypt = 0 bcm43xx: set security called level = 0 enabled = 0 encrypt = 0 bcm43xx: set security called level = 0 enabled = 0 encrypt = 0 bcm43xx: set security called level = 0 enabled = 0 encrypt = 0 bcm43xx: set security called level = 0 enabled = 0 encrypt = 0 done bcm43xx: set security called active_key = 0 level = 4 enabled = 1 encrypt = 1 bcm43xx: set security called enabled = 1 encrypt = 1 hda: dma_timer_expiry: dma status == 0x21 hda: DMA timeout error HARDWARE ERROR CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f TSC 136e4622cf This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check <4>atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be trying access hardware directly. After the machine check, those atkbd.c messages appear about once every second. Running that mce through 'mcelog --ascii' as the message suggests gives this: $ mcelog --ascii < mce HARDWARE ERROR HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge TSC 136e4622cf Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b200000000070f0f MCGSTATUS 4 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check Anyone got a clue how to proceed from here? Jason Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Suspend-devel mailing list Suspend-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/suspend-devel