TRACE_RESUMEs around this line indicate that it gets as far as radeon_resume, but not as far as radeon_pm_resume: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.35.y.git;a=blob;f=drivers/gpu/drm/radeon/radeon_device.c;h=a7184636dcb494b36d470af30c2b513382d38019;hb=1506707a6c740db316e422239a53ae5df1727591#l798 Adding additional TRACE_RESUMEs earlier in radeon_resume_kms (and using them) seems to make failure less frequent, as if there is a timing issue involved. Possibly interesting messages on startup: [drm] initializing kernel modesetting (RV515 0x1002:0x7145). ATOM BIOS: M54CSP/M52CSP [drm] Loading R500 Microcode [drm] Initialized radeon 2.5.0 20080528 for 0000:01:00.0 on minor 0 and during successful resume: radeon 0000:01:00.0: power state changed by ACPI to D0 radeon 0000:01:00.0: power state changed by ACPI to D0 radeon 0000:01:00.0: setting latency timer to 64 radeon 0000:01:00.0: ffff8800bfba5e00 unpin not necessary [drm] radeon: 1 quad pipes, 1 z pipes initialized. [drm] PCIE GART of 512M enabled (table at 0x00040000). [drm] radeon: ring at 0x0000000008000000 [drm] ring test succeeded in 2 usecs [drm] ib test succeeded in 0 usecs ata1: EH complete [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 1sec aborting [drm:atom_execute_table_locked] *ERROR* atombios stuck executing EC38 (len 86, WS 4, PS 0) @ 0xEC6B PM: resume of devices complete after 1660.086 msecs http://bugs.freedesktop.org/show_bug.cgi?id=27744 I have radeon.dynclks=1 on the command line. I had intermittent resume failures even without this, though I haven't confirmed that the failure is at the same point.
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility X1400 (prog-if 00 [VGA controller]) Subsystem: Lenovo Thinkpad T60 model 2007 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 45 Region 0: Memory at d8000000 (32-bit, prefetchable) [size=128M] Region 1: I/O ports at 2000 [size=256] Region 2: Memory at ee100000 (32-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at ee120000 [disabled] [size=128K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v1) Legacy Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee0100c Data: 4189 Kernel driver in use: radeon
does this patch help? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9bd7ef5f5a5ab6088029ad95a435f03e1314275d
With that patch, I'm still seeing the same "stuck in loop for more than 5sec" messages on successful resume, with the same instruction and address. I've only resumed once (successfully), so don't know yet whether failure is less frequent.
The patch in comment 2 hasn't helped. More TRACE_RESUMEs indicate that resume gets as far as radeon_surface_init() from rv515_resume().
The first radeon_bo_get_surface_reg fails to complete. i = 0 at http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.35.y.git;a=blob;f=drivers/gpu/drm/radeon/radeon_device.c;h=a7184636dcb494b36d470af30c2b513382d38019;hb=1506707a6c740db316e422239a53ae5df1727591#l98 I'll try disabling ColorTiling in the server. I had a week of successful resumes while I had no dock or external monitor, so I wonder whether this might be related, though it's difficult to verify given the intermittent nature. I thought I once had a similar resume failure without dock or external monitor, but that would have been after suspending with dock and monitor attached.
Yes, seems that switching off ColorTiling works around the issue.
FWIW, I'm still seeing "I'm still seeing the same "stuck in loop for more than 5sec" messages, so that seems unrelated.
If this is still seen with a modern kernel please update/re-open thanks