Bug 5671

Summary: pci=routeirq required for successful resume from swsusp
Product: ACPI Reporter: Andrew Duggan (cmkrnl)
Component: Power-Sleep-WakeAssignee: acpi_power-sleep-wake
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: pavel, tiwai
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.14.3 Subsystem:
Regression: --- Bisected commit-id:
Attachments: console screenshot softlockup resuming from runlevel 3 (no fb)
console screenshot softlockup resuming from rl 5 no fb
Config Options
lspci -vvv
dmesg from boot before the suspend
screenshot witht the ACPI patches from bz #3469
a dmesg from a succesfull resume using pci=routeirq
output of lsmod
Screenshot hung resume "CONFIG_DETECT_SOFTLOCKUP is not set"

Description Andrew Duggan 2005-11-28 07:50:47 UTC
Most recent kernel where this bug did not occur: 2.6.12
Distribution: Fedora Core 4
Hardware Environment: Gateway 450 Laptop
Software Environment: Runlevel 3 or 5
Problem Description:
Softlockup and system hang occurs on resume from swsusp if pci=routeirq is
omitted from the kernel command line.
Steps to reproduce:
1) boot kernel ro root=LABEL=/ panic=10 audit=1 3
2) login as root
3) remove the usb modules
4) echo "platform" >/sys/power/disk
5) echo "disk" >/sys/power/state
...

power on the system

BUG: softlockup detected on CPU#0.

-----------------
I am attaching the kernel config, output of lspci -vvv and a dmesg of the boot
prior to the swsusp, and a dmidecode FWIW.

I have a screen shot of the softlockup, which I will upload.

If I boot with pci=routeireq it resumes from swsusp succesfully everytime. 
Without the routeirq it hangs every time.

THe kernel in question is the fedora kernel but without the acpi patches that it
normally carries.  The behaviour is the same regardlesss. 

I am also going to attach a slightly blurry screenshot of the softlock when the
acpi patches from bug #3469 where applied to a 2.6.14.2.  Those were 
http://bugzilla.kernel.org/attachment.cgi?id=6403
http://bugzilla.kernel.org/attachment.cgi?id=6404

This, I think, started happening in 2.6.13, but it was mitigated by not having
PREEMPT_VOLUNTARY set, that was probably dumb luck, but in 2.6.14 any preempt
option would hang at least 50% of the time.

I am not sure this is the right section, but since the pci=routeireq option
fixes the problem....   It is interesting since AFAICT pci=routeireq only has
one tiny little change in behaviour which is... arch/i386/pci/acpi.c

while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL)
			acpi_pci_irq_enable(dev);

FWIW, although the screen shot does not demonstrate this, must of the
softlockups I have seen have had some sort of console (video) output in the
trace, e.g. fb_flashcursor or console_callback.

I suppose it could be any device... 

Thanks
Comment 1 Andrew Duggan 2005-11-28 07:54:01 UTC
Created attachment 6698 [details]
console screenshot softlockup resuming from runlevel 3 (no fb)
Comment 2 Andrew Duggan 2005-11-28 07:56:22 UTC
Created attachment 6699 [details]
console screenshot softlockup resuming from rl 5 no fb
Comment 3 Andrew Duggan 2005-11-28 07:59:38 UTC
Created attachment 6700 [details]
Config Options
Comment 4 Andrew Duggan 2005-11-28 08:00:15 UTC
Created attachment 6701 [details]
lspci -vvv
Comment 5 Andrew Duggan 2005-11-28 08:01:40 UTC
Created attachment 6702 [details]
dmesg from boot before the suspend
Comment 6 Andrew Duggan 2005-11-28 08:07:48 UTC
Created attachment 6703 [details]
screenshot witht the ACPI patches from bz #3469

Sorry the the blurry. 
I thought it might be helpful to see the that with those patches
agp_intel_probe seems to be in the mix of the lockup pointing to something in
the video io path as the culprit.

Also shows that the lockup happen with or without radeonfb loaded, and with or
without radeon dri loaded either.
Comment 7 Andrew Duggan 2005-11-28 08:09:14 UTC
Created attachment 6704 [details]
a dmesg from a succesfull resume using pci=routeirq 

Not likely to help but for completness...
Comment 8 Pavel Machek 2006-01-19 07:02:38 UTC
Just turn off softlockup detection...
Comment 9 Andrew Duggan 2006-01-19 10:56:58 UTC
OK, running a build with

# CONFIG_DETECT_SOFTLOCKUP is not set

I'll test with that and report on the outcome.  I had been assuming that the
softlockup might shed light on the accompanied hang/hard lockup (no response to
SYSRQ sequences.  Soflockup detection is something I CAN live without. - Thanks.
Comment 10 Andrew Duggan 2006-01-20 08:25:39 UTC
I rebuilt the same kernel (2.6.14.3) without CONFIG_DETECT_SOFTLOCKUP. Now when
I boot without pci=routeirq, the only change is I don't get any output at all
when the system hangs during resume.  Everything else is the same.  No response
to any keyboard sequences even alt-sysrq-b.  

I had originally turned softlockup detection *on* to see if I could find out
anything about why the system was hanging during resume,  FC4 kernels don't have
softlockup detection turned on by default. (It doesn't have SWSUSP enabled
either, but IMNSHO SWSUSP is a requirement for a laptop) 

I guess I didn't make that clear before that it's not just softlockup shown, but
a system hang.   FWIW, the temp as reported after a hard power-off is directly
related to the length of time the system is allowed to remain in this hung
state, exceeding active/passive trip points.  

Also FWIW, a 2.6.15 kernel (I've been running/tracking 2.6.15 since rc5) behaves
much better and in the testing I've done only hangs about 1 in 5 resumes from
suspend to disk.  I initially tried 2.6.15 without the pci=routeirq and it only
hung on the 5th or 6th resume cycle. Once it did, (and the softlockup output was
similar) I just put the option back and the command line.  I chalked up the
improvement on the ACPI changes that went into 2.6.15, but then that is only my
guess. 

I guess I should also mention that without the softlockup detection, the console
screen restores the  traceback msg

  Debug: sleeping function called from invalid context at mm/slab.c:2486
   in_atomic():0, irqs_disabled():1
   [<c0145c93>] kmem_cache_alloc+0x40/0x52 
   ...

that shows up during the suspend phase or at least that's where I guess its from
then nothing else, until I physically power the system off.

Prior to filing the bug, I had thought (based on the info from the softlockup
output that the problem might be somewhere in the interaction the ACPI &
Intel845 AGP driver(s).  Later, I thought that couldn't be so, since the
softlockup was able print its traceback, and what the traceback might actually
be showing was a demonstration of the Heisenberg Uncertainty Principle, but then
I could be wrong about that too.

Comment 11 Shaohua 2006-01-22 22:32:30 UTC
Please try unload some drivers, see if it helps. Maybe some drivers are buggy.
Comment 12 Andrew Duggan 2006-01-23 04:32:15 UTC
FWIW,

I am already unloading the following modules
uhci_hcd 
button 
ieee1394 
ohci1394 
pcmcia_core 
rsrc_nonstatic 
yenta_socket

I am attaching the output of an lsmod.  (lsmod.20060123)

Plus I have already reproduced this without radeonfb, so I'll do some more
testing, make sure that nothing is loaded, I'll rebuild my initrd from blank
modprobe.conf and boot into single and see what I get. 

Thanks
Comment 13 Andrew Duggan 2006-01-23 04:34:26 UTC
Created attachment 7099 [details]
output of lsmod
Comment 14 Andrew Duggan 2006-01-23 05:25:27 UTC
OK - removing snd-maestro3 seems to be the magic bullet.  With only the following:
nls_utf8                2241  0
video                  16325  0
battery                 9541  0
ac                      4933  0
hw_random               5589  0
ext3                  130249  2
jbd                    57941  1 ext3

resume worked fine without the pci=routeirq. I added 1 driver at a time and
tested the suspend/resume. It broke with snd-maestro3.  So, went back to the
"minimal" list above and tested with and with out just snd-maestro3 and its
dependencies. Fails with it loaded, works without it. (I tried it twice both
ways)  The dependencies don't seem to break it as I can leave those loaded and
suspend/resume work OK. Now I can find out what the deal is with that driver,
between 2.6.12 and now.  Odd though that pci=routeirq makes it OK. Thanks.
Comment 15 Shaohua 2006-01-23 19:32:01 UTC
The snd driver might not call pci_enable_device.
Comment 16 Takashi Iwai 2006-01-24 06:52:33 UTC
It definitely calls pci_enable_device() in resume.
Comment 17 Shaohua 2006-02-14 01:19:03 UTC
Can you give a screenshot with CONFIG_DETECT_SOFTLOCKUP off? I'd like to know 
which step the resume hangs. Thanks.
Comment 18 Andrew Duggan 2006-02-20 11:54:34 UTC
Created attachment 7418 [details]
Screenshot hung resume "CONFIG_DETECT_SOFTLOCKUP is not set"
Comment 19 Andrew Duggan 2006-02-22 17:47:42 UTC
I've been testing with 2.6.16rc4 which looks like is running a new ACPI
(20060127) and I've taken it through 9 swsusp/resume cycles (seven from runlevel
1 and two from runlevel 5) without the pci=routeirq without a problem. 
Comment 20 Shaohua 2006-02-22 18:32:21 UTC
It's magically solved :).