Most recent kernel where this bug did not occur: 2.6.12 Distribution: Fedora Core 4 Hardware Environment: Gateway 450 Laptop Software Environment: Runlevel 3 or 5 Problem Description: Softlockup and system hang occurs on resume from swsusp if pci=routeirq is omitted from the kernel command line. Steps to reproduce: 1) boot kernel ro root=LABEL=/ panic=10 audit=1 3 2) login as root 3) remove the usb modules 4) echo "platform" >/sys/power/disk 5) echo "disk" >/sys/power/state ... power on the system BUG: softlockup detected on CPU#0. ----------------- I am attaching the kernel config, output of lspci -vvv and a dmesg of the boot prior to the swsusp, and a dmidecode FWIW. I have a screen shot of the softlockup, which I will upload. If I boot with pci=routeireq it resumes from swsusp succesfully everytime. Without the routeirq it hangs every time. THe kernel in question is the fedora kernel but without the acpi patches that it normally carries. The behaviour is the same regardlesss. I am also going to attach a slightly blurry screenshot of the softlock when the acpi patches from bug #3469 where applied to a 2.6.14.2. Those were http://bugzilla.kernel.org/attachment.cgi?id=6403 http://bugzilla.kernel.org/attachment.cgi?id=6404 This, I think, started happening in 2.6.13, but it was mitigated by not having PREEMPT_VOLUNTARY set, that was probably dumb luck, but in 2.6.14 any preempt option would hang at least 50% of the time. I am not sure this is the right section, but since the pci=routeireq option fixes the problem.... It is interesting since AFAICT pci=routeireq only has one tiny little change in behaviour which is... arch/i386/pci/acpi.c while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) acpi_pci_irq_enable(dev); FWIW, although the screen shot does not demonstrate this, must of the softlockups I have seen have had some sort of console (video) output in the trace, e.g. fb_flashcursor or console_callback. I suppose it could be any device... Thanks
Created attachment 6698 [details] console screenshot softlockup resuming from runlevel 3 (no fb)
Created attachment 6699 [details] console screenshot softlockup resuming from rl 5 no fb
Created attachment 6700 [details] Config Options
Created attachment 6701 [details] lspci -vvv
Created attachment 6702 [details] dmesg from boot before the suspend
Created attachment 6703 [details] screenshot witht the ACPI patches from bz #3469 Sorry the the blurry. I thought it might be helpful to see the that with those patches agp_intel_probe seems to be in the mix of the lockup pointing to something in the video io path as the culprit. Also shows that the lockup happen with or without radeonfb loaded, and with or without radeon dri loaded either.
Created attachment 6704 [details] a dmesg from a succesfull resume using pci=routeirq Not likely to help but for completness...
Just turn off softlockup detection...
OK, running a build with # CONFIG_DETECT_SOFTLOCKUP is not set I'll test with that and report on the outcome. I had been assuming that the softlockup might shed light on the accompanied hang/hard lockup (no response to SYSRQ sequences. Soflockup detection is something I CAN live without. - Thanks.
I rebuilt the same kernel (2.6.14.3) without CONFIG_DETECT_SOFTLOCKUP. Now when I boot without pci=routeirq, the only change is I don't get any output at all when the system hangs during resume. Everything else is the same. No response to any keyboard sequences even alt-sysrq-b. I had originally turned softlockup detection *on* to see if I could find out anything about why the system was hanging during resume, FC4 kernels don't have softlockup detection turned on by default. (It doesn't have SWSUSP enabled either, but IMNSHO SWSUSP is a requirement for a laptop) I guess I didn't make that clear before that it's not just softlockup shown, but a system hang. FWIW, the temp as reported after a hard power-off is directly related to the length of time the system is allowed to remain in this hung state, exceeding active/passive trip points. Also FWIW, a 2.6.15 kernel (I've been running/tracking 2.6.15 since rc5) behaves much better and in the testing I've done only hangs about 1 in 5 resumes from suspend to disk. I initially tried 2.6.15 without the pci=routeirq and it only hung on the 5th or 6th resume cycle. Once it did, (and the softlockup output was similar) I just put the option back and the command line. I chalked up the improvement on the ACPI changes that went into 2.6.15, but then that is only my guess. I guess I should also mention that without the softlockup detection, the console screen restores the traceback msg Debug: sleeping function called from invalid context at mm/slab.c:2486 in_atomic():0, irqs_disabled():1 [<c0145c93>] kmem_cache_alloc+0x40/0x52 ... that shows up during the suspend phase or at least that's where I guess its from then nothing else, until I physically power the system off. Prior to filing the bug, I had thought (based on the info from the softlockup output that the problem might be somewhere in the interaction the ACPI & Intel845 AGP driver(s). Later, I thought that couldn't be so, since the softlockup was able print its traceback, and what the traceback might actually be showing was a demonstration of the Heisenberg Uncertainty Principle, but then I could be wrong about that too.
Please try unload some drivers, see if it helps. Maybe some drivers are buggy.
FWIW, I am already unloading the following modules uhci_hcd button ieee1394 ohci1394 pcmcia_core rsrc_nonstatic yenta_socket I am attaching the output of an lsmod. (lsmod.20060123) Plus I have already reproduced this without radeonfb, so I'll do some more testing, make sure that nothing is loaded, I'll rebuild my initrd from blank modprobe.conf and boot into single and see what I get. Thanks
Created attachment 7099 [details] output of lsmod
OK - removing snd-maestro3 seems to be the magic bullet. With only the following: nls_utf8 2241 0 video 16325 0 battery 9541 0 ac 4933 0 hw_random 5589 0 ext3 130249 2 jbd 57941 1 ext3 resume worked fine without the pci=routeirq. I added 1 driver at a time and tested the suspend/resume. It broke with snd-maestro3. So, went back to the "minimal" list above and tested with and with out just snd-maestro3 and its dependencies. Fails with it loaded, works without it. (I tried it twice both ways) The dependencies don't seem to break it as I can leave those loaded and suspend/resume work OK. Now I can find out what the deal is with that driver, between 2.6.12 and now. Odd though that pci=routeirq makes it OK. Thanks.
The snd driver might not call pci_enable_device.
It definitely calls pci_enable_device() in resume.
Can you give a screenshot with CONFIG_DETECT_SOFTLOCKUP off? I'd like to know which step the resume hangs. Thanks.
Created attachment 7418 [details] Screenshot hung resume "CONFIG_DETECT_SOFTLOCKUP is not set"
I've been testing with 2.6.16rc4 which looks like is running a new ACPI (20060127) and I've taken it through 9 swsusp/resume cycles (seven from runlevel 1 and two from runlevel 5) without the pci=routeirq without a problem.
It's magically solved :).