Most recent kernel where this bug did not occur: ? Distribution: Debian Hardware Environment: A7V8X motherboard (VT8235 PCI Bridge, VT82xxxxx UHCI USB 1.1 Controller) Software Environment: Vanilla 2.6.15.4 kernel Problem Description: When I put the system in suspend-to-ram echo mem > /sys/power/state all seems well, but the system resumes immediately (but normally). After this the system seems prefectly stable. If I try the same after unloading the driver, suspend-to-ram works as expected. Steps to reproduce: modprobe uhci_hcd echo mem > /sys/power/state
Created attachment 7533 [details] dmesg of immediate resume
Created attachment 7534 [details] lspci -vvv
Alan, this is for you :)
From the log: uhci_hcd 0000:00:10.0: uhci_resume uhci_hcd 0000:00:10.0: uhci_check_and_reset_hc: legsup = 0x2000 This is bad; it indicates the BIOS is setting bits that it doesn't control. I have no way of knowing whether this could be the cause of the immediate resume, however. (I sort of doubt it; after all, the BIOS shouldn't care whether or not uhci-hcd is loaded.) Here's a good test to try. In drivers/usb/core/hcd-pci.c:usb_hcd_pci_suspend(), find the two lines that call pci_enable_wake() and comment them out. In theory that will prevent the UHCI controllers from issuing wakeup requests. In practice... The conditions under which a controller will issue a wakeup request aren't documented anywhere, nor are the actions taken by the BIOS or the ACPI interpreter. So there's no way to tell without trying it.
Nope commenting out pci_enable_wake() didn't help, unfortunately.
I'm inclined to call this a bug in the BIOS. Why else should the system wake up immediately when the controllers aren't enabled for making wakeup requests? Just out of curiosity, does it make any difference if you unplug all your USB devices before suspending? Also, it's worth a shot fiddling with the USB settings in your BIOS setup. Maybe you can convince the BIOS to stop interfering with things the operating system is supposed to be in control of.
I have a card-reader build in the machine. I just tried disconnecting if from the motherboard, and indeed, now suspend works as expected! So it has something to do with the card-reader... but still the controller is doing something wrong, isn't it? [as of tomorrow I will be away for a week, so my response will lag a bit]
Or it has something to do with the fact that a device is connected. But yes, something is going wrong somewhere. It's not so easy to tell what or where, however. There's no way to debug the BIOS or fix problems in it. You can try doing this (with the card reader plugged back in): echo 3 >/sys/devices/pci0000:00/0000:00:10.2/power/state which should suspend just that one UHCI controller, and then do lspci -vvv -s10.2 This should show whether or not the suspended controller is sending a wakeup signal. In fact, do the test twice: once with the original kernel and once with those pci_enable_wake() calls commented out.
I guess I need "USB selective suspend/resume and wakeup" (CONFIG_USB_SUSPEND) for that, or not? Also, what should I see in the output of lspci -vvv to see if a controller is sending wake events?
For this test it's better if you don't define CONFIG_USB_SUSPEND. Here's the relevant part of your earlier lspci output: 0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) (prog-if 00 [UHCI]) Subsystem: Asustek Computer, Inc. VT6202 USB2.0 4 port controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, Cache Line Size: 0x08 (32 bytes) Interrupt: pin C routed to IRQ 18 Region 4: I/O ports at a800 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- The important parts are on the last line. The "PME-" means that the controller doesn't want to assert the PME (Power Management Event) signal, and the "PME-Enable-" means that it's not allowed to assert PME even if it wants to.
I recompilled the usbcore module both with and without the lines commented out is described in comment #4. I rmmod'ed and modprobe'd to change between them (no reboot), I this is not sufficient please tell me. Then I carried out the tests, in all cases the status line stays the same: Status: D0 PME-Enable- DSel=0 DScale=0 PME- Note that after echo 3 >/sys/devices/pci0000:00/0000:00:10.2/power/state This file never contains anything but a 0, also for controllers without anything connected to it.
I forgot a couple of things for the test... Before doing echo 3 >/sys/devices/pci0000:00/0000:00:10.2/power/state you first have to do this: rmmod usb-storage echo 3 >/sys/devices/pci0000:00/0000:00:10.2/3-2/3-2:1.0/power/state echo 3 >/sys/devices/pci0000:00/0000:00:10.2/usb3/3-2/power/state echo 3 >/sys/devices/pci0000:00/0000:00:10.2/usb3/3-0:1.0/power/state echo 3 >/sys/devices/pci0000:00/0000:00:10.2/usb3/power/state That's because the device PM core won't allow you to suspend a device without suspending all its children first. After running this test, you can try making a change to the UHCI driver. In drivers/usb/host/uhci-hcd.c, find the resume_detect_interrupts_are_broken() routine and make it always return 1. It will be interesting to see if this causes a change in behavior.
OK, I did: rmmod usb-storage echo 3 >/sys/devices/pci0000:00/0000:00:10.2/usb3/3-2/3-2:1.0/power/state echo 3 >/sys/devices/pci0000:00/0000:00:10.2/usb3/3-2/power/state echo 3 >/sys/devices/pci0000:00/0000:00:10.2/usb3/3-0:1.0/power/state echo 3 >/sys/devices/pci0000:00/0000:00:10.2/usb3/power/state echo 3 >/sys/devices/pci0000:00/0000:00:10.2/power/state lspci -vvv -s 10.2 With all four permutations of chosing two out of: pci_enable_wake() commented out / not commented out resume_detect_interrupts_are_broken() returning 1 / doing the normal thing This was done without CONFIG_USB_SUSPEND. After that I also tried 'return 1 from resume_detect..' with CONFIG_USB_SUSPEND=y. In all cases the last line of lspci -vvv was the same, also all power/state file s still contained 0. Except for the first one, that has a 1.
The power/state files should contain 2 or 3. I don't know why they don't. What shows up in the dmesg log when you run that test? Also, did changing resume_detect_... fix the behavior?
Just did some checking... It was a foolish mistake on my part. You have to write echo -n 3 >... on each line. Without the "-n" it doesn't work.
OK, that makes the test have more diverse output;) With no modifications the last line of lspci now reads: Status: D3 PME-Enable+ DSel=0 DScale=0 PME- With pci_enable_wake() commented out: Status: D3 PME-Enable- DSel=0 DScale=0 PME- Having resume_detect_interrupts_are_broken() return 1 doesn't make a difference and doesn't fix the problem.
So in one case PME is enabled and in the other it isn't, as you would expect. But in neither case is it on, so it's not the cause of your immediate resume. At this point I'm quite sure there's something wrong with your BIOS. You could check to see if an upgrade is available. In the meantime, here's something else to try. Keep the resume_detect... change, and also edit the suspend_rh() routine just below. Change the line that says outw(USBCMD_EGSM | USBCMD_CF, uhci->io_addr + USBCMD); to outw(USBCMD_CF, uhci->io_addr + USBCMD); In other words, get rid of the USBCMD_EGSM. (The string EGSM occurs only twice in the source file, so it should be easy to spot.) This will cause the suspended controller to be left in essentially the same state as if uhci-hcd had never been loaded.
Removing USBCMD_EGSM helped. Now resume/suspend works as it should. Does this mean there's indeed a bios bug? I'll check the asus site for updates...
It means there's a bug in either the BIOS or the controller hardware. Maybe both. Or perhaps it's an undocumented "feature". If you can get any information out of Asus it might help. VIA refuses to answer questions about this sort of thing.
OK, I can try that, but what do I have to ask precisely? What is the bios doing wrong? What does USBCMD_EGSM mean? Does hurt to keep USBCMD_EGSM removed from that outw()?
You could ask them under what conditions the UHCI controller will wake up a system in various sleep states. In particular, ask them why it would wake up the system when a device is connected but there is no connect change or wakeup request pending. (At least, I assume the card reader isn't sending a wakeup request. If it is, that might explain why the computer wakes up immediately. But it's not supposed to be sending a wakeup request. You could find out for certain by doing all those "echo -n 3 >..." commands, then mounting /sys/kernel/debug with "-t debugfs", and then copying the contents of /sys/kernel/debug/uhci/0000:00:10.2.) I don't know exactly what the BIOS is doing. But one thing it's doing wrong is setting the USBPIRQDEN bit in the UHCI LEGSUP register. That bit is supposed to be controlled entirely by the operating system; the BIOS is not supposed to touch it at all. USBCMD_EGSM is the "Enter Global Suspend Mode" bit in the UHCI USB Command register. It tells the controller to suspend the entire USB bus and to respond to wakeup requests. It's okay to leave it turned off, provided you also make sure that resume_detect_interrupts... always returns 1. (If you don't make that second change then the controller won't do anything when you plug in a USB device!)
After the series of 'echo -n 3 > ...', this is the contents of /sys/kernel/debug/uhci/0000:00:10.2 Root-hub state: suspended HC status usbcmd = 0048 Maxp32 CF EGSM usbstat = 0020 HCHalted usbint = 0002 usbfrnum = (1)144 flbaseadd = 031bb000 sof = 40 stat1 = 0480 OverCurrent stat2 = 1495 Suspend OverCurrent Enabled Connected Frame List Skeleton QHs
Since the only bit set in the USB Status register is the HCHalted bit, we can be certain that the controller doesn't want to issue an interrupt. We can also be pretty sure that it isn't trying to wake up the system. But apparently some combination of the BIOS firmware and motherboard circuitry is causing it to do so regardless.
OK, I tried to get some info out of ASUS, but I can't get past the people at their support that say "sorry we don't do linux". Some more questions from me before I give up on fixing this issue. :( What functionality am I missing if I remove USBCMD_EGSM (Comment #17)? It doesn't seem to hurt, but why is it there in the first place?
Without the EGSM you might lose a remote wakeup capability. For example, if you have a USB mouse plugged into the controller and both the root hub and the mouse are suspended, the system might not automatically wake them up when you move the mouse or press a button. I'm not sure because I've never tried it. Also, you lose a very small amount of system overhead. With EGSM set, the controller will interrupt the CPU whenever a device is plugged or unplugged. Without it, the CPU has to poll the controller 4 times per second. It's not a big deal.
Can you retry with 2.6.18? USB suspend changed a lot...
Problem still exists in 2.6.18. But I think alan concluded it was a bug in my hardware.
That's what it looks like. The only remaining question is whether the driver should change, say by detecting your particular type of motherboard and then avoiding EGSM. Or would you be happy enough just leaving the built-in card reader unplugged? Here's an interesting idea that just occurred to me. Suppose you do unplug the card reader but then plug in an external USB device (it would have to be a full-speed or low-speed device, or else you would have to rmmod ehci-hcd). Would you then see the same immediate resume behavior?
I just tried with the card reader unplugged and a usb camera plugged in; the same behaviour occurs. And, yes, I rather have the workaround in the kernel, that way I don't have to patch the kernel I compile with every release. Unplugging a built in cardreader every time you resume is a bit inconvenient...
The workaround needs to be specific to your sort of motherboard; as far as I know nobody else has the same kind of problem. Please attach the output from dmidecode for your system.
Created attachment 9153 [details] dmidecode
Created attachment 9155 [details] Disable EGSM on ASUS motherboard Here's a patch for 2.6.18 you can try out. It avoids turning on EGSM if it sees a motherboard of your type with a device connected.
Yup, that works. Thanks for working on this btw!
Okay, I'll queue up the patch for submission and mark this bug report closed.