Bug 5227
Summary: | mounted usb block devices don't work after resume from S3 | ||
---|---|---|---|
Product: | Drivers | Reporter: | richlv |
Component: | USB | Assignee: | Alan Stern (stern) |
Status: | REJECTED INVALID | ||
Severity: | normal | CC: | acpi-bugzilla, dbrownell |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.13.1 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
usb-storage suspend and resume callbacks
Fix assignments to hcd->state in uhci-hcd dmesg output new dmesg output Diagnostic for UHCI suspend routine uhci diag suspend/resume dmesg output Add logical disconnect when CONFIG_USB_SUSPEND isn't set Diagnostic for ACPI/BIOS suspend routines trying to suspend with acpi=off suspend and resume with uhci&acpi debug patches Allow PIRQD to be on during resume Allow PIRQD to be on during resume dmesg_output_with pirqd patch applied Allow resume after controller reset dmesg output with 'allow resuma after reset' |
Description
richlv
2005-09-12 01:31:42 UTC
Sounds like a USB bug to me. Did this ever work before? If you unload the USB driver before suspend and reload it after resume, did it work? > Did this ever work before? have no idea, i have successfully resumed only since 2.6.13 :) (http://bugzilla.kernel.org/show_bug.cgi?id=4390), but it was the same in 2.6. 13. > If you unload the USB driver before suspend and reload it after resume, did it work? yes. the problem arises only if the device has been mounted during suspending testing usb devices i also stumbled upon another strange problem - if i mount an usb device after resuming, kde laptop daemon stops responding. there is nothing in the logfiles. could this be related and what can i do to debug this problem better ? oh. actually it (klaptopd) starts responding again after some time. also easily reproducable - resume, test klaptopd - responds. mount usb device - stops responding. wait a while, it responds again. mount usb device - stops responding. (it does not matter if the device is unmounted when waiting for klaptopd to respond - it responds in both cases). the wait time required is pretty long, at least several minutes. everything works ok if there has been no resuming. Was this a real suspend/resume cycle, like suspend-to-RAM? Or was it a swsusp "checkpoint, then powerdown, then powerup and reset and restore checkpoint"? If so then you _have_ in fact broken the connection to the USB disk. it was suspend to memory in swsusp case, i suspect, there would be a script (hibernation script in suspen2 project ?) that would unmount specific drives and mount them upon resuming. Can you let me know what HCD(s) were involved with this? And please confirm that this is with CONFIG_USB_SUSPEND=n. Some recent changes to usbcore seem to have caused regressions in OHCI (and hence I assume EHCI) in some suspend/resume paths. Yes, I'd certainly expect S1 and S3 suspend to preserve mounts. In fact I've tested equivalent stuff many times, just not with the last few kernels. 1. umm. i'm not sure what hcd is ;) looking up one of explanations was "Hard Copy Device", so i'll assume that this is about devices that i was able to reproduce the problem with. a) kingston data traveller ii+; b) a compactflash card reader, labelled as kingston, recognized as "Vendor: eUSB". 2. grepping .config for USB_SUSPEND produces only # CONFIG_USB_SUSPEND is not set oh. ok, so ehci & uhci are compiled in. lspci reports that usb controllers are via vt82 family uhci usb 1.1. (maybe this means i should try without ehci compiled in ?) OK, I did some testing recently and found there have indeed been regressions in the behavior of the USB stack in the past few releases. Annoying. http://marc.theaimsgroup.com/?l=linux-usb-devel&m=112745488014635&w=2 is the overview of a set of patches that fix all the issues I've been able to reproduce, as well as finally fixing some of the structural issues that have needed attention for some time. There's still a separate issue that usb-storage has no suspend() and resume() methods, however, so there may be issues associated with that ... The USB power management patches queued for 2.6.15 at http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-04-usb/ fix a variety of issues. Maybe this one; give them a try and let us know how they behave for you. um. i would appreciate some hand holding this time :) 1. which kernel version can i apply these to ? 2. should i just apply all of them and check for the problems ? 3. will simple "for i in `cat series`; do patch..." do the trick or will email headers prevent this ? thanks. Sorry about that lack-of-directions. The patches are maintained with "quilt", so the "series" file holds the order in which to apply the patches. (As says the README...) Simples to copy the whole directory to .../linux/patches (toplevel kernel directory) then "quilt push -a". Created attachment 6276 [details]
usb-storage suspend and resume callbacks
You _may_ need this in addition to the patches that make
suspend/resume behave properly.
oh. ok, i found quilt-0.42. now the last bit - which kernel version do i apply these to ? :) current kernel (2.6.14-rc4) should suffice. all patches applied successfully (except this one attached here, i'll test it later). results : 1. with usb flash memory attached it failed to suspend at all (not mounted) : Oct 11 16:23:46 is kernel: Stopping tasks: ================================================| Oct 11 16:23:46 is kernel: usb-storage 1-2:1.0: no suspend? Oct 11 16:23:46 is kernel: Could not suspend device 1-2: error -16 Oct 11 16:23:46 is kernel: Some devices failed to suspend Oct 11 16:23:46 is kernel: Restarting tasks... done 2. suspending phase has a heavily distorted image. with previous versions i could see some text messages (about some pci interrupts etc) - regression. 3. after resuming (without usb block device) my usb mouse stopped working 4. with usb mouse attached it freezes upon resume. it resumes if i move the usb mouse (i waited for a minute). with mouse removed works as expected (resumes in a couple of seconds) with attached patch applied (clean) : 1. my mouse does not work after resuming; 2. with usb blockdevice attached (unmounted) conmputer freezes upon resuming. if i remove usb device, it stalls for 5 more seconds, then resumes. some data from syslog : Oct 11 16:38:11 is kernel: Stopping tasks: ================================================| Oct 11 16:38:11 is kernel: ACPI: PCI Interrupt 0000:00:11.1[A]: no GSI Oct 11 16:38:11 is kernel: Restarting tasks... done Oct 11 16:38:20 is kernel: usb 1-1: device descriptor read/64, error -110 Oct 11 16:39:29 is kernel: Stopping tasks: ================================================| Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot reset port 2 (err = -113) Oct 11 16:39:29 is last message repeated 4 times Oct 11 16:39:29 is kernel: hub 1-0:1.0: Cannot enable port 2. Maybe the USB cable is bad? Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot disable port 2 (err = -113) Oct 11 16:39:29 is kernel: Badness in hcd_endpoint_disable at drivers/usb/core/ hcd.c:1368 Oct 11 16:39:29 is kernel: [<c02b1c30>] hcd_endpoint_disable+0x140/0x150 Oct 11 16:39:29 is kernel: [<c02b3b80>] usb_disable_endpoint+0x50/0x70 Oct 11 16:39:29 is kernel: [<c02aebdb>] ep0_reinit+0x1b/0x50 Oct 11 16:39:29 is kernel: [<c02af632>] hub_port_connect_change+0x1e2/0x410 Oct 11 16:39:29 is kernel: [<c02a000a>] pci_iomap+0xea/0xf0 Oct 11 16:39:29 is kernel: [<c02afb3f>] hub_events+0x2df/0x460 Oct 11 16:39:29 is kernel: [<c02afcd7>] hub_thread+0x17/0xf0 Oct 11 16:39:29 is kernel: [<c01303b0>] autoremove_wake_function+0x0/0x60 Oct 11 16:39:29 is kernel: [<c01303b0>] autoremove_wake_function+0x0/0x60 Oct 11 16:39:29 is kernel: [<c02afcc0>] hub_thread+0x0/0xf0 Oct 11 16:39:29 is kernel: [<c012fea5>] kthread+0xa5/0xb0 Oct 11 16:39:29 is kernel: [<c012fe00>] kthread+0x0/0xb0 Oct 11 16:39:29 is kernel: [<c0101379>] kernel_thread_helper+0x5/0xc after a bunch of badness : Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot reset port 2 (err = -113) Oct 11 16:39:29 is last message repeated 4 times Oct 11 16:39:29 is kernel: hub 1-0:1.0: Cannot enable port 2. Maybe the USB cable is bad? Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot disable port 2 (err = -113) after some more badness (probably the moment i removed the device ?) : Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot disable port 2 (err = -113) Oct 11 16:39:29 is kernel: ACPI: PCI Interrupt 0000:00:11.1[A]: no GSI Oct 11 16:39:29 is kernel: usb 1-2: device descriptor read/64, error -110 Oct 11 16:39:29 is last message repeated 3 times Oct 11 16:39:29 is kernel: usb 1-2: device not accepting address 9, error -110 Oct 11 16:39:29 is kernel: Restarting tasks... done so comparing to 2.6.13.2 i have a lot of regressions :) What USB controllers are you using? Please send /sbin/lspci -vv | grep HCI output ... Yes, that one patch is needed to prevent the errors you saw without it. If you're using UHCI there was a BIOS-related patch submitted (and AFAIK not in that patchset) ... try disabling all usb mouse/keyboard support in your BIOS. 00:11.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) (prog-if 00 [UHCI]) 00:11.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) (prog-if 00 [UHCI]) 00:11.4 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) (prog-if 00 [UHCI]) (and one fw controller) the computer itself has only two usb ports, i think both are connected to one controller. i'll try disabling usb mouse in bios today, too. there are no parameters for usb configuration in bios. this is fujitsu-siemens lifebook c-1020. Re-assigning to Alan, as this touches both UHCI and usb-storage which are drivers I don't generally touch. Most specifically the issue looks to be a problem with UHCI and S3 resume. The "badness" is a warning that I suspect may longer be useful; I seem to recall I had removed it in one patch, but it stopped mattering for OHCI and EHCI so I left it in. Created attachment 6293 [details] Fix assignments to hcd->state in uhci-hcd Okay, I've got a patch for you to try. It fixes a similar problem on my system with Standby. I couldn't test suspend to RAM because my computer doesn't want to wake up afterward! I've lost track of the patches David had you try. So this is meant to go on top of 2.6.14-rc4 plus http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-all-2.6.14-rc4.patch plus David's suspend/resume patch for usb-storage. ok, here's what i did... took rc-4. applied grekh-2.6 patch. applied both patches attached to this issue. i _did not_ try to apply patches david previously suggested (gregkh-04-usb). all patches applied successfully. result - suspends and resumes with usb mouse normally (just as plain 2.6.13.*). if i add usb block device, it does not suspend (similar as previosuly). logs were slightly different, though, so here they are. blockdevice was not mounted. Oct 14 11:11:38 is kernel: Stopping tasks: =====================================================| Oct 14 11:11:38 is kernel: sda: assuming drive cache: write through Oct 14 11:11:38 is kernel: sda: assuming drive cache: write through Oct 14 11:11:38 is kernel: hub 1-0:1.0: suspend error -16 Oct 14 11:11:38 is kernel: Could not suspend device 1-0:1.0: error -16 Oct 14 11:11:38 is kernel: Some devices failed to suspend Oct 14 11:11:38 is kernel: Restarting tasks... done Did you apply the patch attached to comment #13? If not, then apply it too. In any case, please turn on the USB verbose debugging option in the kernel configuration (CONFIG_USB_DEBUG) and rebuild the USB drivers. Then run your test as before, and this time post the dmesg log starting from the point where you plug in the mass storage device. Re comment #24 ... Alan, I suspect this is because SCSI isn't marking devices as suspended. You'd be able to observe the lack of this in sysfs after "echo -n 3 > /sys/class/scsi_device/.../power/state", or by having drivers/usb/core/usb.c::verify_suspended() print a message for unsuspended devices... yes, i did apply both patches that are attached to this issue. turns out, unability to suspend is not connected to block device, but to mouse. this is a regression from 2.6.13.2. additionally, blockdevice behaviour has changed slightly : usb device shows up as accessible, but empty. what i did : enable usb & usb storage debug add usb mouse; dmesg > add_mouse.txt add usb flash memory -> add_flash.txt suspended & resumed successfully -> resumed.txt used mouse while trying to suspend -> used_mouse-failed_to_suspend.txt this is a regression from 2.6.13.2, where i can move the mouse around as much as i wish during suspend, it does not prevent suspending. mounted flash, suspended & resumed -> resumed_with_mounted.txt at this point df shows correct information about mounted device, i can sort of access it, but it shows up as empty. when i access it -> usb_access_after_resume.txt umount it, try to mount, fails with "not a valid block device" -> umount_mount_not-a-valid-device.txt now try to mount it as sdb, not sda -> after_failure_mount_sdb.txt some of dmesg outputs might be smaller and not contain all events that happened between to captures (i learned about -s switch only after these tests ;) ). i hope they have enough information. if not, i can redo specific tests. Where are your dmesg log files? Without seeing the logs I can't be sure, but probably the reason you can't use the mounted storage device after resuming is because there was no suspend power available to maintain the USB connection. Without a small amount of current during the suspend, it will appear to the computer as though you had unplugged the device while the machine was asleep. After resume the device is rediscovered, but it shows up as a new device: /dev/sdb instead of /dev/sda, for instance. The end result is that you can't use the mounted filesystem -- exactly as if you had unplugged and replugged the device while the computer was awake. By the way, do you still have CONFIG_USB_SUSPEND unset in your .config? Created attachment 6329 [details]
dmesg output
whoops. kicking myself in the nuts. attached logfiles this time.
usb suspend still is disabled.
the computer is on the repair now (cd-rom replacement), so i won't be able to
do tests for a couple of days, but if needed, put up testing scenarios, i'll do
them when i get it back :)
It's hard to tell what's going on because of all the messages from usb-storage in your log files. It would help if you could try doing the tests again, after turning off CONFIG_USB_STORAGE_DEBUG. Also, you should apply this new patch: http://marc.theaimsgroup.com/?l=linux-kernel&m=112966405019175&w=2 I think that will solve your "mouse-movement" problem -- which from the logs, appears to have nothing at all to do with moving the mouse. Has there been any progress on this? Try using 2.6.14 together with the corresponding gregkh-all-2.6.14.patch changes. That contains all the fixes we've been discussing. local siemens service is still waiting for replacement cd-rom, as i am told. as soon as i get the computer back, i'll test it. to make sure i understand the steps required : take 2.6.14, enable usb debug (but not storage debug). apply http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh- all-2.6.14-rc4.patch apply both patches attached to this issue; apply http://marc.theaimsgroup.com/?l=linux-kernel&m=112966405019175&w=2 right ? :) The last two steps are not necessary. The patches attached to this report and that other patch have been merged into 2.6.14 or gregkh-all-*. In fact, right now it looks like gregkh-all-2.6.14 is against -git3, so you should start from 2.6.14-git3. If there are more updates by the time you get your hardware running again, start from whichever 2.6.14 base the gregkh-all patch is based on. so... took 2.6.14, applied 2.6.14-git5 (as latest gregkh-all was against this), applied gregkh-all-2.6.14-git5.patch. enabled usb debug, disabled usb storage debug. short story : in some cases usb device was accessible without errors after resuming even if it was mounted when suspending (though i suspect this might be connected to the delay between suspend/resume - maybe power was not terminated for usb device if this time was too short ?); in some cases i could read top directory but not subdirectories; and also previous behaviour with device being recognized as sdb1 and mounted one not working was reproducible. this time i used debug -c to clear any events left over from previous run, so every file should contain only event that happened during that action. list : - clean.txt - a clean boot with debug kernel; - added_mouse.txt - self explanatory :) - suspend_with_mouse*.txt - suspended with mouse only, both still and actively moved. suspending in both cases was successful, and it seems that dmesg output is almost identical (only some device ids differ). btw, what was the reason behind inability to suspend when mouse was moved (as it was very easy reproducible, but you mentioned that this problem was not connected to mouse); - block_dev_attached.txt - attached cf reader with a card; - susp_mouse_block.txt - suspended with both mouse & blockdev attached (unmounted); - block_mounted.txt - mounted the blockdevice; - susp_mouse_block_mounted.txt - suspended with both devices attached and blockdev mounted. in this case device was perfectly accessible after resuming; - susp_block_mounted.txt - suspended and resumed with both dev attached and blockdev mounted. in this case there was longer delay between suspend and resume; - attempt_to_access_block_after_resume.txt - now blockdev was not accessible. attempts to access it produced errors. there were also a couple of error messages even when blockdev was accessible after resuming - if they are not connected to this problem, maybe i should file a new issue ? Created attachment 6466 [details]
new dmesg output
Created attachment 6487 [details]
Diagnostic for UHCI suspend routine
Your new logs are good. I can see two problems; one is a bug, and the other I
don't know about. Oddly enough, the bug partially compensates for the other
problem -- this explains why your storage device was sometimes usable after
resuming. Given that strange second problem, it should not have been usable
any of the times!
The bug I know how to fix, so let's not worry about it for now. The attached
patch adds some logging statements to the UHCI driver; they may help to show
where the second problem comes from. When you try it out, it doesn't matter if
the storage device is mounted or not. And you don't have to send a complete
set of logs; all I need to see is what happens during the suspend & resume.
You asked why earlier it seemed that moving the mouse would prevent the system
from suspending? I don't remember all the details, but it was connected with
the fact that you moved the mouse the _second_ time you suspended and not the
_first_ time. Something changed in between, and that something was the real
cause of the problem. Maybe what changed was that you plugged in the storage
device. Anyway the fix has been included in 2.6.14, so that particular problem
should not bother you again.
Created attachment 6493 [details]
uhci diag suspend/resume dmesg output
applied the attached patch to previously used kernel.
with usb mouse and single blockdevice attached suspended & resumed. this is a
dmesg output after resuming.
This last test result definitely shows what's going wrong. When the UHCI driver suspends a controller, it tells the controller not to issue any more interrupts. In the log file, that setting is confirmed by the lines that say: "read_config 0, val 0000". 0000 means interrupts are turned off, whereas 2000 means interrupts are turned on. During the resume procedure, the UHCI driver tests to make sure that interrupts are still turned off. If they aren't, it means that something has changed the controller's state unexpectedly, leaving the driver no choice but to perform a complete reset (which will cause all existing connections to be broken). That's exactly what happened, as you can see from the lines that say: "uhci_check_and_reset_hc: legsup = 0x2000" and "Performing full reset". Now the question becomes: Why were interrupts turned back on, and what piece of code did it? Was it done by the BIOS, or was it done by ACPI? A good test for you to try would be to boot with "acpi=off" on the boot command line, then do exactly the same thing as last time. If interrupts still get turned on, then we'll know the BIOS is guilty. Regardless of who did it, it's potentially dangerous. The driver can't handle interrupts while the controller is suspended, so if the controller does issue an interrupt it will spam the CPU and quickly cause the system to turn off the entire IRQ line. Hi Alan,
>Why were interrupts turned back on
What does this mean? ACPI might enable PIC/IOAPIC, interrupt router before USB
resume. Is this harmful?
We discussed this issue before in pm-list, driver should call free_irq in
the .suspend method.
Re David's comment #38, surely you recall the recent discussions with Linus ... leading to the _removal_ of that troublesome "free_irq during suspend" code? That happend as I recall shortly before 2.6.13 froze. At any rate, something mucking with the UHCI IRQ is a different issue. David Shaohua: The interrupt-enable in question is a setting internal to the UHCI controller. (It's bit 0x2000 of the word at address 0xc0 in the PCI configuration space, to be exact.) It has nothing to do with PIC/IOAPIC. Created attachment 6511 [details]
Add logical disconnect when CONFIG_USB_SUSPEND isn't set
Rich:
The attached patch fixes the bug I referred to earlier. After you apply it,
you should find that the storage device is _never_ accessible after resuming.
Given what is happening to your USB controllers, that's the correct behavior.
this is the latest bios version available, so i would like to report this problem to fujitsu-siemens if this really is a bios problem. trying to suspend with acpi=off from commandline results in segmentation fault ;) how can i make sure that it is the bios problem (or find out what exactly causes the problem if it is not the bios) ? Created attachment 6549 [details] Diagnostic for ACPI/BIOS suspend routines This patch is meant to go on top of the one in attachment #6487 [details]. It will print out the value of the interrupt-enable bit after ACPI is finished (just before calling the BIOS to suspend) and before ACPI restarts (just after resuming from the BIOS). That should tells us whether it's getting set by ACPI or by the BIOS. You can test this the same as before -- and don't use "acpi=off"! However, it would be interesting to see the dmesg log showing what went wrong earlier when you tried to suspend with "apci=off". You should attach that also. Created attachment 6559 [details] trying to suspend with acpi=off this is dmesg output with the same kernel and patchset as in comment #36, booted with acpi=off and tried to suspend (echo "mem" > /sys/power/state) Created attachment 6560 [details] suspend and resume with uhci&acpi debug patches this is dmesg output with the same kernel and patchset as in comment #36, + attachment #6549 [details] (but without the one in comment #41, as i was not sure it would not change the result) usb mouse and blockdevice attached, bd mounted. suspend/resume cycle, dmesg right after that. trying to find familiar letters in second output ;) i stopped at these lines : ACPI before, config 0, 0x0000 Back to C! ACPI after, config 0, 0x2000 i suppose this means that value has changed in between when only bios should be able to change anything. if so, i would appreciate some short writeup that i could hand to fuj-siem :) I think the problem about suspending with "acpi=off" has been fixed in 2.6.15-rc1. It's worth a try. Tell the vendor that either their BIOS or else their ACPI tables cause the USBPIRQDEN bit in the UHCI LEGSUP registers to be set after resuming from S3 sleep, even though it was not set when entering the sleep. 2.6.15-rc1 with acpi=off doesn't do anything when trying to suspend. dmesg output is empty, too. thanks for your help, i'll report this problem with usb to fujitsu siemens. i got this response from fujitsu-siemens (i must admit, i'm pleasantly surprised by their speed, though i have almost no technical knowledge to assess the answer ;) ) : "The operating system saves and restores any registers that are not in the AUX power well. This is defined in the Enhanced Host Controller Interface (EHCI) specification, but is not defined in the UHCI or OHCI specifications. Because of this, the operating system does not save or restore any registers for UHCI or OHCI. The following rules apply: Power management rules for USB host controllers transitioning from USB suspend to USB working. This is what is expected for: Sleep State S3 Sleep State S0 1. The controller must not reset the USB bus or cause a disconnect or power loss on any of the root USB ports. 2. The system BIOS must not enable any type of legacy USB BIOS or otherwise enable the host controller to a run state. 3. If a Peripheral Component Interconnect (PCI) reset occurs in addition to rule 1, the BIOS must restore all host registers to their state prior to entering low power. Root ports should not indicate connect or enable status changes. 4. The controller hardware must be in a functional state that is capable of driving resume and entering the run state without requiring a global hardware reset that otherwise would result in a USB bus reset driven on the root ports. " additionally, in the answer was some finnish text (that supposedly refers to ms windows xp). as i have no knowledge of finnish i have no idea what it exactly says :) : "Eli k I don't understand Finnish either, although there are people on the Linux mailing lists who do. That reply from Fujitsu-Siemens wasn't very helpful, as you probably realized. They didn't answer the question at all. Also, where they write: "The operating system saves and restores any registers that are not in the AUX power well. This is defined in the Enhanced Host Controller Interface (EHCI) specification, but is not defined in the UHCI or OHCI specifications. Because of this, the operating system does not save or restore any registers for UHCI or OHCI." The last sentence should instead say: "Because of this, the operating system saves and restores all registers for UHCI and OHCI." They got the logic exactly backwards. Do you have an email contact that I can use to communicate with them directly? as i was told later, finnish text was not significant from technical viewpoint. i received this answer from Lehikoinen Pauli. i hope he doesn't mind too much being contacted by you ;) Pauli.Lehikoinen at.well.he probably doesn't.like unsolicited.mail-cut here fujitsu-siemens.com Created attachment 6606 [details]
Allow PIRQD to be on during resume
Well, that interchange with Fujitsu-Siemens wasn't much help.
Take out all those other patches I sent, keeping only the gregkh-all patch, and
then apply this one. It tells the uhci-hcd driver that it's okay for PIRQD to
be on during a resume. Perhaps it will solve your problem... The only way to
know is to try it.
some problems again... took latest gregkh-all (gregkh-all-2.6.15-rc1-git6), successfully applied git6, gregkh-all and this latest pirqd patch. unfortunately, computer did not come back for suspend. tried to remove some patches - turns out plain rc1-git6 resumes ok, so it was something in gregkh-all that causes a regression regarding the resume. unfortunately, logfiles seem to contain no relevant information. should i file a separate bugreport regarding this problem ? if so, under which component (as i have no idea what causes it to freeze) ? then i tried previously used gregkh-all, 2.6.14-git5. this time pirqd patch did not apply cleanly : patching file drivers/usb/host/uhci-hcd.c Hunk #1 succeeded at 720 (offset 7 lines). Hunk #2 FAILED at 755. 1 out of 2 hunks FAILED -- saving rejects to file drivers/usb/host/uhci-hcd.c. rej patching file drivers/usb/host/pci-quirks.c Hunk #1 succeeded at 28 (offset 6 lines). and what can we determine from fuj-siem conversation ? did the link to ms guidelines help anyway ? to me it seems that ms has some guidelines that fuj- siem has adhered to - but technically they are pretty bad solution - is this so ? in this case, what choices do we have if a suspend is requested with usb blockdevices mounted (or probably any usb device that requires some permanent connection) ? i understand that there is basically no way to maintain this connection if bios keeps resetting that value, right ? Created attachment 6631 [details]
Allow PIRQD to be on during resume
Here's a version of the new patch which should apply okay to 2.6.15-rc1-git6
without gregkh-all. (I actually adjusted it for 2.6.15-rc1-git4 but there
shouldn't be any problem with slightly later versions.) Try using it; I think
the gregkh-all patch isn't needed any more for your testing.
The suspend/resume problem you saw may go away when 2.6.15-rc2 and its
associated gregkh-all patch come out. Hold off for a while before reporting
it.
The conversation didn't tell us much except that Fujitsu-Siemens doesn't
understand the problem and isn't interested in fixing it. The Microsoft
guidelines document was about startup and shutdown; it did not mention
suspend/resume.
If the only change the BIOS makes is to reset this one bit, then the new patch
should fix the problem. But if the BIOS makes other changes as well, then
there may be no way to maintain the USB connection.
applied the patch, but nothing changed in the behaviour. mounted usb blockdevice is still unaccessible after resuming (rejecting io to dead device, found as sdb, blahblah) "The conversation didn't tell us much except that Fujitsu-Siemens doesn't understand the problem and isn't interested in fixing it. The Microsoft guidelines document was about startup and shutdown; it did not mention suspend/resume." i'm thinking about a detailed situation description, including all previous references and obtained information, coupled with a kind request to forward the information to responsible technical person. if this really is a clear problem with the bios, maybe such an information can get them moving. given that latest patch didn't help, maybe we can dump all related parameters before/after, see which ones are unnecessary reset and inform them ? now, this situation pretty much shows why closed source bios/firmware is a bad idea - a company will inevitably abandon older systems - and in this case bios probably was outsourced, so it might require quite an effort from fuj-siem, too. but support for older systems is important for corporations, and this case might seriously impact our view on their ability to support the systems. i hope they will cooperate on this one :) "given that latest patch didn't help, maybe we can dump all related parameters before/after, see which ones are unnecessary reset and inform them ?" Most of the important parameters are given in the dmesg log. What does it say? Created attachment 6650 [details]
dmesg_output_with pirqd patch applied
ok, a couple of dmesg outputs for suspend-resume cycle with usb blockdev
mounted (no other patches (additional debug output or whatever) were applied in
this case) :
susp-res_wo_usb_debug.txt - 2.6.15-rc1-git6 with second pirqd patch applied, no
usb debugging;
susp-res_with_usb_debug.txt - 2.6.15-rc1-git6 with second pirqd patch applied,
usb debugging enabled;
rc2_gregkh-all_pirqd.txt - 2.6.15-rc2-git2 with associated gregkh-all and first
pirqd patch, usb debugging enabled.
The log shows that the problem still exists: uhci_hcd 0000:00:11.2: uhci_resume uhci_hcd 0000:00:11.2: uhci_check_and_reset_hc: cmd = 0x0000 uhci_hcd 0000:00:11.2: Performing full reset The second line indicates that the entire UHCI controller has been reset, not just the LEGSUP register. This could be because the BIOS reset it, or it could be because the controller lost power during the suspend -- although it shouldn't lose power during a suspend-to-RAM. In either case, there's no way to maintain the connection to a storage device. If you can find out from Fujitsu-Siemens who supplied their BIOS, maybe those people would be willing to listen and fix it. There's always an alternative, of course. You could unmount the storage device and rmmod usb-storage before doing the suspend, and then put things back after the resume. Short of a BIOS upgrade, however, I don't see any way to preserve the mount across a suspend. bios is made by insyde software, but it says on their webpage : "For end user BIOS upgrades or other support, we recommend you contact your computer manufacturer." i probably will try to compose tomorrow some sort of lengthy and explanatory mail to fujitsu-siemens in hope that they will at least forward that to somebody who would fully understand the problem. It's possible that apart from the BIOS problems, the controller may actually lose power during the suspend. If that's so it doesn't matter what the BIOS does, because the connection will be lost no matter what. I can't think of any easy way for you to tell whether power is present. Does the computer wake up if you plug or unplug a USB device during the suspend? the computer doesn't wakeup if i replug usb mouse when it is suspended. if the mouse is plugged and the computer is suspended, the mouse still glows (it is an optical mouse) and laser responds to objects, so i guess it still receives power. Created attachment 6737 [details]
Allow resume after controller reset
Okay, so the power was maintained.
This patch will allow the resume to proceed even when the controller has been
reset. See what happens when you use it; depending on the kind of reset
performed by the BIOS your storage device may or may not be accessible. And
remember to post the dmesg log.
Created attachment 6747 [details]
dmesg output with 'allow resuma after reset'
nope, did not work after resuming. when accessing, got those "rejecting i/o to
dead device again", device was found as sdb.
The log shows that the BIOS has reset the UHCI controller enough to destroy the USB connections to the mouse and the storage device. So it doesn't much matter what the Linux driver does; there's no way it can keep the connections going. If you can't get the BIOS fixed, the only thing to do is unmount the storage device before suspending and remount it after resuming. Since we've proved now that this isn't a kernel bug, I will change the bug-report status accordingly. About comment #64 ... I don't know this specific hardware, but in general there's no requirement that USB controllers stay powered during S3. I've certainly had power-efficient laptops that turn off OHCI controllers in S3; no reason UHCI shouldn't sometimes do the same thing. That much, Linux should recover from -- disconnecting devices etc. The IRQ thing is another matter, and it does seem that something odd is going on there. ok, i put back somewhat this issue, so i'll try to get myself together and pester fujitsu-siemens again tomorrow :) regarding possibility of power shutdown during suspend - maybe this could be treated as something similar, maybe by blacklist ? currently we are left with some unaccessible sda device, maybe it is possible to remove and re-detect (eith by noticing the problem upon resume or by checking the blacklist) ? and what about "the irq thing" :) ? what exactly is it (is it the 'spurious irq' that comes up sometimes ?), is it something related to hardware/bios problems ? what could i debug to possibly report fuj/siem again ? additionally, i filed another, probably similar, problem with usb (bug 5765) - maybe one of you might spot the problem right away after digging through this issue :) Your suspend problem can't be fixed from within the kernel, but you can fix it in userspace by unmounting the filesystem before the suspend and remounting it after the resume. That's what I suggested earlier. The IRQ thing David was talking about is the problem you first wrote about to Fujitsu-Siemans. Fixing it won't solve anything so long as the BIOS resets the USB controller, however. ...and it seems that usb implementation of this laptop is seriously messed up. i used usb optical mouse to check powering of usb devices (not exactly the best way but i hope it will give at least some information). if it is the only device, the light goes off during the suspend, then it turns on and stays on while suspended. during resume, light goes off, then turns on again. if this is an indicator of power provided to usb devices, then seems it is cut off both during suspend and resume (but kept on when suspended). now, things get more interesting when i add another device. if i have an usb mouse and a blockdevice, mouse loses power twice, then works when resumed. but if i exchange them (this laptop has two usb ports), upon resuming power is resumed for a moment, then lost and the mouse does not work... with 2.6.13.2 it _does_ work. most of the times. sometimes the computer does not resume correctly, it starts as if a cold boot was performed. sometimes upon resume it stalls in the console and screen starts to light up (unevenly, from the sides). the funny thing, i am unable to reproduce these problems 2.6.15-rc6... tried several dozen times, none exposed these problems. after testing these combinations, i think i will try to sum up some of the findings to fujitsu/siemens - and then, i guess, i'll have to leave it for a while as i am loosing trace of what and how many times i have tested. |