Bug 5227 - mounted usb block devices don't work after resume from S3
Summary: mounted usb block devices don't work after resume from S3
Status: REJECTED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Alan Stern
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-12 01:31 UTC by richlv
Modified: 2005-12-20 14:46 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.13.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
usb-storage suspend and resume callbacks (1.16 KB, patch)
2005-10-11 03:55 UTC, David Brownell
Details | Diff
Fix assignments to hcd->state in uhci-hcd (1020 bytes, patch)
2005-10-13 13:48 UTC, Alan Stern
Details | Diff
dmesg output (9.00 KB, application/bzip2)
2005-10-18 01:37 UTC, richlv
Details
new dmesg output (3.83 KB, application/bzip2)
2005-11-04 00:49 UTC, richlv
Details
Diagnostic for UHCI suspend routine (834 bytes, patch)
2005-11-07 10:04 UTC, Alan Stern
Details | Diff
uhci diag suspend/resume dmesg output (7.43 KB, text/plain)
2005-11-08 00:58 UTC, richlv
Details
Add logical disconnect when CONFIG_USB_SUSPEND isn't set (565 bytes, patch)
2005-11-09 14:00 UTC, Alan Stern
Details | Diff
Diagnostic for ACPI/BIOS suspend routines (1.87 KB, patch)
2005-11-12 08:35 UTC, Alan Stern
Details | Diff
trying to suspend with acpi=off (1.17 KB, text/plain)
2005-11-13 10:09 UTC, richlv
Details
suspend and resume with uhci&acpi debug patches (7.48 KB, text/plain)
2005-11-13 10:12 UTC, richlv
Details
Allow PIRQD to be on during resume (2.20 KB, patch)
2005-11-17 10:59 UTC, Alan Stern
Details | Diff
Allow PIRQD to be on during resume (2.36 KB, patch)
2005-11-21 09:53 UTC, Alan Stern
Details | Diff
dmesg_output_with pirqd patch applied (3.18 KB, application/bzip2)
2005-11-23 02:40 UTC, richlv
Details
Allow resume after controller reset (1.68 KB, patch)
2005-12-01 08:15 UTC, Alan Stern
Details | Diff
dmesg output with 'allow resuma after reset' (6.42 KB, text/plain)
2005-12-02 01:47 UTC, richlv
Details

Description richlv 2005-09-12 01:31:42 UTC
if a usb block device (reproduced with compactflash card in an usb reader, usb 
memory stick) is mounted when machine is put to sleep, it is found as a new 
device upon resume.

for example, if it is a single device, it is first found as sda. upon resume it 
is found as sdb, and sda is not working anymore (sdb is accessible and works).

trying to access mounted filesystem results in an error :
Sep 11 17:14:15 is kernel: scsi1 (0:0): rejecting I/O to dead device
Comment 1 Shaohua 2005-09-12 02:24:27 UTC
Sounds like a USB bug to me. Did this ever work before?
If you unload the USB driver before suspend and reload it after resume, did it 
work?
Comment 2 richlv 2005-09-12 02:45:04 UTC
> Did this ever work before?

have no idea, i have successfully resumed only since 2.6.13 :)
(http://bugzilla.kernel.org/show_bug.cgi?id=4390), but it was the same in 2.6.
13.

> If you unload the USB driver before suspend and reload it after resume, did it 
work?

yes. the problem arises only if the device has been mounted during suspending

testing usb devices i also stumbled upon another strange problem - if i mount an 
usb device after resuming, kde laptop daemon stops responding. there is nothing 
in the logfiles.

could this be related and what can i do to debug this problem better ?
Comment 3 richlv 2005-09-12 03:06:23 UTC
oh. actually it (klaptopd) starts responding again after some time. also easily 
reproducable - resume, test klaptopd - responds. mount usb device - stops 
responding. wait a while, it responds again. mount usb device - stops 
responding. (it does not matter if the device is unmounted when waiting for 
klaptopd to respond - it responds in both cases).
the wait time required is pretty long, at least several minutes.

everything works ok if there has been no resuming.
Comment 4 David Brownell 2005-09-12 07:23:36 UTC
Was this a real suspend/resume cycle, like suspend-to-RAM?

Or was it a swsusp "checkpoint, then powerdown, then powerup
and reset and restore checkpoint"?  If so then you _have_ in
fact broken the connection to the USB disk.

Comment 5 richlv 2005-09-12 07:34:28 UTC
it was suspend to memory

in swsusp case, i suspect, there would be a script (hibernation script in 
suspen2 project ?) that would unmount specific drives and mount them upon 
resuming.
Comment 6 David Brownell 2005-09-14 22:00:59 UTC
Can you let me know what HCD(s) were involved with this? 
And please confirm that this is with CONFIG_USB_SUSPEND=n. 
 
Some recent changes to usbcore seem to have caused regressions 
in OHCI (and hence I assume EHCI) in some suspend/resume paths. 
 
Yes, I'd certainly expect S1 and S3 suspend to preserve mounts. 
In fact I've tested equivalent stuff many times, just not with 
the last few kernels. 
Comment 7 richlv 2005-09-14 23:26:16 UTC
1. umm. i'm not sure what hcd is ;)
looking up one of explanations was "Hard Copy Device", so i'll assume that this 
is about devices that i was able to reproduce the problem with.

a) kingston data traveller ii+;
b) a compactflash card reader, labelled as kingston, recognized as "Vendor: 
eUSB".

2. grepping .config for USB_SUSPEND produces only
# CONFIG_USB_SUSPEND is not set
Comment 8 richlv 2005-09-15 00:29:14 UTC
oh. ok, so ehci & uhci are compiled in.
lspci reports that usb controllers are via vt82 family uhci usb 1.1.
(maybe this means i should try without ehci compiled in ?)
Comment 9 David Brownell 2005-09-23 08:52:10 UTC
OK, I did some testing recently and found there have indeed been
regressions in the behavior of the USB stack in the past few
releases.  Annoying.

http://marc.theaimsgroup.com/?l=linux-usb-devel&m=112745488014635&w=2

is the overview of a set of patches that fix all the issues I've
been able to reproduce, as well as finally fixing some of the
structural issues that have needed attention for some time.

There's still a separate issue that usb-storage has no suspend()
and resume() methods, however, so there may be issues associated
with that ...

Comment 10 David Brownell 2005-10-10 20:04:10 UTC
The USB power management patches queued for 2.6.15 at 
 
   http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-04-usb/ 
 
fix a variety of issues.  Maybe this one; give them a try and let 
us know how they behave for you. 
Comment 11 richlv 2005-10-10 23:23:54 UTC
um. i would appreciate some hand holding this time :)

1. which kernel version can i apply these to ?
2. should i just apply all of them and check for the problems ?
3. will simple "for i in `cat series`; do patch..." do the trick or will email 
headers prevent this ?

thanks.
Comment 12 David Brownell 2005-10-11 03:31:31 UTC
Sorry about that lack-of-directions.  The patches are maintained 
with "quilt", so the "series" file holds the order in which 
to apply the patches.  (As says the README...)  Simples to 
copy the whole directory to .../linux/patches (toplevel kernel 
directory) then "quilt push -a". 
 
 
 
 
Comment 13 David Brownell 2005-10-11 03:55:08 UTC
Created attachment 6276 [details]
usb-storage suspend and resume callbacks

You _may_ need this in addition to the patches that make
suspend/resume behave properly.
Comment 14 richlv 2005-10-11 04:17:40 UTC
oh. ok, i found quilt-0.42.
now the last bit - which kernel version do i apply these to ? :)
Comment 15 David Brownell 2005-10-11 05:04:12 UTC
current kernel (2.6.14-rc4) should suffice.  
Comment 16 richlv 2005-10-11 06:33:15 UTC
all patches applied successfully (except this one attached here, i'll test it 
later).
results :

1. with usb flash memory attached it failed to suspend at all (not mounted) :
Oct 11 16:23:46 is kernel: Stopping tasks: 
================================================|
Oct 11 16:23:46 is kernel: usb-storage 1-2:1.0: no suspend?
Oct 11 16:23:46 is kernel: Could not suspend device 1-2: error -16
Oct 11 16:23:46 is kernel: Some devices failed to suspend
Oct 11 16:23:46 is kernel: Restarting tasks... done

2. suspending phase has a heavily distorted image. with previous versions i 
could see some text messages (about some pci interrupts etc) - regression.

3. after resuming (without usb block device) my usb mouse stopped working

4. with usb mouse attached it freezes upon resume. it resumes if i move the usb 
mouse (i waited for a minute). with mouse removed works as expected (resumes in 
a couple of seconds)
Comment 17 richlv 2005-10-11 06:41:51 UTC
with attached patch applied (clean) :

1. my mouse does not work after resuming;

2. with usb blockdevice attached (unmounted) conmputer freezes upon resuming. if 
i remove usb device, it stalls for 5 more seconds, then resumes.

some data from syslog :

Oct 11 16:38:11 is kernel: Stopping tasks: 
================================================|
Oct 11 16:38:11 is kernel: ACPI: PCI Interrupt 0000:00:11.1[A]: no GSI
Oct 11 16:38:11 is kernel: Restarting tasks... done
Oct 11 16:38:20 is kernel: usb 1-1: device descriptor read/64, error -110
Oct 11 16:39:29 is kernel: Stopping tasks: 
================================================|
Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot reset port 2 (err = -113)
Oct 11 16:39:29 is last message repeated 4 times
Oct 11 16:39:29 is kernel: hub 1-0:1.0: Cannot enable port 2.  Maybe the USB 
cable is bad?
Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot disable port 2 (err = -113)
Oct 11 16:39:29 is kernel: Badness in hcd_endpoint_disable at drivers/usb/core/
hcd.c:1368
Oct 11 16:39:29 is kernel:  [<c02b1c30>] hcd_endpoint_disable+0x140/0x150
Oct 11 16:39:29 is kernel:  [<c02b3b80>] usb_disable_endpoint+0x50/0x70
Oct 11 16:39:29 is kernel:  [<c02aebdb>] ep0_reinit+0x1b/0x50
Oct 11 16:39:29 is kernel:  [<c02af632>] hub_port_connect_change+0x1e2/0x410
Oct 11 16:39:29 is kernel:  [<c02a000a>] pci_iomap+0xea/0xf0
Oct 11 16:39:29 is kernel:  [<c02afb3f>] hub_events+0x2df/0x460
Oct 11 16:39:29 is kernel:  [<c02afcd7>] hub_thread+0x17/0xf0
Oct 11 16:39:29 is kernel:  [<c01303b0>] autoremove_wake_function+0x0/0x60
Oct 11 16:39:29 is kernel:  [<c01303b0>] autoremove_wake_function+0x0/0x60
Oct 11 16:39:29 is kernel:  [<c02afcc0>] hub_thread+0x0/0xf0
Oct 11 16:39:29 is kernel:  [<c012fea5>] kthread+0xa5/0xb0
Oct 11 16:39:29 is kernel:  [<c012fe00>] kthread+0x0/0xb0
Oct 11 16:39:29 is kernel:  [<c0101379>] kernel_thread_helper+0x5/0xc

after a bunch of badness :

Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot reset port 2 (err = -113)
Oct 11 16:39:29 is last message repeated 4 times
Oct 11 16:39:29 is kernel: hub 1-0:1.0: Cannot enable port 2.  Maybe the USB 
cable is bad?
Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot disable port 2 (err = -113)

after some more badness (probably the moment i removed the device ?) :

Oct 11 16:39:29 is kernel: hub 1-0:1.0: cannot disable port 2 (err = -113)
Oct 11 16:39:29 is kernel: ACPI: PCI Interrupt 0000:00:11.1[A]: no GSI
Oct 11 16:39:29 is kernel: usb 1-2: device descriptor read/64, error -110
Oct 11 16:39:29 is last message repeated 3 times
Oct 11 16:39:29 is kernel: usb 1-2: device not accepting address 9, error -110
Oct 11 16:39:29 is kernel: Restarting tasks... done

so comparing to 2.6.13.2 i have a lot of regressions :)
Comment 18 David Brownell 2005-10-11 16:22:41 UTC
What USB controllers are you using?  Please send 
/sbin/lspci -vv | grep HCI 
output ... 
 
Yes, that one patch is needed to prevent the errors 
you saw without it.  If you're using UHCI there was 
a BIOS-related patch submitted (and AFAIK not in that 
patchset) ... try disabling all usb mouse/keyboard 
support in your BIOS. 
Comment 19 richlv 2005-10-11 23:01:55 UTC
00:11.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller 
(rev 23) (prog-if 00 [UHCI])
00:11.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller 
(rev 23) (prog-if 00 [UHCI])
00:11.4 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller 
(rev 23) (prog-if 00 [UHCI])

(and one fw controller)
the computer itself has only two usb ports, i think both are connected to one 
controller.

i'll try disabling usb mouse in bios today, too.
Comment 20 richlv 2005-10-12 00:27:08 UTC
there are no parameters for usb configuration in bios.
this is fujitsu-siemens lifebook c-1020.
Comment 21 David Brownell 2005-10-12 20:14:11 UTC
Re-assigning to Alan, as this touches both UHCI and usb-storage 
which are drivers I don't generally touch.  Most specifically 
the issue looks to be a problem with UHCI and S3 resume. 
 
The "badness" is a warning that I suspect may longer be useful; 
I seem to recall I had removed it in one patch, but it stopped 
mattering for OHCI and EHCI so I left it in. 
 
Comment 22 Alan Stern 2005-10-13 13:48:21 UTC
Created attachment 6293 [details]
Fix assignments to hcd->state in uhci-hcd

Okay, I've got a patch for you to try.	It fixes a similar problem on my system
with Standby.  I couldn't test suspend to RAM because my computer doesn't want
to wake up afterward!

I've lost track of the patches David had you try.  So this is meant to go on
top of 2.6.14-rc4 plus

http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-all-2.6.14-rc4.patch


plus David's suspend/resume patch for usb-storage.
Comment 23 richlv 2005-10-14 01:15:42 UTC
ok, here's what i did...

took rc-4. applied grekh-2.6 patch. applied both patches attached to this issue. 
i _did not_ try to apply patches david previously suggested (gregkh-04-usb).

all patches applied successfully.

result - suspends and resumes with usb mouse normally (just as plain 2.6.13.*). 
if i add usb block device, it does not suspend (similar as previosuly). logs 
were slightly different, though, so here they are. blockdevice was not mounted.

Oct 14 11:11:38 is kernel: Stopping tasks: 
=====================================================|
Oct 14 11:11:38 is kernel: sda: assuming drive cache: write through
Oct 14 11:11:38 is kernel: sda: assuming drive cache: write through
Oct 14 11:11:38 is kernel: hub 1-0:1.0: suspend error -16
Oct 14 11:11:38 is kernel: Could not suspend device 1-0:1.0: error -16
Oct 14 11:11:38 is kernel: Some devices failed to suspend
Oct 14 11:11:38 is kernel: Restarting tasks... done
Comment 24 Alan Stern 2005-10-14 07:10:54 UTC
Did you apply the patch attached to comment #13?  If not, then apply it too.

In any case, please turn on the USB verbose debugging option in the kernel
configuration (CONFIG_USB_DEBUG) and rebuild the USB drivers.  Then run your
test as before, and this time post the dmesg log starting from the point where
you plug in the mass storage device.
Comment 25 David Brownell 2005-10-15 08:01:33 UTC
Re comment #24 ... Alan, I suspect this is because SCSI isn't marking 
devices as suspended.  You'd be able to observe the lack of this in 
sysfs after "echo -n 3 > /sys/class/scsi_device/.../power/state", or 
by having drivers/usb/core/usb.c::verify_suspended() print a message 
for unsuspended devices... 
Comment 26 richlv 2005-10-17 02:49:54 UTC
yes, i did apply both patches that are attached to this issue.
turns out, unability to suspend is not connected to block device, but to mouse. 
this is a regression from 2.6.13.2.

additionally, blockdevice behaviour has changed slightly : usb device shows up 
as accessible, but empty.

what i did :

enable usb & usb storage debug

add usb mouse; dmesg > add_mouse.txt
add usb flash memory -> add_flash.txt
suspended & resumed successfully -> resumed.txt
used mouse while trying to suspend -> used_mouse-failed_to_suspend.txt

this is a regression from 2.6.13.2, where i can move the mouse around as much as 
i wish during suspend, it does not prevent suspending.

mounted flash, suspended & resumed -> resumed_with_mounted.txt

at this point df shows correct information about mounted device, i can sort of 
access it, but it shows up as empty.
when i access it -> usb_access_after_resume.txt

umount it, try to mount, fails with "not a valid block device" -> 
umount_mount_not-a-valid-device.txt
now try to mount it as sdb, not sda -> after_failure_mount_sdb.txt


some of dmesg outputs might be smaller and not contain all events that happened 
between to captures (i learned about -s switch only after these tests ;) ). i 
hope they have enough information.
if not, i can redo specific tests.
Comment 27 Alan Stern 2005-10-17 07:31:51 UTC
Where are your dmesg log files?

Without seeing the logs I can't be sure, but probably the reason you can't use
the mounted storage device after resuming is because there was no suspend power
available to maintain the USB connection.  Without a small amount of current
during the suspend, it will appear to the computer as though you had unplugged
the device while the machine was asleep.  After resume the device is
rediscovered, but it shows up as a new device: /dev/sdb instead of /dev/sda, for
instance.  The end result is that you can't use the mounted filesystem --
exactly as if you had unplugged and replugged the device while the computer was
awake.

By the way, do you still have CONFIG_USB_SUSPEND unset in your .config?
Comment 28 richlv 2005-10-18 01:37:26 UTC
Created attachment 6329 [details]
dmesg output

whoops. kicking myself in the nuts. attached logfiles this time.

usb suspend still is disabled.
the computer is on the repair now (cd-rom replacement), so i won't be able to
do tests for a couple of days, but if needed, put up testing scenarios, i'll do
them when i get it back :)
Comment 29 Alan Stern 2005-10-18 12:55:41 UTC
It's hard to tell what's going on because of all the messages from usb-storage
in your log files.  It would help if you could try doing the tests again, after
turning off CONFIG_USB_STORAGE_DEBUG.

Also, you should apply this new patch:

http://marc.theaimsgroup.com/?l=linux-kernel&m=112966405019175&w=2

I think that will solve your "mouse-movement" problem -- which from the logs,
appears to have nothing at all to do with moving the mouse.
Comment 30 Alan Stern 2005-10-31 12:28:25 UTC
Has there been any progress on this?  Try using 2.6.14 together with the
corresponding gregkh-all-2.6.14.patch changes.  That contains all the fixes
we've been discussing.
Comment 31 richlv 2005-10-31 22:40:04 UTC
local siemens service is still waiting for replacement cd-rom, as i am told. as 
soon as i get the computer back, i'll test it.

to make sure i understand the steps required :
take 2.6.14, enable usb debug (but not storage debug).
apply http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-
all-2.6.14-rc4.patch

apply both patches attached to this issue;
apply http://marc.theaimsgroup.com/?l=linux-kernel&m=112966405019175&w=2

right ? :)
Comment 32 Alan Stern 2005-11-01 08:52:15 UTC
The last two steps are not necessary.  The patches attached to this report and
that other patch have been merged into 2.6.14 or gregkh-all-*.

In fact, right now it looks like gregkh-all-2.6.14 is against -git3, so you
should start from 2.6.14-git3.  If there are more updates by the time you get
your hardware running again, start from whichever 2.6.14 base the gregkh-all
patch is based on.
Comment 33 richlv 2005-11-04 00:48:41 UTC
so...
took 2.6.14, applied 2.6.14-git5 (as latest gregkh-all was against this), 
applied
gregkh-all-2.6.14-git5.patch.

enabled usb debug, disabled usb storage debug.

short story : in some cases usb device was accessible without errors after 
resuming even if it was mounted when suspending (though i suspect this might be 
connected to the delay between suspend/resume - maybe power was not terminated 
for usb device if this time was too short ?);
in some cases i could read top directory but not subdirectories;
and also previous behaviour with device being recognized as sdb1 and mounted one 
not working was reproducible.

this time i used debug -c to clear any events left over from previous run, so 
every file should contain only event that happened during that action.

list :

- clean.txt - a clean boot with debug kernel;
- added_mouse.txt - self explanatory :)
- suspend_with_mouse*.txt - suspended with mouse only, both still and actively 
moved. suspending in both cases was successful, and it seems that dmesg output 
is almost identical (only some device ids differ). btw, what was the reason 
behind inability to suspend when mouse was moved (as it was very easy 
reproducible, but you mentioned that this problem was not connected to mouse);
- block_dev_attached.txt - attached cf reader with a card;
- susp_mouse_block.txt - suspended with both mouse & blockdev attached 
(unmounted);
- block_mounted.txt - mounted the blockdevice;
- susp_mouse_block_mounted.txt - suspended with both devices attached and 
blockdev mounted. in this case device was perfectly accessible after resuming;
- susp_block_mounted.txt - suspended and resumed with both dev attached and 
blockdev mounted. in this case there was longer delay between suspend and 
resume;
- attempt_to_access_block_after_resume.txt - now blockdev was not accessible. 
attempts to access it produced errors.

there were also a couple of error messages even when blockdev was accessible 
after resuming - if they are not connected to this problem, maybe i should file 
a new issue ?
Comment 34 richlv 2005-11-04 00:49:21 UTC
Created attachment 6466 [details]
new dmesg output
Comment 35 Alan Stern 2005-11-07 10:04:46 UTC
Created attachment 6487 [details]
Diagnostic for UHCI suspend routine

Your new logs are good.  I can see two problems; one is a bug, and the other I
don't know about.  Oddly enough, the bug partially compensates for the other
problem -- this explains why your storage device was sometimes usable after
resuming.  Given that strange second problem, it should not have been usable
any of the times!

The bug I know how to fix, so let's not worry about it for now.  The attached
patch adds some logging statements to the UHCI driver; they may help to show
where the second problem comes from.  When you try it out, it doesn't matter if
the storage device is mounted or not.  And you don't have to send a complete
set of logs; all I need to see is what happens during the suspend & resume.

You asked why earlier it seemed that moving the mouse would prevent the system
from suspending?  I don't remember all the details, but it was connected with
the fact that you moved the mouse the _second_ time you suspended and not the
_first_ time.  Something changed in between, and that something was the real
cause of the problem.  Maybe what changed was that you plugged in the storage
device.  Anyway the fix has been included in 2.6.14, so that particular problem
should not bother you again.
Comment 36 richlv 2005-11-08 00:58:50 UTC
Created attachment 6493 [details]
uhci diag suspend/resume dmesg output

applied the attached patch to previously used kernel.
with usb mouse and single blockdevice attached suspended & resumed. this is a
dmesg output after resuming.
Comment 37 Alan Stern 2005-11-08 07:41:28 UTC
This last test result definitely shows what's going wrong.

When the UHCI driver suspends a controller, it tells the controller not to issue
any more interrupts.  In the log file, that setting is confirmed by the lines
that say: "read_config 0, val 0000".  0000 means interrupts are turned off,
whereas 2000 means interrupts are turned on.

During the resume procedure, the UHCI driver tests to make sure that interrupts
are still turned off.  If they aren't, it means that something has changed the
controller's state unexpectedly, leaving the driver no choice but to perform a
complete reset (which will cause all existing connections to be broken).  That's
exactly what happened, as you can see from the lines that say:
"uhci_check_and_reset_hc: legsup = 0x2000" and "Performing full reset".

Now the question becomes: Why were interrupts turned back on, and what piece of
code did it?  Was it done by the BIOS, or was it done by ACPI?  A good test for
you to try would be to boot with "acpi=off" on the boot command line, then do
exactly the same thing as last time.  If interrupts still get turned on, then
we'll know the BIOS is guilty.

Regardless of who did it, it's potentially dangerous.  The driver can't handle
interrupts while the controller is suspended, so if the controller does issue an
interrupt it will spam the CPU and quickly cause the system to turn off the
entire IRQ line.
Comment 38 Shaohua 2005-11-08 17:14:38 UTC
Hi Alan,
>Why were interrupts turned back on
What does this mean? ACPI might enable PIC/IOAPIC, interrupt router before USB 
resume. Is this harmful?
We discussed this issue before in pm-list, driver should call free_irq in 
the .suspend method.
Comment 39 David Brownell 2005-11-08 19:39:20 UTC
Re David's comment #38, surely you recall the recent discussions with 
Linus ... leading to the _removal_ of that troublesome "free_irq during 
suspend" code?  That happend as I recall shortly before 2.6.13 froze. 
 
At any rate, something mucking with the UHCI IRQ is a different issue. 
 
Comment 40 Alan Stern 2005-11-09 07:38:30 UTC
David Shaohua:

The interrupt-enable in question is a setting internal to the UHCI controller. 
(It's bit 0x2000 of the word at address 0xc0 in the PCI configuration space, to
be exact.)  It has nothing to do with PIC/IOAPIC.
Comment 41 Alan Stern 2005-11-09 14:00:11 UTC
Created attachment 6511 [details]
Add logical disconnect when CONFIG_USB_SUSPEND isn't set

Rich:

The attached patch fixes the bug I referred to earlier.  After you apply it,
you should find that the storage device is _never_ accessible after resuming. 
Given what is happening to your USB controllers, that's the correct behavior.
Comment 42 richlv 2005-11-11 07:17:08 UTC
this is the latest bios version available, so i would like to report this 
problem to fujitsu-siemens if this really is a bios problem.

trying to suspend with acpi=off from commandline results in segmentation fault 
;)

how can i make sure that it is the bios problem (or find out what exactly causes 
the problem if it is not the bios) ?
Comment 43 Alan Stern 2005-11-12 08:35:52 UTC
Created attachment 6549 [details]
Diagnostic for ACPI/BIOS suspend routines

This patch is meant to go on top of the one in attachment #6487 [details].  It will print
out the value of the interrupt-enable bit after ACPI is finished (just before
calling the BIOS to suspend) and before ACPI restarts (just after resuming from
the BIOS).  That should tells us whether it's getting set by ACPI or by the
BIOS.

You can test this the same as before -- and don't use "acpi=off"!  However, it
would be interesting to see the dmesg log showing what went wrong earlier when
you tried to suspend with "apci=off".  You should attach that also.
Comment 44 richlv 2005-11-13 10:09:20 UTC
Created attachment 6559 [details]
trying to suspend with acpi=off

this is dmesg output with the same kernel and patchset as in comment #36,
booted with acpi=off and tried to suspend (echo "mem" > /sys/power/state)
Comment 45 richlv 2005-11-13 10:12:49 UTC
Created attachment 6560 [details]
suspend and resume with uhci&acpi debug patches

this is dmesg output with the same kernel and patchset as in comment #36, +
attachment #6549 [details] (but without the one in comment #41, as i was not sure it
would not change the result)

usb mouse and blockdevice attached, bd mounted. suspend/resume cycle, dmesg
right after that.
Comment 46 richlv 2005-11-13 10:23:18 UTC
trying to find familiar letters in second output ;) i stopped at these lines :

ACPI before, config 0, 0x0000
Back to C!
ACPI after, config 0, 0x2000

i suppose this means that value has changed in between when only bios should be 
able to change anything.

if so, i would appreciate some short writeup that i could hand to fuj-siem :)
Comment 47 Alan Stern 2005-11-14 07:58:35 UTC
I think the problem about suspending with "acpi=off" has been fixed in
2.6.15-rc1.  It's worth a try.

Tell the vendor that either their BIOS or else their ACPI tables cause the
USBPIRQDEN bit in the UHCI LEGSUP registers to be set after resuming from S3
sleep, even though it was not set when entering the sleep.
Comment 48 richlv 2005-11-15 00:48:50 UTC
2.6.15-rc1 with acpi=off doesn't do anything when trying to suspend. dmesg 
output is empty, too.

thanks for your help, i'll report this problem with usb to fujitsu siemens.
Comment 49 richlv 2005-11-15 23:26:50 UTC
i got this response from fujitsu-siemens (i must admit, i'm pleasantly surprised 
by their speed, though i have almost no technical knowledge to assess the answer
 ;) ) :

"The operating system saves and restores any registers that are not in the AUX 
power well. This is defined in the Enhanced Host Controller Interface (EHCI) 
specification, but is not defined in the UHCI or OHCI specifications. Because of 
this, the operating system does not save or restore any registers for UHCI or 
OHCI. The following rules apply:

Power management rules for USB host controllers transitioning from USB suspend 
to USB working. This is what is expected for: 
Sleep State S3
Sleep State S0
1. The controller must not reset the USB bus or cause a disconnect or power loss 
on any of the root USB ports. 
2. The system BIOS must not enable any type of legacy USB BIOS or otherwise 
enable the host controller to a run state. 
3. If a Peripheral Component Interconnect (PCI) reset occurs in addition to rule 
1, the BIOS must restore all host registers to their state prior to entering low 
power. Root ports should not indicate connect or enable status changes. 
4. The controller hardware must be in a functional state that is capable of 
driving resume and entering the run state without requiring a global hardware 
reset that otherwise would result in a USB bus reset driven on the root ports. "

additionally, in the answer was some finnish text (that supposedly refers to ms 
windows xp). as i have no knowledge of finnish i have no idea what it exactly 
says :) :
"Eli k
Comment 50 Alan Stern 2005-11-16 07:48:23 UTC
I don't understand Finnish either, although there are people on the Linux
mailing lists who do.

That reply from Fujitsu-Siemens wasn't very helpful, as you probably realized. 
They didn't answer the question at all.  Also, where they write:

"The operating system saves and restores any registers that are not in the AUX 
power well. This is defined in the Enhanced Host Controller Interface (EHCI) 
specification, but is not defined in the UHCI or OHCI specifications. Because of 
this, the operating system does not save or restore any registers for UHCI or 
OHCI."

The last sentence should instead say: "Because of this, the operating system
saves and restores all registers for UHCI and OHCI."  They got the logic exactly
backwards.

Do you have an email contact that I can use to communicate with them directly?
Comment 51 richlv 2005-11-16 08:06:47 UTC
as i was told later, finnish text was not significant from technical viewpoint.

i received this answer from Lehikoinen Pauli. i hope he doesn't mind too much 
being contacted by you ;)

Pauli.Lehikoinen at.well.he probably doesn't.like unsolicited.mail-cut here 
fujitsu-siemens.com
Comment 52 Alan Stern 2005-11-17 10:59:11 UTC
Created attachment 6606 [details]
Allow PIRQD to be on during resume

Well, that interchange with Fujitsu-Siemens wasn't much help.

Take out all those other patches I sent, keeping only the gregkh-all patch, and
then apply this one.  It tells the uhci-hcd driver that it's okay for PIRQD to
be on during a resume.	Perhaps it will solve your problem...  The only way to
know is to try it.
Comment 53 richlv 2005-11-21 01:22:21 UTC
some problems again...

took latest gregkh-all (gregkh-all-2.6.15-rc1-git6), successfully applied git6, 
gregkh-all and this latest pirqd patch. unfortunately, computer did not come 
back for suspend.

tried to remove some patches - turns out plain rc1-git6 resumes ok, so it was 
something in gregkh-all that causes a regression regarding the resume. 
unfortunately, logfiles seem to contain no relevant information.

should i file a separate bugreport regarding this problem ? if so, under which 
component (as i have no idea what causes it to freeze) ?


then i tried previously used gregkh-all, 2.6.14-git5. this time pirqd patch did 
not apply cleanly :

patching file drivers/usb/host/uhci-hcd.c
Hunk #1 succeeded at 720 (offset 7 lines).
Hunk #2 FAILED at 755.
1 out of 2 hunks FAILED -- saving rejects to file drivers/usb/host/uhci-hcd.c.
rej
patching file drivers/usb/host/pci-quirks.c
Hunk #1 succeeded at 28 (offset 6 lines).


and what can we determine from fuj-siem conversation ? did the link to ms 
guidelines help anyway ? to me it seems that ms has some guidelines that fuj-
siem has adhered to - but technically they are pretty bad solution - is this so
 ?

in this case, what choices do we have if a suspend is requested with usb 
blockdevices mounted (or probably any usb device that requires some permanent 
connection) ? i understand that there is basically no way to maintain this 
connection if bios keeps resetting that value, right ?
Comment 54 Alan Stern 2005-11-21 09:53:49 UTC
Created attachment 6631 [details]
Allow PIRQD to be on during resume

Here's a version of the new patch which should apply okay to 2.6.15-rc1-git6
without gregkh-all.  (I actually adjusted it for 2.6.15-rc1-git4 but there
shouldn't be any problem with slightly later versions.)  Try using it; I think
the gregkh-all patch isn't needed any more for your testing.

The suspend/resume problem you saw may go away when 2.6.15-rc2 and its
associated gregkh-all patch come out.  Hold off for a while before reporting
it.

The conversation didn't tell us much except that Fujitsu-Siemens doesn't
understand the problem and isn't interested in fixing it.  The Microsoft
guidelines document was about startup and shutdown; it did not mention
suspend/resume.

If the only change the BIOS makes is to reset this one bit, then the new patch
should fix the problem.  But if the BIOS makes other changes as well, then
there may be no way to maintain the USB connection.
Comment 55 richlv 2005-11-21 23:13:17 UTC
applied the patch, but nothing changed in the behaviour. mounted usb blockdevice 
is still unaccessible after resuming (rejecting io to dead device, found as sdb, 
blahblah)

"The conversation didn't tell us much except that Fujitsu-Siemens doesn't
understand the problem and isn't interested in fixing it.  The Microsoft
guidelines document was about startup and shutdown; it did not mention
suspend/resume."

i'm thinking about a detailed situation description, including all previous 
references and obtained information, coupled with a kind request to forward the 
information to responsible technical person. if this really is a clear problem 
with the bios, maybe such an information can get them moving.

given that latest patch didn't help, maybe we can dump all related parameters 
before/after, see which ones are unnecessary reset and inform them ?

now, this situation pretty much shows why closed source bios/firmware is a bad 
idea - a company will inevitably abandon older systems - and in this case bios 
probably was outsourced, so it might require quite an effort from fuj-siem, too.

but support for older systems is important for corporations, and this case might 
seriously impact our view on their ability to support the systems. i hope they 
will cooperate on this one :)
Comment 56 Alan Stern 2005-11-22 07:08:14 UTC
"given that latest patch didn't help, maybe we can dump all related parameters 
before/after, see which ones are unnecessary reset and inform them ?"

Most of the important parameters are given in the dmesg log.  What does it say?
Comment 57 richlv 2005-11-23 02:40:47 UTC
Created attachment 6650 [details]
dmesg_output_with pirqd patch applied

ok, a couple of dmesg outputs for suspend-resume cycle with usb blockdev
mounted (no other patches (additional debug output or whatever) were applied in
this case) :

susp-res_wo_usb_debug.txt - 2.6.15-rc1-git6 with second pirqd patch applied, no
usb debugging;

susp-res_with_usb_debug.txt - 2.6.15-rc1-git6 with second pirqd patch applied,
usb debugging enabled;

rc2_gregkh-all_pirqd.txt - 2.6.15-rc2-git2 with associated gregkh-all and first
pirqd patch, usb debugging enabled.
Comment 58 Alan Stern 2005-11-23 07:44:16 UTC
The log shows that the problem still exists:

uhci_hcd 0000:00:11.2: uhci_resume
uhci_hcd 0000:00:11.2: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:11.2: Performing full reset

The second line indicates that the entire UHCI controller has been reset, not
just the LEGSUP register.  This could be because the BIOS reset it, or it could
be because the controller lost power during the suspend -- although it shouldn't
lose power during a suspend-to-RAM.  In either case, there's no way to maintain
the connection to a storage device.

If you can find out from Fujitsu-Siemens who supplied their BIOS, maybe those
people would be willing to listen and fix it.

There's always an alternative, of course.  You could unmount the storage device
and rmmod usb-storage before doing the suspend, and then put things back after
the resume.  Short of a BIOS upgrade, however, I don't see any way to preserve
the mount across a suspend.
Comment 59 richlv 2005-11-27 09:34:11 UTC
bios is made by insyde software, but it says on their webpage : "For end user 
BIOS upgrades or other support, we recommend you contact your computer 
manufacturer."

i probably will try to compose tomorrow some sort of lengthy and explanatory 
mail to fujitsu-siemens in hope that they will at least forward that to somebody 
who would fully understand the problem.
Comment 60 Alan Stern 2005-11-27 12:48:58 UTC
It's possible that apart from the BIOS problems, the controller may actually
lose power during the suspend.  If that's so it doesn't matter what the BIOS
does, because the connection will be lost no matter what.

I can't think of any easy way for you to tell whether power is present.  Does
the computer wake up if you plug or unplug a USB device during the suspend?
Comment 61 richlv 2005-11-28 00:58:13 UTC
the computer doesn't wakeup if i replug usb mouse when it is suspended.

if the mouse is plugged and the computer is suspended, the mouse still glows (it 
is an optical mouse) and laser responds to objects, so i guess it still receives 
power.
Comment 62 Alan Stern 2005-12-01 08:15:53 UTC
Created attachment 6737 [details]
Allow resume after controller reset

Okay, so the power was maintained.

This patch will allow the resume to proceed even when the controller has been
reset.	See what happens when you use it; depending on the kind of reset
performed by the BIOS your storage device may or may not be accessible.  And
remember to post the dmesg log.
Comment 63 richlv 2005-12-02 01:47:55 UTC
Created attachment 6747 [details]
dmesg output with 'allow resuma after reset'

nope, did not work after resuming. when accessing, got those "rejecting i/o to
dead device again", device was found as sdb.
Comment 64 Alan Stern 2005-12-02 07:03:30 UTC
The log shows that the BIOS has reset the UHCI controller enough to destroy the
USB connections to the mouse and the storage device.  So it doesn't much matter
what the Linux driver does; there's no way it can keep the connections going.

If you can't get the BIOS fixed, the only thing to do is unmount the storage
device before suspending and remount it after resuming.

Since we've proved now that this isn't a kernel bug, I will change the
bug-report  status accordingly.
Comment 65 David Brownell 2005-12-02 08:53:42 UTC
About comment #64 ... I don't know this specific hardware, but 
in general there's no requirement that USB controllers stay 
powered during S3.  I've certainly had power-efficient laptops 
that turn off OHCI controllers in S3; no reason UHCI shouldn't 
sometimes do the same thing.  That much, Linux should recover 
from -- disconnecting devices etc. 
 
The IRQ thing is another matter, and it does seem that something 
odd is going on there. 
Comment 66 richlv 2005-12-20 10:58:24 UTC
ok, i put back somewhat this issue, so i'll try to get myself together and 
pester fujitsu-siemens again tomorrow :)

regarding possibility of power shutdown during suspend - maybe this could be 
treated as something similar, maybe by blacklist ?
currently we are left with some unaccessible sda device, maybe it is possible to 
remove and re-detect (eith by noticing the problem upon resume or by checking 
the blacklist) ?

and what about "the irq thing" :) ?
what exactly is it (is it the 'spurious irq' that comes up sometimes ?), is it 
something related to hardware/bios problems ? what could i debug to possibly 
report fuj/siem again ?

additionally, i filed another, probably similar, problem with usb (bug 5765) - 
maybe one of you might spot the problem right away after digging through this 
issue :)
Comment 67 Alan Stern 2005-12-20 11:52:16 UTC
Your suspend problem can't be fixed from within the kernel, but you can fix it
in userspace by unmounting the filesystem before the suspend and remounting it
after the resume.  That's what I suggested earlier.

The IRQ thing David was talking about is the problem you first wrote about to
Fujitsu-Siemans.  Fixing it won't solve anything so long as the BIOS resets the
USB controller, however.
Comment 68 richlv 2005-12-20 14:46:54 UTC
...and it seems that usb implementation of this laptop is seriously messed up.
i used usb optical mouse to check powering of usb devices (not exactly the best 
way but i hope it will give at least some information).

if it is the only device, the light goes off during the suspend, then it turns 
on and stays on while suspended. during resume, light goes off, then turns on 
again.

if this is an indicator of power provided to usb devices, then seems it is cut 
off both during suspend and resume (but kept on when suspended).

now, things get more interesting when i add another device. if i have an usb 
mouse and a blockdevice, mouse loses power twice, then works when resumed. but 
if i exchange them (this laptop has two usb ports), upon resuming power is 
resumed for a moment, then lost and the mouse does not work...

with 2.6.13.2 it _does_ work. most of the times. sometimes the computer does not 
resume correctly, it starts as if a cold boot was performed. sometimes upon 
resume it stalls in the console and screen starts to light up (unevenly, from 
the sides).
the funny thing, i am unable to reproduce these problems 2.6.15-rc6...
tried several dozen times, none exposed these problems.

after testing these combinations, i think i will try to sum up some of the 
findings to fujitsu/siemens - and then, i guess, i'll have to leave it for a 
while as i am loosing trace of what and how many times i have tested.

Note You need to log in before you can comment on or make changes to this bug.