Bug 42730 - ACPI suspend to disk no longer functions when USB key is present - Lenovo ThinkPad T510
Summary: ACPI suspend to disk no longer functions when USB key is present - Lenovo Thi...
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: power-management_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-04 21:28 UTC by rds1944
Modified: 2012-03-12 22:30 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.2.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
tail of syslog shows errors (7.93 KB, text/plain)
2012-02-04 21:30 UTC, rds1944
Details
3.2.5 config with debugfs enabled (52.56 KB, text/plain)
2012-02-07 20:35 UTC, rds1944
Details
Output from 0u after flash drive is inserted (117.27 KB, text/plain)
2012-02-08 20:58 UTC, rds1944
Details
Output from 0u when inserted flash drive is removed (119.88 KB, text/plain)
2012-02-08 20:59 UTC, rds1944
Details
/var/log/debug after boot with flash drive present before start (33.52 KB, text/plain)
2012-02-08 21:01 UTC, rds1944
Details
/var/log/dmesg after boot with drive present (43.41 KB, application/octet-stream)
2012-02-08 21:02 UTC, rds1944
Details
/var/log/dmesg after boot with FD (43.41 KB, text/plain)
2012-02-08 21:10 UTC, rds1944
Details
dmesg file with FRAME_POINTER set after failed s2disk (hibernation) (46.38 KB, text/plain)
2012-02-09 01:38 UTC, rds1944
Details
Patch to log SCSI scanning and INQUIRY commands (1.37 KB, patch)
2012-02-09 18:51 UTC, Alan Stern
Details | Diff
Check for freezing in SCSI scanning threads (1.36 KB, patch)
2012-02-10 22:10 UTC, Alan Stern
Details | Diff
dmesg after resume with your patch (51.23 KB, text/plain)
2012-02-11 00:36 UTC, rds1944
Details
Diagnostics for SCSI scanning and the freezer (1.49 KB, patch)
2012-02-13 22:19 UTC, Alan Stern
Details | Diff
dmesg + PRINTK_TIME + latest patch (59.67 KB, text/plain)
2012-02-14 02:00 UTC, rds1944
Details
Revised diagnostics for SCSI scanning and the freezer (1.21 KB, patch)
2012-02-14 15:36 UTC, Alan Stern
Details | Diff
dmesg - cold boot with patch 3 (60.59 KB, text/plain)
2012-02-14 16:23 UTC, rds1944
Details
dmesg - after resume from hibernation with patch 3 (60.57 KB, text/plain)
2012-02-14 16:24 UTC, rds1944
Details
Freezer test while scanning (475 bytes, patch)
2012-02-14 20:00 UTC, Alan Stern
Details | Diff
debug - patch 4 (40.34 KB, text/plain)
2012-02-14 21:01 UTC, rds1944
Details
dmesg - patch 4 (54.69 KB, text/plain)
2012-02-14 21:02 UTC, rds1944
Details
syslog - patch 4 (10.28 KB, text/plain)
2012-02-14 21:02 UTC, rds1944
Details
dmesg.txt using patch 4 (5.49 KB, text/plain)
2012-02-15 22:05 UTC, rds1944
Details
0u.txt from usbmon test inserting FD (118.97 KB, text/plain)
2012-02-16 16:32 UTC, rds1944
Details
dmesg.txt output (2.20 KB, text/plain)
2012-02-16 16:32 UTC, rds1944
Details
bash script to run this test (304 bytes, text/plain)
2012-02-16 16:33 UTC, rds1944
Details
0u.txt (119.07 KB, text/plain)
2012-02-16 18:10 UTC, rds1944
Details
dmesg2.txt (5.55 KB, text/plain)
2012-02-16 18:11 UTC, rds1944
Details
patch4test1 script (303 bytes, text/plain)
2012-02-16 18:11 UTC, rds1944
Details
patch4test2 script (336 bytes, text/plain)
2012-02-16 18:12 UTC, rds1944
Details
0u.txt (120.60 KB, text/plain)
2012-02-16 21:37 UTC, rds1944
Details
dmesg.txt (4.41 KB, text/plain)
2012-02-16 21:38 UTC, rds1944
Details
patch4test bash script (usbmon + 10s delay) (543 bytes, text/plain)
2012-02-16 21:44 UTC, rds1944
Details
Freeze the block-device polling workqueue (3.79 KB, patch)
2012-02-17 17:39 UTC, Alan Stern
Details | Diff
Implement the prepare stage for SCSI suspend (1.95 KB, patch)
2012-02-17 17:40 UTC, Alan Stern
Details | Diff
usb-storage scanning thread freezing fix and test (769 bytes, patch)
2012-02-17 17:41 UTC, Alan Stern
Details | Diff
0u.txt (118.97 KB, text/plain)
2012-02-17 18:20 UTC, rds1944
Details
dmesg.txt (2.89 KB, text/plain)
2012-02-17 18:20 UTC, rds1944
Details
Fix freezing of usb-storage scanning thread (700 bytes, patch)
2012-02-17 20:00 UTC, Alan Stern
Details | Diff
Convert the scanning routine from a kthread to a delayed-work (6.34 KB, patch)
2012-02-20 20:49 UTC, Alan Stern
Details | Diff
Convert the scanning routine from a kthread to a delayed-work, for 3.2 kernel (6.45 KB, patch)
2012-02-21 15:36 UTC, Alan Stern
Details | Diff

Description rds1944 2012-02-04 21:28:39 UTC
Am now using kernel 3.1 without problems.

echo disk > /sys/power/state

works fine.

It does NOT work in any of the 3.2.x series.

Attached is my syslog from the latest experiment with 3.2.4.

Using Slackware 13.37 on a Lenovo ThinkPad T510.
Comment 1 rds1944 2012-02-04 21:30:57 UTC
Created attachment 72283 [details]
tail of syslog shows errors
Comment 2 rds1944 2012-02-05 16:12:23 UTC
Forgot to include this.

The failure ONLY occurs when a USB 2.0 flash drive is in a socket. The flash drive filesystems are NOT mounted.

Without a flash drive, hibernation is ok.

But, once again, nothing like this happens in 3.1 kernel.
Comment 3 Alan Stern 2012-02-07 16:38:55 UTC
Follow the instructions in the kernel source file Documentation/usb/usbmon.txt to collect a usbmon trace showing what happens when the flash drive is first plugged in.  Attach the trace to this bug report; it should reveal the reason for the problem.
Comment 4 rds1944 2012-02-07 20:35:02 UTC
Created attachment 72313 [details]
3.2.5 config with debugfs enabled
Comment 5 rds1944 2012-02-07 20:38:01 UTC
debugfs fs NOT created. No dir /sys/kernel/debug. Cannot do the tests, of course. Please advise.
Comment 6 Alan Stern 2012-02-08 15:55:37 UTC
Your config file attached to comment #4 contains this line:

# CONFIG_DEBUG_FS is not set

Fix that and you should be all set.
Comment 7 rds1944 2012-02-08 20:58:30 UTC
Created attachment 72326 [details]
Output from 0u after flash drive is inserted
Comment 8 rds1944 2012-02-08 20:59:56 UTC
Created attachment 72327 [details]
Output from 0u when inserted flash drive is removed
Comment 9 rds1944 2012-02-08 21:01:06 UTC
Created attachment 72328 [details]
/var/log/debug after boot with flash drive present before start
Comment 10 rds1944 2012-02-08 21:02:06 UTC
Created attachment 72329 [details]
/var/log/dmesg after boot with drive present
Comment 11 rds1944 2012-02-08 21:08:21 UTC
Alan

Thanks for suggestion on config. It now shows debugging info. I collected 0u in 2 different scenarios, described above, & attached debug & dmesg logs just after boot into a console using a 3.2.5 kernel. I do not know how to read errors in ou files. Hopefully, you do.

As an aside, I tried Fedora 16 (has a 3.2.x kernel) last week. It, too, did not hibernate.

Richard

PS Are you the maintainer of the hibernate module?
Comment 12 rds1944 2012-02-08 21:10:03 UTC
Created attachment 72330 [details]
/var/log/dmesg after boot with FD
Comment 13 Alan Stern 2012-02-08 21:47:05 UTC
Oddly, the files all look quite normal.  There's no sign that any process is hung while trying to communicate with the flash drive and no errors in the usbmon traces.

A few seconds after you plug in the drive, does Alt-SysRq-W show anything interesting?  Note: You should type this in a VT console, not a graphical window.  Also, the output will be more meaningful if you enable CONFIG_FRAME_POINTER in your kernel.

(Instead of sending the contents of /var/log/whatever, in the future it will be better if you send the output from the "dmesg" command.  In fact, a dmesg dump following a hibernation failure with CONFIG_FRAME_POINTER enabled would be a good thing to see.)

No, I'm not the maintainer of the hibernation code (which is part of the Power Management subsystem).  But I am one of the USB maintainers.
Comment 14 rds1944 2012-02-09 01:38:54 UTC
Created attachment 72331 [details]
dmesg file with FRAME_POINTER set after failed s2disk (hibernation)

Alan

Can you decipher this traceback?

Richard
Comment 15 rds1944 2012-02-09 01:42:00 UTC
> does Alt-SysRq-W show anything interesting? 

Nope
Comment 16 Alan Stern 2012-02-09 18:51:38 UTC
Created attachment 72343 [details]
Patch to log SCSI scanning and INQUIRY commands

Yes, indeed I can decipher it.  It says that a process named usb-stor-scan was running and could not be frozen.  That process should show up when you run "ps -Awf" or something similar, after plugging in the drive -- even if you don't try to hibernate.  It also should show up in the Alt-SysRq-W output.

The process couldn't be frozen because it was waiting for a SCSI command to be executed.  Although it doesn't say explicitly, it sure looks like the command in question is an INQUIRY.  But the usbmon trace showed the INQUIRY command was executed normally!

I don't know what's going wrong.  Maybe this patch will help.  Let's see what it adds to the dmesg output when you plug in the flash drive.
Comment 17 Alan Stern 2012-02-09 19:10:19 UTC
Hold on a minute!  I just noticed that the problem occurred when you were trying to resume from hibernation, not when you were trying to hibernate!  That makes a big difference.

This means that the usb-stor-scan thread couldn't be frozen because it was waiting for some other thread to carry out the INQUIRY command, and that other thread had already been frozen.  This explains everything -- except that I don't know what the other thread is.

Let me look into this further.
Comment 18 Alan Stern 2012-02-10 21:58:26 UTC
Question: On your system, what shows up in /sys/module/scsi_mod/parameters/scan ?
Comment 19 Alan Stern 2012-02-10 22:10:57 UTC
Created attachment 72354 [details]
Check for freezing in SCSI scanning threads

This isn't exactly the correct solution, but it's worth trying to see if it helps at all.
Comment 20 rds1944 2012-02-11 00:36:45 UTC
Created attachment 72355 [details]
dmesg after resume with your patch
Comment 21 rds1944 2012-02-11 00:46:56 UTC
cat /sys/module/scsi_mod/parameters/scan produces "sync" on both 3.1 & 3.2.5

Re: Your patch on scsi_scan.c

Good news: It worked! (in a console)

Not so good news: screen flashed "garbage" as it hibernated. v 3.1 does not.

Now why is this suddenly a problem when it all functioned correctly in 3.1 & the preceeding 2.6.x series?

Have you been able to replicate this on any of your hardware?

Why am I the only Linux user to date that observes this? Usually IBM/Lenovo ThinkPad machines are among the most Linux friendly.

Attached is the dmesg output after resume; looks pretty good to me.

What next?
Comment 22 Alan Stern 2012-02-13 22:19:21 UTC
Created attachment 72368 [details]
Diagnostics for SCSI scanning and the freezer

Okay, so we're on the right track.  I don't know about garbage flashing on the screen; that may be totally unrelated.

I can't answer why the problem appeared in 3.2 until I know exactly what the problem is.  That's what the attached patch is for.  You should remove the previous patch, so that the error occurs again, and apply this one so that we can get more information about what's happening.  And while you're at it, please enable CONFIG_PRINTK_TIME so that the dmesg log contains timestamps.

I have not been able to reproduce your problem on my system (although I did uncover a couple of other problems with hibernation while trying).  And then today an update to systemd prevented my test system from booting!  So I have to solve that before I can run more tests over here.  Still, the output you get from this diagnostic patch will help.
Comment 23 rds1944 2012-02-14 02:00:14 UTC
Created attachment 72369 [details]
dmesg + PRINTK_TIME + latest patch
Comment 24 rds1944 2012-02-14 02:01:13 UTC
Attached above is dmesg using your second patch only
after resume from hibernation. Of course, it failed.
Hopefully, you'll garner enough info for the next
phase. I did set CONFIG_PRINTK_TIME as is obvious.

In my experience, flashes of garbage in suspend
indicate a flurry of error messages to stdout.

As an aside, I've looked at & played with systemd
(via Fedora 16). I can not see the point. An awful
lot of non-intuitive complexity for little benefit,
at least on my machine. I still use the old BSD/
SysVinit scheme (rc.S, rc.M, ...). The only time
'hog' is dhcpcd, so I just launch it early in the
background (... &). I'm up & running in a few seconds.
Good enough for me. And really simple to maintain.

What is your principal hardware & Linux distro?
Comment 25 Alan Stern 2012-02-14 15:36:14 UTC
Created attachment 72377 [details]
Revised diagnostics for SCSI scanning and the freezer

Unfortunately that log didn't contain the information I was looking for, i.e., the errors during resume from hibernation.  It was overwhelmed by the normal activity of your primary disk drive.  Here's a revised version of the patch that attempts to print out only the information of interest.

I regularly use a couple of different systems, an HP desktop and an ASUS laptop.  They're both running Fedora 16 32-bit.
Comment 26 rds1944 2012-02-14 16:23:54 UTC
Created attachment 72378 [details]
dmesg - cold boot with patch 3
Comment 27 rds1944 2012-02-14 16:24:59 UTC
Created attachment 72379 [details]
dmesg - after resume from hibernation with patch 3
Comment 28 rds1944 2012-02-14 16:30:29 UTC
Applied patch 3 (no patch 1 or 2). dmesg output from before & after hibernation are above. Any clues there?

Is it in any way useful to run diff on the relevant files in 3.1 & 3.2.x?

(When you installed Fedora 16, did you use GUI installer interface, or Anaconda kickstart cfg file, or?)
Comment 29 Alan Stern 2012-02-14 20:00:03 UTC
Created attachment 72381 [details]
Freezer test while scanning

This helped a little.  Apparently the initial INQUIRY command doesn't complete properly.  The freezer's timeout is 20 seconds and the command's timeout is a little more, so the freezer gives up first and aborts the hibernation attempt.
  
The question is why does the INQUIRY fail in this way.  I have no idea -- it worked perfectly during the cold boot.  The relevant parts of the two logs are here:

Cold boot:
[    3.912853] scsi 6:0:0:0: sdev req 12
[    3.914546] scsi 6:0:0:0: sdev cmd 12
[    3.916550] usb-storage 2-1.2:1.0: usb cmd 12
[    3.918240] scsi 6:0:0:0: Direct-Access              Patriot Memory   PMAP PQ: 0 ANSI: 0 CCS

Here the INQUIRY succeeded in less than 2 milliseconds.

Hibernation resume:
[    3.541509] scsi 6:0:0:0: sdev req 12
[    3.541571] scsi 6:0:0:0: sdev cmd 12
[    3.541621] usb-storage 2-1.2:1.0: usb cmd 12
[   23.498301] 
[   23.498326] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
[   23.498408] usb-stor-scan   D f461dcb0     0    51      2 0x00800004

Here the INQUIRY didn't succeed at all.  But then later on, after a reset, it did work normally.

Here's another test patch for you to try.  This is a little different from the others.  Boot with the flash drive unplugged, and start by doing "echo freezer >/sys/power/pm_test".  Then type "echo disk >/sys/power/state" but don't press the Return key; just leave the cursor sitting at the end of the line.

Then plug in the flash drive.  The new patch will cause a 10-second delay before scanning begins, so after plugging it in, wait about five seconds and then hit the Return key.  The question is whether the same sort of error will occur.  (And by the way, for testing it's a good idea always to leave both CONFIG_PRINTK_TIME and CONFIG_FRAME_POINTER enabled.)

(My Fedora 16 is an upgrade from Fedora 14, which in turn was an upgrade from Fedora 12, which in turn ...)
Comment 30 rds1944 2012-02-14 21:01:21 UTC
Created attachment 72383 [details]
debug - patch 4
Comment 31 rds1944 2012-02-14 21:02:01 UTC
Created attachment 72384 [details]
dmesg - patch 4
Comment 32 rds1944 2012-02-14 21:02:37 UTC
Created attachment 72385 [details]
syslog - patch 4
Comment 33 rds1944 2012-02-14 21:10:56 UTC
Followed your instructions for patch 4. Waited 10 sec after FD insert then hit CR on "echo disk > /sys/power state". After many sec, got "write error: device or resource busy". I've attached above the subsequent logs (debug, dmesg, & syslog). How can device (sdb?) be busy? It's not even mounted.

FRAME_POINTER is enabled.
Comment 34 Alan Stern 2012-02-15 16:02:25 UTC
You're supposed to hit Return after about 5 seconds, not 10 seconds.  If you run the test from a VT console, you should see the "10-second test delay..." message appear when you plug in the flash drive; hit Return less than 10 seconds after it shows up.

I can't tell what these attachments are.  Some of them look like files from /var/log.  Forget about those files, they aren't very useful for kernel debugging.  All I care about is the output from the dmesg command.

The file attached to comment #31 _is_ dmesg output, but it looks like the output from before you ran the test rather than after.  It doesn't contain any important information.

One thing you can do to help shrink the dmesg output is, before you start a test, to run "dmesg -C".  That will clear all the old entries out of the kernel's log buffer so that afterwards you'll get only the messages that were generated during the test, as opposed to everything that has happened since boot-up.
Comment 35 rds1944 2012-02-15 22:05:41 UTC
Created attachment 72391 [details]
dmesg.txt using patch 4

Repeated the test. dmesg.txt shows output. dmesg -c used to clear kernel log before the test. "10 second test delay..." was NOT echoed to screen after inserting FD. Waited 5 sec or so before doing the s2disk command. System hangs for about 20 sec, then I get "write error: device or resource busy" onthe VT console. Next I grabbed the dmesg stuff. (Nice small file now.) I repeated the test several times & always saw the same result.
Comment 36 Alan Stern 2012-02-16 15:22:46 UTC
Okay, that's great!  It did just what I hoped -- the bug showed up, and you didn't have to go through a full hibernation cycle to do it.  This will make further testing a lot easier.

The reason for the message not being echoed to the screen is probably just that the console's log level was set too low for it to show up.  You can increase the log level by typing Alt-SysRq-7 or doing "echo 7 >/proc/sys/kernel/printk".

For the next test, I'd like you to do the same thing as here, but this time collect a usbmon trace while the test is running.  Instructions for usbmon are in Documentation/usb/usbmon.txt.  The 0u file will be fine; start the "cat" process before plugging in the flash drive.
Comment 37 rds1944 2012-02-16 16:32:22 UTC
Created attachment 72403 [details]
0u.txt from usbmon test inserting FD

Here is 0u.txt while inserting FD. NB: This kernel still has the 10 sec delay patch in it. Also attached is dmesg.txt & script I use to run the test.
Comment 38 rds1944 2012-02-16 16:32:59 UTC
Created attachment 72404 [details]
dmesg.txt output
Comment 39 rds1944 2012-02-16 16:33:43 UTC
Created attachment 72405 [details]
bash script to run this test
Comment 40 Alan Stern 2012-02-16 16:45:14 UTC
You didn't do the "echo freezer >/sys/power/pm_test" and "echo disk >/sys/power/state" steps.  This should be exactly the same as the test in comment #35 (including the 10-second delay) except for the addition of the usbmon tracing.
Comment 41 rds1944 2012-02-16 18:09:36 UTC
Command sequence

Cold boot, no FD
In VT1, start usbmon script (patch4test2)
In VT2, run 10s delay script (patch4test1)
Insert FD, wait 5s, CR
Wait ...
Kill usbmon in VT1
Get output (0u.txt dmesg.txt), attached.
Comment 42 rds1944 2012-02-16 18:10:15 UTC
Created attachment 72406 [details]
0u.txt
Comment 43 rds1944 2012-02-16 18:11:10 UTC
Created attachment 72407 [details]
dmesg2.txt
Comment 44 rds1944 2012-02-16 18:11:56 UTC
Created attachment 72408 [details]
patch4test1 script
Comment 45 rds1944 2012-02-16 18:12:28 UTC
Created attachment 72409 [details]
patch4test2 script
Comment 46 rds1944 2012-02-16 21:37:01 UTC
Reran your tests using only one bash script, patch4test. Attached is 0u.txt & dmesg.txt. Looks pretty much as before.
Comment 47 rds1944 2012-02-16 21:37:39 UTC
Created attachment 72411 [details]
0u.txt
Comment 48 rds1944 2012-02-16 21:38:10 UTC
Created attachment 72412 [details]
dmesg.txt
Comment 49 rds1944 2012-02-16 21:44:44 UTC
Created attachment 72413 [details]
patch4test bash script (usbmon + 10s delay)
Comment 50 Alan Stern 2012-02-16 22:20:24 UTC
The usbmon trace in comment #42 holds the key.  These two lines:

f45ce240 539815062 S Ci:1:004:0 s a1 fe 0000 0000 0001 1 <
f45ce240 539815267 C Ci:1:004:0 -2 0

That -2 status code was unexpected.  Now I think I know the cause of the problem.  Let me do some testing over here; I'll get back to you later on.
Comment 51 Alan Stern 2012-02-17 17:38:09 UTC
Okay, I think it's all set.  While working on this, I found two other suspend-related bugs, so I'm going to attach three patches.  The first two are fixes for those other bugs, and the third will contain the usb-storage fix (along with the 10-second delay so you can test it easily).

These patches are for the 3.2.x kernel version.  The usb-storage fix, in particular, will have to be changed for 3.3.  But that's okay; I can take care of it now that I know what the problem is.

For this, just do the same test as before.  Attach the dmesg and usbmon output so I can make sure they are okay.  If everything works right, I'll take the 10-second delay out of the third patch so you can try it with real hibernation.
Comment 52 Alan Stern 2012-02-17 17:39:09 UTC
Created attachment 72425 [details]
Freeze the block-device polling workqueue
Comment 53 Alan Stern 2012-02-17 17:40:06 UTC
Created attachment 72426 [details]
Implement the prepare stage for SCSI suspend
Comment 54 Alan Stern 2012-02-17 17:41:12 UTC
Created attachment 72427 [details]
usb-storage scanning thread freezing fix and test

Note: The previous usb-storage patch should be removed before applying this one.
Comment 55 rds1944 2012-02-17 18:20:00 UTC
Created attachment 72428 [details]
0u.txt
Comment 56 rds1944 2012-02-17 18:20:33 UTC
Created attachment 72429 [details]
dmesg.txt
Comment 57 rds1944 2012-02-17 18:28:01 UTC
I did a fresh rebuild from scratch using ONLY the last 3 patches you posted. Output from the test is above. It looks good! No obvious errors (no -2 in 0u.txt & dmesg says all was frozen). So what's your assessment? Ready for a full hibernation test? In next round, please be specific on which patches to apply. I'm accumulating a large collection.

Could you look & see what was damaged in the evolution from 3.1.x -> 3.2.x? Did someone accidentally delete something? Was an algorithm altered? Or..?

Thanks.
Comment 58 Alan Stern 2012-02-17 20:00:01 UTC
Created attachment 72430 [details]
Fix freezing of usb-storage scanning thread

Here you go.  This patch replaces the previous one, so it's meant to go along with the second- and third-to-last patches from before (i.e., the block-device polling and SCSI prepare-stage patches).  Go ahead and try it with a full hibernation.

What changed between 3.1 and 3.2 is the way the usb-storage scanning thread got frozen.  You can see the actual change here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=f02fe890ece7d695a5744b20

I okay'ed that change, not realizing that enabling signals to be delivered to the scanning thread would mess up its I/O delays.  In addition, your flash drive has a bug of its own -- it refuses to carry out a SCSI command while waiting for a control transfer to complete.  Normally this wouldn't matter, but since the signal delivery caused the scanning thread to cancel its initial control transfer rather than completing it (as shown by the -2 in the usbmon trace), the flash drive just hung.  That in turn hung up the scanning thread long enough to abort the freezer and ruin the resume from hibernation.

Anyway, my patch now does things the right way.  It does not allow signals to be delivered, and it makes the scanning thread freezable at the place where it matters, that is, just before the I/O begins instead of at the start of a potentially long delay.
Comment 59 rds1944 2012-02-17 20:34:20 UTC
It worked in a console (some flashing text not readable). But, it completely failed in X. (Hibernated & powered down. Crashed during resume while extracting the swap image, I think.) Just rebooted the machine. I did this with the config for debugging mode.
Comment 60 Alan Stern 2012-02-17 20:59:10 UTC
That's bad.  On the other hand, if it got as far as trying to read in the memory image from swap then it's doing better than before -- it means the original bug has been fixed.  The new problem is likely to be something completely different.

Is there any way you can collect any information about the crash?  For example, can you boot with a network or serial console?  Even if you can't, it might help to boot with "no_console_suspend" and "loglevel=8" on the grub command line.

Also, can you verify that hibernation is reliable when you don't start it from within X?  For example, if X is running but you switch back to a VT, does it work?  And even if X isn't running, does hibernation occasionally fail?
Comment 61 rds1944 2012-02-17 22:42:08 UTC
So weird. Now it works. Running mc in an xterm & FireFox, I did s2disk & it resumed properly a few times. It also functioned when I switched from a running X to a VT & hibernated.

I do not have access to network boot nor a serial console.

I use lilo, not grub(2). So I added

    append="loglevel=8 no_consle_suspend"

to the per image section & reran lilo.

I think we need more testing of these mods.

I'll try building 3.2.6 with your 3 patches & NO debugging set. (As I normally build & use a kernel.)

Stay tuned.
Comment 62 rds1944 2012-02-17 23:43:15 UTC
Except for extraneous flashes on the screen (broken diagonal lines) during hibernation AND resume, 3.2.6 with your 3 new patches functions, sans debugging in config, as it should. s2ram is ok, too.

Others should check this out & verify reliability.

When might the patches be committed to a stable release? 3.3.0?

Thanks for your dedicated effort in resolving this issue.
Comment 63 Alan Stern 2012-02-20 20:49:59 UTC
Created attachment 72453 [details]
Convert the scanning routine from a kthread to a delayed-work

It turns out that last patch still has potential problems (it could hang if you unplug the flash drive at the wrong moment just when starting to hibernate).  Here's a different solution which should always work.  Please try it out.

The second patch has already been accepted and should appear in a 3.2 stable release soon.  The first should be accepted before long.  I'll wait to submit this third patch until you have tested it.
Comment 64 rds1944 2012-02-21 00:28:02 UTC
latest patch revision has errors. please advise. below is output from patch.

patching file block/genhd.c
patching file include/linux/workqueue.h
patching file kernel/workqueue.c
1
patching file drivers/scsi/scsi_pm.c
patching file drivers/scsi/scsi_priv.h
2
patching file drivers/usb/storage/usb.h
patching file drivers/usb/storage/usb.c
Hunk #2 FAILED at 827.
Hunk #3 succeeded at 919 (offset -1 lines).
Hunk #4 succeeded at 950 (offset -1 lines).
Hunk #5 succeeded at 990 (offset -1 lines).
1 out of 5 hunks FAILED -- saving rejects to file drivers/usb/storage/usb.c.rej
3
Comment 65 rds1944 2012-02-21 03:52:35 UTC
I'm lost! Last patch has an error (? It's serious, no kernel is compiled.). In any case, can you possibly post ONE single patch file that contains all 3 patches in correct sequence, after having fixed latest snafu? Thanks.
Comment 66 Alan Stern 2012-02-21 15:36:05 UTC
Created attachment 72457 [details]
Convert the scanning routine from a kthread to a delayed-work, for 3.2 kernel

My fault, sorry.  I forgot that you are using a 3.2 kernel, whereas my development and testing uses 3.3.  The newly attached patch is the same as before but back-ported to 3.2, so it should apply with no problem.

I don't like to distribute combined patches.  It makes changes more difficult; if something is wrong with only one patch, tracking it down is that much harder.

However, if you want, you can easily combine the three patches into a single file yourself simply by concatenating them.
Comment 67 rds1944 2012-02-21 17:46:19 UTC
It finally works, 3.2.5. Hibernates & resumes as expected.

Will 3.3.0 stable release have all these patches incorporated?
Comment 68 Alan Stern 2012-02-21 18:01:33 UTC
Probably it will.  Ask again a few weeks from now.
Comment 69 Len Brown 2012-03-06 02:52:58 UTC
closing, as this is fixed upstream.
Comment 70 Florian Mickler 2012-03-12 22:30:05 UTC
A patch referencing this bug report has been merged in Linux v3.3-rc5:

commit bb94a406682770a35305daaa241ccdb7cab399de
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Tue Feb 21 13:16:32 2012 -0500

    usb-storage: fix freezing of the scanning thread

Note You need to log in before you can comment on or make changes to this bug.