Bug 14782
Summary: | Suspend hangs with SD card inserted | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Gary Trakhman (gary.trakhman) |
Component: | Block Layer | Assignee: | Jens Axboe (axboe) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | akpm, florian, lenb, rjw, xelfium, yakui.zhao |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://bugs.launchpad.net/ubuntu/+source/linux/+bug/492684 | ||
Kernel Version: | 2.6.32 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216, 14230, 56331 | ||
Attachments: |
lspci
dmesg some errors acpidump acpidump-gary try the debug patch lspci -v output for HP 530 laptop |
Description
Gary Trakhman
2009-12-10 18:53:45 UTC
Odd thing I noticed, it works fine at the GDM prompt, however, once I login, it gets the strange behaviour. I tentatively reassigned this to acpi. Created attachment 24171 [details]
lspci
Lenovo SL500, same problem. First one or two suspend works good. After next, black screen, can switch to console. Created attachment 24172 [details]
dmesg
dmesg: one ACPI warning ACPI Warning: _BQC returned an invalid level (20090903/video-631) Created attachment 24173 [details]
some errors
some errors:
Dec 13 22:56:01 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
Dec 13 22:56:30 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
Dec 13 22:57:06 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
Dec 13 23:00:35 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
Dec 13 23:01:02 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
from dmesg: input: Lid Switch as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input1 ACPI: Lid Switch [LID] and this problem occur, when i close cover my notebook. When i suspend with power button, PWRF) echo -n mem >/sys/power/state ;; works resume with no problems. please attach the acpidump output Created attachment 24175 [details]
acpidump
Breaks for me whether I close the lid or use pm-suspend from command line or use the keyboard hotkey. Created attachment 24183 [details]
acpidump-gary
for me, works heyboard hotkey and also pm-suspend (latest version, I use Archlinux 64-bit) fine, not work lid only Alois, are you sure you're having the same problem as me? My system just hangs.. I can't switch to console or anything. Gary, i think, that yes. I must min. 2x shutdown laptop. Can you say, version o X-server, pm-utils, and have you some errors in /var/log/errors.log ? Created attachment 24195 [details]
try the debug patch
Do someone have an opportunity to try the patch on the latest kernel and see whether the following message is complained?
> PM: Device PNP0C0D:00 failed to resume: error 1
Thanks.
Hi, Alois It seems that your issue on your box is different with that on Gary's one. The screen is blank when the LID is used to suspend/resume, right? If so, will you please try the following patch on the latest kernel and see whether it works for you? >http://lists.freedesktop.org/archives/intel-gfx/2009-December/005143.html Thanks. Hi, Gary Will you please boot the system with the boot option of "drm.debug=0x06 printk.time=1" and test whether the system can be suspened/resumed several times under the console mode? Please do the following test: 1. kill the process who uses the /proc/acpi/event(use the command of "lsof /proc/acpi/event" to get the process) 2. dmesg >dmesg_before; echo mem > /sys/power/state; dmesg >dmesg_after; 3. after the system enters the system, please press the power button and see whether the system can be resumed. 4. if the system can be resumed, please repeat the step 2/3 several times. When it fails to be resumed, please reboot the system and attach the output of dmesg_before/dmesg_after? Thanks. Yakui. I will test it soon, but it seems you're expecting different symptoms. My system never fails to resume. It hangs before actually suspending during suspend. Should I still do the same process? Also, it's fine in console only mode, the problem occurs only once I've logged in through gdm. (In reply to comment #16) > Created an attachment (id=24195) [details] > try the debug patch > > Do someone have an opportunity to try the patch on the latest kernel and see > whether the following message is complained? > > PM: Device PNP0C0D:00 failed to resume: error 1 > > Thanks. I patched with this patch kernel 2.6.32.1 and i see not error in /var/errors.log. I think, this patch works. (In reply to comment #17) > Hi, Alois > It seems that your issue on your box is different with that on Gary's > one. I think yet also, very sorry to bring my report of another bug to report other bug. (In reply to comment #17) > Hi, Alois > It seems that your issue on your box is different with that on Gary's > one. > The screen is blank when the LID is used to suspend/resume, right? > If so, will you please try the following patch on the latest kernel and > see > whether it works for you? > > >http://lists.freedesktop.org/archives/intel-gfx/2009-December/005143.html > > Thanks. I can't apply patch from http://lists.freedesktop.org/archives/intel-gfx/2009-December/005143.html kernel 2.6.32.1: patching file drivers/acpi/button.c patching file drivers/gpu/drm/i915/i915_drv.h Hunk #1 FAILED at 549. 1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_drv.h.rej patching file drivers/gpu/drm/i915/intel_lvds.c Hunk #1 succeeded at 679 (offset -7 lines). Hunk #2 succeeded at 1092 with fuzz 1 (offset -102 lines). commit 13c199c0d0cf78b27592991129fb8cbcfc5164de Author: Zhao Yakui <yakui.zhao@intel.com> Date: Tue Dec 15 22:01:57 2009 +0800 ACPI: Use the return result of ACPI lid notifier chain correctly shipped in linux-2.6.33 before rc1 sent to stable@kernel.org for 2.6.32.y closed That's Alois's issue that got solved, not mine. I still can't suspend in 2.6.33-rc1. To reproduce the behavior, I have to be logged in to my X desktop, but then I can switch over to virtual terminal. Is this ok? I will perform the test soon. (In reply to comment #18) > Hi, Gary > Will you please boot the system with the boot option of "drm.debug=0x06 > printk.time=1" and test whether the system can be suspened/resumed several > times under the console mode? Please do the following test: > 1. kill the process who uses the /proc/acpi/event(use the command of > "lsof > /proc/acpi/event" to get the process) > 2. dmesg >dmesg_before; echo mem > /sys/power/state; dmesg >dmesg_after; > 3. after the system enters the system, please press the power button and > see whether the system can be resumed. > 4. if the system can be resumed, please repeat the step 2/3 several > times. > > When it fails to be resumed, please reboot the system and attach the output > of > dmesg_before/dmesg_after? > > Thanks. > Yakui. Gary, Please try with and without CONFIG_DRM_I915_KMS this looks like a graphics problem to me. re-assign to Yakui. Hi, Gary Will you please boot the system into console mode with KMS enabled and do the following test? 1. kill the process using /proc/acpi/event(use the command of "lsof /proc/acpi/event" to get the process id) 2. echo mem > /sys/power/state; dmesg >dmesg_after1 3. wait for one minute and then press the power button to see whether the box can be resumed. 4. If it can be resumed, please "echo mem > /sys/power/state; dmesg >dmesg_after2" 5. wait for one minute and then press button to see whether the box can be resumed. If it can't be resumed, please reboot the box and see whether the file of dmesg_after2 is created. It will be great if you can boot the system into console mode with KMS disabled and do the above test again. (You can add the boot option of "nomodeset"). thanks. Using KMS in console only mode: Rebooted into single user mode with 2.6.33-rc1 from mainline ppa. 1. No processes using /proc/acpi/event 2. Works fine. No KMS in console only mode: Rebooted into single user mode. 1. No precesses using /proc/acpi/event 2. Will not resume from suspend, no dmesg files obviously. Using KMS when logged in with X: 1. Switched into VT for running the commands. 2. acpid kept creating new processes using /proc/acpi/event, did a 'service acpid stop' and that fixed it. 3. Works fine. This confuses me. Seems intermittent now when it wasn't before. After multiple reboots I was able to get it to fail consistently again. Using KMS when logged in with X take 2: 1. Killed acpid. 2. Suspended once ok, have dmesg from this. 2. Suspended second time fails. happens in 2.6.33-rc2 as well happens with ubuntu lucid alpha1 livecd running 2.6.32 as well, so it's probably not a configuration issue Working on a git-bisect for this issue. 03ba3782e8dcc5b0e1efe440d33084f066e38cae is the first bad commit commit 03ba3782e8dcc5b0e1efe440d33084f066e38cae Author: Jens Axboe <jens.axboe@oracle.com> Date: Wed Sep 9 09:08:54 2009 +0200 writeback: switch to per-bdi threads for flushing data This gets rid of pdflush for bdi writeout and kupdated style cleaning. pdflush writeout suffers from lack of locality and also requires more threads to handle the same workload, since it has to work in a non-blocking fashion against each queue. This also introduces lumpy behaviour and potential request starvation, since pdflush can be starved for queue access if others are accessing it. A sample ffsb workload that does random writes to files is about 8% faster here on a simple SATA drive during the benchmark phase. File layout also seems a LOT more smooth in vmstat: ... Git Bisect log: git bisect start # good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657 # bad: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32 git bisect bad 22763c5cf3690a681551162c15d34d935308c8d7 # bad: [73c583e4e2dd0fbbf2fafe0cc57ff75314fe72df] Merge branch 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 git bisect bad 73c583e4e2dd0fbbf2fafe0cc57ff75314fe72df # bad: [73c583e4e2dd0fbbf2fafe0cc57ff75314fe72df] Merge branch 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 git bisect bad 73c583e4e2dd0fbbf2fafe0cc57ff75314fe72df # bad: [d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 git bisect bad d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b # good: [32e6a0c82e7a7991a02414d830f262e1f4db73e6] WAN: remove deprecated PCI_DEVICE_ID from PCI200SYN driver. git bisect good 32e6a0c82e7a7991a02414d830f262e1f4db73e6 # bad: [7193bea53f9d9730bbc859777c2f86c76349914d] Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip git bisect bad 7193bea53f9d9730bbc859777c2f86c76349914d # good: [2d4ff66ad7b8811d0c75ccccad346496f67cb43a] Merge branch 'topic/hda' into for-linus git bisect good 2d4ff66ad7b8811d0c75ccccad346496f67cb43a # good: [89af571ca633ada14d17746519a179553a732d31] Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 git bisect good 89af571ca633ada14d17746519a179553a732d31 # good: [0b767b4df360bd442434d9d40b8a495e64202254] crypto: hmac - Prehash ipad/opad git bisect good 0b767b4df360bd442434d9d40b8a495e64202254 # good: [fd30afa454282bbe1b36d5d77bd72c0ea5b3f97c] Merge branch 'topic/usb-audio' into for-linus git bisect good fd30afa454282bbe1b36d5d77bd72c0ea5b3f97c # good: [81bd5f6c966cf2f137c2759dfc78abdffcff055e] crypto: sha-s390 - Fix warnings in import function git bisect good 81bd5f6c966cf2f137c2759dfc78abdffcff055e # bad: [a9c86d42599519f3d83b5f46bdab25046fe47b84] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 git bisect bad a9c86d42599519f3d83b5f46bdab25046fe47b84 # bad: [f09b00d3e789a88fa6c7c03cedc62cb65c1de0cb] writeback: add some debug inode list counters to bdi stats git bisect bad f09b00d3e789a88fa6c7c03cedc62cb65c1de0cb # good: [66f3b8e2e103a0b93b945764d98e9ba46cb926dd] writeback: move dirty inodes from super_block to backing_dev_info git bisect good 66f3b8e2e103a0b93b945764d98e9ba46cb926dd # bad: [d0bceac747b547c0b4769b91fec7d3c15600153f] writeback: get rid of pdflush completely git bisect bad d0bceac747b547c0b4769b91fec7d3c15600153f # bad: [03ba3782e8dcc5b0e1efe440d33084f066e38cae] writeback: switch to per-bdi threads for flushing data git bisect bad 03ba3782e8dcc5b0e1efe440d33084f066e38cae First-Bad-Commit : 03ba3782e8dcc5b0e1efe440d33084f066e38cae Can you double check that HEAD at 03ba3782e8dcc5b0e1efe440d33084f066e38cae is definitely broken and HEAD at 66f3b8e2e103a0b93b945764d98e9ba46cb926dd definitely always works? Are you using any special file systems? 03ba3782e8dcc5b0e1efe440d33084f066e38cae (a) is definitely broken in the same way as the releases I started with (2.6.32 rc1-final and 2.6.33 rc1-rc2). 66f3b8e2e103a0b93b945764d98e9ba46cb926dd (b) is good in the sense that I can suspend lots of times without a hang, but it is not stable. My system will just hang after a minute or so of use. Seems random. gary@gary-laptop:~$ cat /etc/mtab /dev/sda5 / ext4 rw,errors=remount-ro,commit=600 0 0 proc /proc proc rw 0 0 none /sys sysfs rw,noexec,nosuid,nodev 0 0 none /sys/fs/fuse/connections fusectl rw 0 0 none /sys/kernel/debug debugfs rw 0 0 none /sys/kernel/security securityfs rw 0 0 udev /dev tmpfs rw,mode=0755 0 0 none /dev/pts devpts rw,noexec,nosuid,gid=5,mode=0620 0 0 none /dev/shm tmpfs rw,nosuid,nodev 0 0 none /var/run tmpfs rw,nosuid,mode=0755 0 0 none /var/lock tmpfs rw,noexec,nosuid,nodev 0 0 none /lib/init/rw tmpfs rw,nosuid,mode=0755 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,noexec,nosuid,nodev 0 0 gvfs-fuse-daemon /home/gary/.gvfs fuse.gvfs-fuse-daemon rw,nosuid,nodev,user=gary 0 0 gary@gary-laptop:~$ 2.6.33-rc3 still has the problem just to clarify, I did recheck those two commits before I posted comment #36 On Monday 11 January 2010, Gary Trakhman wrote:
> yes, I can still reproduce it with any 2.6.32 and 2.6.33-rc kernel.
> Never happens on 2.6.31.
still occurs with 2.6.33-rc4. If I want to figure it out myself, what should I do? I'm new at this, but I know C. Can I just run a 2.6.32 kernel, and undo this commit, or is there going to be a lot more to it than that? (In reply to comment #40) > still occurs with 2.6.33-rc4. If I want to figure it out myself, what should > I > do? I'm new at this, but I know C. Can I just run a 2.6.32 kernel, and undo > this commit, That would be a good start. > or is there going to be a lot more to it than that? Depends on what you get when the bisected commit is reverted. Hi guys! I have the same issue on my HP 530 laptop. It freezes after two suspend/resume cycles. It hangs during third suspend - display is black, nothing works except hard power off. Kernels 2.6.32.x and 2.6.33-rc5 are affected. Tell me if I can help with testing or something else. ha! maybe now they'll listen to me since I'm not the only one in the world with this problem :-). Can you give an lspci -v output so we can compare hardware and try to figure it out? On Fri, Jan 29, 2010 at 2:22 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14782 > > > Alexander Saprykin <xelfium@gmail.com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |xelfium@gmail.com > > > > > --- Comment #42 from Alexander Saprykin <xelfium@gmail.com> 2010-01-29 > 07:22:41 --- > Hi guys! > I have the same issue on my HP 530 laptop. It freezes after two > suspend/resume > cycles. It hangs during third suspend - display is black, nothing works > except > hard power off. Kernels 2.6.32.x and 2.6.33-rc5 are affected. Tell me if I > can > help with testing or something else. > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug. > Created attachment 24781 [details]
lspci -v output for HP 530 laptop
(In reply to comment #43) > ha! maybe now they'll listen to me since I'm not the only one in the > world with this problem :-). Can you give an lspci -v output so we > can compare hardware and try to figure it out? > I've created an attachment. By the way, with 2.6.32 kernel my laptop resumes from suspend only with black screen, but it was fixed somewhere in 2.6.32.x versions. Very odd, there are not really any similarities. Yours: 945 chipset, integrated graphics Intel Pro/100 Ethernet intel 3945ABG wireless Mine: gm45 chipset, integrated graphics realtek 1gb lan atheros 2425 wireless Are you running a 64-bit kernel? I am. Haven't tried with a 32 yet. (In reply to comment #46) > Very odd, there are not really any similarities. > > Yours: > 945 chipset, integrated graphics > Intel Pro/100 Ethernet > intel 3945ABG wireless > > Mine: > gm45 chipset, integrated graphics > realtek 1gb lan > atheros 2425 wireless > > Are you running a 64-bit kernel? I am. Haven't tried with a 32 yet. My intel core duo doesn't support 64 bit, so I am running 32 bit only. Seems problem is in kernel. maybe you could try a git bisect and see if we end up in the same place? It takes a while though to do all the recompiling.. I doubt the validity of mine b/c it was intermittent once without explanation. On Fri, Jan 29, 2010 at 11:35 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14782 > > > > > > --- Comment #47 from Alexander Saprykin <xelfium@gmail.com> 2010-01-29 > 16:35:56 --- > (In reply to comment #46) >> Very odd, there are not really any similarities. >> >> Yours: >> 945 chipset, integrated graphics >> Intel Pro/100 Ethernet >> intel 3945ABG wireless >> >> Mine: >> gm45 chipset, integrated graphics >> realtek 1gb lan >> atheros 2425 wireless >> >> Are you running a 64-bit kernel? I am. Haven't tried with a 32 yet. > > My intel core duo doesn't support 64 bit, so I am running 32 bit only. Seems > problem is in kernel. > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug. > (In reply to comment #48) > maybe you could try a git bisect and see if we end up in the same > place? It takes a while though to do all the recompiling.. I doubt > the validity of mine b/c it was intermittent once without explanation. > No problems, I can compile and check. But I'm not familiar with git, so it would be nice if you will give me bisected sources to compile. Or you can tell me the command to get bisected sources from git. well, http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ but you have to recompile the kernel about 12 times this way. It does a binary search starting from first known good and first known bad. You recompile, and then tell it if the current kernel is good or bad. for instance, numbers 1 (good) out of 100 (bad) it'll pick revision 50, you recompile, tell it bad. It'll pick 25, recompile, tell it good. It'll pick 37.... until you find the first bad commit. On Fri, Jan 29, 2010 at 12:55 PM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14782 > > > > > > --- Comment #49 from Alexander Saprykin <xelfium@gmail.com> 2010-01-29 > 17:55:13 --- > (In reply to comment #48) >> maybe you could try a git bisect and see if we end up in the same >> place? It takes a while though to do all the recompiling.. I doubt >> the validity of mine b/c it was intermittent once without explanation. >> > > No problems, I can compile and check. But I'm not familiar with git, so it > would be nice if you will give me bisected sources to compile. Or you can > tell > me the command to get bisected sources from git. > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug. > (In reply to comment #50) > well, > > http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ > > but you have to recompile the kernel about 12 times this way. It does > a binary search starting from first known good and first known bad. > You recompile, and then tell it if the current kernel is good or bad. > > for instance, numbers 1 (good) out of 100 (bad) > > it'll pick revision 50, you recompile, tell it bad. It'll pick 25, > recompile, tell it good. It'll pick 37.... until you find the first > bad commit. > Thanks for a good guide, I'll try to figure out bad commit, maybe today or on weekends. Hi! I was out of internet access for a while. I've found out, that bug has random nature. It can appear at any suspend, so it is very hard to detect a bad commit one. I've checked 2.6.33-rc7 - the bug is still here. Is there a way to get more information? Maybe some kind of trace or debug during the occur time? I will try to test git bisect much better. Eureka!! It only happens to me when I have an SD card inserted during suspend. This leads me to believe the block layer commit during my bisect is indeed related. Btw, just tested it on 2.6.33-rc8 and it still has the bad behavior. I can suspend however many times I want. If I insert an SD card, the next suspend will hang. OK, then this is identical to an issue I'm currently discussing with Alan Stern. A solution is forth coming. I've found, that my laptop hangs on suspend after using VirtualBox modules - even if I rmmod them. I always use virtualbox, and it doesn't affect my issue... seems to be a different problem? On Thu, Feb 25, 2010 at 5:14 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14782 > > > > > > --- Comment #55 from Alexander Saprykin <xelfium@gmail.com> 2010-02-25 > 10:14:12 --- > I've found, that my laptop hangs on suspend after using VirtualBox modules - > even if I rmmod them. > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug. > (In reply to comment #56) > I always use virtualbox, and it doesn't affect my issue... seems to be > a different problem? > Yeah, maybe. Since 2.6.32 I can't use virtualbox and switched to KVM. what's the status? I haven't tried to reproduce it since I figured out I could stop it by removing an SD card, but I'm noticing other people have the same problem. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/503233 https://bugs.launchpad.net/ubuntu/+source/linux-mvl-dove/+bug/530432 https://bugs.launchpad.net/ubuntu/+source/pm-utils/+bug/468298 Hi Gary! Can you check if it is still a problem in 2.6.36? Seems like it's no longer a problem in 2.6.36. Ok, I will close this bug report. I don't think we will have any chance on finding something to backport to 2.6.32.y? Maybe it's fixed already... |