Bug 14782

Summary: Suspend hangs with SD card inserted
Product: IO/Storage Reporter: Gary Trakhman (gary.trakhman)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, florian, lenb, rjw, xelfium, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/492684
Kernel Version: 2.6.32 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 14230, 56331    
Attachments: lspci
dmesg
some errors
acpidump
acpidump-gary
try the debug patch
lspci -v output for HP 530 laptop

Description Gary Trakhman 2009-12-10 18:53:45 UTC
This is my strange experience.  Running Ubuntu Karmic.  all 2.6.31 kernels work fine.  When I began testing 2.6.32 rc's from git, my system would hang during the second suspend.  First one always worked.  This occurs with -zen and ubuntu mainline ppa kernels.  Screen displays whatever was happening at the time. When it occurs, I must power off by holding down power for 4 seconds.  

Here's a link to the ubuntu bug report, that has all my system information.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/492684
Comment 1 Gary Trakhman 2009-12-10 18:56:51 UTC
Odd thing I noticed, it works fine at the GDM prompt, however, once I login, it gets the strange behaviour.
Comment 2 Andrew Morton 2009-12-10 22:52:11 UTC
I tentatively reassigned this to acpi.
Comment 3 Alois Nespor 2009-12-13 22:10:05 UTC
Created attachment 24171 [details]
lspci
Comment 4 Alois Nespor 2009-12-13 22:11:07 UTC
Lenovo SL500, same problem.

First one or two suspend works good. After next, black screen, can switch to console.
Comment 5 Alois Nespor 2009-12-13 22:11:36 UTC
Created attachment 24172 [details]
dmesg
Comment 6 Alois Nespor 2009-12-13 22:12:48 UTC
dmesg: one ACPI warning
ACPI Warning: _BQC returned an invalid level (20090903/video-631)
Comment 7 Alois Nespor 2009-12-13 22:16:21 UTC
Created attachment 24173 [details]
some errors

some errors:
Dec 13 22:56:01 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
Dec 13 22:56:30 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
Dec 13 22:57:06 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
Dec 13 23:00:35 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
Dec 13 23:01:02 lenovo kernel: PM: Device PNP0C0D:00 failed to resume: error 1
Comment 8 Alois Nespor 2009-12-13 22:39:25 UTC
from dmesg:

input: Lid Switch as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input1
ACPI: Lid Switch [LID]

and this problem occur, when i close cover my notebook.

When i suspend with power button,

           PWRF)   echo -n mem >/sys/power/state ;;


works resume with no problems.
Comment 9 Zhang Rui 2009-12-14 01:31:24 UTC
please attach the acpidump output
Comment 10 Alois Nespor 2009-12-14 04:56:50 UTC
Created attachment 24175 [details]
acpidump
Comment 11 Gary Trakhman 2009-12-14 17:03:17 UTC
Breaks for me whether I close the lid or use pm-suspend from command line or use the keyboard hotkey.
Comment 12 Gary Trakhman 2009-12-14 17:04:49 UTC
Created attachment 24183 [details]
acpidump-gary
Comment 13 Alois Nespor 2009-12-14 20:25:40 UTC
for me, works heyboard hotkey and also pm-suspend (latest version, I use Archlinux 64-bit) fine, not work lid only
Comment 14 Gary Trakhman 2009-12-14 20:30:52 UTC
Alois, are you sure you're having the same problem as me?  My system just hangs.. I can't switch to console or anything.
Comment 15 Alois Nespor 2009-12-14 20:39:00 UTC
Gary, i think, that yes. I must min. 2x shutdown laptop.
Can you say, version o X-server, pm-utils, and have you some errors in /var/log/errors.log ?
Comment 16 ykzhao 2009-12-15 14:07:51 UTC
Created attachment 24195 [details]
try the debug patch

Do someone have an opportunity to try the patch on the latest kernel and see whether the following message is complained?
   > PM: Device PNP0C0D:00 failed to resume: error 1

Thanks.
Comment 17 ykzhao 2009-12-15 14:14:11 UTC
Hi, Alois
    It seems that your issue on your box is different with that on Gary's one. The screen is blank when the LID is used to suspend/resume, right?
    If so, will you please try the following patch on the latest kernel and see whether it works for you?
    >http://lists.freedesktop.org/archives/intel-gfx/2009-December/005143.html

Thanks.
Comment 18 ykzhao 2009-12-15 14:21:21 UTC
Hi, Gary
    Will you please boot the system with the boot option of "drm.debug=0x06 printk.time=1" and test whether the system can be suspened/resumed several times under the console mode? Please do the following test:
    1. kill the process who uses the /proc/acpi/event(use the command of "lsof /proc/acpi/event" to get the process)
    2. dmesg >dmesg_before; echo mem > /sys/power/state; dmesg >dmesg_after;
    3. after the system enters the system, please press the power button and see whether the system can be resumed.
    4. if the system can be resumed, please repeat the step 2/3 several times.
    
When it fails to be resumed, please reboot the system and attach the output of dmesg_before/dmesg_after?

Thanks.
    Yakui.
Comment 19 Gary Trakhman 2009-12-15 14:40:37 UTC
I will test it soon, but it seems you're expecting different symptoms.  My system never fails to resume.  It hangs before actually suspending during suspend.  Should I still do the same process?  Also, it's fine in console only mode, the problem occurs only once I've logged in through gdm.
Comment 20 Alois Nespor 2009-12-15 16:44:02 UTC
(In reply to comment #16)
> Created an attachment (id=24195) [details]
> try the debug patch
> 
> Do someone have an opportunity to try the patch on the latest kernel and see
> whether the following message is complained?
>    > PM: Device PNP0C0D:00 failed to resume: error 1
> 
> Thanks.

I patched with this patch kernel 2.6.32.1 and i see not error in /var/errors.log. I think, this patch works.
Comment 21 Alois Nespor 2009-12-15 16:54:06 UTC
(In reply to comment #17)
> Hi, Alois
>     It seems that your issue on your box is different with that on Gary's
>     one.


I think yet also, very sorry to bring my report of another bug to report other bug.
Comment 22 Alois Nespor 2009-12-15 19:54:05 UTC
(In reply to comment #17)
> Hi, Alois
>     It seems that your issue on your box is different with that on Gary's
>     one.
> The screen is blank when the LID is used to suspend/resume, right?
>     If so, will you please try the following patch on the latest kernel and
>     see
> whether it works for you?
>    
>     >http://lists.freedesktop.org/archives/intel-gfx/2009-December/005143.html
> 
> Thanks.

I can't apply patch from http://lists.freedesktop.org/archives/intel-gfx/2009-December/005143.html

kernel 2.6.32.1:

patching file drivers/acpi/button.c
patching file drivers/gpu/drm/i915/i915_drv.h
Hunk #1 FAILED at 549.
1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_drv.h.rej
patching file drivers/gpu/drm/i915/intel_lvds.c
Hunk #1 succeeded at 679 (offset -7 lines).
Hunk #2 succeeded at 1092 with fuzz 1 (offset -102 lines).
Comment 23 Len Brown 2009-12-17 03:47:22 UTC
commit 13c199c0d0cf78b27592991129fb8cbcfc5164de
Author: Zhao Yakui <yakui.zhao@intel.com>
Date:   Tue Dec 15 22:01:57 2009 +0800

    ACPI: Use the return result of ACPI lid notifier chain correctly

shipped in linux-2.6.33 before rc1

sent to stable@kernel.org for 2.6.32.y

closed
Comment 24 Gary Trakhman 2009-12-18 16:39:39 UTC
That's Alois's issue that got solved, not mine.  I still can't suspend in 2.6.33-rc1.
Comment 25 Gary Trakhman 2009-12-18 16:43:33 UTC
To reproduce the behavior, I have to be logged in to my X desktop, but then I can switch over to virtual terminal.  Is this ok?  I will perform the test soon.

(In reply to comment #18)
> Hi, Gary
>     Will you please boot the system with the boot option of "drm.debug=0x06
> printk.time=1" and test whether the system can be suspened/resumed several
> times under the console mode? Please do the following test:
>     1. kill the process who uses the /proc/acpi/event(use the command of
>     "lsof
> /proc/acpi/event" to get the process)
>     2. dmesg >dmesg_before; echo mem > /sys/power/state; dmesg >dmesg_after;
>     3. after the system enters the system, please press the power button and
> see whether the system can be resumed.
>     4. if the system can be resumed, please repeat the step 2/3 several
>     times.
> 
> When it fails to be resumed, please reboot the system and attach the output
> of
> dmesg_before/dmesg_after?
> 
> Thanks.
>     Yakui.
Comment 26 Len Brown 2009-12-22 02:46:50 UTC
Gary,
Please try with and without CONFIG_DRM_I915_KMS
Comment 27 Zhang Rui 2009-12-22 06:02:40 UTC
this looks like a graphics problem to me.
re-assign to Yakui.
Comment 28 ykzhao 2009-12-24 05:44:48 UTC
Hi, Gary
    Will you please boot the system into console mode with KMS enabled and do the following test?
    1. kill the process using /proc/acpi/event(use the command of "lsof /proc/acpi/event" to get the process id)
    2.  echo mem > /sys/power/state; dmesg >dmesg_after1
    3. wait for one minute and then press the power button to see whether the box can be resumed.
    4. If it can be resumed, please "echo mem > /sys/power/state; dmesg >dmesg_after2"
    5. wait for one minute and then press button to see whether the box can be resumed.

    If it can't be resumed, please reboot the box and see whether the file of dmesg_after2 is created.

    It will be great if you can boot the system into console mode with KMS disabled and do the above test again. (You can add the boot option of "nomodeset").

thanks.
Comment 29 Gary Trakhman 2009-12-24 07:29:34 UTC
Using KMS in console only mode:
Rebooted into single user mode with 2.6.33-rc1 from mainline ppa.
1. No processes using /proc/acpi/event
2. Works fine.

No KMS in console only mode:
Rebooted into single user mode.
1. No precesses using /proc/acpi/event
2. Will not resume from suspend, no dmesg files obviously.

Using KMS when logged in with X: 
1. Switched into VT for running the commands.
2. acpid kept creating new processes using /proc/acpi/event, did a 'service acpid stop' and that fixed it.
3. Works fine.
This confuses me.  Seems intermittent now when it wasn't before.  

After multiple reboots I was able to get it to fail consistently again.
Using KMS when logged in with X take 2: 
1. Killed acpid.
2. Suspended once ok, have dmesg from this.
2. Suspended second time fails.
Comment 30 Gary Trakhman 2009-12-26 17:31:13 UTC
happens in 2.6.33-rc2 as well
Comment 31 Gary Trakhman 2009-12-26 18:05:10 UTC
happens with ubuntu lucid alpha1 livecd running 2.6.32 as well, so it's probably not a configuration issue
Comment 32 Gary Trakhman 2009-12-30 08:01:14 UTC
Working on a git-bisect for this issue.
Comment 33 Gary Trakhman 2009-12-30 18:54:09 UTC
03ba3782e8dcc5b0e1efe440d33084f066e38cae is the first bad commit
commit 03ba3782e8dcc5b0e1efe440d33084f066e38cae
Author: Jens Axboe <jens.axboe@oracle.com>
Date:   Wed Sep 9 09:08:54 2009 +0200

    writeback: switch to per-bdi threads for flushing data

    This gets rid of pdflush for bdi writeout and kupdated style cleaning.
    pdflush writeout suffers from lack of locality and also requires more
    threads to handle the same workload, since it has to work in a
    non-blocking fashion against each queue. This also introduces lumpy
    behaviour and potential request starvation, since pdflush can be starved
    for queue access if others are accessing it. A sample ffsb workload that
    does random writes to files is about 8% faster here on a simple SATA drive
    during the benchmark phase. File layout also seems a LOT more smooth in
    vmstat:
...


Git Bisect log:

git bisect start
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# bad: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32
git bisect bad 22763c5cf3690a681551162c15d34d935308c8d7
# bad: [73c583e4e2dd0fbbf2fafe0cc57ff75314fe72df] Merge branch 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
git bisect bad 73c583e4e2dd0fbbf2fafe0cc57ff75314fe72df
# bad: [73c583e4e2dd0fbbf2fafe0cc57ff75314fe72df] Merge branch 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
git bisect bad 73c583e4e2dd0fbbf2fafe0cc57ff75314fe72df
# bad: [d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
git bisect bad d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b
# good: [32e6a0c82e7a7991a02414d830f262e1f4db73e6] WAN: remove deprecated PCI_DEVICE_ID from PCI200SYN driver.
git bisect good 32e6a0c82e7a7991a02414d830f262e1f4db73e6
# bad: [7193bea53f9d9730bbc859777c2f86c76349914d] Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect bad 7193bea53f9d9730bbc859777c2f86c76349914d
# good: [2d4ff66ad7b8811d0c75ccccad346496f67cb43a] Merge branch 'topic/hda' into for-linus
git bisect good 2d4ff66ad7b8811d0c75ccccad346496f67cb43a
# good: [89af571ca633ada14d17746519a179553a732d31] Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
git bisect good 89af571ca633ada14d17746519a179553a732d31
# good: [0b767b4df360bd442434d9d40b8a495e64202254] crypto: hmac - Prehash ipad/opad
git bisect good 0b767b4df360bd442434d9d40b8a495e64202254
# good: [fd30afa454282bbe1b36d5d77bd72c0ea5b3f97c] Merge branch 'topic/usb-audio' into for-linus
git bisect good fd30afa454282bbe1b36d5d77bd72c0ea5b3f97c
# good: [81bd5f6c966cf2f137c2759dfc78abdffcff055e] crypto: sha-s390 - Fix warnings in import function
git bisect good 81bd5f6c966cf2f137c2759dfc78abdffcff055e
# bad: [a9c86d42599519f3d83b5f46bdab25046fe47b84] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git bisect bad a9c86d42599519f3d83b5f46bdab25046fe47b84
# bad: [f09b00d3e789a88fa6c7c03cedc62cb65c1de0cb] writeback: add some debug inode list counters to bdi stats
git bisect bad f09b00d3e789a88fa6c7c03cedc62cb65c1de0cb
# good: [66f3b8e2e103a0b93b945764d98e9ba46cb926dd] writeback: move dirty inodes from super_block to backing_dev_info
git bisect good 66f3b8e2e103a0b93b945764d98e9ba46cb926dd
# bad: [d0bceac747b547c0b4769b91fec7d3c15600153f] writeback: get rid of pdflush completely
git bisect bad d0bceac747b547c0b4769b91fec7d3c15600153f
# bad: [03ba3782e8dcc5b0e1efe440d33084f066e38cae] writeback: switch to per-bdi threads for flushing data
git bisect bad 03ba3782e8dcc5b0e1efe440d33084f066e38cae
Comment 34 Rafael J. Wysocki 2009-12-30 20:56:07 UTC
First-Bad-Commit : 03ba3782e8dcc5b0e1efe440d33084f066e38cae
Comment 35 Jens Axboe 2009-12-30 21:27:12 UTC
Can you double check that HEAD at 03ba3782e8dcc5b0e1efe440d33084f066e38cae is definitely broken and HEAD at 66f3b8e2e103a0b93b945764d98e9ba46cb926dd definitely always works?

Are you using any special file systems?
Comment 36 Gary Trakhman 2009-12-31 01:45:44 UTC
03ba3782e8dcc5b0e1efe440d33084f066e38cae (a) is definitely broken in the same way as the releases I started with (2.6.32 rc1-final and 2.6.33 rc1-rc2).

66f3b8e2e103a0b93b945764d98e9ba46cb926dd (b) is good in the sense that I can suspend lots of times without a hang, but it is not stable.  My system will just hang after a minute or so of use.  Seems random.

gary@gary-laptop:~$ cat /etc/mtab
/dev/sda5 / ext4 rw,errors=remount-ro,commit=600 0 0
proc /proc proc rw 0 0
none /sys sysfs rw,noexec,nosuid,nodev 0 0
none /sys/fs/fuse/connections fusectl rw 0 0
none /sys/kernel/debug debugfs rw 0 0
none /sys/kernel/security securityfs rw 0 0
udev /dev tmpfs rw,mode=0755 0 0
none /dev/pts devpts rw,noexec,nosuid,gid=5,mode=0620 0 0
none /dev/shm tmpfs rw,nosuid,nodev 0 0
none /var/run tmpfs rw,nosuid,mode=0755 0 0
none /var/lock tmpfs rw,noexec,nosuid,nodev 0 0
none /lib/init/rw tmpfs rw,nosuid,mode=0755 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,noexec,nosuid,nodev 0 0
gvfs-fuse-daemon /home/gary/.gvfs fuse.gvfs-fuse-daemon rw,nosuid,nodev,user=gary 0 0
gary@gary-laptop:~$
Comment 37 Gary Trakhman 2010-01-07 06:19:07 UTC
2.6.33-rc3 still has the problem
Comment 38 Gary Trakhman 2010-01-07 21:30:14 UTC
just to clarify, I did recheck those two commits before I posted comment #36
Comment 39 Rafael J. Wysocki 2010-01-11 19:34:12 UTC
On Monday 11 January 2010, Gary Trakhman wrote:
> yes, I can still reproduce it with any 2.6.32 and 2.6.33-rc kernel.
> Never happens on 2.6.31.
Comment 40 Gary Trakhman 2010-01-13 17:12:32 UTC
still occurs with 2.6.33-rc4.  If I want to figure it out myself, what should I do?  I'm new at this, but I know C.  Can I just run a 2.6.32 kernel, and undo this commit, or is there going to be a lot more to it than that?
Comment 41 Rafael J. Wysocki 2010-01-13 20:47:32 UTC
(In reply to comment #40)
> still occurs with 2.6.33-rc4.  If I want to figure it out myself, what should
> I
> do?  I'm new at this, but I know C.  Can I just run a 2.6.32 kernel, and undo
> this commit,

That would be a good start.

> or is there going to be a lot more to it than that?

Depends on what you get when the bisected commit is reverted.
Comment 42 Alexander Saprykin 2010-01-29 07:22:41 UTC
Hi guys!
I have the same issue on my HP 530 laptop. It freezes after two suspend/resume cycles. It hangs during third suspend - display is black, nothing works except hard power off. Kernels 2.6.32.x and 2.6.33-rc5 are affected. Tell me if I can help with testing or something else.
Comment 43 Gary Trakhman 2010-01-29 08:29:44 UTC
ha! maybe now they'll listen to me since I'm not the only one in the
world with this problem :-).  Can you give an lspci -v output so we
can compare hardware and try to figure it out?

On Fri, Jan 29, 2010 at 2:22 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14782
>
>
> Alexander Saprykin <xelfium@gmail.com> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |xelfium@gmail.com
>
>
>
>
> --- Comment #42 from Alexander Saprykin <xelfium@gmail.com>  2010-01-29
> 07:22:41 ---
> Hi guys!
> I have the same issue on my HP 530 laptop. It freezes after two
> suspend/resume
> cycles. It hangs during third suspend - display is black, nothing works
> except
> hard power off. Kernels 2.6.32.x and 2.6.33-rc5 are affected. Tell me if I
> can
> help with testing or something else.
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>
Comment 44 Alexander Saprykin 2010-01-29 16:15:22 UTC
Created attachment 24781 [details]
lspci -v output for HP 530 laptop
Comment 45 Alexander Saprykin 2010-01-29 16:18:58 UTC
(In reply to comment #43)
> ha! maybe now they'll listen to me since I'm not the only one in the
> world with this problem :-).  Can you give an lspci -v output so we
> can compare hardware and try to figure it out?
> 

I've created an attachment. By the way, with 2.6.32 kernel my laptop resumes from suspend only with black screen, but it was fixed somewhere in 2.6.32.x versions.
Comment 46 Gary Trakhman 2010-01-29 16:22:55 UTC
Very odd, there are not really any similarities.

Yours:
945 chipset, integrated graphics
Intel Pro/100 Ethernet
intel 3945ABG wireless

Mine:
gm45 chipset, integrated graphics
realtek 1gb lan
atheros 2425 wireless

Are you running a 64-bit kernel?  I am. Haven't tried with a 32 yet.
Comment 47 Alexander Saprykin 2010-01-29 16:35:56 UTC
(In reply to comment #46)
> Very odd, there are not really any similarities.
> 
> Yours:
> 945 chipset, integrated graphics
> Intel Pro/100 Ethernet
> intel 3945ABG wireless
> 
> Mine:
> gm45 chipset, integrated graphics
> realtek 1gb lan
> atheros 2425 wireless
> 
> Are you running a 64-bit kernel?  I am. Haven't tried with a 32 yet.

My intel core duo doesn't support 64 bit, so I am running 32 bit only. Seems problem is in kernel.
Comment 48 Gary Trakhman 2010-01-29 16:54:02 UTC
maybe you could try a git bisect and see if we end up in the same
place?  It takes a while though to do all the recompiling..  I doubt
the validity of mine b/c it was intermittent once without explanation.

On Fri, Jan 29, 2010 at 11:35 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14782
>
>
>
>
>
> --- Comment #47 from Alexander Saprykin <xelfium@gmail.com>  2010-01-29
> 16:35:56 ---
> (In reply to comment #46)
>> Very odd, there are not really any similarities.
>>
>> Yours:
>> 945 chipset, integrated graphics
>> Intel Pro/100 Ethernet
>> intel 3945ABG wireless
>>
>> Mine:
>> gm45 chipset, integrated graphics
>> realtek 1gb lan
>> atheros 2425 wireless
>>
>> Are you running a 64-bit kernel?  I am. Haven't tried with a 32 yet.
>
> My intel core duo doesn't support 64 bit, so I am running 32 bit only. Seems
> problem is in kernel.
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>
Comment 49 Alexander Saprykin 2010-01-29 17:55:13 UTC
(In reply to comment #48)
> maybe you could try a git bisect and see if we end up in the same
> place?  It takes a while though to do all the recompiling..  I doubt
> the validity of mine b/c it was intermittent once without explanation.
> 

No problems, I can compile and check. But I'm not familiar with git, so it would be nice if you will give me bisected sources to compile. Or you can tell me the command to get bisected sources from git.
Comment 50 Gary Trakhman 2010-01-29 18:12:52 UTC
well, http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/

but you have to recompile the kernel about 12 times this way.  It does
a binary search starting from first known good and first known bad.
You recompile, and then tell it if the current kernel is good or bad.

for instance, numbers 1 (good) out of 100 (bad)

it'll pick revision 50, you recompile, tell it bad.  It'll pick 25,
recompile, tell it good.  It'll pick 37.... until you find the first
bad commit.

On Fri, Jan 29, 2010 at 12:55 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14782
>
>
>
>
>
> --- Comment #49 from Alexander Saprykin <xelfium@gmail.com>  2010-01-29
> 17:55:13 ---
> (In reply to comment #48)
>> maybe you could try a git bisect and see if we end up in the same
>> place?  It takes a while though to do all the recompiling..  I doubt
>> the validity of mine b/c it was intermittent once without explanation.
>>
>
> No problems, I can compile and check. But I'm not familiar with git, so it
> would be nice if you will give me bisected sources to compile. Or you can
> tell
> me the command to get bisected sources from git.
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>
Comment 51 Alexander Saprykin 2010-01-29 19:39:30 UTC
(In reply to comment #50)
> well,
>
> http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/
> 
> but you have to recompile the kernel about 12 times this way.  It does
> a binary search starting from first known good and first known bad.
> You recompile, and then tell it if the current kernel is good or bad.
> 
> for instance, numbers 1 (good) out of 100 (bad)
> 
> it'll pick revision 50, you recompile, tell it bad.  It'll pick 25,
> recompile, tell it good.  It'll pick 37.... until you find the first
> bad commit.
> 

Thanks for a good guide, I'll try to figure out bad commit, maybe today or on weekends.
Comment 52 Alexander Saprykin 2010-02-08 09:03:05 UTC
Hi!
I was out of internet access for a while. I've found out, that bug has random nature. It can appear at any suspend, so it is very hard to detect a bad commit one. I've checked 2.6.33-rc7 - the bug is still here. Is there a way to get more information? Maybe some kind of trace or debug during the occur time? I will try to test git bisect much better.
Comment 53 Gary Trakhman 2010-02-25 08:26:43 UTC
Eureka!!

It only happens to me when I have an SD card inserted during suspend.  This leads me to believe the block layer commit during my bisect is indeed related.  Btw, just tested it on 2.6.33-rc8 and it still has the bad behavior.

I can suspend however many times I want.  If I insert an SD card, the next suspend will hang.
Comment 54 Jens Axboe 2010-02-25 09:00:07 UTC
OK, then this is identical to an issue I'm currently discussing with Alan Stern. A solution is forth coming.
Comment 55 Alexander Saprykin 2010-02-25 10:14:12 UTC
I've found, that my laptop hangs on suspend after using VirtualBox modules - even if I rmmod them.
Comment 56 Gary Trakhman 2010-02-25 10:16:54 UTC
I always use virtualbox, and it doesn't affect my issue... seems to be
a different problem?

On Thu, Feb 25, 2010 at 5:14 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14782
>
>
>
>
>
> --- Comment #55 from Alexander Saprykin <xelfium@gmail.com>  2010-02-25
> 10:14:12 ---
> I've found, that my laptop hangs on suspend after using VirtualBox modules -
> even if I rmmod them.
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>
Comment 57 Alexander Saprykin 2010-02-25 10:20:08 UTC
(In reply to comment #56)
> I always use virtualbox, and it doesn't affect my issue... seems to be
> a different problem?
> 

Yeah, maybe. Since 2.6.32 I can't use virtualbox and switched to KVM.
Comment 58 Gary Trakhman 2010-04-29 13:30:38 UTC
what's the status?  I haven't tried to reproduce it since I figured out I could stop it by removing an SD card, but I'm noticing other people have the same problem.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/503233
https://bugs.launchpad.net/ubuntu/+source/linux-mvl-dove/+bug/530432
https://bugs.launchpad.net/ubuntu/+source/pm-utils/+bug/468298
Comment 59 Florian Mickler 2010-10-26 09:21:57 UTC
Hi Gary!
Can you check if it is still a problem in 2.6.36?
Comment 60 Gary Trakhman 2010-11-14 12:08:27 UTC
Seems like it's no longer a problem in 2.6.36.
Comment 61 Florian Mickler 2010-11-14 16:29:35 UTC
Ok, I will close this bug report. 

I don't think we will have any chance on finding something to backport to 2.6.32.y?  Maybe it's fixed already...