Bug 14334

Summary: pcmcia suspend regression from 2.6.31.1 to 2.6.31.2 - Dell Inspiron 600m
Product: Power Management Reporter: Jose Marino (braket)
Component: Hibernation/SuspendAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, gowdy, jithin1987, rjw, yuhongbao_386
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 14230    
Attachments: Output from lspci -v
dmesg after emulated suspend (2.6.32-rc3)
Full dmesg right after boot
.config file
Output from acpidump
Full dmesg with lapic + simulated suspend
Logs for simulated suspend after unloading ethernet and usb modules
yenta: Split resume
Simulated suspend log
yenta: Split resume (fixed)
Real suspend log (patch from #33 applied)
PCMCIA PM: Always reinsert cards on resume
PCMCIA PM: Always reinsert cards on resume (fixed)
Suspend/resume log (patch from #40 applied)
yenta: Don't enable interrupts on resume
yenta: Resume debug patch
yenta: Resume debug patch 2
verbose suspend/resume log (patch from #53 applied)
ACPI: Resume all devices early
suspend/resume log (patches from #53 and #60 applied)
yenta: Disable CSC interrupts on resume
PM: Do not suspend and resume device irqs
verbose full boot log (patch from comment #53 applied)
yenta: Resume the first socket only
verbose suspend/resume log (patch from comment #75 applied)
dmesg output with patch from comment #75
yenta: Split resume into early and late parts
verbose suspend/resume log (patch from comment #78 applied)
dmesg output with patch from comment #78
PM / yenta : Split resume into early and late parts
dmesg output with patch from comment #85
PM / yenta: Split resume into early and late parts (rev. 4)

Description Jose Marino 2009-10-06 15:44:54 UTC
I upgraded to 2.6.31.2 from 2.6.31.1 and my laptop would not wake up from suspend anymore. Laptop is a Dell Inspiron 600m.

Located a commit that talked about suspend/resume and did a bisect around it. Here's the result of the bisect:

53024df259e37ad49ee3d1f3721d4cecdd7bc357 is the first bad commit
commit 53024df259e37ad49ee3d1f3721d4cecdd7bc357
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Tue Sep 29 00:11:03 2009 +0200

    PM / yenta: Fix cardbus suspend/resume regression
    
    commit 0c570cdeb8fdfcb354a3e9cd81bfc6a09c19de0c upstream.
    
    Since 2.6.29 the PCI PM core have been restoring the standard
    configuration registers of PCI devices in the early phase of
    resume.  In particular, PCI devices without drivers have been handled
    this way since commit 355a72d75b3b4f4877db4c9070c798238028ecb5
    (PCI: Rework default handling of suspend and resume).  Unfortunately,
    this leads to post-resume problems with CardBus devices which cannot
    be accessed in the early phase of resume, because the sockets they
    are on have not been woken up yet at that point.
    
    To solve this problem, move the yenta socket resume to the early
    phase of resume and, analogously, move the suspend of it to the late
    phase of suspend.  Additionally, remove some unnecessary PCI code
    from the yenta socket's resume routine.
    
    Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13092, which is a
    post-2.6.28 regression.
    
    Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
    Reported-by: Florian <fs-kernelbugzilla@spline.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

:040000 040000 55b935c46d3cea356087b7070ba4072c18fa0215 16829a0439b5632f9027d1d66a2f3a32a66f0870 M	drivers

Modules related to pcmcia and yenta:
$ lsmod | grep -E "pcmcia|yenta"
pcmcia                 31816  0 
yenta_socket           23632  4 
rsrc_nonstatic         10532  1 yenta_socket
pcmcia_core            31444  3 pcmcia,yenta_socket,rsrc_nonstatic

I confirm that if I unload these modules suspend/resume works fine.
Find attached the output of 'lspci -v'
Comment 1 Jose Marino 2009-10-06 15:45:44 UTC
Created attachment 23287 [details]
Output from lspci -v
Comment 2 Rafael J. Wysocki 2009-10-06 21:04:20 UTC
How nice.  Regression fix that causes a regression to happen.

A couple of questions:
1. Is it sufficient to unload the yenta_socked module to make things work again?
2. Does it also fail with 2.6.32-rc3?
Comment 3 Rafael J. Wysocki 2009-10-06 22:58:05 UTC
One more question: it looks like the cardbus socket is actually empty when this failure takes place.  Is this correct?
Comment 4 Jose Marino 2009-10-06 23:03:54 UTC
(In reply to comment #2)
> How nice.  Regression fix that causes a regression to happen.
> 
> A couple of questions:
> 1. Is it sufficient to unload the yenta_socked module to make things work
> again?
> 2. Does it also fail with 2.6.32-rc3?

Haven't had a chance to try question 2 yet.

About question 1:

I tried this:
- I unload modules 'pcmcia' and 'yenta_socket' (can't unload 'yenta_socket' alone, it complains that's in use).
- suspend
=> It restores fine
(This works also if I only unload module 'pcmcia')

The weird thing is that after unloading the module(s) and doing a good suspend/restore, I can load back those module(s) and suspend/restore seems to work just fine.

However, from a fresh reboot I try unloading and reloading the module(s) (with no suspend/resume in between) and the problem persists, i.e. the laptop fails to resume.
Comment 5 Jose Marino 2009-10-06 23:04:23 UTC
(In reply to comment #3)
> One more question: it looks like the cardbus socket is actually empty when
> this
> failure takes place.  Is this correct?

That is correct, the cardbus socket is empty.
Comment 6 Jose Marino 2009-10-07 17:11:44 UTC
(In reply to comment #2)
> How nice.  Regression fix that causes a regression to happen.
> 
> A couple of questions:
> 1. Is it sufficient to unload the yenta_socked module to make things work
> again?
> 2. Does it also fail with 2.6.32-rc3?

About question 2:

It still fails with 2.6.32-rc3.
Comment 7 Rafael J. Wysocki 2009-10-07 19:49:19 UTC
Thanks for testing.

Before we start any advanced debugging, can you please try if the patch from http://patchwork.kernel.org/patch/51834/ helps by chance?
Comment 8 Jose Marino 2009-10-08 16:58:33 UTC
(In reply to comment #7)
> Thanks for testing.
> 
> Before we start any advanced debugging, can you please try if the patch from
> http://patchwork.kernel.org/patch/51834/ helps by chance?

Patch on top of v2.6.31.2 didn't help, problem is still there.
Comment 9 Jose Marino 2009-10-08 17:52:43 UTC
Also, just tested v2.6.31.3 and the problem is still there.
Comment 10 Rafael J. Wysocki 2009-10-08 20:06:46 UTC
Please check if this works:

# echo core > /sys/power/pm_test
# echo mem > /sys/power/state

(this should simulate suspend, wait for 5 seconds and then simulate resume).
Comment 11 Rafael J. Wysocki 2009-10-08 20:07:32 UTC
Please use 2.6.32-rc3 (or later) for testing from now on.
Comment 12 Jose Marino 2009-10-08 20:37:50 UTC
Ok, it seems I got something here. I booted 2.6.32-rc3 and ran the two echo commands. After about 5s of simulated suspend, it came back all by itself. I found this backtrace in the logs:

kernel: ------------[ cut here ]------------
kernel: WARNING: at drivers/base/sys.c:353 __sysdev_resume+0xba/0xd0()
kernel: Hardware name: Inspiron 600m                   
kernel: Interrupts enabled after irqrouter_resume+0x0/0x3e
kernel: Modules linked in: radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core xt_recent xt_state reiserfs snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss pcmcia usbhid snd_intel8x0 snd_intel8x0m snd_ac97_codec ac97_bus fan snd_pcm iTCO_wdt b44 sr_mod lp snd_timer yenta_socket parport_pc snd snd_page_alloc rsrc_nonstatic pcmcia_core uhci_hcd button parport cdrom thermal psmouse ssb iTCO_vendor_support ehci_hcd battery intel_agp agpgart dcdbas ac acpi_cpufreq processor rtc_cmos rtc_core rtc_lib
kernel: Pid: 3594, comm: bash Tainted: G        W  2.6.32-rc3-jos-rc #34
kernel: Call Trace:
kernel: [<c10282ed>] warn_slowpath_common+0x6d/0xa0
kernel: [<c11962ca>] ? __sysdev_resume+0xba/0xd0
kernel: [<c11962ca>] ? __sysdev_resume+0xba/0xd0
kernel: [<c1028366>] warn_slowpath_fmt+0x26/0x30
kernel: [<c11962ca>] __sysdev_resume+0xba/0xd0
kernel: [<c1151f66>] ? irqrouter_resume+0x0/0x3e
kernel: [<c1196337>] sysdev_resume+0x57/0xd0
kernel: [<c1127a7a>] ? __const_udelay+0x1a/0x20
kernel: [<c104ee93>] suspend_devices_and_enter+0x173/0x1b0
kernel: [<c12790b2>] ? printk+0x18/0x1e
kernel: [<c104efd8>] enter_state+0x108/0x130
kernel: [<c104e72b>] state_store+0x6b/0xb0
kernel: [<c104e6c0>] ? state_store+0x0/0xb0
kernel: [<c1121b90>] kobj_attr_store+0x20/0x30
kernel: [<c10c4db4>] sysfs_write_file+0x94/0xf0
kernel: [<c10834ea>] vfs_write+0x9a/0x180
kernel: [<c108eb4a>] ? sys_dup3+0xda/0x130
kernel: [<c10c4d20>] ? sysfs_write_file+0x0/0xf0
kernel: [<c108368d>] sys_write+0x3d/0x70
kernel: [<c1002d08>] sysenter_do_call+0x12/0x26
kernel: ---[ end trace 4eaa2a86a8e2da24 ]---
Comment 13 Rafael J. Wysocki 2009-10-08 22:07:56 UTC
This is puzzling to me.

I don't really think irqrouter_resume() can enable interrupts by itself, so it must have been done by something else, apparently executed in parallel with irqrouter_resume(), which is quite not possible, because this code is run on one CPU with interrupts disabled (ie. it should be atomic context).
Comment 14 Rafael J. Wysocki 2009-10-08 22:16:42 UTC
Please attach full dmesg output containing the trace from comment #12.
Comment 15 Jose Marino 2009-10-08 22:53:32 UTC
Created attachment 23312 [details]
dmesg after emulated suspend (2.6.32-rc3)
Comment 16 Rafael J. Wysocki 2009-10-08 23:05:18 UTC
Please attach /proc/interrupts from your system.
Comment 17 Rafael J. Wysocki 2009-10-08 23:10:56 UTC
Also please attach /proc/cmdline.
Comment 18 Jose Marino 2009-10-08 23:18:13 UTC
# cat /proc/interrupts
           CPU0       
  0:      34564    XT-PIC-XT        timer
  1:        169    XT-PIC-XT        i8042
  2:          0    XT-PIC-XT        cascade
  3:          2    XT-PIC-XT      
  4:          4    XT-PIC-XT      
  5:         58    XT-PIC-XT        Intel 82801DB-ICH4, Intel 82801DB-ICH4 Modem
  7:          1    XT-PIC-XT        parport0
  8:        157    XT-PIC-XT        rtc0
  9:          2    XT-PIC-XT        acpi
 10:          2    XT-PIC-XT      
 11:        312    XT-PIC-XT        ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, yenta, yenta, radeon@pci:0000:01:00.0, eth0
 12:       8320    XT-PIC-XT        i8042
 14:       5875    XT-PIC-XT        ata_piix
 15:        920    XT-PIC-XT        ata_piix
NMI:          0   Non-maskable interrupts
LOC:          0   Local timer interrupts
SPU:          0   Spurious interrupts
CNT:          0   Performance counter interrupts
PND:          0   Performance pending work
TRM:          0   Thermal event interrupts
THR:          0   Threshold APIC interrupts
MCE:          0   Machine check exceptions
MCP:          1   Machine check polls
ERR:          0
MIS:          0


# cat /proc/cmdline
root=/dev/sda2 hpet=force vga=0x318
Comment 19 Rafael J. Wysocki 2009-10-09 20:54:52 UTC
The hardware configuration of your system seems to be unusual.

Either the ACPI BIOS tells the kernel not to use the IO-APIC or your kernel configuration is somewhat strange.

Please attach your .config and a full boot log.
Comment 20 Len Brown 2009-10-09 21:50:43 UTC
Does this machine have an IOAPIC?
I would expect it to, based on the use of the PCI interrupt
link devices...

I don't see the ioapic disabled on the cmdline -- perhaps it is
disabled in the .config?  If yes, what do you see in the
complete dmesg -s64000 and /proc/interrupts if you enable
the IOAPIC?

please attach the .config used, and also the output from acpidump.

Please re-test without using hpet=force
Comment 21 Jose Marino 2009-10-10 22:29:23 UTC
Created attachment 23334 [details]
Full dmesg right after boot
Comment 22 Jose Marino 2009-10-10 22:31:00 UTC
Created attachment 23335 [details]
.config file
Comment 23 Jose Marino 2009-10-10 22:42:00 UTC
I tested without "hpet=force" and the problem is still there.
Comment 24 Jose Marino 2009-10-10 22:44:16 UTC
Created attachment 23336 [details]
Output from acpidump
Comment 25 Jose Marino 2009-10-10 23:07:06 UTC
About the IOAPIC as far as I know the laptop has one and it is enabled. The full boot log has an interesting backtrace that seems to be related to it.
Comment 26 Rafael J. Wysocki 2009-10-11 12:17:10 UTC
Yes, it does.

However, the IO-APIC is disabled by the BIOS on your system and the problem with suspend-resume seems to be related to the fact that yenta uses an ISA interrupt.

Please boot with lapic in the kernel command line and attach the boot log.
Comment 27 Rafael J. Wysocki 2009-10-11 13:25:01 UTC
(In reply to comment #26)
> problem with suspend-resume seems to be related to the fact that yenta uses
> an ISA interrupt.

Scratch that, one of the yenta messages confused me.
Comment 28 Jose Marino 2009-10-11 13:25:18 UTC
Created attachment 23337 [details]
Full dmesg with lapic + simulated suspend

Adding lapic to the kernel parameters fixed the early backtrace about the APIC, however it did not fix the suspend issue.
This dmesg includes a simulated suspend that still shows the problem.
Man what a shitty BIOS does this system have, I have to force the hpet and the APIC. Thanks for the heads up about the lapic option.
Comment 29 Rafael J. Wysocki 2009-10-11 13:27:59 UTC
Please also try to suspend with all of the USB drivers and the Ethernet driver unloaded.
Comment 30 Jose Marino 2009-10-11 13:52:47 UTC
Created attachment 23338 [details]
Logs for simulated suspend after unloading ethernet and usb modules

I unloaded these modules:
# rmmod b44 ssb uhci_hcd ehci_hcd

Problem is still there. I don't see any difference in the logs but just in case here they are (from module unload to simulated resume).
Comment 31 Rafael J. Wysocki 2009-10-11 14:20:56 UTC
Created attachment 23339 [details]
yenta: Split resume

Please check if this debug patch fixes the resume issue.
Comment 32 Jose Marino 2009-10-11 14:37:26 UTC
Created attachment 23340 [details]
Simulated suspend log

I applied the patch. It gave me this warning at compilation:
  CC [M]  drivers/pcmcia/yenta_socket.o
drivers/pcmcia/yenta_socket.c: In function ‘yenta_dev_resume_noirq’:
drivers/pcmcia/yenta_socket.c:1277: warning: control reaches end of non-void function

Rebooted and tested it with no results (I think). I attach the simulated suspend log.
Comment 33 Rafael J. Wysocki 2009-10-11 17:27:15 UTC
Created attachment 23341 [details]
yenta: Split resume (fixed)

Sorry for the build issue, I'm attaching fixed patch for the record.

Does "no results" mean that resume doesn't work with this patch applied?
Comment 34 Rafael J. Wysocki 2009-10-11 17:28:00 UTC
The real resume, that is.
Comment 35 Jose Marino 2009-10-11 18:07:09 UTC
Created attachment 23342 [details]
Real suspend log (patch from #33 applied)

I tested the real suspend/resume with the latest patch (no compilation warnings) and this time the laptop was able to suspend/resume. The real suspend/resume (no simulation).
I attach the logs from the start of the suspend.
Comment 36 Rafael J. Wysocki 2009-10-11 19:42:40 UTC
Great, thanks for testing.  Now, I have to figure out what makes the difference.
Comment 37 Stephen J. Gowdy 2009-10-13 07:58:56 UTC
I think I have the same problem (resume okay in 2.6.31.1 but hangs in 2.6.31.2) with a HP Compaq 6910p laptop. I'm rebuilding 2.6.31.3 with the patch now to see if it is fixed.

Is it useful to attach dmesg or anything if the patch does work?
Comment 38 Stephen J. Gowdy 2009-10-13 12:57:53 UTC
Yes, the patch also fixed it for me.
Comment 39 Rafael J. Wysocki 2009-10-25 08:33:55 UTC
Created attachment 23522 [details]
PCMCIA PM: Always reinsert cards on resume

Please test if this patch still makes things work for you (on top of 2.6.32-rc5).
Comment 40 Rafael J. Wysocki 2009-10-25 08:37:12 UTC
Created attachment 23523 [details]
PCMCIA PM: Always reinsert cards on resume (fixed)

Sorry, the previously attached patch was broken, please test this one instead (on top of 2.6.32-rc5).
Comment 41 Jose Marino 2009-10-25 14:27:13 UTC
Created attachment 23524 [details]
Suspend/resume log (patch from #40 applied)

Applied patch from Comment #40 and I get the same result as with patch from Comment #33.
That is: the laptop is able to suspend and resume fine (the real suspend/resume) but the logs show a backtrace.

Just in case I post the logs from resume to suspend, although they look identical to the logs I posted in Comment #35.
Comment 42 Rafael J. Wysocki 2009-10-25 21:59:17 UTC
Thanks for testing, the backtrace does not seem to be related to the PCMCIA resume problem you're observing.

The patch from comment #40 is a merge candidate, but I need a reporter of bug #13092 to test it to confirm that it doesn't break things again for him.
Comment 43 Stephen J. Gowdy 2009-10-26 21:40:40 UTC
The patch in comment #40 also works for me. I don't have any traceback in my logs.
Comment 44 Jose Marino 2009-10-27 04:03:13 UTC
(In reply to comment #42)
> Thanks for testing, the backtrace does not seem to be related to the PCMCIA
> resume problem you're observing.
> 
> The patch from comment #40 is a merge candidate, but I need a reporter of bug
> #13092 to test it to confirm that it doesn't break things again for him.

You are right. The backtrace is not related to the PCMCIA problem. It seems to be a different problem introduced between 2.6.31 and 2.6.32-rc1. I'll try to get more info about it and open a new bug.
Comment 45 Rafael J. Wysocki 2009-10-27 07:38:38 UTC
Unfortunately, as it turns out, the patch from comment #40 is not really a good solution, since it may lead to some other problems.  For this reason, we'll need to find the root cause of the resume problem you're experiencing.

Generally speaking, there seems to be an ordering violation of some sort and at the moment I think there are two possibilities: (1) the socket resume has to be carried out after resume_device_irqs() has run (although I don't really see why) or (2) one of the other devices has to be resumed before the socket.

I'm going to prepare a few debug patches to figure out this.
Comment 46 Rafael J. Wysocki 2009-10-27 07:40:00 UTC
(In reply to comment #44)
> (In reply to comment #42)
> > Thanks for testing, the backtrace does not seem to be related to the PCMCIA
> > resume problem you're observing.
> > 
> > The patch from comment #40 is a merge candidate, but I need a reporter of
> bug
> > #13092 to test it to confirm that it doesn't break things again for him.
> 
> You are right. The backtrace is not related to the PCMCIA problem. It seems
> to
> be a different problem introduced between 2.6.31 and 2.6.32-rc1. I'll try to
> get more info about it and open a new bug.

There already is one, bug #14483 .
Comment 47 Rafael J. Wysocki 2009-10-27 08:11:24 UTC
Created attachment 23542 [details]
yenta: Don't enable interrupts on resume

Please remove the patch from comment #40 and try this one on top of 2.6.32-rc5.
Comment 48 Stephen J. Gowdy 2009-10-27 14:16:36 UTC
I tried the patch from comment #47 but it didn't resolve this issue. My laptop would not resume. THe power light comes on but the wireless light doesn't and nothing else happens.
Comment 49 Jose Marino 2009-10-27 14:23:09 UTC
I applied the patch from comment #47 on top of 2.6.32-rc5 and the laptop did not resume.
Comment 50 Stephen J. Gowdy 2009-10-27 14:24:25 UTC
BTW, the devices on the PCMCIA bus are;

02:06.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev b9)
02:06.1 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev b9)
02:06.2 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 03)
02:06.3 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host
Adapter (rev 20)
02:06.4 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev ff)

[root@antonia ~]# pccardctl status
Socket 0:
  no card
Socket 1:
  3.3V 16-bit PC Card
  Subdevice 0 (function 0) bound to driver "ide-cs"
[root@antonia ~]# pccardctl ident
Socket 0:
  no product info available
Socket 1:
  product info: "RICOH", "Bay8Controller", "", ""
  manfid: 0x0000, 0x0000
  function: 254 ((null))
Comment 51 Rafael J. Wysocki 2009-10-27 20:04:26 UTC
Have you tried with all of the PCMCIA cards removed?
Comment 52 Stephen J. Gowdy 2009-10-27 20:19:56 UTC
I don't have any cards inserted, it is built into the laptop.
Comment 53 Rafael J. Wysocki 2009-10-27 20:58:18 UTC
Created attachment 23552 [details]
yenta: Resume debug patch

Another debug patch.  It should make yenta_set_socket() return error code on resume right before calling yenta_set_power().  This will result in non-functional PCMCIA after the resume, but let's see if that changes anything.
Comment 54 Jose Marino 2009-10-27 23:38:48 UTC
This patch made the suspend/resume work for me.
Only difference I see in the logs is this line:

pcmcia_socket pcmcia_socket1: cs: unable to apply power.
Comment 55 Rafael J. Wysocki 2009-10-28 17:36:31 UTC
OK, so clearly powering up the socket causes resume to fail. Perhaps some power resources needed to power up the socket are not available at this point.

Please build the kernel (with the patch from comment #53 applied) with CONFIG_PM_VERBOSE set, try to suspend and resume and attach the output of dmesg generated after that.
Comment 56 Rafael J. Wysocki 2009-10-28 18:11:22 UTC
Created attachment 23562 [details]
yenta: Resume debug patch 2

Please also check what happens if you replace the patch from comment #53 with this one.
Comment 57 Jose Marino 2009-10-28 18:59:10 UTC
Created attachment 23563 [details]
verbose suspend/resume log (patch from #53 applied)
Comment 58 Jose Marino 2009-10-28 19:23:22 UTC
(In reply to comment #56)
> Created an attachment (id=23562) [details]
> yenta: Resume debug patch 2
> 
> Please also check what happens if you replace the patch from comment #53 with
> this one.

This patch made the laptop fail to resume again.
Comment 59 Rafael J. Wysocki 2009-10-28 19:28:27 UTC
OK, thanks for testing.

So I guess one of the ACPI devices between LNXSYSTM and dock.0, inclusive, should be resumed before yenta on your system.

I'm going to prepare a debug patch to verify this.
Comment 60 Rafael J. Wysocki 2009-10-28 21:00:17 UTC
Created attachment 23564 [details]
ACPI: Resume all devices early

Please apply this patch instead of the previous debug patches and test if the resume problem remains with it.
Comment 61 Jose Marino 2009-10-28 21:30:37 UTC
(In reply to comment #60)
> Created an attachment (id=23564) [details]
> ACPI: Resume all devices early
> 
> Please apply this patch instead of the previous debug patches and test if the
> resume problem remains with it.

No luck, the resume problem remains with this patch. Laptop did not come back from suspend.
Comment 62 Rafael J. Wysocki 2009-10-28 21:48:20 UTC
Hmm.  Please apply both the patch from comment #60 and the patch from comment #53, build the kernel with CONFIG_PM_VERBOSE set, try to suspend and resume and, if successful, please attach the dmesg output generated afterwards.
Comment 63 Jose Marino 2009-10-28 22:05:43 UTC
Created attachment 23568 [details]
suspend/resume log (patches from #53 and #60 applied)

Option CONFIG_PM_VERBOSE is set.
Comment 64 Rafael J. Wysocki 2009-10-28 22:27:33 UTC
Well, at the moment I'm out of ideas.  It would be helpful to check where exactly it crashes during failing resume, but for now I'm not sure how to do that.
Comment 65 Rafael J. Wysocki 2009-10-29 20:52:38 UTC
Created attachment 23591 [details]
yenta: Disable CSC interrupts on resume

Please try this patch (without any of the previous patches).  It's similar to the patch from comment #47, but goes a bit further than that one.

I think that the problem is related to interrupts, but I'm still not sure how exactly.
Comment 66 Jose Marino 2009-10-29 22:55:07 UTC
(In reply to comment #65)
> Created an attachment (id=23591) [details]
> yenta: Disable CSC interrupts on resume
> 
> Please try this patch (without any of the previous patches).  It's similar to
> the patch from comment #47, but goes a bit further than that one.
> 
> I think that the problem is related to interrupts, but I'm still not sure how
> exactly.

Sadly the patch didn't help. Laptop did not come back from suspend.
Comment 67 Jose Marino 2009-10-29 23:17:41 UTC
Just for fun I enabled "Suspend/resume event tracing" (CONFIG_PM_TRACE_RTC) and suspended with patch from comment #65.
After the resume failed and I rebooted the machine I did:

# dmesg -s 1000000 | grep 'hash matches'
bdi 1:3: hash matches

Does this help in any way?
Comment 68 Rafael J. Wysocki 2009-10-30 09:35:03 UTC
(In reply to comment #67)
> Just for fun I enabled "Suspend/resume event tracing" (CONFIG_PM_TRACE_RTC)
> and
> suspended with patch from comment #65.
> After the resume failed and I rebooted the machine I did:
> 
> # dmesg -s 1000000 | grep 'hash matches'
> bdi 1:3: hash matches
> 
> Does this help in any way?

Well, it might if I knew what 'bdi' was.

Any chance to add some more context from this dmesg?
Comment 69 Jose Marino 2009-10-30 13:57:45 UTC
It turns out I was using the TRACE_RTC option wrong. I didn't know it had to be enabled just before suspend with:
# echo 1 > /sys/power/pm_trace

I guess that bdi message from before is garbage. I tried again, this time enabling pm_trace and got this in dmesg:

  Magic number: 0:971:186
  hash matches drivers/base/power/main.c:342
pci 0000:02:01.1: hash matches

And from lspci:

# lspci | grep "02:01.1"
02:01.1 CardBus bridge: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller (rev 20)
Comment 70 Rafael J. Wysocki 2009-10-30 15:28:28 UTC
Created attachment 23598 [details]
PM: Do not suspend and resume device irqs

OK, that appears to confirm what we have sort of known already.

I still would like to confirm that the problem is related to interrupts, though.  Please try this patch.  It's not very likely to work, but if it works, it'll give us the confirmation.
Comment 71 Jose Marino 2009-10-30 17:39:59 UTC
(In reply to comment #70)
> Created an attachment (id=23598) [details]
> PM: Do not suspend and resume device irqs
> 
> OK, that appears to confirm what we have sort of known already.
> 
> I still would like to confirm that the problem is related to interrupts,
> though.  Please try this patch.  It's not very likely to work, but if it
> works,
> it'll give us the confirmation.

The patch did not work. Laptop didn't come back from suspend.
Comment 72 Rafael J. Wysocki 2009-10-30 18:03:49 UTC
Please attach full boot log from the kernel with CONFIG_PM_VERBOSE set.
Comment 73 Jose Marino 2009-10-30 18:43:12 UTC
Created attachment 23600 [details]
verbose full boot log (patch from comment #53 applied)

This dmesg log has patch from comment #53 applied. It also includes a suspend/resume cycle. The suspend starts on line 957 which contains:
b44: eth0: powering down PHY
Comment 74 Rafael J. Wysocki 2009-10-30 19:10:00 UTC
Thanks.

I've just noticed that both your and Stephen's machines each have two CardBus bridges, so I guess any machines with two CardBus bridges will be affected by this issue.
Comment 75 Rafael J. Wysocki 2009-10-31 09:35:51 UTC
Created attachment 23606 [details]
yenta: Resume the first socket only

Although we'll probably end up redoing the yenta suspend/resume, I think it's worth debugging this issue a bit more.

Please check if you box resumes with this patch applied (without any other patches previously attached to this bug entry).  In case it doesn't, please apply the patch from comment #53 in addition to it, try to suspend-resume and attach the output of dmesg.
Comment 76 Jose Marino 2009-10-31 13:24:13 UTC
Created attachment 23608 [details]
verbose suspend/resume log (patch from comment #75 applied)

I applied patch from comment #75 and the computer was able to suspend/resume fine. This is patch from comment #75 alone on top of v2.6.32-rc5.
Comment 77 Stephen J. Gowdy 2009-10-31 16:41:39 UTC
Created attachment 23609 [details]
dmesg output with patch from comment #75

Same here... with patch from comment #75 suspend/resume works. I've attached dmesg with the PM_DEBUG on also.
Comment 78 Rafael J. Wysocki 2009-10-31 18:39:36 UTC
Created attachment 23611 [details]
yenta: Split resume into early and late parts

OK, thanks for testing.

Can you try this patch in turn, please?
Comment 79 Jose Marino 2009-10-31 20:40:18 UTC
Created attachment 23612 [details]
verbose suspend/resume log (patch from comment #78 applied)

I applied the patch from comment #78 and it also makes the resume work. To clarify, that's patch from comment #78 by itself on top of v2.6.32-rc5.
Comment 80 Rafael J. Wysocki 2009-10-31 21:08:50 UTC
Great, thanks for testing.  I'm going to post this patch as a proposed fix and let's see what people say.
Comment 81 Stephen J. Gowdy 2009-10-31 21:17:40 UTC
Created attachment 23613 [details]
dmesg output with patch from comment #78

Patch in comment #78 also worked for me. I've attached the dmesg output. The difference (I guess expected) is that it also resumes ide2.

Thanks!
Comment 82 Rafael J. Wysocki 2009-10-31 21:32:51 UTC
(In reply to comment #81)
> Created an attachment (id=23613) [details]
> dmesg output with patch from comment #78
> 
> Patch in comment #78 also worked for me.

Great!

> I've attached the dmesg output. The difference (I guess expected) is that it
> also resumes ide2.

Yes, that is expected, both sockets should be fully functional after resume with this patch applied.
Comment 83 Rafael J. Wysocki 2009-10-31 21:36:06 UTC
Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
Patch : http://patchwork.kernel.org/patch/56791/
Comment 84 Rafael J. Wysocki 2009-10-31 23:02:02 UTC
Ignore-Patch : http://patchwork.kernel.org/patch/56791/
Comment 85 Rafael J. Wysocki 2009-10-31 23:12:36 UTC
Created attachment 23615 [details]
PM / yenta : Split resume into early and late parts

Unfortunately I made a mistake testing the previous patch, as I didn't notice that my distro was automatically ejecting all PCMCIA cards before suspend.  Sorry for wasting your time.

Please test this patch (on top of 2.6.32-rc5 without any other patches) and report back.
Comment 86 Jose Marino 2009-11-01 00:07:02 UTC
(In reply to comment #85)
> Created an attachment (id=23615) [details]
> PM / yenta : Split resume into early and late parts
> 
> Unfortunately I made a mistake testing the previous patch, as I didn't notice
> that my distro was automatically ejecting all PCMCIA cards before suspend. 
> Sorry for wasting your time.
> 
> Please test this patch (on top of 2.6.32-rc5 without any other patches) and
> report back.

This patch works for me. Laptop suspends and resumes fine.
Comment 87 Rafael J. Wysocki 2009-11-01 08:27:38 UTC
Great, thanks for the confirmation.
Comment 88 Rafael J. Wysocki 2009-11-01 08:37:00 UTC
Patch : http://patchwork.kernel.org/patch/56828/
Comment 89 Stephen J. Gowdy 2009-11-01 08:41:04 UTC
Created attachment 23616 [details]
dmesg output with patch from comment #85

Same again, patch in comment #85 is good for me too... dmesg attached.
Comment 90 Jithin Emmanuel 2009-11-02 01:22:50 UTC
Hi one doubt. Will there be any 2.6.31-6 with with patch or will it be available with only with 2.6.32 when its released ?
Comment 91 Rafael J. Wysocki 2009-11-02 12:41:44 UTC
(In reply to comment #90)
> Hi one doubt. Will there be any 2.6.31-6 with with patch

Yes.
Comment 92 Rafael J. Wysocki 2009-11-02 19:22:51 UTC
Created attachment 23623 [details]
PM / yenta: Split resume into early and late parts (rev. 4)

Jose, Stephen, attached is the final version of the patch from comment #85.  It's not much different from the previous one (cosmetic changes only), but I'd appreciate it very much if you could double check that it works (on top of 2.6.32-rc5).
Comment 93 Jose Marino 2009-11-02 23:58:00 UTC
This latest patch works fine for me.
Comment 94 Rafael J. Wysocki 2009-11-03 09:18:57 UTC
Great, thanks for the confirmation.
Comment 96 Yuhong Bao 2009-11-04 06:07:59 UTC
"However, the IO-APIC is disabled by the BIOS on your system"
I am not surprised. Even after Intel integrated the IO-APIC into the ICH back in 1999, most uniprocessor motherboards did not enable it, thinking that it is only for SMP systems. MS in around 2000-2001 began pushing mobo and system makers to enable the IO-APIC on uniprocessors too, even making it a PC2001 and a Windows logo requirement. But MS made the requirement apply to desktop systems only, thus it is not surprising that even as late as the Pentium M era, not all laptops enabled the IO-APIC. My Celeron M laptop (ASUS A3N) does, but not all do
Comment 97 Jose Marino 2009-11-04 13:25:40 UTC
For the record I stopped using lapic in 2.6.31.1, which I currently use as my stable kernel. After comment #26 I added lapic to the kernel boot parameters but that caused the laptop to freeze randomly on suspend.
Comment 98 Yuhong Bao 2009-11-04 15:42:40 UTC
BTW, don't confuse the lapic with the IO-APIC. It is true that usually when the IO-APIC is disabled by the BIOS the lapic is also disabled too, which is why Linux can reenable it, but it is not the same thing as enabling the IO-APIC, which has to be done by the BIOS.
Comment 99 Yuhong Bao 2009-11-11 04:54:37 UTC
"It is true that usually when the IO-APIC is disabled by the BIOS the lapic is also disabled too"
For why, it is because Windows's non-IO-APIC (8259) HALs did not even try to use the local APIC. From http://www.microsoft.com/whdc/archive/io-apic.mspx (a page written when MS was beginning to encourage mobo makers to enable the IO-APIC on uniprocessors):
"However, without an I/O APIC in the system, the local APICs are useless [to Windows]. In such a situation, Windows 2000 has to revert to using the 8259 PIC."