Bug 34682

Summary: acer-wmi: 2.6.38.3 commit 8215af0 breaks resume from suspend to RAM on Acer Travelmate 8172
Product: Drivers Reporter: Leho Kraav (leho)
Component: Platform_x86Assignee: drivers_platform_x86 (drivers_platform_x86)
Status: RESOLVED INSUFFICIENT_DATA    
Severity: normal CC: alan, brudley, jlee, rvossen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg.log
acpidump.dat
lspci.log
lspci-n.log
0001-acer-wmi-support-to-set-communication-device-state.patch
acpidump-bios-1.15.dat

Description Leho Kraav 2011-05-07 21:23:15 UTC
Created attachment 56962 [details]
dmesg.log

culprit is acer-wmi commit 88210a07190401c7f5293a8c9eb26c693536bd1b acer-wmi: does not set persistence state by rfkill_init_sw_state 

Resume from suspend to RAM stopped working after initially upgrading from 2.6.38.2 to 2.6.38.4. Symptom is the laptop hangs hard after displaying framebuffer console on resuming, it does not make it back to X and nothing gets written to the logs.

Reverting the commit restores ability to resume.

This commit is useful, because I was pleasantly surprised to see the laptop maintain WIFI state across reboots with it. Breaking resume is quite an unfortunate side-effect, hoping this gets fixed up.

Attachment has a single suspend, resume cycle in dmesg.
Comment 1 Leho Kraav 2011-05-16 15:36:24 UTC
shortening commit hash. I'm not sure this is getting any attention from the right people or automatics (regression reports at http://marc.info/?l=linux-acpi&r=1&b=201104&w=4), is this an ACPI thing?
Comment 2 Leho Kraav 2011-05-16 17:15:48 UTC
looks like my commit hash from another kernel branch was wrong as well, corrected in title.
Comment 3 Leho Kraav 2011-05-16 17:16:19 UTC
and found the origins for this issue: bug 31002
Comment 5 Lee, Chun-Yi 2011-05-17 02:45:31 UTC
In acer-wmi's suspend/resume callback function, there didn't have any statements related to killswitch state.

I am tracing the behavior in rfkill framework for suspend to RAM.
Comment 6 Lee, Chun-Yi 2011-05-17 11:01:42 UTC
Cann't reproduce this issue on my Aspire one ZG8 with 2.6.39-rc7 kernel.

I am tracing the behavior in rfkill framework for suspend to RAM.
Comment 7 Leho Kraav 2011-05-17 11:06:22 UTC
are you saying possible improvements to rfkill in 2.6.39 might be fixing it for this particular acer-wmi patch?
Comment 8 Leho Kraav 2011-05-17 11:11:26 UTC
although looking at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=net/rfkill;hb=693d92a1bbc9e42681c42ed190bd42b636ca876f there is pretty much nothing there.
Comment 9 Lee, Chun-Yi 2011-05-17 12:45:40 UTC
Just check the rfkill resume callback in rfkill/core.c, it run set_block when killswitch state doesn't persistent after S3 resume.

Leho, 
Could you please help to apply and test the following 2 patches the were generated for fix bko#32862 ?

0001-acer-wmi-does-not-allow-negative-number-set-to-init.patch
https://bugzilla.kernel.org/attachment.cgi?id=54492

0001-acer-wmi-check-the-existence-of-internal-3G-device.patch
https://bugzilla.kernel.org/attachment.cgi?id=56422

Maybe this issue also can fixed by the above 2 patches.

And, 
Could you please help to attached acpidump result from your Acer Travelmate 8172 ? Just need:
 acpidump > acpidump.dat

Then attach the acpidump.dat on this bug.


Thank's
Comment 10 Lee, Chun-Yi 2011-05-17 12:50:29 UTC
Also can download the above 2 patches from bko#32862:
https://bugzilla.kernel.org/show_bug.cgi?id=32862
Comment 11 Leho Kraav 2011-05-17 16:03:43 UTC
patches do not help, machine hangs exactly the same on resume. i *think* a single difference i noticed was that i have now disabled openrc hotplugging, and that allowed the machine resume without hanging while i only had xdm running and wlan not enabled.

after i enabled wlan and did the sleep, resume cycle - same hanging behavior.
Comment 12 Leho Kraav 2011-05-17 17:00:56 UTC
is there a way to do some very low level tracing to find out what call hangs the machine on resume?
Comment 13 Lee, Chun-Yi 2011-05-17 23:40:28 UTC
+ Please attach acpidump result on this bug:
       acpidump > acpidump.dat
  and
  Please attach lspci and lspci -n result on this bug:
       lspci > lspci.log and lspci -n > lspci-n.log

I believe this issue is not just related to acer-wmi, because it call wmi method that provided by BIOS to disable wifi device. Maybe also related to EC or wifi driver.

Need trace DSDT to find out how the wmi method doing disable wifi device.

+ If you want to trace this issue by low level way, maybe you can try TRACE_RESUME:
 linux-2.6/Documentation/power/s2ram.txt

Honestly, I tried the above trace resume way before, but I didn't get what I want.

+ Please do the following testing:
  a. Please reverse my 8215af0 patch on 2.6.38.3, then do suspend/resume, then change acer-wireless and acer-bluetooth killswitch manually. Maybe the machine will crash by manually change killswitch state after S3 resume.
  b. Please add my 8215af0 patch, but remove your wifi driver, then do the same S3 resume testing. Look at maybe it stick in wifi driver.
Comment 14 Lee, Chun-Yi 2011-05-17 23:53:21 UTC
I just looked at your dmesg, have some finding:

+ Your machine support new acer WMID_GUID3 method, and your machine have wifi and bluetooth, no 3G device :
[   37.752934] acer-wmi: Acer Laptop ACPI-WMI Extras
[   37.753108] acer-wmi: Function bitmap for Communication Button: 0x801
[   37.753115] acer-wmi: Brightness must be controlled by generic video driver
[   37.754727] input: Acer WMI hotkeys as /devices/virtual/input/input9

+ Did find any bad thing for ACPI when system boot.

+ Your wifi module is Broadcom BCM43xx

[   38.383208] wlc_bmac_attach:: deviceid 0x4357 nbands 1 board 0xe021 macaddr: 78:e4:00:35:c2:c8
[   38.390752] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[   38.391069] wl_set_hint: Sending country code US to MAC80211
[   38.391090] wl0: Broadcom BCM43xx 802.11 MAC80211 Driver (1.82.8.0) (Compiled at 23:47:05 on May  7 2011)

+ S3 resume on your machine already finished at: 

[   91.293205] wl_ops_set_rts_threshold: Enter
[   91.297936] PM: resume of devices complete after 1205.206 msecs
[   91.298367] Restarting tasks ... done.
[   91.299391] video LNXVIDEO:00: Restoring backlight state
[   91.399416] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
...

Did not see any bad thing from dmesg for S3 resume. But your wlan0 take a long time to check link:

[   92.294689] tg3 0000:03:00.0: eth0: Link is down
[  767.710607] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[  769.074723] wlan0: authenticate with 00:16:ce:4d:7c:b5 (try 1)
[  769.077113] wlan0: authenticated

Please remove wifi driver, then do S3 resume test again.

And, 
Please attach acpidump and lspci result follow Comment #13

Thank's
Comment 15 Lee, Chun-Yi 2011-05-17 23:54:17 UTC
Sorry for my typo in Comment #14:
+ Did NOT find any bad thing for ACPI when system boot.
Comment 16 Lee, Chun-Yi 2011-05-18 03:08:53 UTC
Found DSDT from bko#35232, I thought it's the same machine, tracing...
Comment 17 Leho Kraav 2011-05-18 22:12:49 UTC
Created attachment 58342 [details]
acpidump.dat

attaching lspci logs and acpidump here as well, just for completeness sake.

will probably be able to test TRACE_RESUME, and wifi driver exclusion tomorrow.
Comment 18 Leho Kraav 2011-05-18 22:14:11 UTC
Created attachment 58352 [details]
lspci.log
Comment 19 Leho Kraav 2011-05-18 22:14:30 UTC
Created attachment 58362 [details]
lspci-n.log
Comment 20 Lee, Chun-Yi 2011-05-19 06:37:23 UTC
Created attachment 58422 [details]
0001-acer-wmi-support-to-set-communication-device-state.patch

After traced WMAA(WMID_GUID3) and WMBA(WMID_GUID1) in your DSDT, there have a bit difference between new GUID3 method and old GUID1 method.

I am not sure it's the root cause for this issue, but that will be better use new GUID3 method to set device state.

If you want, please try this patch, this patch works fine to me on my Acer TravelMate 8572. It will set device state by new wmi method that was provided by Acer BIOS.
Comment 21 Leho Kraav 2011-05-19 06:48:05 UTC
ok so this is meant to be applied on top of 8215af0 and *NOT* the patches from bug 32862, right?
Comment 22 Lee, Chun-Yi 2011-05-19 07:21:55 UTC
Actually, 
I will suggest you:
 - use newest acer-wmi driver in 2.6.39 kernel 
 - applied the 2 patches from bug 32862
 - applied patch on Comment #20

But, of course, if you don't want to backport acer-wmi driver from 2.6.39 to 2.6.38, then you still can direct try the patch on Comment #20.

And, again, 
I am not sure this patch can fix issue, because until now, we still didn't capture any kernel OOPS or make sure the crash in EC or wifi or any driver.
Comment 23 Leho Kraav 2011-05-24 20:13:19 UTC
i have not been able to do testing yet due to time constraints. but i should note here a new sighting.

i have ran with blacklisted acer-wmi for some days now, this leaves me with just 2 rfkill items after cold boot. wifi always seems to be initially turned on. 

tonight i wanted to disable wifi for a bit and switch to gigabit cable instead. found that it takes a single rfkill button toggle without acer-wmi to completely hang the machine. very very bad.
Comment 24 Lee, Chun-Yi 2011-05-24 23:08:37 UTC
Leho, 

Could you please please to do a simple test?
just need:
 - shutdown your notebook, remove ac-power and battery
 - plugin ac-power then boot your notebook
 - Follow your comment#23, try to crash your machine by press rfkill button.

Please tell me you still can crash it or not.
Then, please remove your wireless driver then try to reproduce again.
Comment 25 Leho Kraav 2011-06-16 21:37:03 UTC
hi jlee

i've been travelling in N-Am, only now got an instance of pf-sources 2.6.39_2 compiled.

i should note that as of last some days my machine has started to totally unexpectedly hard freeze during regular X work with 2.6.38.5-zen. screen either goes completely blank or displays primary console with X cursor staying on screen, but magic sysrq nor anything else works. nothing gets to syslog-ng (i should probably try sync) nor can i see any kernel BUG messages on screen.. either it's also something to do with acer-wmi or it is btrfs, i think.

back to acer-wmi, first observation about resuming on 2.6.39-pf2 has been:
 * FAIL going to sleep from acer-wmi auto-set unblocked state after cold boot, resume will hang 100%. also tried with acpi_sleep=s3_bios just in case, that didn't change anything.
 * SUCCESS manually blocking radios with rfkill key after acer-wmi auto-set unblocked state, then going to sleep with blocked radios
 * SUCCESS after doing one manual block cycle, enabling radios with rfkill key, going to sleep

it seems like there's some kind of insanity going on after cold boot that gets straightened out by acer-wmi later.

i will be now trying comment#24 suggestions, first of all on this 2.6.39-pf2 kernel, 2.6.38.5-zen+ will come later.
Comment 26 Leho Kraav 2011-06-16 22:16:13 UTC
(In reply to comment #24)
> Could you please please to do a simple test?
>  - Follow your comment#23, try to crash your machine by press rfkill button.
> 
> Please tell me you still can crash it or not.
> Then, please remove your wireless driver then try to reproduce again.

Just repeating my comment#23 with 2.6.39-pf2, blacklisted acer-wmi, did NOT do battery and AC removal for now:
 * single rfkill item "phy0: Wireless LAN" visible after cold boot (on 2.6.38.5-zen+, bluetooth was also visible)
 * toggling radio states with rfkill key does NOT hang machine anymore
 * sleep - resume cycle seems to work without any issues

This is how my rfkill key toggles through radio states without acer-wmi:

COLD BOOT

$ sudo rfkill list
0: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: no

rfkill key press #1:
$ sudo rfkill list
0: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: no
1: hci0: Bluetooth
        Soft blocked: no
        Hard blocked: no

rfkill key press #2:
$ sudo rfkill list
0: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: yes

rfkill key press #3:
$ sudo rfkill list
0: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: yes
2: hci0: Bluetooth
        Soft blocked: no
        Hard blocked: no

rfkill key press #4:
$ sudo rfkill list
0: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: yes

rfkill key press #5 (= back to start):
$ sudo rfkill list
0: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: no

Not entirely sure what to make of this, is the cycle normal?
Comment 27 Lee, Chun-Yi 2011-06-30 03:48:58 UTC
Sorry, for I a bit late to reply.

(In reply to comment #25)
> hi jlee
> 
> i've been travelling in N-Am, only now got an instance of pf-sources 2.6.39_2
> compiled.
> 
> i should note that as of last some days my machine has started to totally
> unexpectedly hard freeze during regular X work with 2.6.38.5-zen. screen
> either
> goes completely blank or displays primary console with X cursor staying on
> screen, but magic sysrq nor anything else works. nothing gets to syslog-ng (i
> should probably try sync) nor can i see any kernel BUG messages on screen..
> either it's also something to do with acer-wmi or it is btrfs, i think.
> 
> back to acer-wmi, first observation about resuming on 2.6.39-pf2 has been:
>  * FAIL going to sleep from acer-wmi auto-set unblocked state after cold
>  boot,
> resume will hang 100%. also tried with acpi_sleep=s3_bios just in case, that
> didn't change anything.

For this FAIL case could you please help do the following 2 testing?

 + remove Broadcom driver then do the same testing: 
Please add broadcom driver to add blacklist then test. I thought the driver must be b43 or b43legacy, please check.

 + please add set acer-wmi parameter ec_raw_mode=1, then do the same testing: 
You can add the following statement to /etc/modprobe.d
    options acer-wmi ec_raw_mode=1

>  * SUCCESS manually blocking radios with rfkill key after acer-wmi auto-set
> unblocked state, then going to sleep with blocked radios
>  * SUCCESS after doing one manual block cycle, enabling radios with rfkill
>  key,
> going to sleep
> 
> it seems like there's some kind of insanity going on after cold boot that
> gets
> straightened out by acer-wmi later.
> 

Looks like the S3 resume can be SUCCESS if we change the rfkill state by rfkill key before S3 suspend?
Comment 28 Lee, Chun-Yi 2011-06-30 03:55:59 UTC
(In reply to comment #26)
> (In reply to comment #24)
> > Could you please please to do a simple test?
> >  - Follow your comment#23, try to crash your machine by press rfkill
> button.
> > 
> > Please tell me you still can crash it or not.
> > Then, please remove your wireless driver then try to reproduce again.
> 
> Just repeating my comment#23 with 2.6.39-pf2, blacklisted acer-wmi, did NOT
> do
> battery and AC removal for now:
>  * single rfkill item "phy0: Wireless LAN" visible after cold boot (on
> 2.6.38.5-zen+, bluetooth was also visible)
>  * toggling radio states with rfkill key does NOT hang machine anymore
>  * sleep - resume cycle seems to work without any issues
> 

Per your information, the hardware rfkill key didn't cause broadcom driver crash.

> This is how my rfkill key toggles through radio states without acer-wmi:
> 
> COLD BOOT
> 
> $ sudo rfkill list
> 0: phy0: Wireless LAN
>         Soft blocked: no
>         Hard blocked: no
> 
> rfkill key press #1:
> $ sudo rfkill list
> 0: phy0: Wireless LAN
>         Soft blocked: no
>         Hard blocked: no
> 1: hci0: Bluetooth
>         Soft blocked: no
>         Hard blocked: no
> 
> rfkill key press #2:
> $ sudo rfkill list
> 0: phy0: Wireless LAN
>         Soft blocked: no
>         Hard blocked: yes
> 
> rfkill key press #3:
> $ sudo rfkill list
> 0: phy0: Wireless LAN
>         Soft blocked: no
>         Hard blocked: yes
> 2: hci0: Bluetooth
>         Soft blocked: no
>         Hard blocked: no
> 
> rfkill key press #4:
> $ sudo rfkill list
> 0: phy0: Wireless LAN
>         Soft blocked: no
>         Hard blocked: yes
> 
> rfkill key press #5 (= back to start):
> $ sudo rfkill list
> 0: phy0: Wireless LAN
>         Soft blocked: no
>         Hard blocked: no
> 
> Not entirely sure what to make of this, is the cycle normal?

Yes, the cycle is a normal and default BIOS/EC behavior on Acer notebook, we call it ec_raw_mode. The acer-wmi will set launch manager mode to EC to disable BIOS behavior, because it might be have conflict with rfkill-input and userland application.

Please kindly follow comment #27 to set ec_raw_mode=1, then we can confirm the launch manager mode have problem on your machine.
Comment 29 Leho Kraav 2011-07-06 09:28:40 UTC
Side note: I had my first failed resume this morning with 2.6.39-pf2, *acer-wmi is blacklisted*.

Quite unsure about the exact cause, but last night I did switch between WLAN and cable Ethernet several times. Do not remember what state I left the wlan adapter in.

Other than this, it had been working very stable for weeks.

> >  * FAIL going to sleep from acer-wmi auto-set unblocked state after cold
> boot,
> > resume will hang 100%. also tried with acpi_sleep=s3_bios just in case,
> that
> > didn't change anything.
> 
> For this FAIL case could you please help do the following 2 testing?
> 
>  + remove Broadcom driver then do the same testing: 
> Please add broadcom driver to add blacklist then test. I thought the driver
> must be b43 or b43legacy, please check.

I un-blacklisted acer-wmi and blacklisted broadcom.

I also had to blacklist brcmsmac, since just blacklisting broadcom did not seem to stop the wifi adapter from properly working and bringing up wlan0 interface.

It seems that broadcom is actually a phy driver that is unrelated to this wlan adapter. It is not shown being used at all in lsmod. 

After blacklisting broadcom and brcmsmac, machine seems to be successful in resuming every time. Tried 4-5 times in a row. Used the rfkill key inbetween testing sleep, no apparent problems at all.

>  + please add set acer-wmi parameter ec_raw_mode=1, then do the same testing: 
> You can add the following statement to /etc/modprobe.d
>     options acer-wmi ec_raw_mode=1

dmesg shows "Enabled RC raw mode", so parameter is working.

It is unclear, am I supposed to test EC raw mode with brcmsmac blacklisted?

Currently brcmsmac is still blacklisted, machine seems to have no problem resuming.

Did not reboot, enabled brcmsmac, then used rfkill key to bring up wlan0, sleep, resume = FAIL!

Reboot, brcmsmac, acer-wmi ec_raw_mode=1 enabled, sleep, resume = FAIL!
Comment 30 Leho Kraav 2011-07-06 09:31:20 UTC
Right now, it does not look like I have any other choice but to keep blacklisting acer-wmi. Awaiting your comments with great interest!
Comment 31 Lee, Chun-Yi 2011-07-12 00:47:14 UTC
Thank's for Leho's more testing result.
After reviewed all comments on this bug again, I thought:

 + rfkill call back set_block function in acer-wmi when S3 resume, because acre-wmi's rfkill doesn't persistence. So, call set_block make sure the rfkill state sync with REAL hardware state.

 + acer-wmi's set_block set device state by evaluate wmi method when S3 resume. (HERE, we need make sure evaluate wmi3 method on Travelmate 8172)

 + Acer BIOS receive wmi request then change the wireless RF or power state, it effect Broadcom brcmsmac driver also need update the RF/power state. 
(HERE, we need to check set_block and resume call back in brcmsmac)

Simply say, acer-wmi request BIOS to update RF or power state of Broadcom wireless hardware, it causes brcmsmac driver crash on S3 resume.


Next steps:

+ There have no other report said have the same problem with other non-broadcom wireless module, I will double check on my acer machine.

+ Need do some testing in acer-wmi's set_block callback function, I will attached on patch.

+ Need trace brcmsmac's .resume callback function, check pennding on which statement.
Comment 32 Leho Kraav 2011-08-22 22:19:25 UTC
i have upgraded to pf-sources-3.0.1 in the meanwhile. brcmsmac is pretty seriously broken with my wifi card compared to pf-sources-2.6.39-r2, which was working relatively stable.

because today i was crashing because of it, i decided to un-blacklist acer-wmi also to test this issue. it exhibited exactly the same broken behavior.

i switched to broadcom-sta today, am hoping i will get some stable behavior out of wifi card now or find an atheros mini-pci-express somewhere.
Comment 33 Lee, Chun-Yi 2011-08-23 00:33:29 UTC
We need brcmsmac's experts's help!
Comment 34 Leho Kraav 2011-08-23 06:48:39 UTC
yes. forgot to mention earlier, Acer has also released a new BIOS 1.15 for TravelMate 8172T a few months back.

i can't believe some manufacturers still don't have any kind of notification or subscription system for such updates. gigabyte and intel have at least had rss feeds for some years already. but that's another story.

i installed this BIOS yesterday, i will be diffing vs previous and uploading here.
Comment 35 Leho Kraav 2011-08-23 06:57:31 UTC
uh, i mean acer doesn't have any notification system and i will be diffing dmesg output.
Comment 36 Roland Vossen 2011-08-23 11:28:51 UTC
Hi guys,

I am one of the brcmsmac developers. I will do some suspend/resume tests on a recent kernel on a Dell Latitude E6410 (I don't have an Acer available).

Leho, how does broadcom-sta work for you ? Does it solve the issue ?

Thanks, Roland.
Comment 37 Leho Kraav 2011-08-23 11:40:21 UTC
hi roland, thanks for jumping in. broadcom-sta has been stable for a day. i will put it through some sleep cycles (praying that btrfs can take many more possible lockups.. :>) and report back after a week or so.

this is my current working setup:

 $ lsmod | sort
ac                      1628  0 
acpi_cpufreq            4373  1 
auth_rpcgss            26335  1 nfs
battery                 4545  0 
broadcom                4758  0 
cifs                  192096  2 
coretemp                4368  0 
cpufreq_stats           1989  0 
fan                     1734  0 
i2c_i801                6062  0 
intel_ips               7609  0 
iptable_filter           932  1 
ip_tables               7031  1 iptable_filter
ipt_ULOG                3517  6 
iTCO_wdt                9509  0 
lib80211                2662  2 lib80211_crypt_tkip,wl
lib80211_crypt_tkip     6335  0 
libphy                 11601  2 broadcom,tg3
lockd                  51860  1 nfs
md4                     2657  0 
mei                    22365  0 
Module                  Size  Used by
mperf                    823  1 acpi_cpufreq
nf_conntrack           34295  4 nf_conntrack_ftp,nf_conntrack_irc,xt_state,nf_conntrack_ipv4
nf_conntrack_ftp        3940  0 
nf_conntrack_ipv4       4207  21 
nf_conntrack_irc        2339  0 
nf_defrag_ipv4           755  1 nf_conntrack_ipv4
nfs                   209119  0 
nfs_acl                 1555  1 nfs
pcspkr                  1195  0 
processor              20572  1 acpi_cpufreq
psmouse                27983  0 
rtc_cmos                6424  0 
sg                     12757  0 
snd                    29834  7 snd_seq,snd_seq_device,snd_hda_codec_conexant,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
snd_hda_codec          45814  2 snd_hda_codec_conexant,snd_hda_intel
snd_hda_codec_conexant    33109  1 
snd_hda_intel          15369  0 
snd_page_alloc          4877  2 snd_hda_intel,snd_pcm
snd_pcm                41846  2 snd_hda_intel,snd_hda_codec
snd_seq                32895  0 
snd_seq_device          3557  1 snd_seq
snd_timer              11850  2 snd_seq,snd_pcm
squashfs               17335  1 
sunrpc                134816  4 nfs,lockd,auth_rpcgss,nfs_acl
zram                    6230  1 
tg3                   103273  0 
thermal                 6046  0 
uvcvideo               46484  0 
videodev               54792  1 uvcvideo
wl                   2590250  0 
wmi                     5794  0 
xt_limit                1020  5 
xt_mac                   639  0 
xt_state                 787  21
Comment 38 Roland Vossen 2011-08-23 14:29:48 UTC
I tested suspend/resume on the latest brcmsmac driver on a Dell Latitude E6410, kernel 3.1.0-rc1+, with a BCM43225 mPCIe card. System resumes normally from suspend.

Can you check the firmware images used, the output of the 'md5sum' utility on the firmware in /lib/firmware/brcm should be:

96cf06e4ff9f0c04a0f26ebefdf32e3d  bcm43xx-0.fw
48882412db63b4e2dd9c26571a29a799  bcm43xx_hdr-0.fw

Thanks, Roland.
Comment 39 Roland Vossen 2011-08-24 07:43:47 UTC
In addition to my md5sum request above: what distribution are we talking about ? Gentoo ?
Comment 40 Leho Kraav 2011-08-24 08:40:14 UTC
leho@travelmate /lib/firmware/brcm $ qlist -IUv firmware
sys-kernel/linux-firmware-20110311

leho@travelmate /lib/firmware/brcm $ ls -l
total 362
-rwxr-xr-x 1 root root 269595 11. apr   10:33 bcm4329-fullmac-4-218-248-5.bin
-rwxr-xr-x 1 root root   1604 11. apr   10:33 bcm4329-fullmac-4-218-248-5.txt
-rwxr-xr-x 1 root root  97376 11. apr   10:33 bcm43xx-0-610-809-0.fw
lrwxrwxrwx 1 root root     22 11. apr   10:40 bcm43xx-0.fw -> bcm43xx-0-610-809-0.fw
-rwxr-xr-x 1 root root    180 11. apr   10:33 bcm43xx_hdr-0-610-809-0.fw
lrwxrwxrwx 1 root root     26 11. apr   13:48 bcm43xx_hdr-0.fw -> bcm43xx_hdr-0-610-809-0.fw

leho@travelmate /lib/firmware/brcm $ md5sum bcm43xx-0-610-809-0.fw 
4fa5003e3d0ca540ef973a6b375a43c6  bcm43xx-0-610-809-0.fw

leho@travelmate /lib/firmware/brcm $ md5sum bcm43xx_hdr-0-610-809-0.fw 
9e2256e0bb25e8d73aed0907d254d60f  bcm43xx_hdr-0-610-809-0.fw

---

there is a newer firmware package available, installed it now. will make test runs with it. looks like june firmware is matching yours.

leho@travelmate /lib/firmware/brcm $ qlist -IUv firmware
sys-kernel/linux-firmware-20110604

leho@travelmate /lib/firmware/brcm $ md5sum bcm43xx-0.fw 
96cf06e4ff9f0c04a0f26ebefdf32e3d  bcm43xx-0.fw

leho@travelmate /lib/firmware/brcm $ md5sum bcm43xx_hdr-0.fw 
48882412db63b4e2dd9c26571a29a799  bcm43xx_hdr-0.fw
Comment 41 Leho Kraav 2011-08-24 08:40:32 UTC
(In reply to comment #39)
> In addition to my md5sum request above: what distribution are we talking
> about
> ? Gentoo ?

correct.
Comment 42 Leho Kraav 2011-08-24 09:36:10 UTC
there seems to be improvement. with kernel 3.0.2 and linux-firmware-20110604, brcmsmac on it's own *and* with acer-wmi seems to wake up properly now, based on an initial test of three sleep-resume cycles.

like i said earlier, i have also upgraded bios. could be a contributor.

one immediate difference is that phy0 seems to always be hard blocked when waking up now.

i will also have to test for the brcmsmac-related freezes i experienced two days ago, when there were no associatable wlan base stations around and i was trying to find one. 

selecting an entry in wpa_gui would freeze the process somewhere deep kernel level, since many other file opening processes would also freeze (tail /var/log/messages etc). hitting rfkill key to disable wlan unfroze these processes. when trying to select another entry from wpa_gui, system froze completely.
Comment 43 Leho Kraav 2011-08-26 11:51:49 UTC
today i can report that after modularizing i915, i can now again reproduce the freeze on resume when acer-wmi and brcmsmac are loaded together. when i unload either one of them, resume succeeds.

i've now blacklisted acer-wmi again for the time being.
Comment 44 Roland Vossen 2011-08-30 09:06:03 UTC
Hello Leho,

I freed up time to look at this problem again. Some questions.

What is the reason that you modularized i915 ? Any idea why modularizing it made the issue reappear ?

Would you be able to create a suspend/resume trace as suggested by Chun-Yi https://bugzilla.kernel.org/show_bug.cgi?id=34682#c13 ?

I am currently debugging an other issue (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637240) that could be related.

Thanks, Roland.
Comment 45 Leho Kraav 2011-08-30 10:20:59 UTC
Created attachment 70852 [details]
acpidump-bios-1.15.dat

acpidump from updated bios
Comment 46 Leho Kraav 2011-08-30 13:38:24 UTC
(In reply to comment #44)

> What is the reason that you modularized i915 ? Any idea why modularizing it
> made the issue reappear ?

I wanted a no-reboot way to test various performance parameters of i915.

Based on the further testing I've now done, "reappear" may not be correct word. I think something has been consistently broken all the time, but partly mitigated. Is not 100% certain i915 modularization has anything to do with this bug. See below.

> Would you be able to create a suspend/resume trace as suggested by Chun-Yi
> https://bugzilla.kernel.org/show_bug.cgi?id=34682#c13 ?

OK, went ahead and enable these:

 $ zgrep -e PM_DEBUG -e PM_TRACE /proc/config.gz
CONFIG_PM_DEBUG=y
CONFIG_CAN_PM_TRACE=y
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
# CONFIG_PCIEASPM_DEBUG is not set

First I rebooted into debug-kernel and forgot to un-blacklist acer-wmi. So I just manually did modprobe acer-wmi and suspended. To my surprise, machine resumed successfully. Did several test cycles and they all succeeded.

Then I un-blacklisted acer-wmi in /etc/modprobe.d/blacklist.conf and rebooted.
Did a suspend cycle and now machine hangs as expected.

It would appear there's something to do with module or hardware loading / initialization order. Out of subsequent dmesg:

584 [    1.789437]   Magic number: 0:304:98
585 [    1.789602]   hash matches drivers/base/power/main.c:565

> I am currently debugging an other issue
> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637240) that could be
> related.

I read through this bug. It seems similar to what I experienced in comment 42 of this bug. Everything was apparently working stable when known base station was near, but would freeze system in other locations where known base stations were not around. I am almost certain updating linux-firmware has solved that issue for me, since this behavior has no longer occured at all in various travel situations.
Comment 47 Leho Kraav 2013-08-04 19:55:11 UTC
Note: I no longer have this machine and am unable to provide any further input here :/

Just glad I got rid of Broadcom. Never again.