Most recent kernel where this bug did not occur: 188.8.131.52
Hardware Environment: Medion MD 96400 (aka MSI S270)
Software Environment: latest archlinux
- no battery information beneath /sys
- wakeup from S3 doesn't work anymore (LEDs just blink)
Steps to reproduce:
[ ] Deprecated /proc/acpi files
[ ] Deprecated power /proc/acpi folders
[ ] Deprecated /proc/acpi/event support
and boot 2.6.24-rcX
.config, acpidump, dmesg output attached (all from 2.6.24-rc5-git3)
This was the old output:
jan@jan ~ $ cat /proc/acpi/battery/BAT1/*
design capacity: 4400 mAh
last full capacity: 2976 mAh
battery technology: rechargeable
design voltage: 14800 mV
design capacity warning: 0 mAh
design capacity low: 0 mAh
capacity granularity 1: 1 mAh
capacity granularity 2: 1 mAh
model number: MS-1013
battery type: LION
capacity state: ok
charging state: charged
present rate: 0 mA
remaining capacity: 2976 mAh
present voltage: 16681 mV
Created attachment 14067 [details]
output from acpidump
Created attachment 14068 [details]
dotconfig from 2.6.24-rc5-git3
Created attachment 14069 [details]
output from dmesg
what do you see under /sys/class/power_*/ ?
root@jan ~ # ll /sys/class/power_supply/
lrwxrwxrwx 1 root root 0 16. Dez 2007 ADP1 -> ../../devices/LNXSYSTM:00/device:00/PNP0A03:00/device:05/PNP0C09:00/ACPI0003:00/power_supply/ADP1
lrwxrwxrwx 1 root root 0 16. Dez 2007 BAT1 -> ../../devices/LNXSYSTM:00/device:00/PNP0A03:00/device:05/PNP0C09:00/PNP0C0A:00/power_supply/BAT1
root@jan ~ # ls /sys/class/power_supply/BAT1/*
/sys/class/power_supply/BAT1/alarm /sys/class/power_supply/BAT1/manufacturer /sys/class/power_supply/BAT1/type
/sys/class/power_supply/BAT1/charge_full /sys/class/power_supply/BAT1/model_name /sys/class/power_supply/BAT1/uevent
/sys/class/power_supply/BAT1/charge_full_design /sys/class/power_supply/BAT1/present /sys/class/power_supply/BAT1/voltage_min_design
/sys/class/power_supply/BAT1/charge_now /sys/class/power_supply/BAT1/status /sys/class/power_supply/BAT1/voltage_now
driver hid modalias path power power_supply subsystem uevent
root@jan ~ # cat /sys/class/power_supply/BAT1/*
cat: /sys/class/power_supply/BAT1/device: Ist ein Verzeichnis
cat: /sys/class/power_supply/BAT1/power: Ist ein Verzeichnis
cat: /sys/class/power_supply/BAT1/subsystem: Ist ein Verzeichnis
hrmz.. From the above output I guess everything is working fine and my userspace tools (laptop-mode, powertop, xfce-battery-plugin) are not ready for the hard cut yet?
sorry for wasting your time. there's still the non-working s2ram though :) maybe that'll be fixed with "[*] Deprecated power /proc/acpi folders" too. I'll try it now
Oki the battery issues are 'solved'. I'm really sorry, I should've thought of that.
But s2ram issues remain. Laptop takes a little longer to get into S3 (like 2s) and if I want to wake it from S3 the screen stays black and Caps-/Scroll-lock keys are blinking. How can I debug that?
Let's use this bug entry for tracking the s2ram issue from now.
So, s2ram worked before 2.6.24-rc1 and now it doesn't?
If the issue is 100% reproducible, I'd carry out a bisection to find the offending patch.
it is. I heard what amazing things git-bisect can do and gave it a go:
# good: [bbf25010f1a6b761914430f5fca081ec8c7accd1] Linux 2.6.23
git-bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1
# bad: [c9927c2bf4f45bb85e8b502ab3fb79ad6483c244] Linux 2.6.24-rc1
git-bisect bad c9927c2bf4f45bb85e8b502ab3fb79ad6483c244
# bad: [9ac52315d4cf5f561f36dabaf0720c00d3553162] sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
git-bisect bad 9ac52315d4cf5f561f36dabaf0720c00d3553162
# bad: [038a5008b2f395c85e6e71d6ddf3c684e7c405b0] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
git-bisect bad 038a5008b2f395c85e6e71d6ddf3c684e7c405b0
# good: [dd6d1844af33acb4edd0a40b1770d091a22c94be] Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
git-bisect good dd6d1844af33acb4edd0a40b1770d091a22c94be
the next kernels won't compile however. I tried a few "git-reset --hard HEAD~3" and now I'm on "HEAD is now at ad7379d... [NET]: Fix the prototype of call_netdevice_notifiers." but the kernel still fails to compile: http://jan.willies.info/make.log
I don't know how to proceed now.
I also have no battery info on my MSI-GX700 laptop. The problem has appeared in 2.6.24-rc4. In 2.6.23-2.6.24-rc3 there was an another problem. Sometimes battery charge has been showing incorrectly.
[*] Deprecated /proc/acpi files
[*] Deprecated power /proc/acpi folders
[*] Deprecated /proc/acpi/event support
The above mentioned items are enabled. But since 2.6.24-rc4 I have neither "BAT1" under /sys/class/power_supply/ nor "battery" under /proc/acpi/.
(In reply to comment #10)
Well, I'd expect it to compile if you do
$ git bisect start
$ git bisect good dd6d1844af33acb4edd0a40b1770d091a22c94be
$ git bisect bad 038a5008b2f395c85e6e71d6ddf3c684e7c405b0
$ git reset --hard 03233b90b0977d577322a6e1ddd56d9cc570d406
(compiles for me).
whooho: 81873e9ccd5731ca77027bdb32b34904e7af25d0 is first bad commit. After ~40 recompiles in the last days finally some christmas cookies for me :)
It seems git-bisect isn't good with that whole merging thing yet, I had to nuke the tree several times and clone a new one to make it compile again. Compilation always failed with the same error message (see link above).
Somewhere since around 03233b90b0977d577322a6e1ddd56d9cc570d406 suspend didn't work anymore, saying:
Switching from vt7 to vt1
s2ram_do: Device or resource busy
switching back to vt7
Then suspend worked again but resuming failed. Keyboard LEDs are blinking and the whole laptop freezes. So the culprit for this ticket is 81873e9ccd5731ca77027bdb32b34904e7af25d0. Sounds plausible to me, cause sometimes I can move the mouse for a few secs after resuming from S3 before the whole thing freezes. I guess wlan kicks in then.
What all kernels have in common: after a successful suspend/resume a proper reboot doesn't work anymore. Last message is 'system restarting.' but nothing happens. I have to hard-poweroff the laptop.
Thanks for doing this work!
Please verify if suspend works for you with the rt2x00 driver unloaded.
Also, if commit 81873e9ccd5731ca77027bdb32b34904e7af25d0 can be cleanly reverted, please revert it and see if suspend works.
The last issue is sort of known. Please see Bug #6655.
(In reply to comment #14)
> Thanks for doing this work!
> Please verify if suspend works for you with the rt2x00 driver unloaded.
I mean, with unmodified 2.6.24-rc6 (or a later kernel).
(In reply to comment #15)
> (In reply to comment #14)
> > Please verify if suspend works for you with the rt2x00 driver unloaded.
> I mean, with unmodified 2.6.24-rc6 (or a later kernel).
It works with latest git and rt2x00 unloaded!
I tried reverting the patch but it failed:
jan@jan ~/src/linux-2.6 $ git-revert 81873e9ccd5731ca77027bdb32b34904e7af25d0
CONFLICT (content): Merge conflict in drivers/net/wireless/rt2x00/rt2x00lib.h
Automatic revert failed. After resolving the conflicts,
mark the corrected paths with 'git add <paths>' and commit the result.
CC [M] drivers/net/wireless/rt2x00/rt2x00dev.o
In file included from drivers/net/wireless/rt2x00/rt2x00dev.c:35:
drivers/net/wireless/rt2x00/rt2x00lib.h:32: Fehler: expected identifier or »(« before »<<« token
In Datei, eingefügt von drivers/net/wireless/rt2x00/rt2x00dev.c:35:
drivers/net/wireless/rt2x00/rt2x00lib.h:36:1: Warnung: »LINK_TUNE_INTERVAL« redefiniert
drivers/net/wireless/rt2x00/rt2x00lib.h:33:1: Warnung: dies ist die Stelle der vorherigen Definition
drivers/net/wireless/rt2x00/rt2x00lib.h:37:1: Warnung: »RFKILL_POLL_INTERVAL« redefiniert
drivers/net/wireless/rt2x00/rt2x00lib.h:34:1: Warnung: dies ist die Stelle der vorherigen Definition
drivers/net/wireless/rt2x00/rt2x00lib.h:38:9: Fehler: ungültiger Suffix »...« an Gleitkommakonstante
make: *** [drivers/net/wireless/rt2x00/rt2x00dev.o] Fehler 1
make: *** [drivers/net/wireless/rt2x00] Fehler 2
make: *** [drivers/net/wireless] Fehler 2
make: *** [drivers/net] Fehler 2
make: *** [drivers] Fehler 2
(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > Please verify if suspend works for you with the rt2x00 driver unloaded.
> > I mean, with unmodified 2.6.24-rc6 (or a later kernel).
> It works with latest git and rt2x00 unloaded!
OK, thanks for the confirmation.
Technically, this is not a regression, since the rt2x00 driver was not present in 2.6.23, so this is a new breakage.
I'll reassign the bug to the wireless team, but if no one picks it up, I'll have a closer look at it.
For now, as a workaround, please unload the rt2x00 module before suspend.
What is the RFKill state when you are suspending, and does it matter if you change the state before resuming?
Also could you try enabling debugfs and check what the contents of the "dev_flags" rt2x00 debugfs entry is?
root@jan ~ # cat /sys/class/rfkill/rfkill0/state
changing it to 0 makes suspend/resume work!
(btw, my hardware-wlan-button makes the wlan-LED go on/off but doesn't change the state of rfkill)
Could you disable RFKILL in your config and test if suspend and resume is working then?
that works too
Created attachment 14272 [details]
Suspend & resume fix
Could you try attached patch and see if that improves things?
sorry, I was wrong before. It doesn't work with RFKILL disabled. I didn't test it in the range of my AP, that's why it worked I guess (btw, the range of rt2500pci is poor, sth like 2-3m :( )
With your patch, it doesn't crash on resume. Instead the display stays black and no keypresses are recognized. In fact, it's the same execpt those LEDs aren't blinking. I have to hard-poweroff the laptop.
Is there anything in the logs about the resume?
yes, now that you say it:
Jan 4 11:17:10 jan Clocksource tsc unstable (delta = -191677227 ns)
Jan 4 11:17:14 jan WARNING: at net/mac80211/rx.c:1486 __ieee80211_rx()
Jan 4 11:17:14 jan Pid: 4388, comm: dbus-daemon Not tainted 2.6.24-rc6-gb8c9a187-dirty #2
Jan 4 11:17:14 jan [<f8e89f59>] __ieee80211_rx+0xbf9/0xd10 [mac80211]
Jan 4 11:17:14 jan [<c01475b0>] file_read_actor+0x0/0x130
Jan 4 11:17:14 jan [<f8e662d6>] rt2x00pci_rxdone+0x96/0x1c0 [rt2x00pci]
Jan 4 11:17:14 jan [<f8e7c0bd>] ieee80211_tasklet_handler+0xdd/0xf0 [mac80211]
Jan 4 11:17:14 jan [<f8e66348>] rt2x00pci_rxdone+0x108/0x1c0 [rt2x00pci]
Jan 4 11:17:14 jan [<c0121a63>] tasklet_action+0x33/0x70
Jan 4 11:17:14 jan [<c01219b2>] __do_softirq+0x42/0x90
Jan 4 11:17:14 jan [<c0121a27>] do_softirq+0x27/0x30
Jan 4 11:17:14 jan [<c0121c95>] irq_exit+0x65/0x70
Jan 4 11:17:14 jan [<c0106733>] do_IRQ+0x43/0x80
Jan 4 11:17:14 jan [<c016a451>] sys_read+0x41/0x70
Jan 4 11:17:14 jan [<c0104b8f>] common_interrupt+0x23/0x28
Jan 4 11:17:14 jan =======================
Created attachment 14285 [details]
Move data into 4-byte aligned memory addresss
Hmm that means RX has properly kicked in again after resume.
The warning is fixable with attached patch, _but_ that shouldn't be the cause of the blank screen.
Is there anything else in the log?
with your latest move-data-into-4-byte..patch even the above warning and oops disappeared.
it's not only that my display is black, keyboard isn't working either.
Ok I am running out of ideas at the moment, could you grab the rt2x00 git tree?
and see if the problem persists in there?
Same black screen problem with rt2x00-git. No logs.
Just to make it clear: It works with 2.6.23 and external rt2x00-2.0.7, but it doesn't since it's integrated into mainline kernel.
Unfortunately is 2.0.7 bad comparison material, it is working because it didn't do anything...
But to summarize:
- rfkill: radio enabled - failure
- rfkill: radio disabled - success
- ifconfig up: failure
- ifconfig down: ?
It seems to be related to the radio state, could you confirm that it is working when the interface is down when suspending?
- ifconfig up: failure
- ifconfig down: success!
Ok, this gives me a clue on where to look for this issue.
Hopefully I'll have a patch in a few days.
Created attachment 14391 [details]
Always check DEVICE_PRESENT when handling callback functions
In addition to the previous 2 patches, could you try this new patch.
This will block all calls from mac80211 while the device is busy suspending/resuming. Because the panic happens when the interface is up, I believe the bug occurs because mac80211 is using a callback function just before the device is correctly reinitialized.
That didn't work, sorry. Same error as always except this happens after a successful resume from "ifconfig wlan0 down && s2ram":
root@jan ~ # ifconfig wlan0 up
SIOCSIFFLAGS: No buffer space available
Created attachment 14425 [details]
Completer unregister interface during suspend
Could you unapply patch 14391 and replace it with this patch. This will compeletely unregister the interface from mac80211. The result is that all your configuration on the interface will be lost, but at least mac80211 will not be bothering rt2x00 during suspend/resume.
sorry, that doesn't work either :(
I'm sorry, but I have run out of ideas on this issue.
I'll still look into this, but it will take me more time to really figure this one out. :(
No worries. I'd like to take the opportunity to thank you and Rafael for your work and help on this ticket, it's very appreciated. Keep the good work up
Lots of rt2x00 changes between 2.6.24 and 2.6.25.
Could you retest with rt2x00 2.6.25 to see if the problem persists?
yes, I have been testing continuously. With -rc2 my laptop freezes instantly when I fire up wpa_supplicant. There are no logs. Loading the rt2500pci module is fine though and scanning works too.
(In reply to comment #41)
> yes, I have been testing continuously. With -rc2 my laptop freezes instantly
> when I fire up wpa_supplicant. There are no logs. Loading the rt2500pci
> is fine though and scanning works too.
^- without going into suspend/resume
I've filed a new report for the crashes: http://bugzilla.kernel.org/show_bug.cgi?id=10058
Bug 100058 is now closed, can you retest with 2.6.25-rc3?
2.6.25-rc3 won't let me go into suspend. Will keep trying
Still now success. Suspend works now, but resume freezes the laptop as usual. When I do 'ifconfig wlan0 down' resume works, but I have this in my logs afterwards:
BUG: unable to handle kernel NULL pointer dereference at 00000120
IP: [<f8db7ebb>] :rt2500pci:rt2500pci_rfkill_poll+0xb/0x20
*pde = 00000000
Oops: 0000 [#1] PREEMPT
Modules linked in: nfs lockd sunrpc radeon drm cifs ext2 snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss pcmcia rtc_cmos rtc_core rtc_lib yenta_socket rsrc_nonstatic pcmcia_core psmouse pcspkr snd_atiixp snd_ac97_codec ac97_bus k8temp snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 pl2303 usbserial sg msi_laptop evdev thermal fan button battery ac arc4 ecb rt2500pci rt2x00pci rt2x00lib rfkill input_polldev mac80211 cfg80211 eeprom_93cx6 fuse tun powernow_k8 freq_table processor 8139too mii aes_i586 aes_generic lrw gf128mul usb_storage usbhid hid dm_crypt dm_mod sr_mod cdrom sd_mod ehci_hcd ohci_hcd usbcore
Pid: 4840, comm: ipolldevd Not tainted (2.6.25-rc4-00008-g053398d #42)
EIP: 0060:[<f8db7ebb>] EFLAGS: 00010296 CPU: 0
EIP is at rt2500pci_rfkill_poll+0xb/0x20 [rt2500pci]
EAX: 00000000 EBX: f6ef3dc0 ECX: 00000000 EDX: f8db8fa0
ESI: f6de4e40 EDI: f5ea33c0 EBP: f887e180 ESP: f5f3df70
DS: 0068 ES: 007b FS: 0000 GS: 0000 SS: 0068
Process ipolldevd (pid: 4840, ti=f5f3c000 task=f72c2000 task.ti=f5f3c000)
Stack: f8db254a 00000000 f6de4e40 f6de4e54 f887e194 f6de4e58 f6de4e54 c012c571
f72c2150 f72c2150 0000007b 0000007b f5ea33c8 f5ea33c0 c012cde0 00000000
c012ce59 00000000 f72c2000 c012faf0 f5f3dfc0 f5f3dfc0 fffffffc f5ea33c0
[<f8db254a>] rt2x00rfkill_poll+0x1a/0x60 [rt2x00lib]
[<f887e194>] input_polled_device_work+0x14/0x40 [input_polldev]
Code: d3 ea b9 04 00 00 00 85 d2 0f 95 40 12 83 e3 04 0f bc c9 d3 eb 85 db 5b 0f 95 40 13 c3 8d 76 00 8b 40 4c b9 01 00 00 00 0f bc c9 <8b> 80 20 01 00 00 83 e0 01 d3 e8 c3 89 f6 8d bc 27 00 00 00 00
EIP: [<f8db7ebb>] rt2500pci_rfkill_poll+0xb/0x20 [rt2500pci] SS:ESP 0068:f5f3df70
---[ end trace 3e4bf11b2672b232 ]---
Could you try disabling rt2x00 rfkill support and retry?
I'll try to come up with a patch for this panic as soon as possible.
yes, disabling rt2500-rfkill works smooth. rfkill didn't recognize the key on my laptop anyway
Created attachment 15188 [details]
Add suspend & resume handlers to rt2x00rfkill
Excellent, at least there is some progress with this bug. :)
I have a new patch that should fix the panic with rfkill enabled.
Please test this one, and let me know if it works.
yay! success! works great now, thanks
Excellent, I'll push the patch upstream. :)
I'm afraid it only works when connected to a WLAN :/ basically it's the other way round now. When I'm connected to LAN (and AC), my laptop won't come up after a suspend.
But is it rt2x00 that is causing the problem this time, or another driver?
might as well be another driver. sorry for the long delay but I was busy with RL. Unfortunately I have a new laptop now and can't test anymore. But I guess this can be closed since I was the only one who experienced it anyway. Thank you for your help on this and keep up the good work. rt2x00 is really good working with 2.6.25
Ok, I'll close this bug.
Hopefully it won't reappear for you again. :)