Most recent kernel where this bug did not occur: ? Distribution: debian unstable with 2.6.15-rc5 from kernel.org Hardware Environment: x86 notebook - Acer Travelmate 529TXV Software Environment: Problem Description: On starting swsusp (echo disk > /sys/power/state) the machine hangs if the module snd_ali5451 is loaded. If that module is unloaded first, it works fine. The problem is that unloading is not always easy (tasks like artsd or kmix accessing sound devices) and that on unload the mixer settings are lost. The messages on swsusp are (on the suspend console): swsusp: Need to copy 8240 pages swsusp: critical section/: done (8240 pages copied) PCI: Setting latency timer of device 0000:00:01.0 to 64 ACPI: PCI Interrupt 0000:00:06.0[A] -> Link [PILH] -> GSI 10 (level, low) -> IRQ 10 ACPI: PCI Interrupt 0000:00:10.0[A] -> no GSI - unsing IRQ 15 ACPI: PCI Interrupt 0000:00:13.0[A] -> Link [PILB] -> GSI 11 (level, low) -> IRQ 11 Yenta O2: res at 0x94/0xD4: ea/00 Yenta O2: enabling read prefetch/write burst ACPI: PCI Interrupt 0000:00:13.1[A] -> Link [PILB] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI Interrupt 0000:00:14.0[A] -> Link [PILI] -> GSI 11 (level, low) -> IRQ 11 After that it just hangs. .config highlights: PREEMPT_NONE=y X86_UP_APIC=y X86_UP_IOAPIC=y X86_LOCAL_APIC=y X86_IO_APIC=y NOHIGHMEM=y HZ_100=y HZ=100 ACPI=y
Try 2.6.16-rc1. If it's related with a soft lock-up (too long delay in irq disabled context), it was already fixed in that version.
No, doesn't work. I get two additional lines on swsusp after the ones reported before: pnp: Failed to activate device 00:04. pnp: Failed to activate device 00:09. There it hangs. SYSRQ-Pc shows much output, but I can't scroll back - so here are the visible lines: common_interrupt+0x1a/0x20 __do_softirq+0x2c/0x7d do_softirq+0x22/0x26 common_interrupt+0x1a/0x20 enable_irq+0x8e/0x93 ide_config_drive_speed+0x15a+0x355 ali15x3_tune_chipset+0x15e/0x166 config_chipset_for_dma+0x24/0x33 ali15x3_config_drive_for_dma+0x7e/0xfa ide_do_request+0x525/0x72a and down to syscall_call+0x7/0xb. Hope that helps!
BTW, after SYSRQ+PC SYSRQ no longer works. Before that I could eg. do SYSRQ+H. Now the machine's completely dead, have to turn off.
According to the report, it's rather ali15x3 which causes soft lockup...
Maybe it's some kind of interference - but I definitly unload snd_ali5451.
Tested 2.6.16rc4; on suspend: pnp: Failed to activate device 00:05. pnp: Device 00:09 activated. pnp: Failed to activate device 00:0a. SYSRQ-PC: common_interrupt, __do_softirq, do_softirq, do_IRQ, common_interrupt, enable_irq, ide_config_drive_speed, ali15x3_tune_chipset, config_chipset_for_dma, ide_do_request, ide_do_drive_cmd, generic_ide_resume, blk_end_sync_rq, resume_device, dpm_resume, device_resume, pm_suspend_disk, enter_state, state_store, sysfs_write_file, vfs_write, sys_write, syscall_call SYSRQ does still work - that's better than with 2.6.16rc4.
Created attachment 7413 [details] dmesg of boot done with 2.6.16rc4
Created attachment 7414 [details] /proc/interrupts
Created attachment 7415 [details] /proc/ioports
Does it still happen in 2.6.17-rc4?
Sorry, no. 2.6.17rc4 says on suspend: pnp: Failed to activate device 00:05. pnp: Device 00:09 activated. pnp: Failed to activate device 00:0a. 00:05 is the PS2 keyboard (i8042), 00:0a is INT12 (i8042) SYSRQ/P shows __do_IRQ, common_interrupt, do_softirq, common_interrupt, enable_irq, ali15x3_tune_chipset, ali15x3_config_drive_for_dma, ide_do_drive_cmd, ... Anything else I can do to help?
You mean it is fixed in 2.6.17-rc4? That would be good news, no? Otherwise try napic, nolapic, and show us /proc/interrupts.
Sorry, I wasn't clear. No, it does not work. Attached /proc/interrupts, a dmesg of the boot. Last lines on suspend as written above. Thank you for your efforts!
Created attachment 8161 [details] /proc/interrupts with 2.6.17rc4
Created attachment 8162 [details] dmesg with 2.6.17rc4
Interrupt is shared between eth0 and soundcard, AFAICT. Can you try what happens when you suspend/resume without eth0 drivers?
Booted with 2.6.17rc4 *without* giving noapic nolapic. Removed e100 module echo disk > /sys/power/state Machine hangs. Same with "noapic nolapic".
> Booted with 2.6.17rc4 *without* giving noapic nolapic. > Removed e100 module > echo disk > /sys/power/state > Machine hangs. > > Same with "noapic nolapic". Thanks for test => it is probably not shared interrupt problem. If you remove the sound modules, does it survive suspend? Pavel
Yes. My swsusp.sh tries to save the volume levels, kills all sound users (kmix, artsd, ogg123, etc.), removes the module, and suspends. On wake-up it loads the module, loads arts and kmix and re-sets the volume levels.
So it works if you rmmod ali before suspend. Good. Now... could you look at differences between _suspend and rmmod code paths (and _resume and insmod) to figure out what is wrong? printk() is your friend...
AFAICT _suspend and _remove are completely different. They both do a pci_get_drvdata(), but _remove just calls pci_set_drvdata(pci, NULL); whereas _suspend does a fair bit of suspending single pieces ... Can you give me a hint where to start?
> AFAICT _suspend and _remove are completely different. > They both do a pci_get_drvdata(), but _remove just calls > pci_set_drvdata(pci, NULL); > whereas _suspend does a fair bit of suspending single pieces ... > > > Can you give me a hint where to start? I'm not ALSA expert, sorry. It should be possible to duplicate _remove routine into suspend... there's no reason why these should be different. Or try to find author of that _suspend piece or something like that. Pavel
As of http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;h=353b28bafd1b962359a866ff263a7fad833d29a1;f=sound/pci/ali5451/ali5451.c there's been quite some activity. Takashi, you're already on the CC:-list. Can you help me? Thank you in advance!
Hmm, how about the uploaded patch?
Created attachment 8176 [details] Fix chip initialization in resume of ali5451
Sorry, doesn't work. Booted the machine, patched the module, modules_install, rmmod, modprobe, played some music, echo disk > /sys/power/state: hangs as before.
Still hangs up if you don't play sounds during suspend? Maybe it's an irq issue. Could you try the patch below?
Created attachment 8192 [details] Reinitialize irq in PM
No, I don't even play sound *while* suspending - just before. I'll try the patch - above or instead the other patch? Thank you!
I tried this patch above the other - doesn't help.
Can you please verify if the problem still happens on 2.6.18?
2.6.18 won't let me suspend. There are some messages scrolling by, but they don't get into dmesg or /var/log/messages and are nearly immediately replaced by the original console screen. I can read (or believe to see :-) ACPI: PCI interrupt disabled for ... ACPI: PCI interrupt disabled for ... ACPI: PCI interrupt disabled for ... ACPI: PCI interrupt disabled for ... ACPI: PCI interrupt disabled for ... Class driver suspend failed for cpu0 Could not power down device firmware: -22 Some devices failed to power down, suspend aborted PCI: Enabling device ...
This probably is Bug #7188. Please try to remove the acpi_cpufreq modules before the suspend.
I now compiled the kernel with acpi_cpufreq as module, and got it to suspending after playing a sound. It worked when suspending *while* playing, too.