Bug 5763 - ali5451 sound module hangs on swsusp
Summary: ali5451 sound module hangs on swsusp
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-19 22:35 UTC by Ph. Marek
Modified: 2011-07-30 05:20 UTC (History)
5 users (show)

See Also:
Kernel Version: debian 2.6.14-1, -2 and plain 2.6.15-rc5, 2.6.16-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg of boot (12.46 KB, text/plain)
2006-02-20 00:33 UTC, Ph. Marek
Details
/proc/interrupts (549 bytes, text/plain)
2006-02-20 00:35 UTC, Ph. Marek
Details
/proc/ioports (996 bytes, text/plain)
2006-02-20 00:35 UTC, Ph. Marek
Details
/proc/interrupts with 2.6.17rc4 (549 bytes, text/plain)
2006-05-21 23:08 UTC, Ph. Marek
Details
dmesg with 2.6.17rc4 (13.04 KB, text/plain)
2006-05-21 23:08 UTC, Ph. Marek
Details
Fix chip initialization in resume of ali5451 (1.48 KB, patch)
2006-05-22 10:35 UTC, Takashi Iwai
Details | Diff
Reinitialize irq in PM (992 bytes, patch)
2006-05-23 06:21 UTC, Takashi Iwai
Details | Diff

Description Ph. Marek 2005-12-19 22:35:15 UTC
Most recent kernel where this bug did not occur: ?
Distribution: debian unstable with 2.6.15-rc5 from kernel.org
Hardware Environment: x86 notebook - Acer Travelmate 529TXV
Software Environment: 
Problem Description:

On starting swsusp (echo disk > /sys/power/state) the machine hangs if the
module snd_ali5451 is loaded.
If that module is unloaded first, it works fine.

The problem is that unloading is not always easy (tasks like artsd or kmix
accessing sound devices) and that on unload the mixer settings are lost.

The messages on swsusp are (on the suspend console):
  swsusp: Need to copy 8240 pages
  swsusp: critical section/: done (8240 pages copied)
  PCI: Setting latency timer of device 0000:00:01.0 to 64
  ACPI: PCI Interrupt 0000:00:06.0[A] -> Link [PILH] -> GSI 10 (level, low) ->
IRQ 10
  ACPI: PCI Interrupt 0000:00:10.0[A] -> no GSI - unsing IRQ 15
  ACPI: PCI Interrupt 0000:00:13.0[A] -> Link [PILB] -> GSI 11 (level, low) ->
IRQ 11
  Yenta O2: res at 0x94/0xD4: ea/00
  Yenta O2: enabling read prefetch/write burst
  ACPI: PCI Interrupt 0000:00:13.1[A] -> Link [PILB] -> GSI 11 (level, low) ->
IRQ 11
  ACPI: PCI Interrupt 0000:00:14.0[A] -> Link [PILI] -> GSI 11 (level, low) ->
IRQ 11
After that it just hangs.

.config highlights:
PREEMPT_NONE=y
X86_UP_APIC=y
X86_UP_IOAPIC=y
X86_LOCAL_APIC=y
X86_IO_APIC=y
NOHIGHMEM=y
HZ_100=y
HZ=100
ACPI=y
Comment 1 Takashi Iwai 2006-01-19 04:12:34 UTC
Try 2.6.16-rc1.  If it's related with a soft lock-up (too long delay in irq
disabled context), it was already fixed in that version.
Comment 2 Ph. Marek 2006-01-19 23:19:49 UTC
No, doesn't work.

I get two additional lines on swsusp after the ones reported before:

  pnp: Failed to activate device 00:04.
  pnp: Failed to activate device 00:09.

There it hangs.

SYSRQ-Pc shows much output, but I can't scroll back - so here are the visible lines:
  common_interrupt+0x1a/0x20
  __do_softirq+0x2c/0x7d
  do_softirq+0x22/0x26
  common_interrupt+0x1a/0x20
  enable_irq+0x8e/0x93
  ide_config_drive_speed+0x15a+0x355
  ali15x3_tune_chipset+0x15e/0x166
  config_chipset_for_dma+0x24/0x33
  ali15x3_config_drive_for_dma+0x7e/0xfa
  ide_do_request+0x525/0x72a

and down to syscall_call+0x7/0xb.

Hope that helps!
Comment 3 Ph. Marek 2006-01-19 23:22:32 UTC
BTW, after SYSRQ+PC SYSRQ no longer works.
Before that I could eg. do SYSRQ+H.

Now the machine's completely dead, have to turn off.
Comment 4 Takashi Iwai 2006-01-31 02:32:28 UTC
According to the report, it's rather ali15x3 which causes soft lockup...
Comment 5 Ph. Marek 2006-01-31 02:46:15 UTC
Maybe it's some kind of interference - but I definitly unload snd_ali5451.
Comment 6 Ph. Marek 2006-02-20 00:32:16 UTC
Tested 2.6.16rc4; on suspend:
   pnp: Failed to activate device 00:05.
   pnp: Device 00:09 activated.
   pnp: Failed to activate device 00:0a.
SYSRQ-PC:
   common_interrupt, __do_softirq, do_softirq, do_IRQ, common_interrupt,
   enable_irq, ide_config_drive_speed, ali15x3_tune_chipset, 
   config_chipset_for_dma, ide_do_request, ide_do_drive_cmd, generic_ide_resume,
   blk_end_sync_rq, resume_device, dpm_resume, device_resume, pm_suspend_disk,
   enter_state, state_store, sysfs_write_file, vfs_write, sys_write, syscall_call
SYSRQ does still work - that's better than with 2.6.16rc4.

Comment 7 Ph. Marek 2006-02-20 00:33:42 UTC
Created attachment 7413 [details]
dmesg of boot

done with 2.6.16rc4
Comment 8 Ph. Marek 2006-02-20 00:35:03 UTC
Created attachment 7414 [details]
/proc/interrupts
Comment 9 Ph. Marek 2006-02-20 00:35:37 UTC
Created attachment 7415 [details]
/proc/ioports
Comment 10 Pavel Machek 2006-05-19 02:56:56 UTC
Does it still happen in 2.6.17-rc4?
Comment 11 Ph. Marek 2006-05-19 04:51:49 UTC
Sorry, no.

2.6.17rc4 says on suspend:
  pnp: Failed to activate device 00:05.
  pnp: Device 00:09 activated.
  pnp: Failed to activate device 00:0a.

00:05 is the PS2 keyboard (i8042), 00:0a is INT12 (i8042)

SYSRQ/P shows
__do_IRQ, common_interrupt, do_softirq, common_interrupt, enable_irq, 
ali15x3_tune_chipset, ali15x3_config_drive_for_dma, ide_do_drive_cmd, ...


Anything else I can do to help?
Comment 12 Pavel Machek 2006-05-20 09:52:44 UTC
You mean it is fixed in 2.6.17-rc4? That would be good news, no?

Otherwise try napic, nolapic, and show us /proc/interrupts.
Comment 13 Ph. Marek 2006-05-21 23:05:00 UTC
Sorry, I wasn't clear.
No, it does not work.

Attached /proc/interrupts, a dmesg of the boot.
Last lines on suspend as written above.

Thank you for your efforts!
Comment 14 Ph. Marek 2006-05-21 23:08:03 UTC
Created attachment 8161 [details]
/proc/interrupts with 2.6.17rc4
Comment 15 Ph. Marek 2006-05-21 23:08:36 UTC
Created attachment 8162 [details]
dmesg with 2.6.17rc4
Comment 16 Pavel Machek 2006-05-22 02:36:29 UTC
Interrupt is shared between eth0 and soundcard, AFAICT. Can you try what happens
when you suspend/resume without eth0 drivers?
Comment 17 Ph. Marek 2006-05-22 02:57:05 UTC
Booted with 2.6.17rc4 *without* giving noapic nolapic.
Removed e100 module
echo disk > /sys/power/state
Machine hangs.

Same with "noapic nolapic".
Comment 18 Pavel Machek 2006-05-22 03:00:56 UTC
> Booted with 2.6.17rc4 *without* giving noapic nolapic.
> Removed e100 module
> echo disk > /sys/power/state
> Machine hangs.
> 
> Same with "noapic nolapic".

Thanks for test => it is probably not shared interrupt problem. If you
remove the sound modules, does it survive suspend?

								Pavel
Comment 19 Ph. Marek 2006-05-22 03:11:39 UTC
Yes.
My swsusp.sh tries to save the volume levels, kills all sound users (kmix, 
artsd, ogg123, etc.), removes the module, and suspends.

On wake-up it loads the module, loads arts and kmix and re-sets the volume 
levels.

Comment 20 Pavel Machek 2006-05-22 03:21:42 UTC
So it works if you rmmod ali before suspend. Good.

Now... could you look at differences between _suspend and rmmod code
paths (and _resume and insmod) to figure out what is wrong? printk()
is your friend...

Comment 21 Ph. Marek 2006-05-22 03:43:37 UTC
AFAICT _suspend and _remove are completely different.
They both do a pci_get_drvdata(), but _remove just calls 
   pci_set_drvdata(pci, NULL);
whereas _suspend does a fair bit of suspending single pieces ...


Can you give me a hint where to start?
Comment 22 Pavel Machek 2006-05-22 03:54:45 UTC
> AFAICT _suspend and _remove are completely different.
> They both do a pci_get_drvdata(), but _remove just calls 
>    pci_set_drvdata(pci, NULL);
> whereas _suspend does a fair bit of suspending single pieces ...
> 
> 
> Can you give me a hint where to start?

I'm not ALSA expert, sorry. It should be possible to duplicate _remove
routine into suspend... there's no reason why these should be
different. Or try to find author of that _suspend piece or something
like that.
								Pavel
Comment 23 Ph. Marek 2006-05-22 04:01:29 UTC
As of 
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;h=353b28bafd1b962359a866ff263a7fad833d29a1;f=sound/pci/ali5451/ali5451.c
there's been quite some activity.

Takashi, you're already on the CC:-list.
Can you help me?

Thank you in advance!
Comment 24 Takashi Iwai 2006-05-22 10:33:35 UTC
Hmm, how about the uploaded patch?
Comment 25 Takashi Iwai 2006-05-22 10:35:15 UTC
Created attachment 8176 [details]
Fix chip initialization in resume of ali5451
Comment 26 Ph. Marek 2006-05-22 22:16:36 UTC
Sorry, doesn't work.

Booted the machine, patched the module, modules_install, rmmod, modprobe, 
played some music, echo disk > /sys/power/state:
hangs as before.

Comment 27 Takashi Iwai 2006-05-23 06:19:52 UTC
Still hangs up if you don't play sounds during suspend?

Maybe it's an irq issue.  Could you try the patch below?
Comment 28 Takashi Iwai 2006-05-23 06:21:48 UTC
Created attachment 8192 [details]
Reinitialize irq in PM
Comment 29 Ph. Marek 2006-05-23 22:26:19 UTC
No, I don't even play sound *while* suspending - just before.
I'll try the patch - above or instead the other patch?


Thank you!
Comment 30 Ph. Marek 2006-05-24 00:05:21 UTC
I tried this patch above the other - doesn't help.
Comment 31 Rafael J. Wysocki 2006-09-29 03:10:08 UTC
Can you please verify if the problem still happens on 2.6.18?
Comment 32 Ph. Marek 2006-09-29 04:53:00 UTC
2.6.18 won't let me suspend.
There are some messages scrolling by, but they don't get into dmesg 
or /var/log/messages and are nearly immediately replaced by the original 
console screen.

I can read (or believe to see :-)
  ACPI: PCI interrupt disabled for ...
  ACPI: PCI interrupt disabled for ...
  ACPI: PCI interrupt disabled for ...
  ACPI: PCI interrupt disabled for ...
  ACPI: PCI interrupt disabled for ...
  Class driver suspend failed for cpu0
  Could not power down device firmware: -22
  Some devices failed to power down, suspend aborted
  PCI: Enabling device ...
Comment 33 Rafael J. Wysocki 2006-09-29 05:15:47 UTC
This probably is Bug #7188.  Please try to remove the acpi_cpufreq modules
before the suspend.
Comment 34 Ph. Marek 2006-09-29 05:38:34 UTC
I now compiled the kernel with acpi_cpufreq as module, and got it to suspending 
after playing a sound.
It worked when suspending *while* playing, too.

Note You need to log in before you can comment on or make changes to this bug.