Bug 52181
Description
Francesco Muzio
2013-01-02 20:58:32 UTC
Also with my solution sometimes the problem reappears, and when it happens i see this error in the dmesg output hda-intel: azx_get_response timeout, switching to polling mode: last cmd=0x00df000d damn, options added to a module in the file /etc/modules doesn't works now I have added the options in a .conf file on /etc/modprobe.d/ directory options snd-hda-intel model=acer-aspire probe-mask=1 and seems to work well to each boot If you request more information/debug please ask me Could you give alsa-info.sh outputs at both working and non-working states, and attach to this bugzilla? But I'm wondering how model=acer-aspire option works for you. This codec chip doesn't have this model string, so it shouldn't change anything... Created attachment 91031 [details]
a lot of outputs of alsa-info.sh
I have run the alsa-info.sh scripts many times
For simplicity, from now on I will call the problem that afflicts me (confirmed by the presence of the message "hda-intel: azx_get_response timeout, switching to polling mode ....") "The problem"
I have a lot of outputs:
1) alsa-info_HDMI_Enabled_NotWork.txt -> contains the output after a boot with Analog and HDMI devices both enabled. In this situation speaker-test is unable to work because the HDMI device is enumerated/recognized before the Analog.
But I haven't seen any "azx_get_response timeout" in the dmesg
the follow outputs are taken with HIDMI disabled by bios
2) alsa-info_ko_HDMI_Disabled_noParameter.txt -> contains the output after a boot with snd-hda-intel loaded without parameter. The problem occurred again
3) alsa-info_ko_HDMI_Disabled_OnlyMaskParameterSet.txt -> contains the output after a boot with snd-hda-intel loaded with probe-mask=1. The problem occurred again
4) alsa-info_ok_HDMI_Disabled_ModelAndMaskParameterSet.txt -> contains the output after a boot with snd-hda-intel loaded with probe-mask=1 and model=acer-aspire. The problem isn't occurred
5) alsa-info_ok_HDMI_Disabled_ModelAndMaskParameterSet_2nd_boot.txt -> contains the output after a boot with snd-hda-intel loaded with probe-mask=1 and model=acer-aspire. The problem isn't occurred
6) alsa-info_ok_HDMI_Disabled_noParameter.txt -> contains the output after a boot with snd-hda-intel loaded without parameter. The problem isn't occurred
I have uploaded the output of two boots with "snd-hda-intel model=acer-aspire probe-mask=1" used because sometimes the system assigns to the device the irq 45 or the irq 46, but in both situation the device can work.
I have tried the solution with only probe-mask parameter set, but it doesn't work.
For now the snd-hda-intel model=acer-aspire probe-mask=1 modprobe solution seems to work and I haven't seen yet the azx_get_response in the dmesg output.
I have attached a tar.gz file with all 6 outptus just described.
Hope to be helpful
Created attachment 91041 [details]
another output of alsa-info.sh
I have to correct myself, tonight I have found the error:
hda-intel: azx_get_response timeout, switching to polling mode: last cmd=0x003b8000
in the dmesg output, when the module snd-hda-intel has been loaded with both parameters....
I have attached also this output, taken with alsa-info.sh, named alsa-info_ko_HDMI_Disabled_ModelAndMaskParameterSet.txt
You can pass index=1,0 option to snd-hda-intel to swap HDMI and analog devices. (But this will screw up the index again when only the analog is loaded, as it assumes there are two instances.) Since the problem is related with the interrupt, try to pass enable_msi=0 option, to. Also, the model option you passed is maybe just a coincident. I guess the string value doesn't matter, and you'll get the same effect even if you pass a string like model=foobar. Created attachment 91111 [details]
alsa-info output for a boot with snd_hda_intel index=1,0 enable_msi=0
After the firsts boots I can tell "seems to work" but I must to boot my machine many times to be sure
I have attached the alsa-info output of my last boot.
still remains a little problem (also occurs when the 2 Audio devices are enabled but not swapped)
when I (or the system) run the command:
aslactl restore
this warning appears on the stdout:
alsactl: set_control:1464: Cannot write control '3:3:0:Playback Channel Map:0' : File descriptor in bad state
Maybe this problem is not related to the kernel, in any case I have reported it
Thanks for your support
Another question This netbook has a combo jack for microphone and headphones. It works on windows 8 but not on linux. When I put the combo jack the audio is redirected to the headphones but the internal microphone continues to record instead of the external. How I can debug/solve this issue ? snd-hda-intel it's able to support a combo jack solution? The combo jack on an Acer laptop was already supported in 3.8 kernel, but it's not applied to AO725 as default. Try the patch below on the latest Linus tree (or 3.8-rc4). Created attachment 91441 [details]
Test patch
I have applied the patch to the latest kernel (3.8-rc4) I'm happy to tell "It works!" And I have seen that in the latest version there are no warning when I run "alsactl restore" Very happy, thank you! OK, the patch is merged to sound git tree. It'll be included in the next pull request for 3.8-rc5. Thanks for quick tests. Created attachment 91811 [details]
dmesg output after a suspend to ram
Unfortunately I must reopen this bug report, because the "azx_get_response timeout" problem (the main problem) still appears after a suspend-to-ram
I have attached the dmesg log taken after a suspend-to-ram/resume action
A patch referencing this bug report has been merged in Linux v3.8-rc5: commit ec50b4cea63fdcd1f1c428b93a47986d74c244b8 Author: Takashi Iwai <tiwai@suse.de> Date: Sat Jan 19 12:17:54 2013 +0100 ALSA: hda - Add fixup for Acer AO725 laptop Sorry, there was a little bit of confusion. The patch that you are indicated is related to the external microphone bug. In this bug report I have reported 2 bugs: one as been patched (the external mic) but the other isn't solved. I was in testing with the enable_msi=0 proposed solution. But every time I put my netbook in suspend-to-ram state, at the resume the suond doesn't work correctly. My conclusions: when I boot with snd_hda_intel index=1,0 enable_msi=0 the sound works until I don't suspend to ram the machine (every time) when I boot with snd_hda_intel index=1,0 enable_msi=1 the sound maybe not works, but if it works after the boot, works also after a suspend to ram Are you sure about it? It doesn't explain why it makes such a difference. Does this happen also with hibernation? Also, what kernel are you testing? If it's 3.8-rc, please use 3.8-rc5. Until rc4 there is a known bug regarding the resume. Note that "azx_response timeout" itself isn't a big problem. It may happen on many machines and mostly harmless (unless going to the single_cmd mode). So, you can't judge the problem only from that message. If a PCM stream doesn't work (i.e. repeating), check whether the interrupt count changes via any sound access, i.e. PCM stream playback/capture or a mixer value change. If the irq count doesn't change, it's a problem of the PCI controller. If you face a silent PCM output instead, this might be a different problem. the problem is always the "repeating", reported in the description of this bug report And no, the bug doesn't happens if the netbook is suspended to disk. I have used the version 3.7.1 of kernel available in debian experimental branch because this machine has a CPU at 1GHZ and it's a pain to compile an entire kernel. So I have compiled it (3.8-rc4 + patch) for only test your patch with a "mutilated" kernel (eg. no wireless) for reduce the compile time, but it's a kernel unusable in other context. If it's necessary I can recompile the latest version (v3.8-rc5) to see what happens with it. But it takes time... Created attachment 91951 [details]
alsa-info.sh output before and after a suspend to ram
I have attached the alsa-info.sh output before and after a suspend to ram
I was going to forget:
yes, I know that the "azx_response timeout" warning is mostly harmless, but on my netbook occurs always when the audio driver stops to work correctly
As mentioned, the most certain way is to check whether the interrupt counts increase. See /proc/interrupts and check whether the number for the audio driver changes when you do something for the audio (suppose it's non-shared irq). If it's an interrupt issue, the culprit is more likely in the deeper level, e.g. PCI core, ACPI, etc. (The fact that only S3 affects implies it, too...) I have tried the last kernel (v3.8-rc6) The problem still exists when I suspend to ram my netbook The interruputs changes frequently during a session of use. (fetched with cat /proc/interrupts | grep snd) I must add a detail: I don't use pulseaudio, every software dialogs directly with ALSA. When the problem occurs softwares like speaker-test, mpg123, VLC freeze themselves. KDE system sound and some KDE players (like dragon player) seems to works correctly again this event. On debian it's running a version 4.8.x of KDE with the phonon-gstreamer backend installed) How can I to investigate again about this bug ? The problem occurs also if I disable the "multicore support" from the BIOS Does the problem still happen with 3.12 kernel? If it is, this might be due to some codec verb accesses that this codec doesn't like. In that case, try the patch below. If the patch helped, try to narrow down the disabled codes by the patch. The second ifdef chunk should be harmless. Either the first one or the third one (or both) hits the problem... Created attachment 116621 [details]
Test fix patch for codec stalls
I'm sorry for the long delay. The bug doesn't happens if I use the pulseaudio sound server, so I am using the pulseaudio.
But in the recent kernel (v3.12 and v3.13) a strange message is appeared on the dmesg, this message in repeated many times:
hda-codec: out of range cmd 0:20:400:ffffffff
The audio works well, but I don't understand what means and if this message is important.
see the attachment 123391 [details] who contains the dmesg of my AO725 booted with kernel v3.13
It means some wrong COEF write has been performed, which is likely the result of the wrong COEF read. It might be the cause of the codec communication stall, so it can be a serious problem. Please try the patch in comment 24 and see whether this message is gone. Created attachment 124131 [details] dmesg output after patch submitted on comment #24 yes, with the attachment 116621 [details] the "hda_codec: out of range cmd..." messages are gone. see the dmesg OK, could you check which #if 0 actually fixes the issue? There are three ifdefs, and likely only one of them is effective. My bet is the first one (around alc269_fill_coef() call.) I have booted my netbook with three diffent patched kernel, each with an only #if block applied. With the first #if the message "hda-codec: out of range cmd....." is not showed With the second #if the message "hda-codec: out of range cmd....." is showed many times and the during the printing of more "hda-codec: out of range cmd 0:20:400:ffffffff" the kernel seems to be stalled, then boot normally. With the third #if the message "hda-codec: out of range cmd....." is showed many times but the machine boot normally as running a standard unpatched kernel Thanks for quick tests. So my guess seems correct; the first chunk is the culprit. Could you try the patch below instead of the previous one? If this is confirmed to work, I'm going to merge it to the upstream. Created attachment 124291 [details]
Fix patch
Created attachment 124521 [details] dmesg output after patch attachment 124291 [details] I have tryed your attachment 124291 [details] as a patch and works. See my dmesg Thanks It turned out that the patch brings a problem on other machines with the same codec. (The output is gone.) So, this doesn't seem to be specific to the codec but to the machine. Could you revert the previous fix and try the next one instead, to see whether it still fixes the problem? Created attachment 125781 [details]
Better fix patch
Created attachment 125811 [details] dmesg output of booting kernel patched with attachment 125781 [details] I have tried this patch but I see some strange behaviour After log into my user space I have listen VLC plays music After few minutes I have tried to connect my combo earphones+microphone but audio seems not working with external earphones. (unpatched 3.13 kernels works) However the external microphone seems to work (audacity bar react when I speak into it). Also after a logout the "hda-codec:" errors returns (I see it on stdout) After more connection/disconnection of the audio jack the sounds has stopped working and after the reboot the analog audio devices seems to be disappear! But all seems to be returned to standard behaviour after a cold boot with older (and unpatched) kernel 3.12 I have to say, I honestly haven't tried the functionality of the earphones with previous patches. Would I do it ? The dmesg of the first boot is attached Hrm, OK, then we seem to need some of COEFs, narrowing down the range to disable. Could you figure out which COEF call triggers the problem? For example, you can put printk() for each line in alc269_fill_coef() (and its further calls), and see where the problem happens. I know what are the printk() function but I don't think to be able to use it in the right direction. What do you want to print? What I should use as a parameter for the calls of this function? Do you want to see the result of this printk() on a unpatched kernel or on the kernel with the (latest) patch? I renew my request: please tell me how to debug the kernel as you mean, where and how I put the printk in the sourcecode. Also I'm looking a reproducible rule: the messages "hda-codec: out of range cmd ....." are printed only if the netbook is running without the AC power connected When the netbook is powered from the AC I don't see any messages like "hda-codec: out of range cmd" in the dmesg output Just put a printk() with a unique string at each line of alc269_fill_coef(), e.g. static void alc269_fill_coef(struct hda_codec *codec) { struct alc_spec *spec = codec->spec; int val; if (spec->codec_variant != ALC269_TYPE_ALC269VB) return; printk("XXX alc269_fill_coef %d\n", __LINE__); if ((alc_get_coef0(codec) & 0x00ff) < 0x015) { printk("XXX alc269_fill_coef %d\n", __LINE__); alc_write_coef_idx(codec, 0xf, 0x960b); printk("XXX alc269_fill_coef %d\n", __LINE__); alc_write_coef_idx(codec, 0xe, 0x8817); } printk("XXX alc269_fill_coef %d\n", __LINE__); if ((alc_get_coef0(codec) & 0x00ff) == 0x016) { printk("XXX alc269_fill_coef %d\n", __LINE__); alc_write_coef_idx(codec, 0xf, 0x960b); printk("XXX alc269_fill_coef %d\n", __LINE__); alc_write_coef_idx(codec, 0xe, 0x8814); .... and check "out of range cmd" error happens at which point. The last call before the error message is supposed to be the one triggering the error. I'm sorry for the delay I have made this tests with the latest kernel available on Debian testing (3.14) and I have saved a dmesg output in both cases: - dmesg_AC: machine booted with AC power plugged in. In this test all seems to work as expected. I have also run VLC with a song and after this action no more output seems to be printed on dmesg - dmesg_BATT: machine booted without AC power plugged in. In this case "out of range cmd" error happens never but I found a lot of "XXX alc269_fill_coef" messages. In the dmesg_BATT please note that after the message [ 120.378846] XXX alc269_fill_coef 4648 I have running a song with VLC (and the music works) I have modified the file patch_realtek.c, this file is attached in the next messages. So you can see the line number assigned at any printk. Created attachment 136511 [details]
patch_realtek.c with alc269_fill_coef() modified
Created attachment 136521 [details]
dmesg output with AC plugged
Created attachment 136531 [details]
dmesg output without AC plugged
You have to repeat testing the kernel until you get the problematic state. Or, it might be a timing issue. Then gradually reduce the printk's that look OK, and try until you hit the problem again. In anyway, the only purpose of this test is to identify which command results in the error. So, without seeing the error, it doesn't help at all, unfortunately. I have commented some lines in the previous patch_realtek.c and I have add a printk at the top of the alc269_fill_coef() function, at line 4598 I have run a kernel where the alc269_fill_coef() function contain the first printk at line 4598 and the last (uncommented) printk at line 4637 In the dmesg I see that the "out of range" errors are printed before a printk at line 4598 and after the last printk at line 4637. I think that the message has been raised by an istruction outside this function. Then you need to figure out the caller of the error. For example, put "WARN(1);" line at the error message in make_codec_cmd() in hda_codec.c. Then it'll give a stack trace like Oops. After that, try to reduce the command verbs by commenting out or so for identifying the culprit. But, also note that the error message *might* not be the direct culprit of the bug itself. It might be a side-effect. But, this is likely the way to trace the original bug. when I trying to compile the kernel with a "WARN(1);" line placed between "printk(KERN_ERR "hda-codec:...." and "return ~0;" I get compilation error. Maybe it require two or more arguments... I'm not skilled on Linux kernel, but I'm happy to know more on how to debug it Please explain me how to correctly use the WARN macro. I'm sorry for the delay, but I'm not yet able to debug the problem. Can you provide me an example to understand how to use the "WARN(1)" or other equivalent mechanisms to correctly debug the bug? thanks in advance Put like WARN(1, "XXX some message\n") Created attachment 146571 [details]
dmesg output without AC plugged
I have patched the latest stable kernel available (3.16).
I have put a single
WARN(1, "WWW WARN\n");
at the end of the line 205 in the hda_codec.c file (between codec_err() and the return instruction in the make_codec_cmd() function)
the dmesg output is attached
ps: for this test I haven't patched the alc269_fill_coef() function with any printk.
What about the patch below? Created attachment 146581 [details]
Check return value before COEF write for ALC269 variants
Created attachment 146741 [details]
dmesg output without AC plugged
This patch seems to work correctly, I have tested it with vlc on pulseaudio.
See the dmesg devoid of error messages
|