Bug 43155
Description
Matthieu Baerts
2012-04-23 23:28:47 UTC
Created attachment 73058 [details]
A few informations from lspci -vvv
Created attachment 73059 [details]
/proc/modules
Created attachment 73060 [details]
Output of scripts/ver_linux
This bug is maybe linked to the bug #18512 [1] but it was with an old version of the kernel and for me the sound works. 1: https://bugzilla.kernel.org/show_bug.cgi?id=18512 Please give alsa-info.sh output (run with --no-upload option) to this bugzilla. It'll contain the necessary information. Also, pass probe_mask=0x101 instead of 0x103. The kernel error happens at probing the secondary codec, at least. Created attachment 73062 [details] alsa-info (when I use this option: probe_mask=0x103) (In reply to comment #5) > Please give alsa-info.sh output (run with --no-upload option) to this > bugzilla. > It'll contain the necessary information. Thank you for this quick answer! Created attachment 73063 [details] alsa-info (when I use this option: probe_mask=0x101) (In reply to comment #6) > Also, pass probe_mask=0x101 instead of 0x103. The kernel error happens at > probing the secondary codec, at least. I have exactly the same problem with probe_mask=0x101 and with or without model=medion model=medion doesn't play any role in 3.4 kernel. It was dropped. Looking at the alsa-info.sh output, the HDMI codec has the strange output. What happens if you pass probe_mask=0x101 enable=1,0 options? Created attachment 73064 [details] alsa-info (when I use these options: probe_mask=0x101 enable=1,0) (In reply to comment #9) > model=medion doesn't play any role in 3.4 kernel. It was dropped. > > Looking at the alsa-info.sh output, the HDMI codec has the strange output. > What happens if you pass probe_mask=0x101 enable=1,0 options? Yes, you're right, these options fix the bug! Thank you for the help! :) But is it a workaround or not? Then try without probe_mask option but only enable option. If the problem is only about the broken HDMI codec, it should work without probe_mask override. If only enable=1,0 suffices, it really means that probing HDMI codec triggers the problem. And, this is a thing more about the video side, e.g. lspci shows also the invalid entry. Or, the problem is rather in PCI core... (In reply to comment #11) > Then try without probe_mask option but only enable option. > If the problem is only about the broken HDMI codec, it should work without > probe_mask override. Yes, you're right, probe_mask option is not required. > If only enable=1,0 suffices, it really means that probing HDMI codec triggers > the problem. And, this is a thing more about the video side, e.g. lspci > shows > also the invalid entry. Or, the problem is rather in PCI core... Should I have to change the component from Sound to PCI? Well, the cause isn't clear yet. Does HDMI video output work with your machine at all? Or, it might be because of the vgaswitcher. Since the HDMI video is disabled, the probing of the HDMI audio failed, too. Check whether the probing works when AMD video is activated. For example, you can reload snd-hda-intel driver without enable option after switching to AMD graphics. Could you give "lspci -vxxx" output? We might be able to check the activity by some register bits. Created attachment 73065 [details] alsa-info (without option but when AMD video card is not disabled) (In reply to comment #13) > Well, the cause isn't clear yet. Does HDMI video output work with your > machine > at all? I don't know, I've no HDMI cable to test it. (In reply to comment #14) > Or, it might be because of the vgaswitcher. Since the HDMI video is > disabled, > the probing of the HDMI audio failed, too. > > Check whether the probing works when AMD video is activated. For example, > you > can reload snd-hda-intel driver without enable option after switching to AMD > graphics. > > Could you give "lspci -vxxx" output? We might be able to check the activity > by > some register bits. If I don't disable my AMD video card at startup (by removing this line in my /etc/rc.local: echo OFF > /sys/kernel/debug/vgaswitcheroo/switch). I don't have this bug too! (as you can see on this new alsa-info file. Created attachment 73066 [details]
Output of lspci -vxxx
(Note that I've just enabled my AMD video card but I'm still using my Intel GPU) OK, then could you give lspci -vxxx output after turning off HDMI? I'd like to see whether the same check as vgaswitcher can be used to determine the activity of HDMI codec PCI entry. Created attachment 73067 [details] Output of lspci -vxxx after having switched of AMD video card (In reply to comment #18) > OK, then could you give lspci -vxxx output after turning off HDMI? > I'd like to see whether the same check as vgaswitcher can be used to > determine > the activity of HDMI codec PCI entry. Sure! AMD video card is now disabled (echo OFF > /sys/kernel/debug/vgaswitcheroo/switch) and here is the output of lspci -vxxx. Thanks. The patch below is a very simple fix to skip the probing of the dead controller. It means that you'll loose HDMI codec when the driver is loaded at the state with the video off. It's no ideal solution, but better than stuck too much as of now. At best, we'd need to integrate the switching mechanism via vgaswitcher. But, vgaswitcher doesn't seem to support such extra clients (at least as is), thus for 3.4 kernel, it'd be safer to take this minimal approach. Anyway, let me know whether this works around the stall on your machine. Created attachment 73068 [details]
Test patch for snd-hda-intel
Thank you for this patch! I applied your patch but now I've a kernel Oops. I added a few printk and it seems it crashes when it calls this function: pci_read_config_dword(pci, i * 4, &tmp); where i = 0 sound/pci/hda/hda_intel.c:azx_pci_sanity_check:2910 It crashes with this message: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<(...)>] free_percpu+0x6c/0x130 (...) Call Trace: module_unload_free+0xde/0xf0 load_module+0x3f2/0x5e0 ? module_sect_show+0x30/0x30 sys_init_module+0x60/0x220 system_call_fastpath+x016/0x1b I use the version 3.4rc4 of the kernel and I only compile and install files on this directory: sound/pci/hda Can I do something more? Hmm, the back trace doesn't show anything about the function call of pci_read_config_dword(). How did you figure it out? Anyway, there was a bug in the module free. Check patch the below instead. It'll give some printks during azx_pci_sanity_check() call, so we can see whether the crash happens there or not. Created attachment 73081 [details]
Test patch #2
Created attachment 73082 [details] Test patch for snd-hda-intel with printk (In reply to comment #23) > Hmm, the back trace doesn't show anything about the function call of > pci_read_config_dword(). How did you figure it out? > > Anyway, there was a bug in the module free. Check patch the below instead. > It'll give some printks during azx_pci_sanity_check() call, so we can see > whether the crash happens there or not. As you can see on the files attached on this bug report, I added a printk just before this line: if (!azx_pci_sanity_check(pci)) just to check if pci != NULL and it was true. And I added a printk just before: pci_read_config_dword(pci, i * 4, &tmp); I've only seen one printk (with i = 0) and then it's crashed. Just before the crash, I saw these messages: hda-intel: check patch: ffff8801218fe000 hda-intel: video is turned off via switcher? hda-intel: pci_read_config_dword with 0 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 (...) But I can test your new patch if it's needed. OK, it means that calling pci_read_config_*() triggers the Oops immediately at this state. We need to know a good way that this pci device is practically dead. I used the pci_read_config_*() just because I've seen these 0xff in your lspci output. Maybe we need to inspect the fields in struct pci_dev whether any good sign is found... At least, you can try to replace pci_read_config_dword() with raw_pci_read(pci_domain_nr(pci->bus), pci->bus->number, pci->devfn, i * 4, &tmp); It seems an argument is missing: /opt/linux_bug/build/linux-3.4-rc4/sound/pci/hda/hda_intel.c:2913:10: error: too few arguments to function ‘raw_pci_read’ Raw_pci_read needs 6 arguments: int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn, int reg, int len, u32 *val) pass 4 to len argument. Now, I have this warning when I try to compile hda_intel.c: WARNING: "raw_pci_read" [/opt/linux_bug/build_dep/linux-3.4-rc4/sound/pci/hda/snd-hda-intel.ko] undefined! And at startup, I've no message about hda-intel. Only: [ 28.929101] snd_hda_intel: Unknown symbol raw_pci_read (err 0) [ 28.929125] snd_hda_intel: Unknown symbol raw_pci_read (err 0) [ 28.929914] snd_hda_intel: Unknown symbol raw_pci_read (err 0) It seems this doesn't work either. OK, let's take another approach. The patch below checks the VGA adapter entry corresponding to the audio codec. The check is as same as in vga_switcheroo. Created attachment 73095 [details]
Test patch #3
If the latest patch works, try the git branch topic/vga-switcheroo in sound git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git topic/vga-switcheroo It's based on 3.4-rc4, so you can pull the branch over Linus tree. It contains the support for vga-switcheroo in HD-audio driver. Note that the patches there are completely untested!! (In reply to comment #30) > It seems this doesn't work either. OK, let's take another approach. > > The patch below checks the VGA adapter entry corresponding to the audio > codec. > The check is as same as in vga_switcheroo. Thank you for your help! I tried this patch but now I have a crash (unable to handle kernel NULL pointer dereference) I had a few printk and for the first card, it calls your new function (`check_hdmi_disabled`). It seems the vendor is not nVidia, ATI or AMD: this function returns 'false'. Then it calls `azx_create` and I've a crash. But a few seconds later, I can see the output of a printk I added just before the line with `azx_codec_create`. But about vga-switcheroo: I added this line in my /etc/rc.local: echo OFF > /sys/kernel/debug/vgaswitcheroo/switch But when is this command launched? Dmesg gives me this output with Linux 3.2: [ 26.913172] snd_hda_intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22 [ 26.913548] snd_hda_intel 0000:00:1b.0: irq 49 for MSI/MSI-X [ 26.913927] snd_hda_intel 0000:00:1b.0: setting latency timer to 64 [ 26.968374] input: HDA Intel Mic as /devices/pci0000:00/0000:00:1b.0/sound/card0/input6 [ 26.968550] input: HDA Intel Headphone as /devices/pci0000:00/0000:00:1b.0/sound/card0/input7 [ 26.969021] snd_hda_intel 0000:01:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17 [ 26.969089] snd_hda_intel 0000:01:00.1: irq 50 for MSI/MSI-X [ 26.969113] snd_hda_intel 0000:01:00.1: setting latency timer to 64 [ 26.994587] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input8 (...) [ 30.122885] radeon: switched off (...) [ 32.893359] hda-intel: spurious response 0x0:0x0, last cmd=0x370100 [ 32.893363] hda-intel: spurious response 0x0:0x0, last cmd=0x370100 It seems that my ATI video card is enabled and then switched off (I guess this card has been initialised before that 'vga-switcheroo' disables this video card). Also it seems my ATI video cart wants to use this HDA Intel module but why? Why snd_hda_intel and not snd_hda_hdmi? Is it a bug or is it correct? Sorry for my lack of knowledge about that... Do you want that I test your git branch? Did you apply only the patch in comment 31 without any other patches, right? If check_hdmi_disabled() returns false, it's fine, and it must not lead to crash. There is no code change in that path. If it really triggered, something else was wrong. Double-check that you have a clean 3.4-rc and only that patch is applied. Also, please build the kernel with CONFIG_FRAME_POINTER if not selected. This will give more useful stack trace at Oops. Then, attach the Oops output in the bugzilla, too. Note that your hardware has two audio controllers: one for Intel and one for AMD HDMI. snd-hda-intel is the only driver for all HD-audio controllers, no matter which vendor is. Then there are sub modules, snd-hda-codec and snd-hda-codec-*. These correspond to the HD-audio codec chips, not controller chips. If you switch off the VGA later _after_ loading the sound driver, then it can be a problem. The patch in comment 31 checks only the availability at module loading time. For the runtime switch, a proper support for vga-switcheroo would be necessary. The branch in sound git tree should do that, but it's totally untested, of course. So, try it at your own risk. BTW, I'm going to be off from tomorrow for a week, thus the further follow-up will be delayed. (In reply to comment #34) > Did you apply only the patch in comment 31 without any other patches, right? Yes it is. > If check_hdmi_disabled() returns false, it's fine, and it must not lead to > crash. There is no code change in that path. If it really triggered, > something else was wrong. Double-check that you have a clean 3.4-rc and only > that patch is applied. Yes, I've already checked... Strange. Or maybe it's just because the two cards are initialised at the same time? > Also, please build the kernel with CONFIG_FRAME_POINTER if not selected. Yes, I have: CONFIG_FRAME_POINTER=y > Then, attach the Oops output in the bugzilla, too. I'm sorry but how can I do that? When I've a crash at startup, I don't have any log about this crash in /var/log. Should I have to add an option at startup? Or is it possible to reload all modules about the sound and the video after? > Note that your hardware has two audio controllers: one for Intel and one for > AMD HDMI. snd-hda-intel is the only driver for all HD-audio controllers, no > matter which vendor is. Then there are sub modules, snd-hda-codec and > snd-hda-codec-*. These correspond to the HD-audio codec chips, not > controller > chips. Ok, thank you for this explanation :) > If you switch off the VGA later _after_ loading the sound driver, then it can > be a problem. The patch in comment 31 checks only the availability at module > loading time. For the runtime switch, a proper support for vga-switcheroo > would be necessary. The branch in sound git tree should do that, but it's > totally untested, of course. So, try it at your own risk. I'll try your new patches asap! > BTW, I'm going to be off from tomorrow for a week, thus the further follow-up > will be delayed. No problem! Thank you again for your help! (In reply to comment #35) > (In reply to comment #34) > > Did you apply only the patch in comment 31 without any other patches, > right? > > Yes it is. > > > If check_hdmi_disabled() returns false, it's fine, and it must not lead to > > crash. There is no code change in that path. If it really triggered, > > something else was wrong. Double-check that you have a clean 3.4-rc and > only > > that patch is applied. > > Yes, I've already checked... Strange. Or maybe it's just because the two > cards > are initialised at the same time? Maybe. It just looks as if it crashes at that point. > > Also, please build the kernel with CONFIG_FRAME_POINTER if not selected. > > Yes, I have: CONFIG_FRAME_POINTER=y > > > Then, attach the Oops output in the bugzilla, too. > > I'm sorry but how can I do that? When I've a crash at startup, I don't have > any > log about this crash in /var/log. Should I have to add an option at startup? So, doesn't your system start up at all after Oops? If you have a control and chance to run "dmesg", it'd be the best... > Or is it possible to reload all modules about the sound and the video after? Yes. Add "blacklist snd-hda-intel" in modprobe.d/* file, and the sound driver won't be loaded at boot time. Then load snd-hda-intel module manually via modprobe. If the machine hangs after that, try to capture the Oops screen by a digital camera. Also, try to load the module with enable=1,0 option at first. There should be no crash. If it passed, you can try with enable=0,1 instead. This will load only AMD part. Created attachment 73199 [details]
Output of dmesg with a crash (topic/vga-switcheroo)
Hello,
I tested the new patches and it crashes. (I join the output of dmesg).
I just added 3 printk in vga_switcheroo.c (in these functions: vga_switcheroo_register_client, vga_switcheroo_register_audio_client, register_client => "=== matttbe: vga_switcheroo: __func__")
Now here is the content of /sys/kernel/debug/vgaswitcheroo/switch
0:IGD-Audio:+:Pwr:0000:00:02.0
1:IGD-Audio: :Pwr:0000:01:00.0
2:IGD-Audio:+:Pwr:0000:00:1b.0
(note that I still have this line in my rc.local: echo OFF > /sys/kernel/debug/vgaswitcheroo/switch)
If I set it to OFF a few time later (at 1896.902163 according to the dmesg), this is now what I have:
0:IGD-Audio:+:Pwr:0000:00:02.0
1:IGD-Audio: :Off:0000:01:00.0
2:IGD-Audio:+:Pwr:0000:00:1b.0
(and the sound still works)
I can add a few printk in hda_intel.c if you want.
Thanks. The sysfs output indicates that some bugs in the vga_switcheroo client registration code. The clients should be DIS, IGD, and DIS-Audio. (And the audio client should be 01:00.1). I'll take a look at it in the next week. But, it'd be also helpful if you can figure out where the actual Oops occurs, e.g. by adding printk's. Created attachment 73204 [details] [topic/vga-switcheroo] Fixed a kernel Oops (In reply to comment #38) > Thanks. The sysfs output indicates that some bugs in the vga_switcheroo > client > registration code. The clients should be DIS, IGD, and DIS-Audio. (And the > audio client should be 01:00.1). I'll take a look at it in the next week. Thank you. > But, it'd be also helpful if you can figure out where the actual Oops occurs, > e.g. by adding printk's. It seems this Oops is caused by this 'pci_slot_name' function used in printk's. (Can we use it if the card is disabled?) Without the use of this function (or when using this tiny attached patch), it no longer crashes and it seems this bug is fixed: no freeze at startup, no error messages from hda in the dmesg and the sound work :) About vga_switcheroo: # cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD-Audio:+:Pwr:0000:00:02.0 1:IGD-Audio: :Off:0000:01:00.0 2:IGD-Audio:+:Pwr:0000:00:1b.0 3:DIS-Audio: :Off:0000:01:00.1 Created attachment 73205 [details]
Test patch for topic/vgaswitcheroo with a few printk
I don't know if it's helpful but I added a few printk
Created attachment 73206 [details]
Output of dmesg without the crash (topic/vga-switcheroo)
And this is the output of dmesg with the previous patch.
Could you pull topic/vga-switcheroo again to a clean tree? I fixed some bugs there and the branch was rebased. The topmost commit ID is 0e4c0559c6519d040111392a964b74640c4f14e4. Check it after you pulled again. I fixed a few things again (e.g. removing pci_slot_name() calls). The topmost commit is commit 5a19407b096971a11d0e6713764b9b5ebb281afe Author: Takashi Iwai <tiwai@suse.de> Date: Thu Apr 26 12:23:42 2012 +0200 ALSA: hda - Support VGA-switcheroo Again one more fix. The topmost commit is now 3a93aac91be0b7a6562b74c6d8602fc96fa2c7bc. Created attachment 73226 [details]
Output of dmesg without the crash 2012-05-09 (topic/vga-switcheroo)
Hello,
With the latest version available on your git branch, it works fine!
Thank you :)
About vga-switcheroo:
$ cat /sys/kernel/debug/vgaswitcheroo/switch"
0:DIS: :Off:0000:01:00.0
1:IGD:+:Pwr:0000:00:02.0
2:DIS-Audio: :Off:0000:01:00.1
I also added a few printk's (just to confirm that everything works fine) and I'm going to join the output of the dmesg command and the patch with the printk.
Have a nice day,
Created attachment 73227 [details]
Test patch for topic/vgaswitcheroo with a few printk 2012-05-09
Could you try again with topic/hda-switcheroo branch (not topic/vga-switcheroo)? This contains the patches based on the latest DRM tree. If this is confirmed to work, we can finally merge the patches for 3.5 kernel. Thanks. I found some obvious bugs and fixed again. The fixed version of topic/hda-switcheroo branch begins with the commit a82d51ed24bb7994f1f3dff18ec2eefe19385840. Check whether you have this one when you are testing. Created attachment 73407 [details]
Content of the kern.log file with the crash (topic/hda-switcheroo)
Hello and sorry for the delay...
I've finally compiled the version of your topic/hda-switcheroo branch but there is a crash (I had black screen with a few yellow/white pixels).
It seems this bug is due to this i915 module but here you can find the content of the kern.log file with the backtrace of the crash.
PS: I tried to compile your 'master' branch but I had this error: ERROR: "handle_edge_irq" [drivers/gpio/gpio-pch.ko] undefined! ERROR: "irq_to_desc" [drivers/gpio/gpio-pch.ko] undefined! make[3]: *** [__modpost] Error 1 make[2]: *** [modules] Error 2 Yes, the crash of i915 is known, and it was fixed in drm tree. All the patches have been already merged to the upstream for 3.5-rc1, so try the latest Linus git tree instead. I confirmed that this bug has been fixed with your modifications available in today Linus git tree. Thank you for your help! :) Have a nice day, Matt A patch referencing this bug report has been merged in Linux v3.5-rc1: commit 3e9e63dbd3745ba9ea10f0f86c93f4086c89d5b8 Author: Takashi Iwai <tiwai@suse.de> Date: Thu Apr 26 14:29:48 2012 +0200 vga_switcheroo: Add the support for audio clients A patch referencing this bug report has been merged in Linux v3.5-rc1: commit 9121947d696df7ea259c0102e449da9621b9cf92 Author: Takashi Iwai <tiwai@suse.de> Date: Thu Apr 26 12:13:25 2012 +0200 ALSA: hda - Check the dead HDMI audio controller by vga-switcheroo |