Bug 60769
Description
Alexander E. Patrakov
2013-08-19 15:15:18 UTC
Created attachment 107242 [details]
alsa-info.sh output
The relevant card is 00:03.0 (Intel Haswell HDMI)
Created attachment 107243 [details]
Kernel config
Created attachment 107245 [details]
Full dmesg
Created attachment 107267 [details]
Full dmesg after BIOS update
It was suggested that I update the BIOS and retry with -D hw:0,7. It didn't help, I still get the "playback write error (DMA or IRQ trouble?)" even with the F4 version of the BIOS.
Created attachment 107373 [details]
/proc/interrupts, three times
On IRC, ohsix asked me for /proc/interrupts. Here is what I did:
cat /proc/interrupts > interrupts.txt
pasuspender -- aplay -D hw:1,0 /usr/share/sounds/startup3.wav # works, plays through analog output
cat /proc/interrupts >> interrupts.txt
pasuspender -- aplay -D hw:0,3 /usr/share/sounds/startup3.wav # hangs => ctrl+c
cat /proc/interrupts >> interrupts.txt
Backup info from Alexander: "according to the log, this machine claims to have three HDMI outputs, while in fact it has one DVI (possibly wired for HDMI, at least ELD is successfully transferred over it) and two "real" HDMI. All of them fail to transfer audio in the same way." More info possibly relevant to this bug: the board appears to have general IRQ problems. 1. The analog audio sometimes stutters. There is the following message in dmesg while playing a DVD (note: this is about the different card!): [ 2158.573591] hda-intel: IRQ timing workaround is activated for card #1. Suggest a bigger bdl_pos_adj. 2. Sometimes the keyboard or mouse exhibits lags (characters get delayed or the cursor sticks for several hundred milliseconds, while other screen updates appear as they should). Initially I attributed them to the fact that they are wireless (and that's the first time I use a wireless keyboard and mouse) and suspected interference from some unknown source. But, given the above IRQ-related message, I decided to tell you about that, too. your still have deadlock and oops 1.351918] ====================================================== [ 1.351989] [ INFO: possible circular locking dependency detected ] [ 1.352056] 3.11.0-rc5+ #1 Not tainted [ 1.352122] ------------------------------------------------------- [ 1.352195] crda/311 is trying to acquire lock: [ 1.352383] microcode: CPU5 sig=0x306c3, pf=0x2, revision=0x9 [ 1.352258] (genl_mutex){+.+.+.}, at: [<ffffffff81596902>] genl_lock+0x12/0x20 [ 1.352562] but task is already holding lock: [ 1.352716] systemd-udevd[275]: renamed network interface eth0 to enp2s0 [ 1.352666] (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81592b09>] netlink_dump+0x29/0x240 [ 1.352940] which lock already depends on the new lock. [ 1.353039] the existing dependency chain (in reverse order) is: [ 1.353113] microcode: CPU6 sig=0x306c3, pf=0x2, revision=0x9 [ 1.353189] -> #1 (nlk->cb_mutex){+.+.+.}: [ 1.353385] [<ffffffff810ace97>] lock_acquire+0x87/0x130 [ 1.353482] [<ffffffff8166defc>] mutex_lock_nested+0x5c/0x3e0 [ 1.353575] [<ffffffff8159317f>] __netlink_dump_start+0xbf/0x1c0 [ 1.353686] [<ffffffff8159608c>] genl_family_rcv_msg+0x1bc/0x320 [ 1.353753] microcode: CPU7 sig=0x306c3, pf=0x2, revision=0x9 [ 1.353843] [<ffffffff81596989>] genl_rcv_msg+0x79/0xb0 [ 1.353938] [<ffffffff81595bd9>] netlink_rcv_skb+0xa9/0xc0 [ 1.354030] [<ffffffff81595eb7>] genl_rcv+0x27/0x40 [ 1.354123] [<ffffffff8159516d>] netlink_unicast+0x10d/0x190 [ 1.354214] [<ffffffff815955f9>] netlink_sendmsg+0x359/0x760 [ 1.354308] [<ffffffff8154e082>] sock_sendmsg+0xc2/0xe0 [ 1.354427] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba [ 1.354409] [<ffffffff8154e46c>] ___sys_sendmsg+0x37c/0x390 [ 1.354619] [<ffffffff81551144>] __sys_sendmsg+0x44/0x80 [ 1.354729] [<ffffffff8155118d>] SyS_sendmsg+0xd/0x20 [ 1.354835] [<ffffffff81678f16>] system_call_fastpath+0x1a/0x1f [ 1.354932] -> #0 (genl_mutex){+.+.+.}: [ 1.355129] [<ffffffff810ac028>] __lock_acquire+0x1528/0x1dd0 [ 1.355222] [<ffffffff810ace97>] lock_acquire+0x87/0x130 [ 1.355316] [<ffffffff8166defc>] mutex_lock_nested+0x5c/0x3e0 [ 1.355412] [<ffffffff81596902>] genl_lock+0x12/0x20 [ 1.355516] [<ffffffff81596aec>] ctrl_dumpfamily+0x12c/0x140 [ 1.355638] [<ffffffff81592b70>] netlink_dump+0x90/0x240 [ 1.355746] [<ffffffff81593010>] netlink_recvmsg+0x2f0/0x3a0 [ 1.355858] [<ffffffff8154dd29>] sock_recvmsg+0xd9/0xf0 [ 1.355964] [<ffffffff8154d562>] ___sys_recvmsg+0x112/0x2a0 [ 1.356076] [<ffffffff81551384>] __sys_recvmsg+0x44/0x80 [ 1.356181] [<ffffffff815513cd>] SyS_recvmsg+0xd/0x20 [ 1.356271] [<ffffffff81678f16>] system_call_fastpath+0x1a/0x1f [ 1.356411] other info that might help us debug this: [ 1.356700] Possible unsafe locking scenario: [ 1.356929] CPU0 CPU1 [ 1.357081] ---- ---- [ 1.357248] lock(nlk->cb_mutex); [ 1.357517] lock(genl_mutex); [ 1.357823] lock(nlk->cb_mutex); [ 1.358098] lock(genl_mutex); [ 1.358354] *** DEADLOCK *** [ 1.358486] 1 lock held by crda/311: [ 1.358541] #0: (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81592b09>] netlink_dump+0x29/0x240 [ 1.358820] stack backtrace: [ 1.358898] CPU: 0 PID: 311 Comm: crda Not tainted 3.11.0-rc5+ #1 [ 1.358969] Hardware name: Gigabyte Technology Co., Ltd. H87N-WIFI/H87N-WIFI, BIOS F4 08/03/2013 [ 1.359059] ffffffff820ac640 ffff8804253af878 ffffffff8166a218 0000000000000001 [ 1.359296] ffffffff820ac640 ffff8804253af8c8 ffffffff81665ed3 ffff8804253af898 [ 1.359509] ffff8804253af948 00000000005ea0ec ffff880426be86f8 00000000005ea0ec [ 1.359758] Call Trace: [ 1.359813] [<ffffffff8166a218>] dump_stack+0x4f/0x84 [ 1.359878] [<ffffffff81665ed3>] print_circular_bug+0x2ae/0x2bf [ 1.359948] [<ffffffff810ac028>] __lock_acquire+0x1528/0x1dd0 [ 1.360010] [<ffffffff810aa356>] ? check_irq_usage+0x96/0xe0 [ 1.360086] [<ffffffff810ace97>] lock_acquire+0x87/0x130 [ 1.360146] [<ffffffff81596902>] ? genl_lock+0x12/0x20 [ 1.360223] [<ffffffff81169ff0>] ? __kmalloc_node_track_caller+0x290/0x390 [ 1.360287] [<ffffffff81596902>] ? genl_lock+0x12/0x20 [ 1.360363] [<ffffffff8166defc>] mutex_lock_nested+0x5c/0x3e0 [ 1.360424] [<ffffffff81596902>] ? genl_lock+0x12/0x20 [ 1.360501] [<ffffffff8155a197>] ? __kmalloc_reserve.isra.45+0x37/0xa0 [ 1.360564] [<ffffffff81596902>] genl_lock+0x12/0x20 [ 1.360661] [<ffffffff81596aec>] ctrl_dumpfamily+0x12c/0x140 [ 1.360722] [<ffffffff8159205f>] ? netlink_alloc_skb+0xaf/0x1e0 [ 1.360799] [<ffffffff81592b70>] netlink_dump+0x90/0x240 [ 1.360860] [<ffffffff81593010>] netlink_recvmsg+0x2f0/0x3a0 [ 1.360938] [<ffffffff8154dd29>] sock_recvmsg+0xd9/0xf0 [ 1.361000] [<ffffffff8154d562>] ___sys_recvmsg+0x112/0x2a0 [ 1.361078] [<ffffffff8106ee6e>] ? up_read+0x1e/0x40 [ 1.361138] [<ffffffff81674fcc>] ? __do_page_fault+0x1fc/0x570 [ 1.361215] [<ffffffff8113d55f>] ? might_fault+0x4f/0xa0 [ 1.361276] [<ffffffff8154d3f3>] ? move_addr_to_user+0x83/0xe0 [ 1.361354] [<ffffffff81551384>] __sys_recvmsg+0x44/0x80 [ 1.361416] [<ffffffff815513cd>] SyS_recvmsg+0xd/0x20 [ 1.361492] [<ffffffff81678f16>] system_call_fastpath+0x1a/0x1f This is not a deadlock, just a lockdep warning that is unrelated to audio (i.e. a separate bug). If you want, I will blacklist the iwldvm module so that you don't see this. As for my earlier comment about general IRQ problems - I stand corrected. This IS RF interference. In pulseaudio -vvv log, I see front headphone jack status changes that correspond in time to the glitches. However, this computer's case does not have frnt panel audio jacks, so I left the motherboard pins that should lead to the front panel audio unconnected to anything. And the keyboard becomes non-lagged if I shift it a bit on my table. Also, the Dell monitor sometimes shows some spurious light near its sensor "buttons". So, I now believe all of this strange activity is actually unrelated to the HDMI bug. I attempted to get rid of the bug by buying a different motherboard (MSI Z87I). The attempt is unsuccessful: the same bug manifests itself again on the new board. Created attachment 109811 [details]
alsa-info.sh output from MSI board
This is with today's drm-intel/drm-intel-nightly
Created attachment 109821 [details]
dmesg from MSI board
Can you spot what's common (besides the form factor) between these two motherboards? Or should I suspect a faulty CPU?
which pin complex is DisplayPort and which pin complex is you HDMI ? Do their ELD contain the monitor name ? state.MID { control.1 { iface CARD name 'HDMI/DP,pcm=3 Jack' value true comment { access read type BOOLEAN count 1 } } control.6 { iface PCM device 3 name ELD value '100009006a100001000000000000000010ac16f044454c4c2055323431300907070000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000' comment { access 'read volatile' type BYTES count 83 } } control.7 { iface CARD name 'HDMI/DP,pcm=7 Jack' value true comment { access read type BOOLEAN count 1 } } control.12 { iface PCM device 7 name ELD value '100008006522000000000000000000001e6d01004c4720545615075009570700000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000' comment { access 'read volatile' type BYTES count 83 } } but your new motherboard does not have dual HDMI? 1.794216] hda-intel 0000:00:1b.0: codec_mask = 0x1 [ 1.794413] hda-intel 0000:00:1b.0: codec #0 probed OK [ 1.796130] hda_codec: invalid CONNECT_LIST verb 7[1]:0 [ 1.796132] hdmi: haswell: override pin connection 0x7 [ 1.796677] HDMI hot plug event: Codec=0 Pin=5 Device=0 Inactive=0 Presence_Detect=1 ELD_Valid=1 [ 1.796731] HDMI status: Codec=0 Pin=5 Presence_Detect=1 ELD_Valid=1 [ 1.796738] HDMI status: Codec=0 Pin=5 Presence_Detect=1 ELD_Valid=1 [ 1.798571] HDMI: detected monitor DELL U2410 at connection type HDMI [ 1.798573] HDMI: available speakers: FL/FR [ 1.798575] HDMI: supports coding type LPCM: channels = 2, rates = 32000 44100 48000, bits = 16 20 24 [ 1.800315] HDMI: detected monitor DELL U2410 at connection type HDMI [ 1.800318] HDMI: available speakers: FL/FR [ 1.800321] HDMI: supports coding type LPCM: channels = 2, rates = 32000 44100 48000, bits = 16 20 24 [ 1.800386] HDMI hot plug event: Codec=0 Pin=5 Device=0 Inactive=0 Presence_Detect=1 ELD_Valid=1 [ 1.800407] HDMI status: Codec=0 Pin=6 Presence_Detect=1 ELD_Valid=1 [ 1.800419] HDMI status: Codec=0 Pin=5 Presence_Detect=1 ELD_Valid=1 > which pin complex is DisplayPort and which pin complex is you HDMI ? I don't know how to figure this out. According to what you pasted, control.6 is DELL U2410, and control.12 is LG TV. Physically, DELL U2410 is connected to the DVI port using the DVI-to-HDMI cable, and LG TV to the HDMI. DisplayPort is unused. > but your new motherboard does not have dual HDMI? It sort-of does. It has one DVI connector that is seen as HDMI1 by xrandr and is actually connected to an HDMI encoder. Also it has a "real" HDMI connector that is seen as HDMI2 by xrandr. And it has a DisplayPort. Also the old motherboard physically has one DVI and two HDMI sockets, but, from the software viewpoint, all three are HDMI. And actually, with the old board, the Dell monitor could produce audio clicks even when connected to the DVI output using DVI-to-HDMI cable. So, in both cases, please treat the DVI physical connector as HDMI. as you are using pulseaudio what is the default sink ? do these two available HDMI sinks have same priority ? post the output of pactl stat pactl list Sorry, I have replaced the motherboard back to Gigabyte H87N-WIFI, because it is more energy-effifient (to avoid the Streacom Nano150 PSU overload). So I cannot post any more output from the MSI board. As for pulseaudio - it is completely irrelevant to this bug. As you can see from the comments above (including the original submission), the kernel complains even with pasuspender. If you want pulseaudio logs for some other reason, unrelated to this bug, please ask privately via e-mail. did you only test the device 3 since your HDMI LG TV is connected to pin 6 which is device 5 ? you are only connected your dell through DVI Advanced information - PCI Vendor/Device/Subsystem ID's !!------------------------------------------------------- 00:03.0 0403: 8086:0c0c (rev 06) Subsystem: 1462:7851 -- 00:1b.0 0403: 8086:8c20 (rev 04) Subsystem: 1462:d851 1.790068] hda-intel 0000:00:03.0: chipset global capabilities = 0x2001 [ 1.790207] hda-intel 0000:00:1b.0: chipset global capabilities = 0x4401 seem haswell only support 2 SDO only two independent HDMI streams for multistreaming the two streams must have different stream tags instead of Same tag 0x2 574956] hdmi_setup_stream: NID=0x5, pinctl=0x40 [ 17.574958] hda_codec_setup_stream: NID=0x2, stream=0x2, channel=0, format=0x4011 [ 17.578555] hdmi_setup_stream: NID=0x6, pinctl=0x40 [ 17.578557] hda_codec_setup_stream: NID=0x2, stream=0x2, channel=0, format=0x4011 I tested both devices. For X in 3, 7, 8 the following command produces either a click in either a TV or a monitor (on older kernels like 3.11) or no sound at all (on newer kernels from git, or on the "remaining" device that corresponds to the unconnected port), and logs "playback write error (DMA or IRQ trouble?)": pasuspender -- aplay -D hw:0,$X /usr/share/sounds/startup3.wav As for different vs the same stream tags - how do I test that it is indeed my problem? P.S. there is a cat show today, I will be able to reply and test your suggestions only after it. https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/plain/Documentation/sound/alsa/HD-Audio.txt Hint Strings ~~~~~~~~~~~~ The codec parser have several switches and adjustment knobs for matching better with the actual codec or device behavior. Many of them can be adjusted dynamically via "hints" strings as mentioned in the section above. try specify hint stick_stream = false for haswell hdmi - sticky_stream (bool): keep the PCM format, stream tag and ID as long as possible; default true - trigger_sense (bool): indicates that the jack detection needs the explicit call of AC_VERB_SET_PIN_SENSE verb - indep_hp (bool): provide the independent headphone PCM stream and the corresponding mixer control, if available For alc892 trigger_sense = false indepdent_hp = true it seem that MSI user manual does not mention how to support 7.1 for desktop motherboard with 5 stack even intel web site does not mention which jack need to be retask for (side channel playback) http://www.intel.com/support/motherboards/desktop/sb/CS-034198.htm 8-channel audio 8-channel audio is available only on certain Intel® Desktop Boards. After installing the audio driver from the Intel Express Installer CD, multi-channel audio can be enabled: Connect speakers to A, B, C, D, or E as shown in the figure below, up to eight speakers. Raymond, echo stick_stream = false > /sys/devices/pci0000:00/0000:00:03.0/sound/card0/hwC0D0/hints does not help the IRQ or DMA problem. And please do not hijack the bug for your multistreaming remarks. did you rebbot the computer or reloaded the modules or dynamic reconfiguration ? did you perform dynamic reconfigure # echo 1 > /sys/class/sound/hwC0D0/reconfig or Early Patching ~~~~~~~~~~~~~~ When CONFIG_SND_HDA_PATCH_LOADER=y is set, you can pass a "patch" as a firmware file for modifying the HD-audio setup before initializing the codec. This can work basically like the reconfiguration via sysfs in the above, but it does it before the first codec configuration. A patch file is a plain text file which looks like below: ------------------------------------------------------------------------ [codec] 0x12345678 0xabcd1234 2 [model] auto [pincfg] 0x12 0x411111f0 [verb] 0x20 0x500 0x03 0x20 0x400 0xff [hint] jack_detect = no ------------------------------------------------------------------------ The hd-audio driver reads the file via request_firmware(). Thus, a patch file has to be located on the appropriate firmware path, typically, /lib/firmware. For example, when you pass the option `patch=hda-init.fw`, the file /lib/firmware/hda-init.fw must be present. With the Gigabyte board, I did the following, without any effect except a single click. Note: no matter which HDMI device I specify, the click is most often in the TV speakers, not in the monitor output. But sometimes (rarely) the same aplay command produces a click both in the TV speakers and on the monitor output. Created /lib/firmware/hda-init.fw with the following contents: [hint] stick_stream = false Ran the following commands, without pulseaudio running: fuser -k /dev/snd/* ; rmmod snd-hda-intel modprobe snd-hda-intel patch=hda-init.fw aplay -D hdmi:0,0 /usr/share/sounds/startup3.wav aplay -D hdmi:0,2 /usr/share/sounds/startup3.wav # in another terminal Both aplay commands complete very slowly by themselves and lead to "playback write error (DMA or IRQ trouble?)" in the dmesg. should be sticky_stream = false sticky_stream (bool): keep the PCM format, stream tag and ID as long as possible; default true is it normal for the graphic driver get EDID from the dell monitor connected through dvi using DVI-HDMI adapter ? only HDMI and Displayport are defined in the two bits CONNECTOR TYPE field in EDID I have corrected the typo. Same effect: either a click in either the monitor or the TV, or no click at all, randomly. As for the EDID-over-HDMI-masquerading-as-DVI question - I think it is normal. At work, we had an Asrock-based machine with the onboard NVidia card, and a Fujitsu monitor connected to its DVI output via the DVI-to-HDMI cable. It worked as it should, including audio. Cannot recheck, as someone in the other department took that machine. Yesterday I made an observation by adding a few printk()s into azx_interrupt() (yes, I know, that's bad). The IRQs come fine when codec parameters are changed in the driver (but of course they are not consumed by snd_pcm_period_elapsed()). They don't come at all (or only one IRQ comes) when they are needed for snd_pcm_period_elapsed(). Could anyone please tell me how to check if they are somehow disabled by mistake? Created attachment 110171 [details]
Debug patch
A patch that adds some printks about IRQ handling
Created attachment 110181 [details]
Dmesg with the debugging patch above
Dmesg from today's drm-intel/drm-intel-nightly with the debugging patch on top. No special kernel or snd_hda_intel options.
First I attempted to play sound via hdmi:0,0, then through plughw:1,0. In the HDMI case, only the first (bogus) interrupt comes through after starting the playback. 3.11.0 behaves the same, please disregard the comment about the difference above.
Also I tried the following stupid hack on top of 3.11.0: 1. Deleted the !azx_dev->irq_pending check from azx_irq_pending_work() 2. Replaced all cases where azx_position_ok() returns -1 with the return code 0. 3. Made azx_irq_pending_work() reenqueue itself after msleep(10) instead of just returning when nothing is pending. I.e. I effectively forced pending IRQ processing to run every 10 ms. Result: no complaints about IRQ or DMA problem, and "time aplay ..." takes about half of the time it should take. With only one copy of aplay, no sound (no matter which device I use and which output I listen to). With two copies of aplay (one for hdmi:0,0 and one for hdmi:0,2) there is crackling sound in the monitor's output, and, if the TV is also connected, from the TV. I'm experiencing the same issue on my GA-H87N-WIFI (same board), with an i5 4670 CPU. Breakthrough: HDMI audio works using Debian's kernel (3.10-3-amd64, from linux-image-3.10-3-amd64_3.10.11-1_amd64.deb). So it is some regression introduced after v3.10. This is not an introduced regression. It is a motherboard (hardware) bug well-hidden by some configuration option that differs in my kernel and in Debian kernel. I have not yet identified the relevant set of configuration options, but disabling VT-d in the BIOS, or, alternatively, passing intel_iommu=off to the kernel command line allows the hardware to play the first period. Then it hangs (receives no further interrupts) in my kernel and happily continues in the Debian kernel. Forgot to say that CONFIG_INTEL_IOMMU_DEFAULT_ON=y in my kernel and =n in Debian kernel. OK, to get a working 3.10.14 kernel, two options are essential CONFIG_INTEL_IOMMU_DEFAULT_ON=n (with y, I get complete silence) CONFIG_SND_HDA_PREALLOC_SIZE=64 (with 1024, only the first period is played) I have not yet tried these options on newer kernels. Changing these two options also helps on new kernels. So, I have a viable solution to my problem, but leave the bug open, as the failure mode is quite mysterious. Please add quirks so that IOMMU is not used on either of the boards, and that MID does not accept too-big preallocated buffers (even though SUSE had 1024 as the default for a long time - now we have a situation where this leads to bugs). Created attachment 110261 [details]
Good dmesg, just in case
Created attachment 110271 [details]
Good config
Did you try intel_iommu=on,igfx_off boot option? This allows IOMMU on other devices but disables the broken graphics (and HDMI audio). Regarding the buffer size, try snoop=false option for snd-hda-intel, at first. The restriction of preallocation size is not real solution, it just happens to work, and inappropriate to put into the driver statically as a quirk. Tried intel_iommu=on,igfx_off snd_hda_intel.snoop=0 Results are the same as with just disabling the IOMMU - i.e. one period plays, and then it gets stuck. So intel_iommu=igfx_off is useful, snd_hda_intel.snoop=0 has no effect. Another thing to test is to pass enable_msi=0 to snd-hda-intel. Also tried: intel_iommu=igfx_off snd_hda_intel.snoop=0 snd_hda_intel.enable_msi=0 Result: the card still gets stuck. P.S. I am currently in #alsa on freenode as patrakov and will be there for one hour, let's talk there if you want more interactive debugging. OK, then try to adjust the preallocation size on the fly. It can be changed via proc file, e.g. echo 1024 > /proc/asound/card1/pcm3p/sub0/prealloc I don't quite understand what you wanted me to test in comment #46. So I built some kernels with CONFIG_SND_HDA_PREALLOC_SIZE set to different values, and also tried to echo various values to /proc/asound/card0/pcm?p/sub0/prealloc (because HDMI is card0 here). Result: no matter how the value ends up in prealloc, values of 84 and below work, 88 and above don't, and once the card has seen 88 or higher while playing, there is no way out. The card won't play even the first period on the attempt to use aplay after "correcting" the situation. Note: this testing was with a 44100 Hz S16_LE stereo wav file. Will reboot now and retest with different files. With prealloc = 88, this works: Slave: Hardware PCM card 0 'HDA Intel MID' device 3 subdevice 0 Its setup is: stream : PLAYBACK access : MMAP_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 32000 exact rate : 32000 (32000/1) msbits : 16 buffer_size : 16000 period_size : 4000 period_time : 125000 tstamp_mode : NONE period_step : 1 avail_min : 4000 period_event : 0 start_threshold : 16000 stop_threshold : 16000 silence_threshold: 0 silence_size : 0 boundary : 9007199254740992000 appl_ptr : 0 hw_ptr : 0 This doesn't: Slave: Hardware PCM card 0 'HDA Intel MID' device 3 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 22052 period_size : 5513 period_time : 125011 tstamp_mode : NONE period_step : 1 avail_min : 5513 period_event : 0 start_threshold : 22052 stop_threshold : 22052 silence_threshold: 0 silence_size : 0 boundary : 6207086186423386112 appl_ptr : 0 hw_ptr : 0 With prealloc = 84, this works: Slave: Hardware PCM card 0 'HDA Intel MID' device 3 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 21504 period_size : 5376 period_time : 121904 tstamp_mode : NONE period_step : 1 avail_min : 5376 period_event : 0 start_threshold : 21504 stop_threshold : 21504 silence_threshold: 0 silence_size : 0 boundary : 6052837899185946624 appl_ptr : 0 hw_ptr : 0 Build CONFIG_SND_HDA_PREALLOC_SIZE=1024 as is, but adjust the prealloc size later on the running system via echo to a proc file. At the untouched state, confirm that it shouldn't work (it's 1024). Then change to 64, and check that it works. Then echo 1024, check that it's actually changed (cat the same proc file will show the current value), then confirm that it breaks again. Also, if it has something to do with buffer pages, you can try to disable 64bit DMA. Add AZX_DCAPS_NO_64BIT should do it: --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -610,7 +610,7 @@ enum { AZX_DCAPS_COUNT_LPIB_DELAY) #define AZX_DCAPS_INTEL_PCH \ - (AZX_DCAPS_INTEL_PCH_NOPM | AZX_DCAPS_PM_RUNTIME) + (AZX_DCAPS_INTEL_PCH_NOPM | AZX_DCAPS_PM_RUNTIME | AZX_DCAPS_NO_64BIT) /* quirks for ATI SB / AMD Hudson */ #define AZX_DCAPS_PRESET_ATI_SB \ I have not tried your patch, but found that intel_iommu=on,igfx_off snd_hda_intel.align_buffer_size=1 fixes the problem. Should I still try the patch? Hm, interesting. I haven't heard of this for Intel chip. Maybe specific to the combination of Haswell and HDMI and IOMMU. But, yes, it's still interesting to check whether it influences on the behavior. Test with and without align_buffer_size option. The patch does not help me to remove any of the following kernel parameters: intel_iommu=on,igfx_off snd_hda_intel.align_buffer_size=1. OTOH, it does not make the IRQ situation worse when both of those parameters are present. In other words, the patch has no visivle effect. is the alignment only for haswell ? how about your Intel hda controller with alc892 ? 859051] HDMI: detected monitor LG TV at connection type HDMI [ 1.859054] HDMI: available speakers: FL/FR LFE FC RL/RR RC FLC/FRC RLC/RRC FLW/FRW FLH/FRH TC FCH [ 1.859056] HDMI: supports coding type AC-3: channels = 6, rates = 32000 44100 48000, max bitrate = 640000 [ 1.859059] HDMI: supports coding type LPCM: channels = 2, rates = 32000 44100 48000 96000 192000, bits = 16 20 24 does playing audio different rate , channel , format to two HDMI TV at the same time work ? 321.906316] hdmi_setup_stream: NID=0x5, pinctl=0x40 [ 321.906317] hda_codec_setup_stream: NID=0x2, stream=0x2, channel=0, format=0x4011 [ 339.856738] hda-intel 0000:00:03.0: azx_pcm_prepare: bufsize=0x10000, format=0x4011 [ 339.857090] hdmi_setup_stream: NID=0x7, pinctl=0x40 [ 339.857091] hda_codec_setup_stream: NID=0x3, stream=0x1, channel=0, format=0x4011 It is indeed possible to output two independent streams with different sample rates via two HDMI outputs. The analog controller never had any problems that required patches or kernel parameters. It also allows independent analog and spdif streams with different sample rates. Hello Takashi, I find it surprising that, despite the correct options being known, there is still no patch that adds a quirk. Is there any additional information that I need to add to the bug? have you ask those Intel developers if you think your Intel controller need align buffer ? http://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pci/hda/hda_intel.c?id=2ae66c26550cd94b0e2606a9275eb0ab7070ad0e If I read the code correctly, AZX_DCAPS_BUFSIZE means "this device does not need buffer size alignment". And this code says that no Intel HD audio device needs buffer size alignment: /* Generic Intel */ { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_ANY_ID), .class = PCI_CLASS_MULTIMEDIA_HD_AUDIO << 8, .class_mask = 0xffffff, .driver_data = AZX_DRIVER_ICH | AZX_DCAPS_BUFSIZE }, ...which is false according to the findings in this bug. I will try to insert an entry before this and see if it helps. I do realize that I have not indicated whether both quirks apply to both boards - I don't know yet and have to test. Unfortunately, the construction of the fanless case makes it hard to disassemble, so I will definitely not test the MSI board today :( An Intel engineer is already in the CC list. Well, the biggest question is whether this is specific to what parameter, IOW, which condition triggers the bug. Is it only on your mobo models, or only on a certain chipset, or only on some chipset revisions, or specific to (all) Haswell models, or happening only with combination of IOMMU (or any other kernel) setup? It's hard to answer without more testing. And, I have a bunch of (more than 20) Haswell machines here for testing, and none of them show such a problem. That's why I didn't apply any patch, so far; i.e. you're the only person hitting the issue, and it can be worked around by a known option :) Once if we can narrow down the condition, I'd be glad to apply the fix. As I promised, I have replaced the motherboard to MSI temporarily. Results of the test: the MSI motherboard also needs both quirks (intel_iommu=on,igfx_off snd_hda_intel.align_buffer_size=1). So, as far as I am able to generalize from my data, the quirk needs to go on any 8086:0c0c device, regardless of any subsystem vendor/device IDs. Just to stress this again, the bug needs some very specific conditions to trigger: 1. CONFIG_SND_HDA_PREALLOC_SIZE=1024 (or in fact anything greater than 128) 2. CONFIG_INTEL_IOMMU_DEFAULT_ON=y 3. IOMMU enabled in the BIOS 4. A sound file with S16_LE, 44100 Hz, stereo - otherwise alsa-lib would choose a period size that is a multiple of 128 5. Testing directly with aplay -D hdmi:0,X - not via pulseaudio, and not via speaker-test, otherwise non-round period size will not be chosen Takashi: are you sure that you have satisfied all those conditions during your tests? Even one unsatisfied condition could lead to you happily saying "there is no bug on this machine", while it in fact exists. Hrm, OK, maybe it's safer to set the alignment for Haswell HDMI generically. But before doing it: could you check whether the same problem happens even without IOMMU? I guess this is independent from that. On the Gigabyte board, snd_hda_intel.align_buffer_size=1 is needed even with IOMMU disabled in the BIOS. Let me recheck with MSI, and please join #alsa or #pulseaudio on freenode if you want more interactive debugging. Well, the MSI board does not even have a setting to disable IOMMU in the BIOS. So nothing to test. OK, thanks for quick tests. Below is the patch I'm going to apply. Please let me know if this works. Created attachment 113501 [details]
Fix buffer alignment for Haswell HDMI controllers
I'm confused by the IOMMU aspect of this. Surely the DMA for the *audio* will still from from the HD Audio controller, not from the graphics device? So turning off the IOMMU from the graphics device really shouldn't make any difference? And if there's some mixup in hardware and the DMA transactions are actually happening with the PCI source-id of the graphics device instead of the HD audio controller... then we should see *faults*. But we don't; instead the DMA just silently fails? How is the integration between the graphics and audio devices supposed to work? In the case of Intel HDMI, the audio device is a kind of slave of GPU. The actual data is handled solely in the graphics side. And, on the recent Intel chips like Haswell, both are integrated more tightly. For example, if GPU is turned off, just accessing to the audio PCI register gives kernel Oops or a hard lockup, although it looks independent in the PCI level. Please could you upload the DMAR tables from the offending machines? (/sys/firmware/acpi/tables/DMAR) (In reply to Takashi Iwai from comment #67) > Created attachment 113501 [details] > Fix buffer alignment for Haswell HDMI controllers The patch works. Created attachment 113511 [details]
DMAR from the MSI board
I will not attach the DMAR from the Gigabyte board today, because it is rather hard to disassemble and reassemble the whole computer. At any given moment, at most one of those boards can be in my computer, because I don't have two Haswell CPUs.
(yes I know this is unrelated to this bug) (In reply to Raymond from comment #22) > it seem that MSI user manual does not mention how to support 7.1 for desktop > motherboard with 5 stack Please don't worry. On the MSI board, 7.1 works over analog output out of the box. Here is what goes where, both according to the labels and to the fact: Line In: blue jack Front Left/Right: green jack Microphone: red jack Surround left/right: black jack Center/LFE: orange (meant to be brown?) jack Side Left/Right: light-gray jack but your info did not have the channel mode to select 6ch , 8ch http://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pci/hda?id=a07a949be6eb1c9aab06adaadce72dbd27b7d9cb ALSA: hda - Fix multi-io channel mode management The multi-io channels can vary not only from 1 to 6 but also may vary from 6 to 8 or such. spec->multi_ios = 1 when only blue Jack is used for retasking if (spec->multi_ios == 2) { for (i = 0; i < 2; i++) spec->private_dac_nids[spec->multiout.num_dacs++] = spec->multi_io[i].dac; } else if (spec->multi_ios) { spec->multi_ios = 0; badness += BAD_MULTI_IO; } Raymond: there was indeed no such switch. From my "user" viewpoint, I don't see why it would be necessary on that board, as there are enough connectors and there is never any need to retask jacks. IOW, there is indeed no way to switch the board into a 5.1 mode and no need to do so. Created attachment 113661 [details]
DMAR from the Gigabyte board
I have replaced the board again in order to get the DMAR info. Unfortunately, in the process, one of the adhesive pads that keep the nuts from the passive cooling system in place was damaged.
This means slightly suboptimal position of the passive heatsink (but it still survives the full CPU load) and, more importantly, physical impossibility to remove the heatsink again (well, maybe with the help of thin flat pliers this is fixable, but I don't have them at hand now). So no more motherboard changes for your testing, I am stuck with Gigabyte :(
I do have a backup copy of "lspci -nnvv", "dmidecode --dump-bin" and all ACPI tables that were exposed in sysfs from the MSI board.
Erased that lspci/dmidecode backup by accident :( David Woodhouse: with both DMARs attached to this bug, why there is no further progress in the IOMMU side of the bug? Alexander - I am considering the MSI Z87I and am wondering if your bug is still present in kernel version 3.13.6 (current stable) or in 3.14-rc7 (current rc). The buffer alignment bug is fixed. As for the IOMMU bug, I don't know. I still have a workaround on my kernel command line and I am too lazy to check whether it is still needed. However, the IOMMU bug, even if it still exists, is so easily workaroundable that you should not consider it as an argument against this board. 3.14-rc7 should have the bug fix patch by Takashi. Not sure about 3.13.6 Ping about the IOMMU bug. Ping about the IOMMU bug. The IOMMU broke my hdmi sound outputs on my ASUS B85M-G. I am developing intel IGD virtualization, so 'igfx_off' is not a way for me. Should I file another bug to cover this issue? *** Bug 86311 has been marked as a duplicate of this bug. *** *** Bug 67321 has been marked as a duplicate of this bug. *** *** Bug 86221 has been marked as a duplicate of this bug. *** Another ping. I've also had problems with Haswell + HDMI audio + IOMMU. For me, using the media player Kodi to bitstream AC3 or DTS data over HDMI doesn't work (other types of audio data work though). This only happens with intel_iommu=on or CONFIG_INTEL_IOMMU_DEFAULT_ON=y. With IOMMU enabled I also see this line in dmesg: [ 35.168233] snd_hda_intel 0000:00:03.0: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj. My hardware is an Intel Haswell NUC D54250WYK. I posted more info at http://forum.kodi.tv/showthread.php?tid=224324 if that helps. Hi there, I also seem to be affected by this issue (Haswell i7-4790k, Asus Maximus Hero VII). No HDMI Audio from integrated Intel HD 4600 graphics, with intel_iommu=on (I need this turned on in order to do GPU passthrough for a discrete Nvidia adapter). There is no audio being output, even though every command I try seems to confirm the HDMI audio device is present and 'working'. Finally, can anyone confirm whether or not Skylake CPU's are also affected by this issue, or is the issue limited to Haswell CPU's only? Thanks, I also have the sound problem with intel_iommu=on and hdmi output via HDMI on a haswell cpu + nvidia gpu (Intel i7-4702HQ 2.2-3.2Ghz and GT750M). Is there any hope of getting this fixed to make PCI passthrough possible? Thanks ! Outputs + logs: https://bbs.archlinux.org/viewtopic.php?id=204460 (In reply to etfaker from comment #88) > I also have the sound problem with intel_iommu=on and hdmi output via HDMI > on a haswell cpu + nvidia gpu (Intel i7-4702HQ 2.2-3.2Ghz and GT750M). Is > there any hope of getting this fixed to make PCI passthrough possible? > Thanks ! > > Outputs + logs: > https://bbs.archlinux.org/viewtopic.php?id=204460 I should add that I also have no or very crappy/stuttering/low quality sound output via HDMI with intel_iommu=on which makes it useless :/ but I need it. And .. after some days when I thought I was getting mad... or something wrong with my mobo ... I found this which fits perfectly ! I've never used my onboard hdmi .. now I use it since I passthrough the nvidia card to a VM and use the onboard hdmi to use it on the regular system ( maybe even launch a kodi if audio would work ). So, workarounds ? Any way I can help debug ? I'm on a 4.4.3 kernel, the cpu is a i7-4790 ( without the K ), the mobo is a ASUS H97-PRO GAMER. I don't use it for gaming though .. but more as a server and desktop .. Hi again, Just a quick update subsequent to my original comment (see below) I recently upgraded from Haswell to Skylake (i6700k, Asus Maximus Hero VIII Alpha) and it would appear that I no longer have the HDMI audio issue.. As in: - I have vt-d enabled in the bios - My currently booted kernel has iommu=on (verified via 'cat /proc/cmdline') - The iommu seems to be working (verified via 'dmesg | grep iommu' and 'find /sys/kernel/iommu_groups/ -type l') - With the above in place, I appear to have working HDMI audio (from the integrated Intel HD 530 graphics) This would suggest the problem is Haswell specific, since Skylake doesn't appear to be affected? Thanks, (In reply to Alza from comment #87) > Hi there, > > I also seem to be affected by this issue (Haswell i7-4790k, Asus Maximus > Hero VII). > > No HDMI Audio from integrated Intel HD 4600 graphics, with intel_iommu=on (I > need this turned on in order to do GPU passthrough for a discrete Nvidia > adapter). > > There is no audio being output, even though every command I try seems to > confirm the HDMI audio device is present and 'working'. > > Finally, can anyone confirm whether or not Skylake CPU's are also affected > by this issue, or is the issue limited to Haswell CPU's only? > > Thanks, I have a problem with HDMI audio on Haswell integrated video that may be related as described in this PulseAudio bug report: https://bugs.freedesktop.org/show_bug.cgi?id=94804 If I have "intel_iommu=on" on the kernel command line without adding "igfx_off", the HDMI audio works, but with significant timing problems. When playing a video, either the audio and video quickly start to drift out of sync, up to a few seconds off, or (in VLC's case) it starts having audio dropouts and complaining like this: core warning: picture is too late to be displayed (missing 25 ms) core debug: picture might be displayed late (missing 13 ms) core warning: playback way too early (-121209): playing silence core debug: inserting 5818 zeroes core warning: playback too early (-40158): down-sampling core warning: timing screwed (drift: -81704 us): stopping resampling core warning: playback too early (-82366): down-sampling core warning: playback way too early (-121032): playing silence core debug: inserting 5809 zeroes core debug: auto hiding mouse cursor core warning: playback too early (-40230): down-sampling core debug: auto hiding mouse cursor core warning: timing screwed (drift: -80762 us): stopping resampling core warning: playback too early (-81447): down-sampling core warning: playback way too early (-120541): playing silence core debug: inserting 5785 zeroes Also having this on Lenovo ThinkPad T440. Intel(R) Core(TM) i3-4010U CPU @ 1.70GHz, Haswell Sorry for the spam, forgot to mention that I'm running 4.9.6 kernel (and also seen the issue in previous kernel versions). Is there any chance that this bug will ever get fixed? This bug is present in the latest linux-mainline 5.3-rc6 still :( Another instance of this bug: https://www.reddit.com/r/archlinux/comments/ghozc1/cantt_get_audio_from_intel_graphics_hdmi_out/ Dear maintainers, maybe it's time to fix the IOMMU driver? Or, if IOMMU maintainers are indeed unreachable, maybe ALSA maintainers can detect this condition (iommu on, Haswell HDMI) and at least emit a warning in the log referencing this bug and a possible workaround? Does it make a difference when you boot with 'intremap=off' on the kernel command line? "intel_iommu=on intremap=off" yields no sound. Created attachment 289285 [details]
DMAR table of Haswell macbookpro11,1 [8086:0a2e]
I have been able to reproduce this bug on a Haswell system, specifically a 2013 Retina MacBookPro (macbookpro11,1) with an integrated Intel graphics card, with a single HDMI out.
00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell-ULT Integrated Graphics Controller [8086:0a2e] (rev 09)
00:03.0 Audio device [0403]: Intel Corporation Haswell-ULT HD Audio Controller [8086:0a0c] (rev 09)
(I'm attaching the full lspci -nn)
I can reliably reproduce the symptoms of lagging audio and eventual complete failure of HDMI audio. I can work-around this issue by disabling IOMMU either completely or just for the integrated graphics.
I haven't test the kernel patch, but I can do it if you believe it is still relevant to try and fix this upstream.
I hope I can help with this, thank you very much for your work!
I'm attaching the DMAR table of my system, as well as my boot/kernel log for IOMMU-off and IOMMU-on
Created attachment 289287 [details]
lspci of affected haswell-macbookpro11,1
Created attachment 289289 [details]
dmesg-haswell-macbook11,1 (iommu is ON)
Created attachment 289291 [details]
dmesg-haswell-macbook11,1 (iommu BUT with igfx_off)
Sound pause and resume lags few seconds behind the video, adding solves this problem intel_iommu=on,igfx_off on Intel(R) Celeron(R) 2955U Another instance in the wild: https://www.reddit.com/r/debian/comments/14lje8f/comment/jpwmckm/?context=3 I can confirm that this bug affects me in Debian Bookworm (kernel 6.1) + Pipewire with this hardware (and a DisplayPort -> HDMI adapter):
================================================
System:
Host: T440s Kernel: 6.1.0-13-amd64 arch: x86_64 bits: 64 Desktop: Cinnamon
v: 5.6.8 Distro: Debian GNU/Linux 12 (bookworm)
Machine:
Type: Laptop System: LENOVO product: 20AR006RUS v: ThinkPad T440s
Mobo: LENOVO model: 20AR006RUS v: SDK0E50510 PRO
UEFI: LENOVO v: GJETA4WW (2.54 )
date: 03/27/2020
CPU:
Info: dual core Intel Core i5-4300U [MT MCP] speed (MHz): avg: 805
min/max: 800/2900
Graphics:
Device-1: Intel Haswell-ULT Integrated Graphics driver: i915 v: kernel
Device-2: Chicony Integrated Camera type: USB driver: uvcvideo
Display: x11 server: X.Org v: 1.21.1.7 driver: X: loaded: modesetting
unloaded: fbdev,vesa dri: crocus gpu: i915 resolution: 1366x768~60Hz
API: OpenGL v: 4.6 Mesa 22.3.6 renderer: Mesa Intel HD Graphics 4400 (HSW
GT2)
Network:
Device-1: Intel Ethernet I218-LM driver: e1000e
Device-2: Intel Wireless 7260 driver: iwlwifi
Device-3: Intel Bluetooth wireless interface type: USB driver: btusb
================================================
I also tried the Debian Backports 6.4 kernel with the same results.
The workaround suggested earlier in this report works:
> edit /etc/default/grub to add
> GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on,igfx_off"
However, it should be noted that openSUSE Tumbleweed + Pipewire on this same hardware does *NOT* have this bug.
|