It seems there is a bug preventing machines from booting with certain IEEE1394 firewire devices. This has been reported in multiple places but doesn't seem to have a report on here. Examples: https://bugzilla.suse.com/show_bug.cgi?id=1215436 https://bugzilla.redhat.com/show_bug.cgi?id=2240973 I experienced the same issue with this card: https://www.amazon.com/gp/product/B07QPDN3XK/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 It looks like it uses a VIA VT6307 chip and was causing mce error reports on my computer while trying to boot Kernel 6.5 on Fedora 38. I could boot normally with 6.4 and with Windows. Here are those MCE errors: ``` [ 0.860834] mce: [Hardware Error]: Machine check events logged [ 0.860834] microcode: CPU20: patch_level=0x0a201025 [ 0.860835] microcode: CPU21: patch_level=0x0a201025 [ 0.860836] microcode: CPU23: patch_level=0x0a201025 [ 0.860836] microcode: CPU22: patch_level=0x0a201025 [ 0.860837] mce: [Hardware Error]: CPU 17: Machine Check: 0 Bank 0: bc00080001010135 [ 0.860845] fbcon: Taking over console [ 0.860847] mce: [Hardware Error]: TSC 0 ADDR fca000f0 MISC d012000000000000 IPID 1000b000000000 [ 0.860854] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1696955537 SOCKET 0 APIC b microcode a201025 [ 0.860860] microcode: CPU0: patch_level=0x0a201025 [ 0.861676] microcode: Microcode Update Driver: v2.2. ```
Could you please bisect? https://docs.kernel.org/admin-guide/bug-bisect.html
CC'ing Takashi Sakamoto, who's probably introduced the regression.
I'll try to get that done tonight or tomorrow. Just `git bisect bad dcadfd7f7c74ef9ee415e072a19bdf6c085159eb` and then build, correct?
(In reply to Artem S. Tashkinov from comment #1) > Could you please bisect? > > https://docs.kernel.org/admin-guide/bug-bisect.html Sorry I'm not that familiar with git bisect. I tried: ``` git bisect start git bisect bad dcadfd7f7c74ef9ee415e072a19bdf6c085159eb git bisect good ddfcf8fb914438b422892f56717508867dfbd6af #This is the tag for kernel 6.4.15 which is the version I was previously using with no issue ``` and got `[44c026a73be8038f03dbdeef028b642880cf1511] Linux 6.4-rc3` which doesn't seem particularly helpful. Let me know if there is other troubleshooting steps I can help with, it's my first kernel bug report so it will be a bit of a learning experience.
Hi Ian and Artem, I'm a maintainer of Linux FireWire subsystem, and sorry for the regression. At present, the issue is just for the case that the extension card has the similar design with Asmedia ASM 1083 and ASM 1085 for PCIe-to-PCI bus bridge, thus not so widely applied; e.g. VIA VT1615, TI chipsets, Agere FW643, and so on. As long as bisecting by SUSE stuff (Stuart Rogers and Jiri Slaby), a commit dcadfd7f7c74 ("firewire: core: use union for callback of transaction completion ") brings the issue. The change is to execute `readl` to a register (`CYCLE_TIME`) defined in 1394 OHCI. The access to register is quite common, so it does not perform something specific. Then people puzzled, the more familiar to low-level software they are, "why such simple register access causes system reboots?". In my experiments, the host system reboot happens even if the issued 1394 OHCI hardware is bound to virtual machine by PCI-passthrough (vfio-pci). Furthermore, without the PCIe-to-PCI bridge chip, the system reboot is not triggered. I purchased 1394 OHCI (VIA VT6306 and VT6307) hardware to figure out the issue, and realized that they work well with the latest Linux FireWire subsystem. In my opinion, my experiments point that the cause of reboot is triggered by quite low-level in software stack (e.g. Linux PCI subsystem) or hardware itself due to any quirk of the bridge chip. Actually we can see some issues relevant to the bridge chip; e.g. interrupt, DMA, and so on. I've never figure out it yet. In my current understanding, we have longstanding potential problem to use the bridge chip in Linux environment. The change of Linux FireWire subsystem reveals it, unfortunately.
I have two of these ASMedia bridges in my system, one for FireWire: 25:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 04) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 94, IOMMU group 25 Bus: primary=25, secondary=26, subordinate=26, sec-latency=32 I/O behind bridge: e000-efff [size=4K] [16-bit] Memory behind bridge: fc900000-fc9fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Power Management version 3 Capabilities: [80] Express PCI-Express to PCI/PCI-X Bridge, MSI 00 Capabilities: [c0] Subsystem: Device [0000:0000] Capabilities: [100] Virtual Channel 26:00.0 FireWire (IEEE 1394) [0c00]: VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller [1106:3044] (rev 80) (prog-if 10 [OHCI]) Subsystem: VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller [1106:3044] Flags: bus master, stepping, medium devsel, latency 32, IRQ 94, IOMMU group 25 Memory at fc900000 (32-bit, non-prefetchable) [size=2K] I/O ports at e000 [size=128] Capabilities: [50] Power Management version 2 Kernel driver in use: firewire_ohci Kernel modules: firewire_ohci And one for the sound card, which is: 27:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 03) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 24, IOMMU group 26 Bus: primary=27, secondary=28, subordinate=28, sec-latency=32 I/O behind bridge: d000-dfff [size=4K] [16-bit] Memory behind bridge: [disabled] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Power Management version 3 Capabilities: [80] Express PCI-Express to PCI/PCI-X Bridge, MSI 00 Capabilities: [c0] Subsystem: Device [0000:0000] Capabilities: [100] Virtual Channel 28:04.0 Multimedia audio controller [0401]: C-Media Electronics Inc CMI8788 [Oxygen HD Audio] [13f6:8788] Subsystem: ASUSTeK Computer Inc. Virtuoso 100 (Xonar Essence STX) [1043:835c] Flags: bus master, medium devsel, latency 32, IRQ 24, IOMMU group 26 I/O ports at d000 [size=256] Capabilities: [c0] Power Management version 2 Kernel driver in use: snd_virtuoso Kernel modules: snd_virtuoso Just in case you're looking to prove whether 1394 is involved or not, it may help to know other hardware with that specific bridge chip? I have the MCE and failure to boot on a Fedora 6.5.7 kernel. 6.4.7 is okay.
Hi Tony Vroon, > Just in case you're looking to prove whether 1394 is involved or > not, it may help to know other hardware with that specific bridge > chip? I have the MCE and failure to boot on a Fedora 6.5.7 kernel. > 6.4.7 is okay. Thanks for the report. However, as I posted in LKML[1], the issue seems to appear just in the combination of the issued 1394 OHCI hardware (i.e. VIA6307), the issued PCIe-to-PCI bridge (i.e. ASM1083), and recent AMD chipset for Ryzen machines (I.e. B450, X370, X570). I guess you use such AMD chipset when encountering the MCE failure. At present, I judge multiple causes underlies in the issue. Linux kernel potentially has the problem, and the change of firewire-ohci module reveals it by indirect way. [1] https://lore.kernel.org/lkml/20231016155657.GA7904@workstation.local/ Regards
(In reply to Takashi Sakamoto from comment #7) > Thanks for the report. However, as I posted in LKML[1], the issue > seems to appear just in the combination of the issued 1394 OHCI > hardware (i.e. VIA6307), the issued PCIe-to-PCI bridge (i.e. ASM1083), > and recent AMD chipset for Ryzen machines (I.e. B450, X370, X570). I > guess you use such AMD chipset when encountering the MCE failure. Absolutely right, MSI MS-7D53 (MPG X570S EDGE MAX WIFI) with an X570 chipset.
> Absolutely right, MSI MS-7D53 (MPG X570S EDGE MAX WIFI) with an X570 chipset. Okay. It is what I expected. Would I ask you to test snd_virtuoso to your Xonar sound card without loading firewire-ohci kernel module? We can use `modules_blacklist=firewire_ohci` in your v6.5.7 kernel command line option for the purpose. Even if the above test were successful, it has not immediately meant ASM1083 would work well in VT6306/7/8 side I think, since the bridge chip is used by different ways, patterns, and configurations. However, it is helpful information to investigate the issue. Thanks
I get the reboot even with modules_blacklist=firewire_ohci it seems.
Do we want to look at the ASPM refactoring that went in with 6.5?
> I get the reboot even with modules_blacklist=firewire_ohci it seems. OK. I presume that ALSA PCM/Control character devices for your Xonar card work as expected in the case after booting up, right? > Do we want to look at the ASPM refactoring that went in with 6.5? Unfortunately, I can regenerate the issue with backported firewire-ohci module in v6.2 kernel. It occurs in my AMD X370 chipset (Gigabyte GA-AX370-Gaming 5 rev. 1.0, F51h BIOS version).
*** This bug has been marked as a duplicate of bug 217993 ***
Hi, The change for 1394 OHCI driver, aimed at suppressing the unexpected system reboot in AMD Ryzen machine[1], has been merged into Linux kernel v6.7[2]. It has also been applied to the following releases of stable and longterm kernels. * 6.6.11[3] * 6.1.72[4] * 5.15.147[5] * 5.10.208[6] * 5.4.267[7] * 4.19.305[8] * 4.14.336[9] Once the downstream distribution project provides the corresponding kernel packages, you should no longer encounter the unexpected system reboot. Note that the following combination of hardware is not necessarily suitable, depending on your use case: * Any type of AMD Ryzen machine * 1394 OHCI hardware consists of: * Asmedia ASM1083/1085 * VIA VT6306/6307/6308 When working with time-aware protocol, such as audio sample processing, it is advisable to avoid the combination. The change accompanies a functional limitation that the software stack does not provides precise hardware time in this case. If you choose to continue using AMD Ryzen machine, the recommendation is to replace the 1394 OHCI hardware with another one. Conversely, if you choose to continue using the 1394 OHCI hardware, the recommendation is to use the machine provided by vendors other than AMD. Thanks for your report and long patience. [1] https://git.kernel.org/torvalds/linux/c/ac9184fbb847 [2] https://lore.kernel.org/lkml/CAHk-=widprp4XoHUcsDe7e16YZjLYJWra-dK0hE1MnfPMf6C3Q@mail.gmail.com/ [3] https://lore.kernel.org/lkml/2024011058-sheep-thrower-d2f8@gregkh/ [4] https://lore.kernel.org/lkml/2024011052-unsightly-bronze-e628@gregkh/ [5] https://lore.kernel.org/lkml/2024011541-defective-scuff-c55e@gregkh/ [6] https://lore.kernel.org/lkml/2024011532-lustiness-hybrid-fc72@gregkh/ [7] https://lore.kernel.org/lkml/2024011519-mating-tag-1f62@gregkh/ [8] https://lore.kernel.org/lkml/2024011508-shakiness-resonant-f15e@gregkh/ [9] https://lore.kernel.org/lkml/2024011046-ecology-tiptoeing-ce50@gregkh/ Thanks Takashi Sakamoto