Bug 217993 - Kernel 6.5 causes instant crash into bootloop
Summary: Kernel 6.5 causes instant crash into bootloop
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: AMD Linux
: P3 high
Assignee: Takashi Sakamoto
URL: https://bugzilla.suse.com/show_bug.cg...
Keywords:
: 217994 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-10-10 21:00 UTC by matthias.schrumpf
Modified: 2024-01-17 18:22 UTC (History)
8 users (show)

See Also:
Kernel Version: 6.5
Subsystem:
Regression: Yes
Bisected commit-id: dcadfd7f7c74ef9ee415e072a19bdf6c085159eb


Attachments

Description matthias.schrumpf 2023-10-10 21:00:25 UTC
Kernel 6.5 causes a crash immediately after selecting it in GRUB or trying to boot with it via other means. This crash always leads into a bootloop if no other kernel is selected. 

A successful boot with kernel 6.5 is impossible, so no log data could be collected. 

This error can be reproduced on many distros including Arch, Endeavour OS, Manjaro, Fedora and OpenSuse. 

The computer is working perfectly fine with older kernels up to and including 6.4. CPU, RAM and hard drives have all been checked thoroughly and no errors could be found. 

Current OS: 
Operating System: Fedora Linux 38
KDE Plasma Version: 5.27.8
KDE Frameworks Version: 5.110.0
Qt Version: 5.15.10
Kernel Version: 6.4.15-200.fc38.x86_64 (64-bit)
Graphics Platform: X11

Hardware: 
Processors: 16 × AMD Ryzen 7 5800X 8-Core Processor
Memory: 62.7 GiB of RAM
Graphics Processor: AMD Radeon RX 6600 XT
Manufacturer: Micro-Star International Co., Ltd.
Product Name: MS-7C91
System Version: 1.0
Comment 1 Artem S. Tashkinov 2023-10-11 14:41:55 UTC
Could you please bisect?

https://docs.kernel.org/admin-guide/bug-bisect.html

Otherwise this bug report has very slim chances of being fixed.
Comment 2 Bagas Sanjaya 2023-10-12 09:39:15 UTC
Do you have any out-of-tree modules that may cause this regression?
Comment 3 matthias.schrumpf 2023-10-16 21:01:04 UTC
(In reply to Artem S. Tashkinov from comment #1)
> Could you please bisect?
> 
> https://docs.kernel.org/admin-guide/bug-bisect.html
> 
> Otherwise this bug report has very slim chances of being fixed.
I'm sorry, I don't have the basic knowledge necessary for performing a bisection. 

(In reply to Bagas Sanjaya from comment #2)
> Do you have any out-of-tree modules that may cause this regression?
No, not to my knowledge. I tried this with live images or fresh installs of Arch, Endeavour OS, Manjaro, Fedora and OpenSuse and I didn't change anything beyond the settings and customizations that these distros do by default.
Comment 4 Bagas Sanjaya 2023-10-16 23:55:15 UTC
On 17/10/2023 04:01, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217993
> 
> --- Comment #3 from matthias.schrumpf@freenet.de ---
> (In reply to Artem S. Tashkinov from comment #1)
>> Could you please bisect?
>>
>> https://docs.kernel.org/admin-guide/bug-bisect.html
>>
>> Otherwise this bug report has very slim chances of being fixed.
> I'm sorry, I don't have the basic knowledge necessary for performing a
> bisection. 
> 

Then refer to Documentation/admin-guide/bug-bisect.rst in the kernel
sources for instructions.

> (In reply to Bagas Sanjaya from comment #2)
>> Do you have any out-of-tree modules that may cause this regression?
> No, not to my knowledge. I tried this with live images or fresh installs of
> Arch, Endeavour OS, Manjaro, Fedora and OpenSuse and I didn't change anything
> beyond the settings and customizations that these distros do by default.
> 

So you have this regression there?
Comment 5 matthias.schrumpf 2023-10-18 21:36:42 UTC
(In reply to Bagas Sanjaya from comment #4)
> So you have this regression there?

What do you mean?
Comment 6 Artem S. Tashkinov 2023-10-24 14:54:14 UTC
Given that seemingly you're the only Linux user who has this issue, your only bet of fixing it is performing regression testing.

The URL provided above has enough information to do so.

If that's not enough, you may simply Google the appropriate questions, e.g.

1) How to compile and install the Linux kernel
2) How to install GCC in Distro_X

It's all quite easy if you get down to it. No one but you can do it unfortunately.
Comment 7 Mark Broadworth 2023-10-25 22:31:30 UTC
This is occurring for me as well on the generic 6.5 kernel that Ubuntu 23.10 installed.

Seems to happen early in the bootup process (at least before amdgpu is loaded).

Basic Hardware:
AMD Ryzen 7 5800X 8-Core Processor
Asus TUF GAMING X570-PLUS (BIOS 4802 06/15/2023)
64 GB RAM
Radeon 6800 XT

Demonstrated regression:
v6.4.14 - good
v6.5 - bad

I'm attempting a bisect.
Comment 8 Mark Broadworth 2023-10-28 23:29:35 UTC
This appears to have been introduced with:

commit dcadfd7f7c74ef9ee415e072a19bdf6c085159eb (HEAD -> dcadfd7f7c7)
Author: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Date:   Tue May 30 08:12:40 2023 +0900

    firewire: core: use union for callback of transaction completion
    
    In 1394 OHCI, the OUTPUT_LAST descriptor of Asynchronous Transmit (AT)
    request context has timeStamp field, in which 1394 OHCI controller
    record the isochronous cycle when the packet was sent for the request
    subaction. Additionally, for the case of split transaction in IEEE 1394,
    Asynchronous Receive (AT) request context is used for response subaction
    to finish the transaction. The trailer quadlet of descriptor in the
    context has timeStamp field, in which 1394 OHCI controller records the
    isochronous cycle when the packet arrived.
    
    Current implementation of 1394 OHCI controller driver stores values of
    both fields to internal structure as time stamp, while Linux FireWire
    subsystem provides no way to access to it. When using asynchronous
    transaction service provided by the subsystem, callback function is passed
    to kernel API. The prototype of callback function has the lack of argument
    for the values.
    
    This commit adds a new callback function for the purpose. It has an
    additional argument to point to the constant array with two elements. For
    backward compatibility to kernel space, a new union is also adds to wrap
    two different prototype of callback function. The fw_transaction structure
    has the union as a member and a boolean flag to express which function
    callback is available.
    
    The core function is changed to handle the two cases; with or without
    time stamp. For the error path to process transaction, the isochronous
    cycle is computed by current value of CYCLE_TIMER register in 1394 OHCI
    controller. Especially for the case of timeout of split transaction, the
    expected isochronous cycle is computed.
    
    Link: https://lore.kernel.org/r/20230529113406.986289-6-o-takashi@sakamocchi.jp
    Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Comment 9 Mark Broadworth 2023-10-28 23:41:01 UTC
I have a firewire card in my system. Affected kernels boot fine with the firewire card removed.

06:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev 80)

What are the next steps?
Comment 10 Mark Broadworth 2023-10-28 23:43:52 UTC
(The print on the chip itself reads VIA VT6307)
Comment 11 matthias.schrumpf 2023-10-30 11:39:35 UTC
(In reply to Mark Broadworth from comment #9)
> I have a firewire card in my system. Affected kernels boot fine with the
> firewire card removed.
> 
> 06:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)]
> IEEE 1394 OHCI Controller (rev 80)
> 
> What are the next steps?

Oh my god. It was the Firewire card the whole time?

I also had a VIA VT6307 in my computer. I just removed it and now I can boot with the 6.5+ kernels without any issues. 

I wasn't using this anyway, I just never would have expected it to cause such a problem. 
Hope you find a solution that works for you.
Comment 12 Mario Limonciello (AMD) 2023-11-07 21:28:54 UTC
*** Bug 217994 has been marked as a duplicate of this bug. ***
Comment 13 rjbgolding 2023-11-20 05:09:22 UTC
I also am experiencing this issue, reported it to
 https://bugs.launchpad.net/bugs/2043905 
in the Ubuntu forums, the piece I didn't report there is that I also tried the ubuntu built 6.6 Rc4 generic 64-bit beta kernel and it was also the same problem with that 6.6 kernel... so this may also effect 6.6 as well as the 6.5 kernels.

My system is
CPU: AMD Ryzen 5600X 
MB:  MSI MPG Gaming Plus (Bios 7C56v1F dated 12 Oct 2023)
Ram: 64 Gig 3200 ddr4
GPU: AMD Radeon RX6600
Comment 14 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-27 07:46:15 UTC
FWIW, it's a known issue that most likely still happens with mailine; for details see this msg and the two replies to it: https://lore.kernel.org/all/20231108051638.GA194133@workstation.local/
Comment 15 Mario Limonciello (AMD) 2024-01-17 18:22:50 UTC
A solution has been committed for this:

https://github.com/torvalds/linux/commit/ac9184fbb8478dab4a0724b279f94956b69be827

Note You need to log in before you can comment on or make changes to this bug.