Bug 218296

Summary: Kernel 6.6.8 locks up shortly after booting.
Product: Linux Reporter: Chris Rankin (rankincj)
Component: KernelAssignee: Virtual assignee for kernel bugs (linux-kernel)
Status: NEW ---    
Severity: normal CC: rdunlap
Priority: P3    
Hardware: i386   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: 6.5.13 config
6.6.8 config
dmesg log for Linux 6.5.0
6.4.16 config
dmesg log for 6.4.16
Crash dump - 1
Crash dump - 2
Crash dump - 3
Crash dump - 4

Description Chris Rankin 2023-12-21 00:12:41 UTC
I have an ancient UP i586 machine which successfully runs 6.4.16 but which crashes without logging an oops shortly after booting either 6.5.13 or 6.6.8.

This bug *might* be network-related, but is not fixed by:
```
--- linux-6.5/include/net/neighbour.h.orig	2023-12-10 22:11:54.079741645 +0000
+++ linux-6.5/include/net/neighbour.h	2023-12-10 22:12:24.920364781 +0000
@@ -162,7 +162,7 @@
 	struct rcu_head		rcu;
 	struct net_device	*dev;
 	netdevice_tracker	dev_tracker;
-	u8			primary_key[0];
+	u8			primary_key[];
 } __randomize_layout;
 
 struct neigh_ops {
```
The dmesg log seems to get this far before stopping:
```
NET: Registered PF_INET6 protocol family
Segment Routing with IPv6
In-situ OAM (IOAM) with IPv6
bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
br0: port 1(eth1) entered blocking state
br0: port 1(eth1) entered disabled state
e100 0000:01:04.0 eth1: entered allmulticast mode
e100 0000:01:04.0 eth1: entered promiscuous mode
e100 0000:01:04.0 eth1: NIC Link is Up 100 Mbps Full Duplex
br0: port 2(eth2) entered blocking state
br0: port 2(eth2) entered disabled state
e100 0000:01:05.0 eth2: entered allmulticast mode
e100 0000:01:05.0 eth2: entered promiscuous mode
e100 0000:01:05.0 eth2: NIC Link is Up 100 Mbps Full Duplex
br0: port 2(eth2) entered blocking state
br0: port 2(eth2) entered forwarding state
br0: port 1(eth1) entered blocking state
br0: port 1(eth1) entered forwarding state
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp-with-tls transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
e100 0000:00:0f.0 eth0: NIC Link is Up 100 Mbps Full Duplex
```
Comment 1 Chris Rankin 2023-12-21 00:16:13 UTC
Created attachment 305637 [details]
6.5.13 config
Comment 2 Chris Rankin 2023-12-21 00:17:38 UTC
Created attachment 305638 [details]
6.6.8 config
Comment 3 Chris Rankin 2023-12-21 22:20:09 UTC
Created attachment 305642 [details]
dmesg log for Linux 6.5.0

Linux 6.5.0 also fails on this UP machine.
Comment 4 Randy Dunlap 2023-12-22 04:22:53 UTC
Chris, does a 6.4 kernel run successfully?
Please post a successful boot log and kernel .config file.
Comment 5 Chris Rankin 2023-12-22 09:20:30 UTC
Created attachment 305643 [details]
6.4.16 config
Comment 6 Chris Rankin 2023-12-22 09:21:32 UTC
Created attachment 305644 [details]
dmesg log for 6.4.16

Dmesg log for successful boot with 6.4.16.
Comment 7 Chris Rankin 2023-12-22 15:56:20 UTC
Created attachment 305645 [details]
Crash dump - 1

6.6.8 again, except recompiled with an updated toolchain. Once the kernel had locked up, I managed to trigger an oops via SysRq-"kill all tasks".

I was obviously only able to capture what would fit on my screen at the time.
Comment 8 Chris Rankin 2023-12-22 15:57:39 UTC
Created attachment 305646 [details]
Crash dump - 2

Same crash, screenshot 2.
Comment 9 Chris Rankin 2023-12-22 15:58:21 UTC
Created attachment 305647 [details]
Crash dump - 3

Same crash, screenshot 3.
Comment 10 Chris Rankin 2023-12-22 15:59:22 UTC
Created attachment 305648 [details]
Crash dump - 4

Same crash, screenshot 4.

I gave up after this and rebooted the box back to 6.4.16.
Comment 11 Chris Rankin 2023-12-23 11:33:30 UTC
One obvious theory is that the e100 driver could be broken as of 6.5.0.