Bug 219014

Summary: Kernel panic - not syncing: VFS: Unable to mount root fs on "UUID=XXX" or unknown-block(0,0)
Product: Platform Specific/Hardware Reporter: Jack D (wm2vdghq)
Component: ARMAssignee: linux-arm-kernel (linux-arm-kernel)
Status: NEW ---    
Severity: normal CC: regressions
Priority: P3    
Hardware: ARM   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: .config works in kernel 6.8.12
.config for kernel 6.9.x - generates kernel panic on boot
git bisect log
full boot log on panic
uboot start script

Description Jack D 2024-07-07 20:49:42 UTC
Created attachment 306542 [details]
.config works in kernel 6.8.12

Hi.

I'm marking as arm64 since git bisect identifies the first problematic commit as:
  6d75c6f40a03c97e1ecd683ae54e249abb9d922b

git bisect log is attached.

Any kernel tagged v6.9x and later fails to boot with the following error:

[    0.420154] Initramfs unpacking failed: invalid magic at start of compressed archive
[    0.433178] Freeing initrd memory: 8820K
... snip ...
[    0.701958] List of all partitions:
[    0.705604] No filesystem could mount root, tried: 
[    0.705608] 
[    0.712166] Kernel panic - not syncing: VFS: Unable to mount root fs on "UUID=a889afec-1ced-4493-ab8b-cf1569819c49" or unknown-block(0,0)                                                
[    0.724903] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.8-sheeva64 #sheeva64
[    0.732356] Hardware name: gti sheeva64 development board (DT)
[    0.738369] Call trace:

full start-up log attached.

kernel 6.8.12 and earlier builds boot no problem. 

Info:

- hardware: sheeva64 plug (Marvell Armada 3720 dual core ARMv8 Cortex-A53)
- uboot version: U-Boot 2017.03-armada-18.09.1-gc338309136
  - uboot init script with kernel cmdline is attached (I haven't changed this in at least a few years)
- ext4 rootfs is on external usb drive
- kernel 6.8.12 and earlier boot without issue (see attached: .config.6.8.12)
- problem first seen in commit: 6d75c6f40a03c97e1ecd683ae54e249abb9d922b (see attached: git_bisect.log)
- any kernel tagged 6.9x and later that I've tried fails with above error (see attached: .config.6.9.8.broken)
- I'm building the kernel by cross-compiling on debian "testing" (`crossbuild-essential-arm64`)
- the `initrd.img-6.9x` file itself appears valid:
  - `lsinitramfs` works no problem
  - `file initrd.img-6.9x` output is the same as the working initrds (`Zstandard compressed data (v0.8+), Dictionary ID: None`)

The bisected commit 6d75c6f40a03c97e1ecd683ae54e249abb9d922b is a merge of some 130+ arm64-related commits, so I'm having a hard time narrowing this down further. I'm glad to help test/troubleshoot but I'd need someone to give me some pointers.

thanks.

Jack Donohue
wm2vdghq@duck.com
Comment 1 Jack D 2024-07-07 20:51:44 UTC
Created attachment 306543 [details]
.config for kernel 6.9.x - generates kernel panic on boot
Comment 2 Jack D 2024-07-07 20:52:11 UTC
Created attachment 306544 [details]
git bisect log
Comment 3 Jack D 2024-07-07 20:53:26 UTC
Created attachment 306545 [details]
full boot log on panic
Comment 4 Jack D 2024-07-07 20:55:59 UTC
Created attachment 306546 [details]
uboot start script
Comment 6 Jack D 2024-07-08 19:30:51 UTC
Thanks for checking. Looks like that patch is for linux 6.8.x (the file in the patch doesn't exist in at all 6.9-rc1).

linux 6.8.x is working okay for me. The problem I'm experiencing starts in linux 6.9-rc1 and later.
Comment 7 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-07-09 09:17:36 UTC
Did you recheck if fe46a7dd189e25604716c03576d05ac8a5209743 (the first parent of 6d75c6f40a03c97e1ecd683ae54e249abb9d922b) really is fine). Rechecking if the second parent (1ef21fcd6a50f0) really is fine might be wise, too.
Comment 8 Jack D 2024-07-10 23:49:39 UTC
Hi.

Thank you for the pointers. Here are some new data points:

fe46a7dd189e (parent of 6d75c6f40a03) boots with no problem. I have done this several times now.

i.e., building from:
  git branch -b test-fe46a7dd189e fe46a7dd189e
boots okay.


1ef21fcd6a50f0 (the last commit id in 6d75c6f40a03) also boots with no problem.

i.e., building from:
  git branch -b test-1ef21fcd6a50 1ef21fcd6a50
is also good.


6d75c6f40a03 does not boot. I have done this several times now.

i.e.,
  git branch -b test-6d75c6f40a03 6d75c6f40a03


We've exceeded the limits of my git knowledge, but I guess this means that the arm64 tree itself at commit 6d75c6f40a03 was good, but it was out-of-sync with linux-6.9.y and broke something (for me) when it merged.

In fact it looks like commit id 6d75c6f40a03 was at approx. linux-6.8-rc3. So that commit was good for linux-6.8-rc3, but not so good for linux-6.9.y (for me, at least).

I guess my next option is to methodically apply the individual commits from 6d75c6f40a03 on top of fe46a7dd189e in order until it breaks?

Or I'd be grateful for any other suggestions for tackling this.

Thanks again.
Comment 9 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-07-12 14:05:21 UTC
(In reply to Jack D from comment #8)
> Or I'd be grateful for any other suggestions for tackling this.

Yeah, sounds like two changes that work fine independently break when in the same tree. A more advanced bisection could help: https://lore.kernel.org/all/20240306100153.32d305f7@meshulam.tesarici.cz/

Also note that decd347c2a75d3 ("x86/efistub: Reinstate soft limit for initrd loading") [v6.9-rc2] fixes and error that leads to a problem that causes you boot failure.

IOW: it's messy, so lets better bring in some developers. Can I CC you on a public mail (this would expose your email address to the world)?
Comment 10 Jack D 2024-07-12 18:41:12 UTC
Hi.
Thanks for looking into this. It is no problem with the email address as I always use disposables.

I'm going to take some time this weekend and see if I can narrow it down better (not sure how successful I'll be). So early next week I might be able to give a better starting point for someone if they were to look into it.

I don't see anyone else complaining about this so I'm starting to wonder if there is just something weird in my setup.

Jack.
Comment 11 Jack D 2024-07-16 05:35:54 UTC
Hi.
I saw a reply from Will Deacon on the mailing list:

> My guess (based on times I've seen this sort of thing in the past) is
> that the kernel binary grew in size and the broken bootloader is loading
> the kernel over the initramfs instead of parsing the 'image_size' field
> in the boot header.

I'm not sure of the protocol - if I'm supposed to reply directly on the mailing list here, but I'll reply here.

Yes, the 6.9x and now 6.10 kernels generate bigger vmlinuz than the working 6.8x kernels, so that seems like a real possibility.

I'll follow-up on that (need to read-up on uboot). Until now I have been using the predefined uboot addr variables that came with the board.

I will try to follow-up in the next couple of days on this.

Thanks again to everyone for the help and pointers.
Comment 12 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-07-16 06:26:46 UTC
(In reply to Jack D from comment #11)
> if I'm supposed to reply directly on the mailing list

Yes, please reply there; feel free to ignore this ticket, unless you need a place to attach files or something.

But there is no strong need to repost that update on the list; just do so when you investigated uboot.