Bug 74901 - kernel panic during boot init
Summary: kernel panic during boot init
Status: NEW
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P1 high
Assignee: other_other
URL:
Keywords:
Depends on: 74261
Blocks:
  Show dependency tree
 
Reported: 2014-04-27 07:59 UTC by Plamen
Modified: 2014-05-18 07:16 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.14
Subsystem:
Regression: No
Bisected commit-id:


Attachments
patch to list available filesystems and show do_mount_root error codes (1.57 KB, patch)
2014-04-27 08:17 UTC, Plamen
Details | Diff
Patch to make the kernel try to mount root using all available filesystems during boot (778 bytes, patch)
2014-04-27 09:21 UTC, Plamen
Details | Diff
proposed fix (1.29 KB, patch)
2014-04-27 11:47 UTC, Plamen
Details | Diff

Description Plamen 2014-04-27 07:59:45 UTC
Currently, linux does this during init - mount_block_root:

	get_fs_names(fs_names);
retry:
	for (p = fs_names; *p; p += strlen(p)+1) {
		int err = do_mount_root(name, p, flags, root_mount_data);
		switch (err) {
			case 0:
				goto out;
			case -EACCES:
				flags |= MS_RDONLY;
				goto retry;
			case -EINVAL:
				continue;
		}
	        /*
		 * Allow the user to distinguish between failed sys_open
		 * and bad superblock on root device.
		 * and give them a list of the available devices
                 * ...
		 */
         }

The above code tries to mount filesystems only while the current one returns 0, EACCES or EINVAL.

Result is kernel panic during boot if for some reason the current filesystem tried returns error code other than 0, EACCES or EINVAL, like so:

[   11.532543] BTRFS: selftest: Running hole first btrfs_get_extent test
[   11.604166] rtc_cmos 00:03: setting system clock to 2014-04-25 17:32:39 UTC (1398447159)
[   11.677846] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input3
[   11.682145] md: Skipping autodetection of RAID arrays. (raid=autodetect will force)
[   11.720494] VFS: Cannot open root device "sda2" or unknown-block(8,2): error -38
[   11.723463] Please append a correct "root=" boot option; here are the available partitions:
[   11.727965] 0b00         1048575 sr0  driver: sr
[   11.730085] 0800        12582912 sda  driver: sd
[   11.732023]   0801           51200 sda1 000095d3-01
[   11.733742]   0802         5242880 sda2 000095d3-02
[   11.734573]   0803         4194304 sda3 000095d3-03
[   11.735406]   0804         3093504 sda4 000095d3-04
[   11.736257] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,2)
Comment 1 Plamen 2014-04-27 08:06:50 UTC
On a kernel (3.14.1) with ext4, xfs and btrfs built-in this resulted in unbootable system - root was formatted as btrfs.

See https://bugzilla.kernel.org/show_bug.cgi?id=74261
Comment 2 Plamen 2014-04-27 08:17:40 UTC
Created attachment 133861 [details]
patch to list available filesystems and show do_mount_root error codes
Comment 3 Plamen 2014-04-27 08:41:51 UTC
Using the attached "[patch] patch to list available filesystems and show do_mount_root error codes" applied to 3.14.1 resulted in this:

[    0.000000] Linux version 3.14.1-mix64-00001-g1713529 (root@btrfs-debug-at-home) (gcc version 4.7.3 (CRUX-x86_64-multilib) ) #18 SMP Sun Apr 27 11:35:28 EEST 2014
[    0.000000] Command line: rw root=/dev/sda2 console=ttyS0,9600n8 console=tty0
...
[   12.438349] Btrfs loaded
[   12.444824] rtc_cmos 00:03: setting system clock to 2014-04-27 11:37:41 UTC (1398598661)
[   12.605535] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input3
[   12.618062] md: Waiting for all devices to be available before autodetect
[   12.626581] md: If you don't use raid, use raid=noautodetect
[   12.635685] md: Autodetecting RAID arrays.
[   12.640633] md: Scanned 0 and added 0 devices.
[   12.646080] md: autorun ...
[   12.649745] md: ... autorun DONE.
[   12.653710] ---> EACCESS=13, EINVAL=22, Available filesystems: ext3 ext2 ext4 fuseblk xfs btrfs
[   12.692559] -----> Tried ext3, error code is -22.
[   12.695326] -----> Tried ext2, error code is -22.
[   12.697948] -----> Tried ext4, error code is -22.
[   12.700464] -----> Tried fuseblk, error code is -22.
[   12.705165] -----> Tried xfs, error code is -38.
[   12.707686] VFS: Cannot open root device "sda2" or unknown-block(8,2): error -38
[   12.711523] Please append a correct "root=" boot option; here are the available partitions:
[   12.716142] 0b00         1048575 sr0  driver: sr
[   12.718683] 0800        12582912 sda  driver: sd
[   12.721216]   0801          204800 sda1 00061e1d-01
[   12.723865]   0802         4703232 sda2 00061e1d-02
[   12.726569]   0803         4726784 sda3 00061e1d-03
[   12.729182]   0804         2947072 sda4 00061e1d-04
[   12.731943] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,2)
[   12.736260] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.1-mix64-00001-g1713529 #18

And all filesystems listed on this line:
[   12.653710] ---> EACCESS=13, EINVAL=22, Available filesystems: ext3 ext2 ext4 fuseblk xfs btrfs
where built-in - not modules (the kernel used is monolithic).
Comment 4 Plamen 2014-04-27 08:48:31 UTC
From the previous comment - it became obvious that xfs did not follow the rules for return code mount_block_root expected: trying to mount_block_root with xfs clearly results in something other than 0, EACCES or EINVAL.

So, the kernel parameter
 rootfstype=btrfs
was added just to check the result. Here is it:

[    0.000000] Linux version 3.14.1-mix64-00001-g1713529 (root@btrfs-debug-at-home) (gcc version 4.7.3 (CRUX-x86_64-multilib) ) #18 SMP Sun Apr 27 11:35:28 EEST 2014
[    0.000000] Command line: rw root=/dev/sda2 console=ttyS0,9600n8 console=tty0 rootfstype=btrfs
...
[   12.265182] Btrfs loaded
[   12.272186] rtc_cmos 00:03: setting system clock to 2014-04-27 11:47:20 UTC (1398599240)
[   12.447433] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input3
[   12.454442] md: Waiting for all devices to be available before autodetect
[   12.458027] md: If you don't use raid, use raid=noautodetect
[   12.462997] md: Autodetecting RAID arrays.
[   12.465222] md: Scanned 0 and added 0 devices.
[   12.467685] md: autorun ...
[   12.469239] md: ... autorun DONE.
[   12.471090] ---> EACCESS=13, EINVAL=22, Available filesystems: btrfs
[   12.503695] BTRFS: device fsid 2ba08fbc-4b95-46cc-b638-299f16462620 devid 1 transid 488 /dev/root
[   12.514002] BTRFS info (device sda2): disk space caching is enabled
[   12.806683] VFS: Mounted root (btrfs filesystem) on device 0:13.
[   12.813976] -----> Tried btrfs, error code is 0.
[   12.877037] devtmpfs: mounted
[   12.884882] Freeing unused kernel memory: 932K (ffffffff81c85000 - ffffffff81d6e000)

And the system booted normally.
Comment 5 Plamen 2014-04-27 09:17:35 UTC
All of the above led to the solution to change 

 case -EINVAL:

to

 default:

in init/do_mounts.c mount_block_root routine, like so:

        ...
	get_fs_names(fs_names);
retry:
	for (p = fs_names; *p; p += strlen(p)+1) {
		int err = do_mount_root(name, p, flags, root_mount_data);
		switch (err) {
			case 0:
				goto out;
			case -EACCES:
				flags |= MS_RDONLY;
				goto retry;
			default:
				continue;
		}
	        /*
		 * Allow the user to distinguish between failed sys_open
		 * and bad superblock on root device.
		 * and give them a list of the available devices
                 * ...
		 */
         }

The above is also attached as solution-patch [PATCH] fix_not_all_available_fs_tried_during_boot_panic.patch
Comment 6 Plamen 2014-04-27 09:21:44 UTC
Created attachment 133871 [details]
Patch to make the kernel try to mount root using all available filesystems during boot

Fixes the problem with not trying BTRFS because xfs returns error code ENOSYS = 38, which is obviously not 0, 13 nor 22.
Comment 7 Plamen 2014-04-27 09:47:00 UTC
With both patches applied to 3.14.1 and the same kernel parameters as the one used in comment 3:

[    0.000000] Linux version 3.14.1-mix64-00002-g9111951 (root@btrfs-debug-at-home) (gcc version 4.7.3 (CRUX-x86_64-multilib) ) #19 SMP Sun Apr 27 12:27:58 EEST 2014
[    0.000000] Command line: rw root=/dev/sda2 console=ttyS0,9600n8 console=tty0
...
[   12.713120] Btrfs loaded
[   12.719589] rtc_cmos 00:03: setting system clock to 2014-04-27 12:42:48 UTC (1398602568)
[   12.873388] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input3
[   12.887039] md: Waiting for all devices to be available before autodetect
[   12.895020] md: If you don't use raid, use raid=noautodetect
[   12.904627] md: Autodetecting RAID arrays.
[   12.909676] md: Scanned 0 and added 0 devices.
[   12.914995] md: autorun ...
[   12.918373] md: ... autorun DONE.
[   12.922487] ---> EACCESS=13, EINVAL=22, Available filesystems: ext3 ext2 ext4 fuseblk xfs btrfs
[   12.954855] -----> Tried ext3, error code is -22.
[   12.957565] -----> Tried ext2, error code is -22.
[   12.960202] -----> Tried ext4, error code is -22.
[   12.963495] -----> Tried fuseblk, error code is -22.
[   12.967808] -----> Tried xfs, error code is -38.
[   12.970805] BTRFS: device fsid 2ba08fbc-4b95-46cc-b638-299f16462620 devid 1 transid 494 /dev/root
[   12.977186] BTRFS info (device sda2): disk space caching is enabled
[   13.158690] VFS: Mounted root (btrfs filesystem) on device 0:13.
[   13.165852] -----> Tried btrfs, error code is 0.
[   13.181333] devtmpfs: mounted
[   13.188654] Freeing unused kernel memory: 932K (ffffffff81c85000 - ffffffff81d6e000)

we see that the systems boots up normally, as expected.
Comment 8 Plamen 2014-04-27 11:22:38 UTC
If we try to fool the patched kernel (both patches applied) with some obviously invalid file system types using
 rootfstype=xfs,some,invalid,reiserfs,nilfs2,ext2,ext3,ext4,jfs,cifs,btrfs
kernel parameter, the kernel handles it gracefully:

[    0.000000] Linux version 3.14.1-mix64-00002-g9111951 (root@btrfs-debug-at-home) (gcc version 4.7.3 (CRUX-x86_64-multilib) ) #19 SMP Sun Apr 27 12:27:58 EEST 2014
[    0.000000] Command line: rw root=/dev/sda2 console=ttyS0,9600n8 console=tty0 rootfstype=xfs,some,invalid,reiserfs,nilfs2,ext2,ext3,ext4,jfs,cifs,btrfs
...
[   12.617966] Btrfs loaded
[   12.624626] rtc_cmos 00:03: setting system clock to 2014-04-27 14:15:00 UTC (1398608100)
[   12.786838] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input3
[   12.800316] md: Waiting for all devices to be available before autodetect
[   12.808461] md: If you don't use raid, use raid=noautodetect
[   12.817460] md: Autodetecting RAID arrays.
[   12.822685] md: Scanned 0 and added 0 devices.
[   12.828783] md: autorun ...
[   12.831067] md: ... autorun DONE.
[   12.832910] ---> EACCESS=13, EINVAL=22, Available filesystems: xfs some invalid reiserfs nilfs2 ext2 ext3 ext4 jfs cifs btrfs
[   12.864020] -----> Tried xfs, error code is -38.
[   12.866671] -----> Tried some, error code is -19.
[   12.869474] -----> Tried invalid, error code is -19.
[   12.872179] -----> Tried reiserfs, error code is -19.
[   12.874822] -----> Tried nilfs2, error code is -19.
[   12.878125] -----> Tried ext2, error code is -22.
[   12.882153] -----> Tried ext3, error code is -22.
[   12.885208] -----> Tried ext4, error code is -22.
[   12.887899] -----> Tried jfs, error code is -19.
[   12.890369] -----> Tried cifs, error code is -19.
[   12.893628] BTRFS: device fsid 2ba08fbc-4b95-46cc-b638-299f16462620 devid 1 transid 504 /dev/root
[   12.900106] BTRFS info (device sda2): disk space caching is enabled
[   13.151022] VFS: Mounted root (btrfs filesystem) on device 0:13.
[   13.158421] -----> Tried btrfs, error code is 0.
[   13.170442] devtmpfs: mounted
[   13.177874] Freeing unused kernel memory: 932K (ffffffff81c85000 - ffffffff81d6e000)
Comment 9 Plamen 2014-04-27 11:29:08 UTC
Here is how the kernel still panics with a helpfull message when the correct FS is not tried:

[    0.000000] Linux version 3.14.1-mix64-00002-g9111951 (root@btrfs-debug-at-home) (gcc version 4.7.3 (CRUX-x86_64-multilib) ) #19 SMP Sun Apr 27 12:27:58 EEST 2014
[    0.000000] Command line: rw root=/dev/sda2 console=ttyS0,9600n8 console=tty0 rootfstype=xfs,some,invalid,reiserfs,nilfs2,ext2,ext3,ext4,jfs,cifs
...
[   12.201941] Btrfs loaded
[   12.209293] rtc_cmos 00:03: setting system clock to 2014-04-27 14:23:57 UTC (1398608637)
[   12.336936] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input3
[   12.342952] md: Waiting for all devices to be available before autodetect
[   12.346516] md: If you don't use raid, use raid=noautodetect
[   12.351081] md: Autodetecting RAID arrays.
[   12.353383] md: Scanned 0 and added 0 devices.
[   12.355845] md: autorun ...
[   12.357563] md: ... autorun DONE.
[   12.359404] ---> EACCESS=13, EINVAL=22, Available filesystems: xfs some invalid reiserfs nilfs2 ext2 ext3 ext4 jfs cifs
[   12.395783] -----> Tried xfs, error code is -38.
[   12.397336] -----> Tried some, error code is -19.
[   12.398905] -----> Tried invalid, error code is -19.
[   12.400494] -----> Tried reiserfs, error code is -19.
[   12.402111] -----> Tried nilfs2, error code is -19.
[   12.404234] -----> Tried ext2, error code is -22.
[   12.406153] -----> Tried ext3, error code is -22.
[   12.408071] -----> Tried ext4, error code is -22.
[   12.409868] -----> Tried jfs, error code is -19.
[   12.411383] -----> Tried cifs, error code is -19.
[   12.413003] List of all partitions:
[   12.414178] 0b00         1048575 sr0  driver: sr
[   12.415739] 0800        12582912 sda  driver: sd
[   12.417269]   0801          204800 sda1 00061e1d-01
[   12.418943]   0802         4703232 sda2 00061e1d-02
[   12.420581]   0803         4726784 sda3 00061e1d-03
[   12.422263]   0804         2947072 sda4 00061e1d-04
[   12.424110] No filesystem could mount root, tried:  xfs some invalid reiserfs nilfs2 ext2 ext3 ext4 jfs cifs
[   12.427976] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,2)
[   12.430643] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.14.1-mix64-00002-g9111951 #19
[   12.433175] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   12.436597]  ffff88005bd56039 ffff88005c8addb8 ffffffff817b4c00 0000000000000001
[   12.439309]  ffffffff819e9648 ffff88005c8ade38 ffffffff817b05f3 0000000000008000
[   12.442103]  0000000000000010 ffff88005c8ade48 ffff88005c8adde8 ffffff0034616473
[   12.444772] Call Trace:
[   12.445603]  [<ffffffff817b4c00>] dump_stack+0x46/0x58
[   12.447252]  [<ffffffff817b05f3>] panic+0xbc/0x1ba
[   12.448830]  [<ffffffff81c9a475>] mount_block_root+0x284/0x29c
[   12.450695]  [<ffffffff81c9a603>] mount_root+0x57/0x5b
[   12.452353]  [<ffffffff81c9a769>] prepare_namespace+0x162/0x19b
[   12.454547]  [<ffffffff81c9a117>] kernel_init_freeable+0x1df/0x1ef
[   12.456553]  [<ffffffff81c99894>] ? do_early_param+0x8c/0x8c
[   12.458330]  [<ffffffff817ab480>] ? rest_init+0x70/0x70
[   12.460042]  [<ffffffff817ab489>] kernel_init+0x9/0xf0
[   12.461701]  [<ffffffff817bda8c>] ret_from_fork+0x7c/0xb0
[   12.463485]  [<ffffffff817ab480>] ? rest_init+0x70/0x70
[   12.465289] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
Comment 10 Plamen 2014-04-27 11:34:00 UTC
All of the above is generated using the environment/kernel configs/etc described in and attached to bug 74261 - https://bugzilla.kernel.org/show_bug.cgi?id=74261

The attached patch https://bugzilla.kernel.org/attachment.cgi?id=133871 solves the problems described at bug 74261.
Comment 11 Plamen 2014-04-27 11:47:17 UTC
Created attachment 133881 [details]
proposed fix

While debugging a kernel panic with 3.14.1 it became clear that some
changes made some filesystems mount routines return error codes other
than 0, EACCES and EINVAL. Such return codes result in the kernel
panicking without trying to mount root with all of the available
filesystems, as can be seen in bugzilla entries 74901 and 74261.

Make mount_block_root continue trying other available filesystems by
default, not only when the last tried returned EACCES or EINVAL.

Signed-off-by: Plamen Petrov <plamen.sisi@gmail.com>
Comment 12 Eric Sandeen 2014-05-18 02:03:49 UTC
Strange, XFS has returned ENOSYS for a very long time... since 2.6.12 we've done something like this:

+       /*
+        * We must be able to do sector-sized and sector-aligned IO.
+        */
+       if (sector_size > mp->m_sb.sb_sectsize) {
+               cmn_err(CE_WARN,
+                       "XFS: device supports only %u byte sectors (not %u)",
+                       sector_size, mp->m_sb.sb_sectsize);
+               error = ENOSYS;
+               goto fail;
+       }
Comment 13 Plamen 2014-05-18 07:16:49 UTC
Well, maybe these error sign issues will address the problem:

http://article.gmane.org/gmane.linux.kernel/1705038

Or maybe these error sign issues have nothing to do with all this, and its just now that someone have found a case which shows there is a problem. After all, my debug code says:

[   12.864020] -----> Tried xfs, error code is -38.

and that is consistent with what you say - its ENOSYS.

Anyway, mount_block_root needs fixing because if one tries to trick the kernel using rootfstype= parameter with some nonsense - look at my debug - there are still return codes different than the accepted 0, EACCES and EINVAL. Hence, the patch.

I've sent the patch 3 times to LKML, and 2 times CCed to Linus - latest here:

https://lkml.org/lkml/2014/5/13/971

Any idea for what to do so someone finally picks it up? I hope Linus will pick it up when he returns from his trip, but it now has been a month since my initial bug report...

Note You need to log in before you can comment on or make changes to this bug.