Bug 118401
Description
Kevin Brubeck Unhammer
2016-05-18 11:41:03 UTC
This is not an ext4 issue. It could be a problem with how the kernel was configured. It could be a hardware issue; it could be many things. But the problem seems to be that the root device is not getting detected, so this is before the file system gets involved. Part of the problem is that hardware support issues are one of the things that take a quite a lot of time to try to resolve remotely, especially if the user doesn't know how to get the necessary debugging information, and upstream developers are all volunteers when it comes to giving support to random end users. Normally this is work that gets done by distributions, especially if they are getting paid support $$$. But they don't want to support anything other than their standard kernel. You can find kernel folks who also have a particular hardware, or something similar, and they will help on a volunteer basis. The problem is while many of us will have Thinkpads (for example), many fewer kernel developers are likely to have come across an All-in-one Lenovo A740 desktop. This sort of problem is much better handled by someone who knows how to get the necessary debugging information (the kernel configuration used to build the kernel image you are trying to use, and the kernel dmesg messages during the boot process, etc.). So if you are near a local Linux User's Group, that might be a good place to try to ask for help. I will say that my Haswell-based Lenovo Thinkpad T540p works just fine with upstream kernels, so my guess is that it's mostly a kernel configuration issue, possibly combined with needing to make changes in the BIOS settings. So there's a good chance this isn't a kernel bug at all. Created attachment 216681 [details]
kernel config 4.3.0-040300-generic
config of lowest tested not-booting kernel version
Created attachment 216691 [details]
kernel config 4.6.0-040600rc6-generic
kernel config of highest tested (not-booting) kernel version
Created attachment 216701 [details]
kernel config 4.2.8-040208-generic
kernel config of highest tested *booting* kernel
Thanks for taking the time to comment :) https://help.ubuntu.com/community/Kernel/Compile says /boot/config* are the kernel configurations used to create the .deb's of the kernel's (which I got from http://kernel.ubuntu.com/~kernel-ppa/mainline/), so I attached those at least. Maybe there's something relevant in the change from 4.2.8 to 4.3 (that didn't change back in 4.6)? I did find some tips on https://wiki.archlinux.org/index.php/Boot_debugging and http://elinux.org/Kernel_Debugging_Tips – I'll try to figure out how to get a detailed log of the failing boot, and see if there are other kernel images that might work. https://ext4.wiki.kernel.org/index.php/UpgradeToExt4 says in section "Converting an ext3 filesystem to ext4" Note: The ext3 driver will be removed from the kernel in 4.3. Your root filesystem is ext3 . Maybe your second image drop is says root image is not found due to the above. Your options are like 1) Convert your data to ext4 2) Enable the ext3 filesystem driver explicitly in your menuconfig /xconfig After saving your .config diff with the older non-working configuration should look like ie added EXT3_FS and build a kernel -# CONFIG_EXT3_FS is not set +CONFIG_EXT3_FS=y +CONFIG_EXT3_FS_POSIX_ACL=y +CONFIG_EXT3_FS_SECURITY=y 3) Convert your data to ext2 and then try with the current non-working kernels. The first file in diff is working (4.2) and the second file is non working . # CONFIG_EXT2_FS is not set # CONFIG_EXT3_FS is not set CONFIG_EXT4_FS=y -CONFIG_EXT4_USE_FOR_EXT23=y +CONFIG_EXT4_USE_FOR_EXT2=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_EXT4_FS_SECURITY=y CONFIG_EXT4_ENCRYPTION=m # CONFIG_DEBUG_BLOCK_EXT_DEVT is not set Looks like you have removed in EXT23 in your config and used it as EXT2 . This is for first file as (4.2.) and 4.6 as the second file (.config). # CONFIG_EXT2_FS is not set # CONFIG_EXT3_FS is not set CONFIG_EXT4_FS=y -CONFIG_EXT4_USE_FOR_EXT23=y +CONFIG_EXT4_USE_FOR_EXT2=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_EXT4_FS_SECURITY=y CONFIG_EXT4_ENCRYPTION=m The ext4 file system will happily mount file systems that are intended for ext2 and ext3. That is, ext4 supports a superset of the features supported by the ext2 and ext3 file system implementations. The CONFIG_EXT4_USE_FOR_EXT23 (before ext3 dropped was dropped) and CONFIG_EXT4_USE_FOR_EXT2 (after the ext3 driver was dropped) merely determines whether or not ext4 claims to be the ext2 or ext3 drivers. This matters only if the file system type is explicitly used in the mount command (e.g., "mount -t ext3 ...") or if you are mounting a non-root file system in /etc/fstab. Now, the distribution is doing something wierd with initramfs, so it's possible this is the cause of the issue, but when the kernel is mounting the root file system it will try using all of the file systems available to it to try to mount it. Furthermore the error message of not being able to find the file system with the specific UUID isn't the error message in the boot logs that would indicate that the problem was that the initramfs was trying to mount with ext3 and failed to find it. Still, if you want to try compiling with CONFIG_EXT4_USE_FOR_EXT2, that certainly wouldn't hurt and would probably help once you get the system working. Thanks for the information and looks like it has nothing to do with code, atleast i could simulate the ALERT UUID= with a wrong hex char in kernel command line root=UUID and i get the same error. Also OP already tried with CONFIG_EXT4_USE_FOR_EXT2 and it doesn't work for him as per the diff in this .configs. output of command blkid from the initramfs shell would definitely help in figuring out more about this problem. Hi Kevin, Can you please get the required output or close the bug so that we can either fix it if required or it is an issue that is considered to be resolved by you. Sorry for the late reply – I still haven't been able to get keyboard working in initramfs, so not much more useful information :( I did try running blkid from an initramfs script: $ cat /etc/initramfs-tools/scripts/init-premount/kbu-blkid #!/bin/sh PREREQ="" prereqs() { echo "$PREREQ" } case $1 in prereqs) prereqs exit 0 ;; esac echo "KBU" blkid echo "/KBU" $ sudo update-initramfs -u update-initramfs: Generating /boot/initrd.img-4.6.0-040600rc6-generic but that just shows running /scripts/init-premount ... KBU /KBU as if the "blkid" command had no output. Is there something more useful I could put in there? Would it be useful at all to compile a kernel with debug symbols as in https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel#Debug_Symbols or if I change some other configuration options? (I don't really know what I should change though, since CONFIG_EXT4_USE_FOR_EXT2 is already tested as per the config diff). There are 2 things. 1) You don't need to compile a kernel with debug symbols as of now or change any configuration. This doesn't seem something related to kernel configuration. 2) The images you posted in the bug report like drop.jpg Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... done. Begin: Running /scripts/local-block ... done. Begin: Running /scripts/local-block ... done. […] done. Gave up waiting for root device. and drops me into an (initramfs) shell When you drop into (initramfs) shell, if you could get the /sbin/blkid output it would be great. It is strange that the keyboard doesn't work . I hope you have compiled the kernel after make config/menuconfig . a) make b) make modules_install c) make install Step (b) is missed sometimes by many. It installs the necessary drivers in /lib/modules/`uname -r` and them puts them in initramfs. So one possible reason is missing make modules_install. The size of initrd.x.y.img should easily tell you that. lsinitramfs /boot/initrd.img-4.6.0 should have the necessary modules like usbkbd.ko. In my case. lsinitramfs /boot/initrd.img-4.6.0+ | grep usbkbd lib/modules/4.6.0+/kernel/drivers/hid/usbhid/usbkbd.ko I guess it would be helpful to have these as well . I'm guess blkid is not returning anything could also mean your block layer disk is not being detected or something wrong. echo "KBU" cat /proc/partitions /sbin/blkid cat /proc/cmdline dmesg | grep -i blocks echo "/KBU" Created attachment 218671 [details]
phone-image of output from initramfs-script
This is the updated script that lead to the output in the attachment:
$ cat /etc/initramfs-tools/scripts/init-premount/kbu-blkid
#!/bin/sh
PREREQ=""
prereqs()
{
echo "$PREREQ"
}
case $1 in
prereqs)
prereqs
exit 0
;;
esac
echo "KBU"
echo 'cat /proc/partitions'
cat /proc/partitions
echo '/sbin/blkid'
/sbin/blkid
echo 'cat /proc/cmdline'
cat /proc/cmdline
echo 'dmesg | grep -i blocks'
dmesg | grep -i blocks
echo "/KBU"
Your block devices are not being detected due to some problem. Maybe the driver for it is not getting loaded. Can you post the complete dmesg from 4.2 working ? In the initramfs pre mount scripts Instead of dmesg | grep -i blocks you can add dmesg | grep -i achi dmesg | grep -i scsi dmesg | grep -i sata dmesg | grep -i ata[0-4] cat /proc/scsi/scsi The below device is missing in your 4.6 . Below is your 4.2 [7.6.] SCSI information (from /proc/scsi/scsi) (from running 4.2) Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: ST1000LM014-1EJ1 Rev: LIV6 Type: Direct-Access ANSI SCSI revision: 05 Created attachment 218791 [details]
dmesg from 4.2.0-35-generic
Here's the complete dmesg from 4.2.0 (am ssh-ing now; will get the initramfs dmesg's when I get back to the physical machine …)
Your 4.2 is a signed kernel where as the other two aren't (4.6 and 4.4). I hope secureboot is disabled in bios already. If not please try that also. If the ahci part fails ie dmesg | grep -i ahci produces no output, then you need to include in /etc/initramfs-tools/modules ahci libahci and then run update-initramfs -u Created attachment 218851 [details]
phone-image of initramfs-script w/ahci, scsi dmesg's, 4.6
$ cat /etc/initramfs-tools/scripts/init-premount/kbu-blkid
#!/bin/sh
PREREQ=""
prereqs()
{
echo "$PREREQ"
}
case $1 in
prereqs)
prereqs
exit 0
;;
esac
echo "KBU"
echo 'cat /proc/partitions # gives:'
cat /proc/partitions
echo '/sbin/blkid # gives:'
/sbin/blkid
echo 'cat /proc/cmdline # gives:'
cat /proc/cmdline
echo 'dmesg | grep -i blocks # gives:'
dmesg | grep -i blocks
echo 'dmesg | grep -i ahci # gives:'
dmesg | grep -i ahci
echo 'dmesg | grep -i scsi # gives:'
dmesg | grep -i scsi
echo 'dmesg | grep -i sata # gives:'
dmesg | grep -i sata
echo 'dmesg | grep -i ata[0-4] # gives:'
dmesg | grep -i ata[0-4]
echo 'cat /proc/scsi/scsi # gives:'
cat /proc/scsi/scsi
echo "/KBU"
$ cat /etc/initramfs-tools/modules
# List of modules that you want to include in your initramfs.
# They will be loaded at boot time in the order below.
#
# Syntax: module_name [args ...]
#
# You must run update-initramfs(8) to effect this change.
#
# Examples:
#
# raid1
# sd_mod
ahci
libahci
Created attachment 218861 [details] phone-image of initramfs-script w/ahci, scsi dmesg's, 4.4 signed Same script/modules as attachment 218851 [details], on linux-signed-image-4.4.0-22-generic from Ubuntu repos (so signing/secureboot seems irrelevant?) Created attachment 218871 [details] phone-image of initramfs-script w/ahci, scsi dmesg's, 4.2.8 (the working one) Snatched an image as the text flew by in one of the working kernels, just for comparison. (The 4.2.0 I got through regular Ubuntu signed repos, but this 4.2.8 was from http://kernel.ubuntu.com/~kernel-ppa/mainline/ and has no corresponding "linux-signed-image".) (In reply to Navin from comment #13) > Instead of dmesg | grep -i blocks > > you can add > > dmesg | grep -i achi I changed this to "ahci" – was that right? Can youou have ahci modules present in your initramfs ? Mine returns like 1) lsinitramfs /boot/initrd.img-4.7.0-rc1+ | grep -i ahci lib/modules/4.7.0-rc1+/kernel/drivers/ata/ahci_platform.ko lib/modules/4.7.0-rc1+/kernel/drivers/ata/ahci.ko lib/modules/4.7.0-rc1+/kernel/drivers/ata/libahci.ko lib/modules/4.7.0-rc1+/kernel/drivers/ata/libahci_platform.ko lib/modules/4.7.0-rc1+/kernel/drivers/ata/acard-ahci.ko 2) If the above returns that it is present and loaded confirmed by grep -i ahci /proc/modules from your initramfs grep -i ahci /proc/modules ahci 36864 2 - Live 0xffffffffc0044000 libahci 32768 1 ahci, Live 0xffffffffc0015000 If (1) and (2) work as expected like described ie they are loaded and present. Please post the output of (1) and (2) what it returns in your case. 3) Then maybe something related to irq like described in https://bugzilla.kernel.org/show_bug.cgi?id=111211 where comment 7 says "It works with 'pci=routeirq' parameter" in kernel command line. I asked for this because your 4.2 dmesg showed something like [ 0.716610] ahci 0000:00:1f.2: version 3.0 [ 0.716620] ahci 0000:00:1f.2: can't find IRQ for PCI INT B; probably buggy MP table re: 1) $ lsinitramfs /boot/initrd.img-4.4.0-22-generic |grep -i ahci lib/modules/4.4.0-22-generic/kernel/drivers/ata/ahci.ko lib/modules/4.4.0-22-generic/kernel/drivers/ata/libahci.ko lib/modules/4.4.0-22-generic/kernel/drivers/ata/acard-ahci.ko lib/modules/4.4.0-22-generic/kernel/drivers/ata/libahci_platform.ko lib/modules/4.4.0-22-generic/kernel/drivers/ata/ahci_platform.ko $ lsinitramfs /boot/initrd.img-4.6.0-040600rc6-generic |grep -i ahci lib/modules/4.6.0-040600rc6-generic/kernel/drivers/ata/ahci.ko lib/modules/4.6.0-040600rc6-generic/kernel/drivers/ata/libahci.ko lib/modules/4.6.0-040600rc6-generic/kernel/drivers/ata/acard-ahci.ko lib/modules/4.6.0-040600rc6-generic/kernel/drivers/ata/libahci_platform.ko lib/modules/4.6.0-040600rc6-generic/kernel/drivers/ata/ahci_platform.ko $ lsinitramfs /boot/initrd.img-4.2.8-040208-generic |grep -i ahci lib/modules/4.2.8-040208-generic/kernel/drivers/ata/ahci.ko lib/modules/4.2.8-040208-generic/kernel/drivers/ata/libahci.ko lib/modules/4.2.8-040208-generic/kernel/drivers/ata/acard-ahci.ko lib/modules/4.2.8-040208-generic/kernel/drivers/ata/libahci_platform.ko lib/modules/4.2.8-040208-generic/kernel/drivers/ata/ahci_platform.ko 1) First is Why does these lines get printed in 4.2 [ 0.716610] ahci 0000:00:1f.2: version 3.0 [ 0.716620] ahci 0000:00:1f.2: can't find IRQ for PCI INT B; probably buggy MP table where as in 4.4 and 4.6 you don't get to see even the "version 3.0". In your initramfs shell , you should try to see if ahci is actually loaded in /proc/modules 2) One more thing is i see in 4.2 you have this in dmesg libata version 3.00 loaded. I hope in 4.6 also you get that message. You can find it out in your initramfs scripts by doing dmesg | grep -i libata That should tell you that your modules are not loaded/good because of some changes or some other reason like corruption? At the end of initramfs you can do modprobe ahci and see that it prints lsmod | grep ahci That should tell you the reason why it is not printing those messages in dmesg. Created attachment 219041 [details]
phone-image of initramfs-script w/libata dmesg's, modprobe ahci, 4.6
It's the same for 4.4, and appending pci=routeirq had no effect :(
What command can I use instead of lsmod after the modprobe? (is grep -i ahci /proc/modules equivalent? does initramfs modprobe accept --verbose?)
yes grep -i ahci /proc/modules would work . Running modprobe ahci should be fine . Then again grep -i ahci /proc/modules . dmesg | tail should show the last 10 lines and any error. If an error is thrown it should shown by the dmesg | tail. If it says successful and loaded.Atleast from your phone screenshots it appears that the failure is in this part. Created attachment 219071 [details]
phone-image initramfs "grep ahci /proc/modules" 4.6 where ahci in etc modules
Here I had ahci and libahci in /etc/initramfs-tools/modules, and 4.6 prints the same from "grep ahci /proc/modules" before and after the modprobe (and still not booting). Nothing interesting I can see from dmesg|tail.
Created attachment 219081 [details]
phone-image initramfs "grep ahci /proc/modules" 4.6 withOut ahci in etc modules
As 219071, but without ahci\nlibahci in /etc/initramfs-tools/modules – here the first grep gives no ahci, but after a modprobe it is in /proc/modules.
("lsinitramfs /boot/initrd.img-4.6.0-040600rc6-generic |grep -i ahci"
is identical whether ahci is in /etc/initramfs-tools/modules)
I get a similar error about ahci "used by field" as 0 only in Virtual Machines like qemu . I assume yours is on bare hardware. Maybe bios mode is set to IDE that is why AHCI is not detected. If you change from IDE to AHCI does it work ? If it still doesn't work Well can you take the latest 4.7 and if that also doesn't work you should contact ahci.c - AHCI SATA support Maintained by: Tejun Heo <tj@kernel.org> Please ALWAYS copy linux-ide@vger.kernel.org on emails. saying you have the latest 4.7 mainline . I guess the issue could be easily resolved by the authors. Created attachment 219091 [details]
BIOS showing AHCI mode
and when 4.2.0 is booted I get
$ lspci |grep -i sata
00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04)
$ dmesg |grep ahci
[ 0.719414] ahci 0000:00:1f.2: version 3.0
[ 0.719425] ahci 0000:00:1f.2: can't find IRQ for PCI INT B; probably buggy MP table
[ 0.735712] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 4 ports 6 Gbps 0x1 impl SATA mode
[ 0.737592] ahci 0000:00:1f.2: flags: 64bit ncq pm led clo only pio slum part deso sadm sds apst
[ 0.739623] scsi host0: ahci
[ 0.745463] scsi host1: ahci
[ 0.747706] scsi host2: ahci
[ 0.749662] scsi host3: ahci
so I'll try 4.7
Created attachment 219181 [details]
phone-image initramfs 4.7
This was without ahci/libahci in /etc/initramfs-tools/modules. Doesn't seem too different from attachemnt 219081 except that /proc/scsi/scsi doesn't exist (though maybe it's created after modprobing; haven't tried 4.7 with ahci/libahci in /etc/initramfs-tools/modules yet).
Thanks so much for your help Navin; getting in touch with the ide/ahci list now.
|