Bug 60758
Summary: | module scsi_wait_scan not found kernel panic on boot | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | zakrzewskim |
Component: | SCSI | Assignee: | linux-scsi (linux-scsi) |
Status: | NEW --- | ||
Severity: | blocking | CC: | ajb, antun, jamie.bainbridge, jz.researcher, kernel.tw, kometch, linf, Sam.shahl37, thorsten.kohfeldt, toracat, vincent.mc.li |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.10.5-3.11.1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Kernel panic screenshot
diff -Npru linux-3.10.4 linux-3.10.5 |
My /etc/fstab: timeout 5 default 0 title CentOS (3.10.6-1.el6.elrepo.x86_64) root (hd0,1) kernel /vmlinuz-3.10.6-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de initrd /initramfs-3.10.6-1.el6.elrepo.x86_64.img title CentOS (3.10.4-1.el6.elrepo.x86_64) root (hd0,1) kernel /vmlinuz-3.10.4-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de initrd /initramfs-3.10.4-1.el6.elrepo.x86_64.img Sorry that was my grub.conf. Here's fstab: proc /proc proc defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 tmpfs /dev/shm tmpfs defaults 0 0 sysfs /sys sysfs defaults 0 0 /dev/md0 none swap sw 0 0 /dev/md1 /boot ext4 defaults 0 1 /dev/md2 / ext4 rw,discard,noatime,nodiratime,usrjquota=aquota.user,grpjquota=aquota.group,usrquota,grpquota,jqfmt=vfsv0 0 1 /dev/md3 /var/lib/mysql ext4 rw,discard,noatime,nodiratime,usrjquota=aquota.user,grpjquota=aquota.group,usrquota,grpquota,jqfmt=vfsv0 0 0 /dev/md4 /home ext4 rw,noatime,nodiratime,usrjquota=aquota.user,grpjquota=aquota.group,usrquota,grpquota,jqfmt=vfsv0 0 0 Looks like a dependency failure, would you like to show the two .config files? If your system fails by "module scsi_wait_scan not found", then it could be the init script issue in your CentOS box. The last kernel with scsi_wait_scan.ko is v3.5.7, it has been removed ever since v3.6. Any init script for 3.10 should not use that module. Another point is in your config, the CONFIG_SCSI_SCAN_ASYNC is not set, try to turn it into Y and see what's happening. There are some discussions about this removal in history: http://www.mail-archive.com/initramfs@vger.kernel.org/msg02645.html (In reply to Jeff Zhou from comment #5) > If your system fails by "module scsi_wait_scan not found", then it could be > the init script issue in your CentOS box. > > The last kernel with scsi_wait_scan.ko is v3.5.7, it has been removed ever > since v3.6. Any init script for 3.10 should not use that module. > > > Another point is in your config, the CONFIG_SCSI_SCAN_ASYNC is not set, try > to turn it into Y and see what's happening. Jeff, For the fuller picture please see -- http://elrepo.org/bugs/view.php?id=401 This non-booting issue only occurs with one system. The reporter has other systems which do boot correctly using the same kernel(s). As was explained in the referenced bug report (note 3235), the mention of "module scsi_wait_scan not found" is a red-herring. Note the following section from the 3.10.10 drivers/scsi/Kconfig file -- [quote] config SCSI_SCAN_ASYNC bool "Asynchronous SCSI scanning" depends on SCSI help The SCSI subsystem can probe for devices while the rest of the system continues booting, and even probe devices on different busses in parallel, leading to a significant speed-up. If you have built SCSI as modules, enabling this option can be a problem as the devices may not have been found by the time your system expects them to have been. You can load the scsi_wait_scan module to ensure that all scans have completed. If you build your SCSI drivers into the kernel, then everything will work fine if you say Y here. You can override this choice by specifying "scsi_mod.scan=sync" or async on the kernel's command line. [/quote] It still makes a reference to the scsi_wait_scan module and advises against setting SCSI_SCAN_ASYNC=y when scsi drivers have been built as modules. It is unnecessary to build a new kernel to test, as per your last point. Just appending "scsi_mod.scan=async" to the kernel boot line will be sufficient. Perhaps the reporter will test with that and then report back? Alan / burakkucat. Yes, I can test ;) I will booting with these options: timeout 0 default 0 title CentOS (3.10.10-1.el6.elrepo.x86_64) root (hd0,1) kernel /vmlinuz-3.10.10-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de scsi_mod.scan=sync initrd /initramfs-3.10.10-1.el6.elrepo.x86_64.img Made a mistake, so once again: timeout 0 default 0 title CentOS (3.10.10-1.el6.elrepo.x86_64) root (hd0,1) kernel /vmlinuz-3.10.10-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de scsi_mod.scan=async initrd /initramfs-3.10.10-1.el6.elrepo.x86_64.img Thanks. I am a bit curious about the description in Kconfig, since the scsi_wait_scan.ko was built from scsi_wait_scan.c, which was removed in v.3.6. How to refer a non-exist module, as described in the section of "config SCSI_SCAN_ASYNC" From v.3.5.7 to v.3.6.1, there is a change in source code, but it seems the documentation in Kconfig has not been updated. (In reply to Alan Bartlett from comment #7) > (In reply to Jeff Zhou from comment #5) > > If your system fails by "module scsi_wait_scan not found", then it could be > > the init script issue in your CentOS box. > > > > The last kernel with scsi_wait_scan.ko is v3.5.7, it has been removed ever > > since v3.6. Any init script for 3.10 should not use that module. > > > > > > Another point is in your config, the CONFIG_SCSI_SCAN_ASYNC is not set, try > > to turn it into Y and see what's happening. > > Jeff, > > For the fuller picture please see -- > > http://elrepo.org/bugs/view.php?id=401 > > This non-booting issue only occurs with one system. The reporter has other > systems which do boot correctly using the same kernel(s). > > As was explained in the referenced bug report (note 3235), the mention of > "module scsi_wait_scan not found" is a red-herring. > > Note the following section from the 3.10.10 drivers/scsi/Kconfig file -- > > [quote] > config SCSI_SCAN_ASYNC > bool "Asynchronous SCSI scanning" > depends on SCSI > help > The SCSI subsystem can probe for devices while the rest of the > system continues booting, and even probe devices on different > busses in parallel, leading to a significant speed-up. > > If you have built SCSI as modules, enabling this option can > be a problem as the devices may not have been found by the > time your system expects them to have been. You can load the > scsi_wait_scan module to ensure that all scans have completed. > If you build your SCSI drivers into the kernel, then everything > will work fine if you say Y here. > > You can override this choice by specifying "scsi_mod.scan=sync" > or async on the kernel's command line. > [/quote] > > It still makes a reference to the scsi_wait_scan module and advises against > setting SCSI_SCAN_ASYNC=y when scsi drivers have been built as modules. > > It is unnecessary to build a new kernel to test, as per your last point. > Just appending "scsi_mod.scan=async" to the kernel boot line will be > sufficient. > > Perhaps the reporter will test with that and then report back? > > Alan / burakkucat. (In reply to zakrzewskim from comment #10) > Made a mistake, so once again: > > timeout 0 > default 0 > > title CentOS (3.10.10-1.el6.elrepo.x86_64) > root (hd0,1) > kernel /vmlinuz-3.10.10-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS > rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 > LANG=en_US.UTF-8 KEYTABLE=de scsi_mod.scan=async > initrd /initramfs-3.10.10-1.el6.elrepo.x86_64.img In thread http://elrepo.org/bugs/view.php?id=401 says 3.10.4-1 boots fine, would you run lsinitramfs to see if scsi_scan_wait module is there? From changelog 3.10.4 to 3.10.5, I did not see any modifications to that. lsinitramfs -bash: lsinitramfs: command not found What do I need to install ? The server does not boot with kernel 3.10.10-1.el6.elrepo.x86_64 :/ (In reply to zakrzewskim from comment #13) > lsinitramfs > -bash: lsinitramfs: command not found > > What do I need to install ? You can use lsinitrd that is included in the dracut package. For completeness (so that we cover 'every angle') I have built a version of our kernel-ml-3.10.10 package (64-bit) with CONFIG_SCSI_SCAN_ASYNC=y, as Jeff has suggested. It is available to download from -- http://elrepo.org/people/ajb/tmp/ However I do not expect that will make any difference. It doesn't work too. I don't know what's wrong. Only 3.10.4-1 works fine and I need to stick with it :/ (In reply to zakrzewskim from comment #17) > It doesn't work too. I don't know what's wrong. Only 3.10.4-1 works fine and > I need to stick with it :/ [1] May I know, do you have separate directories for each version of kernel source during building/installation, or unpack different versions of kernel source to the same folder to save some space while upgrading? [2] Would you like to show a more complete booting log? 1. All of them are separate. I just install kernel-ml via yum. 2. I can only show such log from kernel 3.10.4-1. I don't have KVM to see what's going on. Since this is a production machine inside datacenter I can only rent KVM for 2 hours. 3. I found another bug - server is freezing and load is getting higher than 100: BUG: unable to handle kernel paging request at 0000010600000032 IP: [<ffffffff811dcb85>] SyS_epoll_ctl+0x145/0x420 PGD 0 Oops: 0000 [#1] SMP Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_iprange iptable_filter ip_tables netconsole configfs nct6775 hwmon_vid ipv6 cpufreq_ondemand ppdev iTCO_wdt iTCO_vendor_support shpchp coretemp hwmon acpi_cpufreq freq_table mperf kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr i2c_i801 parport_pc parport r8169 mii sg lpc_ich xhci_hcd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 jbd2 mbcache raid1 sd_mod crc_t10dif mxm_wmi video ahci libahci wmi dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 3133 Comm: nginx Not tainted 3.10.4-1.el6.elrepo.x86_64 #1 Hardware name: MSI MS-7816/H87-G43 (MS-7816), BIOS V2.3 06/07/2013 task: ffff8807f18c0ac0 ti: ffff880753374000 task.ti: ffff880753374000 RIP: 0010:[<ffffffff811dcb85>] [<ffffffff811dcb85>] SyS_epoll_ctl+0x145/0x420 RSP: 0018:ffff880753375f18 EFLAGS: 00010202 RAX: 0000010600000002 RBX: ffff8807f23125c0 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffff880650213bc0 RDI: ffff88076704b808 RBP: ffff880753375f78 R08: 0000000000000002 R09: 0101010101010101 R10: 00007fff52ea2220 R11: 0000000000000202 R12: ffff880773f15c80 R13: 0000000000000001 R14: 0000000000000045 R15: ffff88076704b800 FS: 00007f8451c707c0(0000) GS:ffff88081ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000010600000032 CR3: 0000000771b3b000 CR4: 00000000001407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff8807f445c648 0000000152ea22b0 ffff88076704b808 fffffff7810dfe46 11b8144080000001 ffff880700000000 0000000001838b20 0000000000000001 0000000011b81440 0000000011bd9288 0000000011b6f8c0 00007fff52ea22b0 Call Trace: [<ffffffff815fd319>] system_call_fastpath+0x16/0x1b Code: 00 00 c7 45 ac 00 00 00 00 83 f8 01 0f 86 4d 01 00 00 49 8d 47 08 48 89 c7 48 89 45 b0 e8 44 4c 41 00 49 8b 47 70 48 85 c0 74 1f <48> 3b 58 30 48 89 c6 77 0d 72 68 44 89 f2 2b 50 38 83 fa 00 7e RIP [<ffffffff811dcb85>] SyS_epoll_ctl+0x145/0x420 RSP <ffff880753375f18> CR2: 0000010600000032 ---[ end trace 3c47cb214b7f3743 ]--- Yes, there could be extra bugs. Since this bug is about "scsi_wait_scan not found" in v3.10.5 and above, we can try to work on this one first. As suggested by Alan, the lsinitrd is available, it can be used to list out the modules of a initramfs file. Would you like to run lsinitrd for a working version (v.3.10.4-1) and the first version with issue (v.3.10.5-1) to show the modules in the initramfs file? lsinitrd initrd_v.xxx.img > v.xxx.lst Here you are: http://www.upemax.user.icpnet.pl/3.10.4-1.el6.lst (In reply to zakrzewskim from comment #21) > Here you are: http://www.upemax.user.icpnet.pl/3.10.4-1.el6.lst Yes then we see the scsi_scan_wait is not in 3.10.4 either, otherwise there will be a line :"drivers/scsi/scsi_wait_scan.ko". The rc.sysinit file in your system might need adjustment to remove the reference to scsi_wait_scan module. But that might not be the root cause, since 3.10.4 without scsi_wait_scan works fine under this script. There could be a regression between 3.10.4 to 3.10.5 cause your system hang, might be or might not be scsi problem. To find the root cause and fix it, probably need to roll back between 3.10.5 to 3.10.4, build and test the kernel for booting, git bisect could be helpful. Then we can check the details to see why the commit crashing the specific machine. Ok. Thank you. Maybe you need to get such board - H87-G43 to test it ? (1) I have critically examined the configuration files used (from 3.10 to 3.10.10) paying particular attention to 3.10.4 v 3.10.5 There is nothing untoward in the configuration that can account for this problem. (2) I have performed a diff (diff -Npru) between the sources of 3.10.4 and 3.10.5 Subsequent checking the output for scsi references does not show anything obvious, to me. The output is attached, as the file diff-3.10.4-to-3.10.5.txt (3) Grep'ing the standard RHEL 6 /etc/rc.d/rc.sysinit file shows something interesting on line 165 but, once again, I do not think it is relevant. [quote] [Duo2 ~]$ grep -n -C 10 scsi /etc/rc.d/rc.sysinit 155-# Configure kernel parameters 156-update_boot_stage RCkernelparam 157-apply_sysctl 158- 159-# Set the hostname. 160-update_boot_stage RChostname 161-action $"Setting hostname ${HOSTNAME}: " hostname ${HOSTNAME} 162-[ -n "${NISDOMAIN}" ] && domainname ${NISDOMAIN} 163- 164-# Sync waiting for storage. 165:{ rmmod scsi_wait_scan ; modprobe scsi_wait_scan ; rmmod scsi_wait_scan ; } >/dev/null 2>&1 166- 167-# Device mapper & related initialization 168-if ! __fgrep "device-mapper" /proc/devices >/dev/null 2>&1 ; then 169- modprobe dm-mod >/dev/null 2>&1 170-fi 171- 172-if [ -f /etc/crypttab ]; then 173- init_crypto 0 174-fi 175- [/quote] Created attachment 107395 [details]
diff -Npru linux-3.10.4 linux-3.10.5
(In reply to zakrzewskim from comment #23) > Ok. Thank you. > > Maybe you need to get such board - H87-G43 to test it ? I would like to help if get such board. For this H87-G43 board, some report also shows booting issue with Linux recently: https://forum-en.msi.com/index.php?topic=170643.0 There are lots of checkins for ACPI and DRM driver in 3.10.5, https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.10.5 If own the machine, it would be much easier to revert the checkins to see what's going on. Hetzner is using these boards inside their EX40 or EX40-SSD dedicated servers: http://wiki.hetzner.de/index.php/Wake_On_LAN/en One more thing - these boards does not boot kernel-lt-3.0.94-1 and latest official CentOS kernel too ! kernel 3.1.11-2 does not boot too :/ (In reply to Alan Bartlett from comment #24) > (3) Grep'ing the standard RHEL 6 /etc/rc.d/rc.sysinit file shows something > interesting on line 165 but, once again, I do not think it is relevant. Nice find. I will try booting with this line commented ;) Standard CentOS 6 kernel got such module: /lib/modules/2.6.32-358.18.1.el6.x86_64/kernel/drivers/scsi/scsi_wait_scan.ko /lib/modules/2.6.32-358.el6.x86_64/kernel/drivers/scsi/scsi_wait_scan.ko That's why it's referring to it. It seems I'm not the only one with this problem: http://www.gossamer-threads.com/lists/linux/kernel/1747344 Maybe this will help: lspci 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) 00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 04) 00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 04) 00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d4) 00:1c.1 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #2 (rev d4) 00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d4) 00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation H87 Express Chipset LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 04) 00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 04) 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06) 03:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03) Tested again. Here are the results: http://files.tinypic.pl/i/00449/clv7pa58vgxk.jpg http://files.tinypic.pl/i/00449/nawfo9bwhyn8.jpg Please help the current kernel is just unstable ! Please note that was with: #{ rmmod scsi_wait_scan ; modprobe scsi_wait_scan ; rmmod scsi_wait_scan ; } >/dev/null 2>&1 and title CentOS (3.11.1-2.el6.elrepo.x86_64) root (hd0,1) kernel /vmlinuz-3.11.1-2.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de scsi_mod.scan=async initrd /initramfs-3.11.1-2.el6.elrepo.x86_64.img (In reply to zakrzewskim from comment #34) > Tested again. Here are the results: > > http://files.tinypic.pl/i/00449/clv7pa58vgxk.jpg > http://files.tinypic.pl/i/00449/nawfo9bwhyn8.jpg > > Please help the current kernel is just unstable ! In the second of the images, above, there is a suggestion to add "rdshell" to the kernel command line. Have you tried it? Perhaps it will allow you to gather some more information about the problem, which might assist Jeff's investigations. I forgot to add it. I will try again soon. Hi: I had the same problem with 3.10.16 on CentOS 6.4 x86_64 too... I don't know it's my wrong when I make config, or other people too? I reinstall my system and install a new kernel rpm with 3.17, seems fine, I think it might be my wrong with some operation. I check my boot.log on CentOS 6.4 x86_64 with 3.10.17 today, seem FATAL: Module scsi_wait_scan not found still appear, but can boot in to the system at least. Please test (In reply to newbie from comment #40) > I check my boot.log on CentOS 6.4 x86_64 with 3.10.17 today, > seem > > FATAL: Module scsi_wait_scan not found > > still appear, but can boot in to the system at least. Please test kernel 3.11.6 too. (In reply to zakrzewskim from comment #41) > Please test (In reply to newbie from comment #40) > > I check my boot.log on CentOS 6.4 x86_64 with 3.10.17 today, > > seem > > > > FATAL: Module scsi_wait_scan not found > > > > still appear, but can boot in to the system at least. > > Please test kernel 3.11.6 too. I'd solve other problem with my 3.10.17 config now ... 3.10.18 still show one line FATAL: Module scsi_wait_scan not found but boot ok. (In reply to newbie from comment #43) > 3.10.18 still show one line > > FATAL: Module scsi_wait_scan not found > > but boot ok. That is not a message output by the kernel but is the result of line number 165 in the "userland" file, /etc/rc.d/rc.sysinit, of RHEL 6 (and its clones). If you do not like seeing the message, just "comment out" line number 165 in your /etc/rc.d/rc.sysinit file. > If you do not like seeing the message,
> just "comment out" line number 165 in your /etc/rc.d/rc.sysinit file.
seems no help, that message still exist, I don't know why.
but really problem is:
I use two SATA 200G(sda/sdb) build a soft RAID1 (md)
three days ago, one disk error and off,
but I just discovered today,
after I mdadm add fix it then reboot,
the bad disk show read error constantly,
I can't use normally,so I use the other good one boot, then...
FATAL: Module scsi_wait_scan not found
then...
message loop~
I use 3.10.17, 3.10.18, the same situation,I can't boot into system,
so finally,I boot with CentOS 2.6 kernel into the system save data,
then shutdown now... ><
What about kernel 3.12 ? 3.12 doesn't boot too. The same reason... oddly, I can boot into 3.12.0-rc4 but not 3.13.0-rc3, neither of them has scsi_wait_scan.ko compiled. I see error message "FATAL: Module scsi_wait_scan not found" for both 3.12.0-rc4 and 3.13.0-rc3. but it only appears once for 3.12.0-rc4. it repeated many times for 3.13.0-rc3 and finally kernel Panic: FATAL: Module scsi_wait_scan not found. FATAL: Module scsi_wait_scan not found. FATAL: Module scsi_wait_scan not found. FATAL: Module scsi_wait_scan not found. Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 CPU: 3 PID: 1 Comm: init Tainted: GF 3.13.0-rc3 #27 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 0000000000000001 ffff8802108f3d98 ffffffff8156c50b 000000000000fffe ffffffff817e0d00 ffff8802108f3e18 ffffffff8156c298 ffffffff00000010 ffff8802108f3e28 ffff8802108f3dc8 ffff8802108f17b8 ffff8802108f3dd8 Call Trace: [<ffffffff8156c50b>] dump_stack+0x49/0x5e [<ffffffff8156c298>] panic+0xbb/0x1d5 [<ffffffff8104e74b>] find_new_reaper+0x17b/0x180 [<ffffffff8104f8d5>] forget_original_parent+0x45/0x1c0 [<ffffffff8111e110>] ? perf_cgroup_switch+0x180/0x180 [<ffffffff8104fa67>] exit_notify+0x17/0x130 [<ffffffff8104fd6e>] do_exit+0x1ee/0x480 [<ffffffff81050051>] do_group_exit+0x51/0xc0 [<ffffffff810500d7>] SyS_exit_group+0x17/0x20 [<ffffffff81578dd2>] system_call_fastpath+0x16/0x1b I tried comment out the scsi_wait_scan in rc.sysinit and kernel boot parameter with 'scsi_mod.scan=async', no help Can someone test it on H87-G43 board ? Hi all, I got exactly the same problem on a CentOS 6.4 64bit kvm guest. That's while trying to update my kenrel to mainline 3.13-rc5 console repeat writing "FATAL: Module scsi_wait_scan not found." and finally panic. The whole output graph is maily same as pasted by Vincent Li. After tracking this bugzilla I get the thought that the dracut mismatch with the kernel. Since kerne 3.6 and following have dropped scsi_wait_scan module but CentOS6.4's dracut is hardcoded to add the instruction "modeprobe scsi_wait_scan" into /init script in initramfs image. So I tried following approaches for confirmation: 1. Update kernel to 3.5(it still holds the scsi_wait_scan module): it boots find. 2. Update the dracut to the latest version in git tree, since it has removed the redundant "modeprobe scsi_wait_scan" for building 3.6 and later kernels but it doesn't work, telling me that could not find root device like pasted in step3. 3. Decompress the initramfs and modify the /init script by removing the line "modeprobe scsi_wait_scan" but it doesn't work neither: (snips) dracut Warning: No root device "block:/dev/disk/by-uuid/cedcbd9c-32eb-4a3f-9dad-ae5fc560642a" found dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line. dracut Warning: Signal caught! dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line. Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 CPU: 1 PID: 1 Comm: init Tainted: GF 3.13.0-rc5 #6 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000001 ffff88005dbf9d98 ffffffff8156c92b 000000000000fffe ffffffff817dd350 ffff88005dbf9e18 ffffffff8156c6b8 ffffffff00000010 ffff88005dbf9e28 ffff88005dbf9dc8 ffff88005dbdf7b8 ffff88005dbf9dd8 Call Trace: [<ffffffff8156c92b>] dump_stack+0x49/0x5e [<ffffffff8156c6b8>] panic+0xbb/0x1d5 [<ffffffff8104e75b>] find_new_reaper+0x17b/0x180 [<ffffffff8104f8e5>] forget_original_parent+0x45/0x1c0 [<ffffffff8111e1f0>] ? perf_cgroup_switch+0x180/0x180 [<ffffffff8104fa77>] exit_notify+0x17/0x130 [<ffffffff8104fd7e>] do_exit+0x1ee/0x480 [<ffffffff81050061>] do_group_exit+0x51/0xc0 [<ffffffff810500e7>] SyS_exit_group+0x17/0x20 [<ffffffff81579212>] system_call_fastpath+0x16/0x1b ------------------------------------------------------------------------------- From 2 and 3 it seems that scsi_wait_scan is necessary in some cases, like H87 board and my KVM case. Any idea, Can anyone tell me how to move on? Thanks in advance. What king of KVM do you use ? Is this Proxmox ? (In reply to zakrzewskim from comment #51) > What king of KVM do you use ? Is this Proxmox ? Hi, it's qemu based on KVM. And the host is fedora20 64bit. I install CentOS6.4 via a liveCD. Here is my xml file for guest configuration, maybe it's helpful for reproduction. [root@localhost home]# cat /etc/libvirt/qemu/CentOS6.4.xml <!-- WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: virsh edit CentOS6.4 or other application using the libvirt API. --> <domain type='kvm'> <name>CentOS6.4</name> <uuid>52eb93be-64ad-45e8-9e19-5bac87e57bca</uuid> <memory unit='KiB'>1572864</memory> <currentMemory unit='KiB'>1572864</currentMemory> <vcpu placement='static'>2</vcpu> <os> <type arch='x86_64' machine='pc-i440fx-1.6'>hvm</type> <boot dev='cdrom'/> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/CentOS6.4.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> <disk type='block' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hda' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </controller> <interface type='network'> <mac address='52:54:00:cb:94:2c'/> <source network='default'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'/> <input type='mouse' bus='ps2'/> <graphics type='spice' autoport='yes'/> <sound model='ich6'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </sound> <video> <model type='qxl' ram='65536' vram='65536' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </memballoon> </devices> </domain> hello, After diving into it I found that in my qemu-VM case "FATAL: Module scsi_wait_scan" is not the root cause, because even I backport the scsi_wait_scan module, it still can't boot, but reports "dracut Warning: No root device "block:/dev/disk/by-uuid/cedcbd9c-32eb-4a3f-9dad-ae5fc560642a" found". I bisect through the 3.13-rc6 tree of upstream, and locates this commit broke my initramfs, [root@CentOS6 linux]# git bisect bad 1cf7e9c68fe84248174e998922b39e508375e7c1 is the first bad commit commit 1cf7e9c68fe84248174e998922b39e508375e7c1 Author: Jens Axboe <axboe@kernel.dk> Date: Fri Nov 1 10:52:52 2013 -0600 virtio_blk: blk-mq support Switch virtio-blk from the dual support for old-style requests and bios to use the block-multiqueue. Acked-by: Asias He <asias@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de> though it's somthing about virtio driver(my guest uses virtio as the storage driver), looking into this commit it is mainly about C code changes, not module compiling or not. Also I have checked the modules compiled, in both cases(with and without this commit) we get virtio_blk.ko module. But the difference is that with this commit virtio_blk.ko isn't packed into the initramfs. However between both cases there is no environmental changes, with exactly the same config, same dracut, same gcc, everything...So I don't know why dracut doesn't pack the virtio_blk.ko into the initramfs, and more kidding I find that it packed the floppy.ko instead. (In my case as a workround we can compile virtio moduels into kernel or use other disk bus driver such as IDE or USB instead) But one thing I don't understand can someone tell me why, Will dracut look through the kernel tree(C codes) to find some useful information to pack the final initramfs? Ps. Related to the initial creative of this bugzilla, I guess it may also be caused by some storage drivers are not packed into initramfs. thanks (In reply to Lin Feng from comment #53) > though it's somthing about virtio driver(my guest uses virtio as the storage > driver), looking into this commit it is mainly about C code changes, not > module compiling or not. Also I have checked the modules compiled, in both > cases(with and without this commit) we get virtio_blk.ko module. But the > difference is that with this commit virtio_blk.ko isn't packed into the > initramfs. > > However between both cases there is no environmental changes, with exactly > the same config, same dracut, same gcc, everything...So I don't know why > dracut doesn't pack the virtio_blk.ko into the initramfs, and more kidding I > find that it packed the floppy.ko instead. > (In my case as a workround we can compile virtio moduels into kernel or use > other disk bus driver such as IDE or USB instead) > > But one thing I don't understand can someone tell me why, Will dracut look > through the kernel tree(C codes) to find some useful information to pack the > final initramfs? Thank you for this extensive analysis. I, too, was having the same problem on my KVM guest that uses virtio (host=RHEL 6.5 and guest=CentOS 6.5). Just to reconfirm your findings, I created initramfs with a '--add-drivers virtio_blk' option and the kernel booted just fine. It would indeed be great if we could find out why dracut fails to pick up some particular modules. I've had this problem on my CentOS 6 boxes on 3.10, 3.12, until I've cleared the .config files from the kernel source directory and you would need to have Virtio suppoer for block, PCI, net devices compiled in kernel (either a module or embedded). Good guide to follow is: http://www.linux-kvm.org/page/Virtio This has been tested on following kernels: kernel 3.10.27 kernel 3.10.28 kernel 3.12.8 kernel 3.12.9 kernel 3.13 Currently using the 3.12.8 kernel on one of my VMs : root@dev01 [~]# lsinitrd /boot/initrd-3.12.8.img | grep virt -rw-r--r-- 1 root root 28720 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/block/virtio_blk.ko -rw-r--r-- 1 root root 26688 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/scsi/virtio_scsi.ko drwxr-xr-x 2 root root 0 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio -rw-r--r-- 1 root root 12752 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio/virtio.ko -rw-r--r-- 1 root root 18304 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio/virtio_mmio.ko -rw-r--r-- 1 root root 20744 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio/virtio_pci.ko -rw-r--r-- 1 root root 20912 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio/virtio_ring.ko I also tried to introduced something to CentOS 6.5 and KVM on Ubuntu13.10, was the kernel compile the Kernel 3.13.2, but it does not start and continue to be output "module scsi_wait_scan not found". In the following, there is no /block/virtio_blk.ko result of the execution of lsinitrd. # Lsinitrd /boot/initramfs-3.13.2.img | grep virt drwxr-xr-x 2 root root 0 Feb 8 00:50 lib/modules/3.13.2/kernel/drivers/virtio -rw-r - r - 1 root root 13216 Feb 8 00:50 lib/modules/3.13.2/kernel/drivers/virtio/virtio.ko -rw-r - r - 1 root root 19944 Feb 8 00:50 lib/modules/3.13.2/kernel/drivers/virtio/virtio_pci.ko -rw-r - r - 1 root root 19448 Feb 8 00:50 lib/modules/3.13.2/kernel/drivers/virtio/virtio_ring.ko it did not improve work even reconfigure the initramfs with dracut. Is there a way to deal about this? I have filed a bug against dracut: https://bugzilla.redhat.com/show_bug.cgi?id=1067669 which gives an explanation at least for kernel 3.13. That kernel version introduces multi queue block i/o for virtio_blk, which in turn does not any more call blk_init_queue() but blk_mq_init_queue() instead. Dracut matches against symbol blk_init_queue but not yet against symbol blk_mq_init_queue. This should be fixed for Fedora and also for all RHEL derivatives ... In the mean time there is this WORKAROUND: 1) check if virtio_blk is in intrd: # for i in /boot/initramfs-* ; do echo $i: ; lsinitrd $i | grep virt ; done 2) if virtio_blk is missing in the relevant initrd, then # echo 'add_drivers+="virtio_blk"' >/etc/dracut.conf.d/force-vitio_blk-to-ensure-boot.conf (NOTE that the .conf extension is mandatory !) 3) then rebuild the initrd Please verify/confirm (add comments to) that redhat/dracut bug/solution, so it gets dracut maintainers' attention. https://bugzilla.redhat.com/show_bug.cgi?id=1067669 (In reply to Thorsten Kohfeldt from comment #57) > I have filed a bug against dracut: > > https://bugzilla.redhat.com/show_bug.cgi?id=1067669 > > which gives an explanation at least for kernel 3.13. > > > That kernel version introduces multi queue block i/o for virtio_blk, which > in turn does not any more call blk_init_queue() but blk_mq_init_queue() > instead. > > Dracut matches against symbol blk_init_queue but not yet against symbol > blk_mq_init_queue. > Good, based on the bisect it seems that it's the root cause, thanks :) This was resolved in dracut with http://git.kernel.org/cgit/boot/dracut/dracut.git/commit/?id=faa17f09218ed7e2ce4362cc2d9319f8d5b7a37f which was included in Red Hat's dracut-004-356.el6 package. This bug can be closed. @jamie, I too facing the same error, to look for the solution I tried to go to http://git.kernel.org/cgit/boot/dracut/dracut.git/commit/?id=faa17f09218ed7e2ce4362cc2d9319f8d5b7a37f but it seems that commit is bad or not available. Can you help me out here? Looks like the repo is here now: https://github.com/dracutdevs/dracut/commit/faa17f09218ed7e2ce4362cc2d9319f8d5b7a37f It's been over 10 years since this commit. I don't remember anything about it sorry. Thanks for prompt response @Jamie. |
Created attachment 107218 [details] Kernel panic screenshot After upgrading kernel 3.10.4-1 to 3.10.5-1 or 3.10.6-1 the system is refusing to boot with "module scsi_wait_scan not found" long error message. Then comes kernel panic. I'm on CentOS 6.4 64-bit.