Bug 60758 - module scsi_wait_scan not found kernel panic on boot
Summary: module scsi_wait_scan not found kernel panic on boot
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: linux-scsi@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-16 19:17 UTC by zakrzewskim
Modified: 2015-10-08 02:55 UTC (History)
10 users (show)

See Also:
Kernel Version: 3.10.5-3.11.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel panic screenshot (132.54 KB, image/jpeg)
2013-08-16 19:17 UTC, zakrzewskim
Details
diff -Npru linux-3.10.4 linux-3.10.5 (159.94 KB, text/plain)
2013-09-03 15:44 UTC, Alan Bartlett
Details

Description zakrzewskim 2013-08-16 19:17:18 UTC
Created attachment 107218 [details]
Kernel panic screenshot

After upgrading kernel 3.10.4-1 to 3.10.5-1 or 3.10.6-1 the system is refusing to boot with "module scsi_wait_scan not found" long error message. Then comes kernel panic.

I'm on CentOS 6.4 64-bit.
Comment 1 zakrzewskim 2013-08-16 19:18:12 UTC
My /etc/fstab:

timeout 5
default 0

title CentOS (3.10.6-1.el6.elrepo.x86_64)
root (hd0,1)
kernel /vmlinuz-3.10.6-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de
initrd /initramfs-3.10.6-1.el6.elrepo.x86_64.img

title CentOS (3.10.4-1.el6.elrepo.x86_64)
root (hd0,1)
kernel /vmlinuz-3.10.4-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de
initrd /initramfs-3.10.4-1.el6.elrepo.x86_64.img
Comment 2 zakrzewskim 2013-08-16 23:32:02 UTC
Sorry that was my grub.conf.

Here's fstab:

proc /proc proc defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
tmpfs /dev/shm tmpfs defaults 0 0
sysfs /sys sysfs defaults 0 0
/dev/md0 none           swap  sw                                                                                                              0       0
/dev/md1 /boot          ext4  defaults                                                       0       1
/dev/md2 /              ext4  rw,discard,noatime,nodiratime,usrjquota=aquota.user,grpjquota=aquota.group,usrquota,grpquota,jqfmt=vfsv0        0       1
/dev/md3 /var/lib/mysql ext4  rw,discard,noatime,nodiratime,usrjquota=aquota.user,grpjquota=aquota.group,usrquota,grpquota,jqfmt=vfsv0        0       0
/dev/md4 /home          ext4  rw,noatime,nodiratime,usrjquota=aquota.user,grpjquota=aquota.group,usrquota,grpquota,jqfmt=vfsv0                0       0
Comment 3 Jeff Zhou 2013-08-29 05:13:08 UTC
Looks like a dependency failure, would you like to show the two .config files?
Comment 4 zakrzewskim 2013-08-30 07:35:25 UTC
Here you are: http://www.upemax.user.icpnet.pl/config-3.10.9-1.el6.elrepo.x86_64
Comment 5 Jeff Zhou 2013-08-31 03:39:01 UTC
If your system fails by "module scsi_wait_scan not found", then it could be the init script issue in your CentOS box.

The last kernel with scsi_wait_scan.ko is v3.5.7, it has been removed ever since v3.6. Any init script for 3.10 should not use that module.


Another point is in your config, the CONFIG_SCSI_SCAN_ASYNC is not set, try to turn it into Y and see what's happening.
Comment 6 Jeff Zhou 2013-08-31 04:00:30 UTC
There are some discussions about this removal in history:

http://www.mail-archive.com/initramfs@vger.kernel.org/msg02645.html
Comment 7 Alan Bartlett 2013-08-31 16:02:53 UTC
(In reply to Jeff Zhou from comment #5)
> If your system fails by "module scsi_wait_scan not found", then it could be
> the init script issue in your CentOS box.
> 
> The last kernel with scsi_wait_scan.ko is v3.5.7, it has been removed ever
> since v3.6. Any init script for 3.10 should not use that module.
> 
> 
> Another point is in your config, the CONFIG_SCSI_SCAN_ASYNC is not set, try
> to turn it into Y and see what's happening.

Jeff, 

For the fuller picture please see --

http://elrepo.org/bugs/view.php?id=401

This non-booting issue only occurs with one system. The reporter has other systems which do boot correctly using the same kernel(s).

As was explained in the referenced bug report (note 3235), the mention of "module scsi_wait_scan not found" is a red-herring.

Note the following section from the 3.10.10 drivers/scsi/Kconfig file --

[quote]
config SCSI_SCAN_ASYNC
        bool "Asynchronous SCSI scanning"
        depends on SCSI
        help
          The SCSI subsystem can probe for devices while the rest of the
          system continues booting, and even probe devices on different
          busses in parallel, leading to a significant speed-up.

          If you have built SCSI as modules, enabling this option can
          be a problem as the devices may not have been found by the
          time your system expects them to have been.  You can load the
          scsi_wait_scan module to ensure that all scans have completed.
          If you build your SCSI drivers into the kernel, then everything
          will work fine if you say Y here.

          You can override this choice by specifying "scsi_mod.scan=sync"
          or async on the kernel's command line.
[/quote]

It still makes a reference to the scsi_wait_scan module and advises against setting SCSI_SCAN_ASYNC=y when scsi drivers have been built as modules.

It is unnecessary to build a new kernel to test, as per your last point. Just appending "scsi_mod.scan=async" to the kernel boot line will be sufficient.

Perhaps the reporter will test with that and then report back?

Alan / burakkucat.
Comment 8 zakrzewskim 2013-08-31 16:49:42 UTC
Yes, I can test ;)
Comment 9 zakrzewskim 2013-08-31 17:01:53 UTC
I will booting with these options:

timeout 0
default 0

title CentOS (3.10.10-1.el6.elrepo.x86_64)
root (hd0,1)
kernel /vmlinuz-3.10.10-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de scsi_mod.scan=sync
initrd /initramfs-3.10.10-1.el6.elrepo.x86_64.img
Comment 10 zakrzewskim 2013-08-31 17:03:08 UTC
Made a mistake, so once again:

timeout 0
default 0

title CentOS (3.10.10-1.el6.elrepo.x86_64)
root (hd0,1)
kernel /vmlinuz-3.10.10-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de scsi_mod.scan=async
initrd /initramfs-3.10.10-1.el6.elrepo.x86_64.img
Comment 11 Jeff Zhou 2013-08-31 20:50:22 UTC
Thanks. I am a bit curious about the description in Kconfig,
since the scsi_wait_scan.ko was built from scsi_wait_scan.c, which was removed in v.3.6.
How to refer a non-exist module, as described in the section of "config SCSI_SCAN_ASYNC"

From v.3.5.7 to v.3.6.1, there is a change in source code, but it seems the documentation in Kconfig has not been updated.




(In reply to Alan Bartlett from comment #7)
> (In reply to Jeff Zhou from comment #5)
> > If your system fails by "module scsi_wait_scan not found", then it could be
> > the init script issue in your CentOS box.
> > 
> > The last kernel with scsi_wait_scan.ko is v3.5.7, it has been removed ever
> > since v3.6. Any init script for 3.10 should not use that module.
> > 
> > 
> > Another point is in your config, the CONFIG_SCSI_SCAN_ASYNC is not set, try
> > to turn it into Y and see what's happening.
> 
> Jeff, 
> 
> For the fuller picture please see --
> 
> http://elrepo.org/bugs/view.php?id=401
> 
> This non-booting issue only occurs with one system. The reporter has other
> systems which do boot correctly using the same kernel(s).
> 
> As was explained in the referenced bug report (note 3235), the mention of
> "module scsi_wait_scan not found" is a red-herring.
> 
> Note the following section from the 3.10.10 drivers/scsi/Kconfig file --
> 
> [quote]
> config SCSI_SCAN_ASYNC
>         bool "Asynchronous SCSI scanning"
>         depends on SCSI
>         help
>           The SCSI subsystem can probe for devices while the rest of the
>           system continues booting, and even probe devices on different
>           busses in parallel, leading to a significant speed-up.
> 
>           If you have built SCSI as modules, enabling this option can
>           be a problem as the devices may not have been found by the
>           time your system expects them to have been.  You can load the
>           scsi_wait_scan module to ensure that all scans have completed.
>           If you build your SCSI drivers into the kernel, then everything
>           will work fine if you say Y here.
> 
>           You can override this choice by specifying "scsi_mod.scan=sync"
>           or async on the kernel's command line.
> [/quote]
> 
> It still makes a reference to the scsi_wait_scan module and advises against
> setting SCSI_SCAN_ASYNC=y when scsi drivers have been built as modules.
> 
> It is unnecessary to build a new kernel to test, as per your last point.
> Just appending "scsi_mod.scan=async" to the kernel boot line will be
> sufficient.
> 
> Perhaps the reporter will test with that and then report back?
> 
> Alan / burakkucat.
Comment 12 Jeff Zhou 2013-08-31 21:03:44 UTC
(In reply to zakrzewskim from comment #10)
> Made a mistake, so once again:
> 
> timeout 0
> default 0
> 
> title CentOS (3.10.10-1.el6.elrepo.x86_64)
> root (hd0,1)
> kernel /vmlinuz-3.10.10-1.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS
> rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16
> LANG=en_US.UTF-8 KEYTABLE=de scsi_mod.scan=async
> initrd /initramfs-3.10.10-1.el6.elrepo.x86_64.img

In thread
http://elrepo.org/bugs/view.php?id=401

says 3.10.4-1 boots fine, would you run lsinitramfs to see if scsi_scan_wait module is there? From changelog 3.10.4 to 3.10.5, I did not see any modifications to that.
Comment 13 zakrzewskim 2013-08-31 21:26:05 UTC
lsinitramfs
-bash: lsinitramfs: command not found

What do I need to install ?
Comment 14 zakrzewskim 2013-08-31 22:26:15 UTC
The server does not boot with kernel 3.10.10-1.el6.elrepo.x86_64 :/
Comment 15 Akemi Yagi 2013-08-31 22:39:35 UTC
(In reply to zakrzewskim from comment #13)
> lsinitramfs
> -bash: lsinitramfs: command not found
> 
> What do I need to install ?

You can use lsinitrd that is included in the dracut package.
Comment 16 Alan Bartlett 2013-09-01 00:17:42 UTC
For completeness (so that we cover 'every angle') I have built a version of our kernel-ml-3.10.10 package (64-bit) with CONFIG_SCSI_SCAN_ASYNC=y, as Jeff has suggested. It is available to download from --

http://elrepo.org/people/ajb/tmp/

However I do not expect that will make any difference.
Comment 17 zakrzewskim 2013-09-01 00:48:37 UTC
It doesn't work too. I don't know what's wrong. Only 3.10.4-1 works fine and I need to stick with it :/
Comment 18 Jeff Zhou 2013-09-01 02:16:38 UTC
(In reply to zakrzewskim from comment #17)
> It doesn't work too. I don't know what's wrong. Only 3.10.4-1 works fine and
> I need to stick with it :/

[1] May I know, do you have separate directories for each version of kernel source during building/installation, or unpack different versions of kernel source to the same folder to save some space while upgrading?

[2] Would you like to show a more complete booting log?
Comment 19 zakrzewskim 2013-09-02 14:48:57 UTC
1. All of them are separate. I just install kernel-ml via yum.

2. I can only show such log from kernel 3.10.4-1. I don't have KVM to see what's going on. Since this is a production machine inside datacenter I can only rent KVM for 2 hours.

3. I found another bug - server is freezing and load is getting higher than 100: 

BUG: unable to handle kernel paging request at 0000010600000032
IP: [<ffffffff811dcb85>] SyS_epoll_ctl+0x145/0x420
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_iprange iptable_filter ip_tables netconsole configfs nct6775 hwmon_vid ipv6 cpufreq_ondemand ppdev iTCO_wdt iTCO_vendor_support shpchp coretemp hwmon acpi_cpufreq freq_table mperf kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr i2c_i801 parport_pc parport r8169 mii sg lpc_ich xhci_hcd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 jbd2 mbcache raid1 sd_mod crc_t10dif mxm_wmi video ahci libahci wmi dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 3133 Comm: nginx Not tainted 3.10.4-1.el6.elrepo.x86_64 #1
Hardware name: MSI MS-7816/H87-G43 (MS-7816), BIOS V2.3 06/07/2013
task: ffff8807f18c0ac0 ti: ffff880753374000 task.ti: ffff880753374000
RIP: 0010:[<ffffffff811dcb85>]  [<ffffffff811dcb85>] SyS_epoll_ctl+0x145/0x420
RSP: 0018:ffff880753375f18  EFLAGS: 00010202
RAX: 0000010600000002 RBX: ffff8807f23125c0 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff880650213bc0 RDI: ffff88076704b808
RBP: ffff880753375f78 R08: 0000000000000002 R09: 0101010101010101
R10: 00007fff52ea2220 R11: 0000000000000202 R12: ffff880773f15c80
R13: 0000000000000001 R14: 0000000000000045 R15: ffff88076704b800
FS:  00007f8451c707c0(0000) GS:ffff88081ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000010600000032 CR3: 0000000771b3b000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffff8807f445c648 0000000152ea22b0 ffff88076704b808 fffffff7810dfe46
 11b8144080000001 ffff880700000000 0000000001838b20 0000000000000001
 0000000011b81440 0000000011bd9288 0000000011b6f8c0 00007fff52ea22b0
Call Trace:
 [<ffffffff815fd319>] system_call_fastpath+0x16/0x1b
Code: 00 00 c7 45 ac 00 00 00 00 83 f8 01 0f 86 4d 01 00 00 49 8d 47 08 48 89 c7 48 89 45 b0 e8 44 4c 41 00 49 8b 47 70 48 85 c0 74 1f <48> 3b 58 30 48 89 c6 77 0d 72 68 44 89 f2 2b 50 38 83 fa 00 7e
RIP  [<ffffffff811dcb85>] SyS_epoll_ctl+0x145/0x420
 RSP <ffff880753375f18>
CR2: 0000010600000032
---[ end trace 3c47cb214b7f3743 ]---
Comment 20 Jeff Zhou 2013-09-02 19:45:13 UTC
Yes, there could be extra bugs.

Since this bug is about "scsi_wait_scan not found" in v3.10.5 and above, we can try to work on this one first.

As suggested by Alan, the lsinitrd is available, it can be used to list out the modules of a initramfs file.
Would you like to run lsinitrd for a working version (v.3.10.4-1) and the first version with issue (v.3.10.5-1) to show the modules in the initramfs file?

lsinitrd initrd_v.xxx.img > v.xxx.lst
Comment 21 zakrzewskim 2013-09-02 19:55:16 UTC
Here you are: http://www.upemax.user.icpnet.pl/3.10.4-1.el6.lst
Comment 22 Jeff Zhou 2013-09-03 05:09:51 UTC
(In reply to zakrzewskim from comment #21)
> Here you are: http://www.upemax.user.icpnet.pl/3.10.4-1.el6.lst

Yes then we see the scsi_scan_wait is not in 3.10.4 either, otherwise there will be a line :"drivers/scsi/scsi_wait_scan.ko".

The rc.sysinit file in your system might need adjustment to remove the reference to scsi_wait_scan module.

But that might not be the root cause, since 3.10.4 without scsi_wait_scan works fine under this script.

There could be a regression between 3.10.4 to 3.10.5 cause your system hang, might be or might not be scsi problem.

To find the root cause and fix it, probably need to roll back between 3.10.5 to 3.10.4, build and test the kernel for booting, git bisect could be helpful. Then we can check the details to see why the commit crashing the specific machine.
Comment 23 zakrzewskim 2013-09-03 10:33:36 UTC
Ok. Thank you.

Maybe you need to get such board - H87-G43 to test it ?
Comment 24 Alan Bartlett 2013-09-03 15:41:23 UTC
(1) I have critically examined the configuration files used (from 3.10 to 3.10.10) paying particular attention to 3.10.4 v 3.10.5

There is nothing untoward in the configuration that can account for this problem.

(2) I have performed a diff (diff -Npru) between the sources of 3.10.4 and 3.10.5

Subsequent checking the output for scsi references does not show anything obvious, to me. The output is attached, as the file diff-3.10.4-to-3.10.5.txt

(3) Grep'ing the standard RHEL 6 /etc/rc.d/rc.sysinit file shows something interesting on line 165 but, once again, I do not think it is relevant.

[quote]
[Duo2 ~]$ grep -n -C 10 scsi /etc/rc.d/rc.sysinit
155-# Configure kernel parameters
156-update_boot_stage RCkernelparam
157-apply_sysctl
158-
159-# Set the hostname.
160-update_boot_stage RChostname
161-action $"Setting hostname ${HOSTNAME}: " hostname ${HOSTNAME}
162-[ -n "${NISDOMAIN}" ] && domainname ${NISDOMAIN}
163-
164-# Sync waiting for storage.
165:{ rmmod scsi_wait_scan ; modprobe scsi_wait_scan ; rmmod scsi_wait_scan ; } >/dev/null 2>&1
166-
167-# Device mapper & related initialization
168-if ! __fgrep "device-mapper" /proc/devices >/dev/null 2>&1 ; then
169-       modprobe dm-mod >/dev/null 2>&1
170-fi
171-
172-if [ -f /etc/crypttab ]; then
173-    init_crypto 0
174-fi
175-
[/quote]
Comment 25 Alan Bartlett 2013-09-03 15:44:00 UTC
Created attachment 107395 [details]
diff -Npru linux-3.10.4 linux-3.10.5
Comment 26 Jeff Zhou 2013-09-04 04:48:02 UTC
(In reply to zakrzewskim from comment #23)
> Ok. Thank you.
> 
> Maybe you need to get such board - H87-G43 to test it ?

I would like to help if get such board.

For this H87-G43 board, some report also shows booting issue with Linux recently:
https://forum-en.msi.com/index.php?topic=170643.0

There are lots of checkins for ACPI and DRM driver in 3.10.5,
https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.10.5

If own the machine, it would be much easier to revert the checkins to see what's going on.
Comment 27 zakrzewskim 2013-09-04 09:15:18 UTC
Hetzner is using these boards inside their EX40 or EX40-SSD dedicated servers:

http://wiki.hetzner.de/index.php/Wake_On_LAN/en
Comment 28 zakrzewskim 2013-09-04 09:16:21 UTC
One more thing - these boards does not boot kernel-lt-3.0.94-1 and latest official CentOS kernel too !
Comment 29 zakrzewskim 2013-09-22 08:23:11 UTC
kernel 3.1.11-2 does not boot too :/
Comment 30 zakrzewskim 2013-09-22 13:38:22 UTC
(In reply to Alan Bartlett from comment #24)
> (3) Grep'ing the standard RHEL 6 /etc/rc.d/rc.sysinit file shows something
> interesting on line 165 but, once again, I do not think it is relevant.

Nice find. I will try booting with this line commented ;)
Comment 31 zakrzewskim 2013-09-22 13:55:15 UTC
Standard CentOS 6 kernel got such module:

/lib/modules/2.6.32-358.18.1.el6.x86_64/kernel/drivers/scsi/scsi_wait_scan.ko
/lib/modules/2.6.32-358.el6.x86_64/kernel/drivers/scsi/scsi_wait_scan.ko

That's why it's referring to it.
Comment 32 zakrzewskim 2013-09-22 14:15:38 UTC
It seems I'm not the only one with this problem: http://www.gossamer-threads.com/lists/linux/kernel/1747344
Comment 33 zakrzewskim 2013-09-22 21:01:16 UTC
Maybe this will help:

lspci
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 04)
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d4)
00:1c.1 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #2 (rev d4)
00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d4)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation H87 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 04)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
03:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03)
Comment 34 zakrzewskim 2013-09-22 22:23:14 UTC
Tested again. Here are the results:

http://files.tinypic.pl/i/00449/clv7pa58vgxk.jpg
http://files.tinypic.pl/i/00449/nawfo9bwhyn8.jpg

Please help the current kernel is just unstable !
Comment 35 zakrzewskim 2013-09-22 22:31:54 UTC
Please note that was with:

#{ rmmod scsi_wait_scan ; modprobe scsi_wait_scan ; rmmod scsi_wait_scan ; } >/dev/null 2>&1

and

title CentOS (3.11.1-2.el6.elrepo.x86_64)
root (hd0,1)
kernel /vmlinuz-3.11.1-2.el6.elrepo.x86_64 ro root=/dev/md2 rd_NO_LUKS rd_NO_DM nomodeset crashkernel=auto SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=de scsi_mod.scan=async
initrd /initramfs-3.11.1-2.el6.elrepo.x86_64.img
Comment 36 Alan Bartlett 2013-09-22 23:16:14 UTC
(In reply to zakrzewskim from comment #34)
> Tested again. Here are the results:
> 
> http://files.tinypic.pl/i/00449/clv7pa58vgxk.jpg
> http://files.tinypic.pl/i/00449/nawfo9bwhyn8.jpg
> 
> Please help the current kernel is just unstable !

In the second of the images, above, there is a suggestion to add "rdshell" to the kernel command line. 

Have you tried it? 

Perhaps it will allow you to gather some more information about the problem, which might assist Jeff's investigations.
Comment 37 zakrzewskim 2013-09-23 09:24:18 UTC
I forgot to add it. I will try again soon.
Comment 38 newbie 2013-10-19 15:38:59 UTC
Hi:

 I had the same problem with 3.10.16 on CentOS 6.4 x86_64 too...
I don't know it's my wrong when I make config, or other people too?
Comment 39 newbie 2013-10-22 06:24:56 UTC
I reinstall my system and install a new kernel rpm with 3.17, seems fine,
I think it might be my wrong with some operation.
Comment 40 newbie 2013-10-23 11:21:29 UTC
 I check my boot.log on CentOS 6.4 x86_64 with 3.10.17 today,
seem

    FATAL: Module scsi_wait_scan not found

still appear, but can boot in to the system at least.
Comment 41 zakrzewskim 2013-10-23 11:23:16 UTC
Please test (In reply to newbie from comment #40)
>  I check my boot.log on CentOS 6.4 x86_64 with 3.10.17 today,
> seem
> 
>     FATAL: Module scsi_wait_scan not found
> 
> still appear, but can boot in to the system at least.

Please test kernel 3.11.6 too.
Comment 42 newbie 2013-10-23 13:48:08 UTC
(In reply to zakrzewskim from comment #41)
> Please test (In reply to newbie from comment #40)
> >  I check my boot.log on CentOS 6.4 x86_64 with 3.10.17 today,
> > seem
> > 
> >     FATAL: Module scsi_wait_scan not found
> > 
> > still appear, but can boot in to the system at least.
> 
> Please test kernel 3.11.6 too.

I'd solve other problem with my 3.10.17 config now ...
Comment 43 newbie 2013-11-06 11:55:12 UTC
3.10.18 still show one line

 FATAL: Module scsi_wait_scan not found

but boot ok.
Comment 44 Alan Bartlett 2013-11-06 15:52:39 UTC
(In reply to newbie from comment #43)
> 3.10.18 still show one line
> 
>  FATAL: Module scsi_wait_scan not found
> 
> but boot ok.

That is not a message output by the kernel but is the result of line number 165 in the "userland" file, /etc/rc.d/rc.sysinit, of RHEL 6 (and its clones).

If you do not like seeing the message, just "comment out" line number 165 in your /etc/rc.d/rc.sysinit file.
Comment 45 newbie 2013-11-08 15:34:29 UTC
> If you do not like seeing the message,
> just "comment out" line number 165 in your /etc/rc.d/rc.sysinit file.

seems no help, that message still exist, I don't know why.
but really problem is:

I use two SATA 200G(sda/sdb) build a soft RAID1 (md)
three days ago, one disk error and off,
but I just discovered today,
after I mdadm add fix it then reboot,
the bad disk show read error constantly,
I can't use normally,so I use the other good one boot, then...

 FATAL: Module scsi_wait_scan not found

then...

message loop~

I use 3.10.17, 3.10.18, the same situation,I can't boot into system,
so finally,I boot with CentOS 2.6 kernel into the system save data,
then shutdown now... ><
Comment 46 zakrzewskim 2013-11-13 22:06:41 UTC
What about kernel 3.12 ?
Comment 47 zakrzewskim 2013-11-14 11:19:25 UTC
3.12 doesn't boot too. The same reason...
Comment 48 Vincent Li 2013-12-06 21:18:11 UTC
oddly, I can boot into 3.12.0-rc4 but not 3.13.0-rc3, neither of them has scsi_wait_scan.ko compiled. I see error message "FATAL: Module scsi_wait_scan not found" for both 3.12.0-rc4 and 3.13.0-rc3. but it only appears once for 3.12.0-rc4. it repeated many times for 3.13.0-rc3 and finally kernel Panic:

FATAL: Module scsi_wait_scan not found.
FATAL: Module scsi_wait_scan not found.
FATAL: Module scsi_wait_scan not found.
FATAL: Module scsi_wait_scan not found.

Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100

CPU: 3 PID: 1 Comm: init Tainted: GF            3.13.0-rc3 #27
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
 0000000000000001 ffff8802108f3d98 ffffffff8156c50b 000000000000fffe
 ffffffff817e0d00 ffff8802108f3e18 ffffffff8156c298 ffffffff00000010
 ffff8802108f3e28 ffff8802108f3dc8 ffff8802108f17b8 ffff8802108f3dd8
Call Trace:
 [<ffffffff8156c50b>] dump_stack+0x49/0x5e
 [<ffffffff8156c298>] panic+0xbb/0x1d5
 [<ffffffff8104e74b>] find_new_reaper+0x17b/0x180
 [<ffffffff8104f8d5>] forget_original_parent+0x45/0x1c0
 [<ffffffff8111e110>] ? perf_cgroup_switch+0x180/0x180
 [<ffffffff8104fa67>] exit_notify+0x17/0x130
 [<ffffffff8104fd6e>] do_exit+0x1ee/0x480
 [<ffffffff81050051>] do_group_exit+0x51/0xc0
 [<ffffffff810500d7>] SyS_exit_group+0x17/0x20
 [<ffffffff81578dd2>] system_call_fastpath+0x16/0x1b

I tried comment out the scsi_wait_scan in rc.sysinit and kernel boot parameter with 'scsi_mod.scan=async', no help
Comment 49 zakrzewskim 2013-12-06 21:19:44 UTC
Can someone test it on H87-G43 board ?
Comment 50 Lin Feng 2013-12-26 09:00:24 UTC
Hi all,

I got exactly the same problem on a CentOS 6.4 64bit kvm guest. That's while trying to update my kenrel to mainline 3.13-rc5 console repeat writing "FATAL: Module scsi_wait_scan not found." and finally panic. The whole output graph is maily same as pasted by Vincent Li.

After tracking this bugzilla I get the thought that the dracut mismatch with the kernel. Since kerne 3.6 and following have dropped scsi_wait_scan module but CentOS6.4's dracut is hardcoded to add the instruction "modeprobe scsi_wait_scan" into /init script in initramfs image.

So I tried following approaches for confirmation:
1. Update kernel to 3.5(it still holds the scsi_wait_scan module): it boots find.
2. Update the dracut to the latest version in git tree, since it has removed the redundant "modeprobe scsi_wait_scan" for building 3.6 and later kernels but it doesn't work, telling me that could not find root device like pasted in step3.
3. Decompress the initramfs and modify the /init script by removing the line "modeprobe scsi_wait_scan" but it doesn't work neither:
(snips)
dracut Warning: No root device "block:/dev/disk/by-uuid/cedcbd9c-32eb-4a3f-9dad-ae5fc560642a" found





dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.


dracut Warning: Signal caught!

dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100

CPU: 1 PID: 1 Comm: init Tainted: GF            3.13.0-rc5 #6
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 0000000000000001 ffff88005dbf9d98 ffffffff8156c92b 000000000000fffe
 ffffffff817dd350 ffff88005dbf9e18 ffffffff8156c6b8 ffffffff00000010
 ffff88005dbf9e28 ffff88005dbf9dc8 ffff88005dbdf7b8 ffff88005dbf9dd8
Call Trace:
 [<ffffffff8156c92b>] dump_stack+0x49/0x5e
 [<ffffffff8156c6b8>] panic+0xbb/0x1d5
 [<ffffffff8104e75b>] find_new_reaper+0x17b/0x180
 [<ffffffff8104f8e5>] forget_original_parent+0x45/0x1c0
 [<ffffffff8111e1f0>] ? perf_cgroup_switch+0x180/0x180
 [<ffffffff8104fa77>] exit_notify+0x17/0x130
 [<ffffffff8104fd7e>] do_exit+0x1ee/0x480
 [<ffffffff81050061>] do_group_exit+0x51/0xc0
 [<ffffffff810500e7>] SyS_exit_group+0x17/0x20
 [<ffffffff81579212>] system_call_fastpath+0x16/0x1b
-------------------------------------------------------------------------------
 
From 2 and 3 it seems that scsi_wait_scan is necessary in some cases, like H87 board and my KVM case.

Any idea, Can anyone tell me how to move on? Thanks in advance.
Comment 51 zakrzewskim 2013-12-26 22:20:46 UTC
What king of KVM do you use ? Is this Proxmox ?
Comment 52 Lin Feng 2013-12-27 01:14:59 UTC
(In reply to zakrzewskim from comment #51)
> What king of KVM do you use ? Is this Proxmox ?

Hi, it's qemu based on KVM. And the host is fedora20 64bit. I install CentOS6.4 via a liveCD. Here is my xml file for guest configuration, maybe it's helpful for reproduction.

[root@localhost home]# cat /etc/libvirt/qemu/CentOS6.4.xml 
<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh edit CentOS6.4
or other application using the libvirt API.
-->

<domain type='kvm'>
  <name>CentOS6.4</name>
  <uuid>52eb93be-64ad-45e8-9e19-5bac87e57bca</uuid>
  <memory unit='KiB'>1572864</memory>
  <currentMemory unit='KiB'>1572864</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-1.6'>hvm</type>
    <boot dev='cdrom'/>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/CentOS6.4.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <disk type='block' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:cb:94:2c'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <graphics type='spice' autoport='yes'/>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
</domain>
Comment 53 Lin Feng 2013-12-31 06:22:02 UTC
hello,

After diving into it I found that in my qemu-VM case "FATAL: Module scsi_wait_scan" is not the root cause, because even I backport the scsi_wait_scan module, it still can't boot, but reports "dracut Warning: No root device "block:/dev/disk/by-uuid/cedcbd9c-32eb-4a3f-9dad-ae5fc560642a" found". 

I bisect through the 3.13-rc6 tree of upstream, and locates this commit broke my initramfs, 
[root@CentOS6 linux]# git bisect bad
1cf7e9c68fe84248174e998922b39e508375e7c1 is the first bad commit
commit 1cf7e9c68fe84248174e998922b39e508375e7c1
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Nov 1 10:52:52 2013 -0600

    virtio_blk: blk-mq support
    
    Switch virtio-blk from the dual support for old-style requests and bios
    to use the block-multiqueue.
    
    Acked-by: Asias He <asias@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Christoph Hellwig <hch@lst.de>

though it's somthing about virtio driver(my guest uses virtio as the storage driver), looking into this commit it is mainly about C code changes, not module compiling or not. Also I have checked the modules compiled, in both cases(with and without this commit) we get virtio_blk.ko module. But the difference is that with this commit virtio_blk.ko isn't packed into the initramfs. 

However between both cases there is no environmental changes, with exactly the same config, same dracut, same gcc, everything...So I don't know why dracut doesn't pack the virtio_blk.ko into the initramfs, and more kidding I find that it packed the floppy.ko instead. 
(In my case as a workround we can compile virtio moduels into kernel or use other disk bus driver such as IDE or USB instead)

But one thing I don't understand can someone tell me why, Will dracut look through the kernel tree(C codes) to find some useful information to pack the final initramfs?

Ps. Related to the initial creative of this bugzilla, I guess it may also be caused by some storage drivers are not packed into initramfs.

thanks
Comment 54 Akemi Yagi 2014-01-22 21:17:56 UTC
(In reply to Lin Feng from comment #53)

> though it's somthing about virtio driver(my guest uses virtio as the storage
> driver), looking into this commit it is mainly about C code changes, not
> module compiling or not. Also I have checked the modules compiled, in both
> cases(with and without this commit) we get virtio_blk.ko module. But the
> difference is that with this commit virtio_blk.ko isn't packed into the
> initramfs. 
> 
> However between both cases there is no environmental changes, with exactly
> the same config, same dracut, same gcc, everything...So I don't know why
> dracut doesn't pack the virtio_blk.ko into the initramfs, and more kidding I
> find that it packed the floppy.ko instead. 
> (In my case as a workround we can compile virtio moduels into kernel or use
> other disk bus driver such as IDE or USB instead)
> 
> But one thing I don't understand can someone tell me why, Will dracut look
> through the kernel tree(C codes) to find some useful information to pack the
> final initramfs?

Thank you for this extensive analysis. I, too, was having the same problem on my KVM guest that uses virtio (host=RHEL 6.5 and guest=CentOS 6.5). Just to reconfirm your findings, I created initramfs with a '--add-drivers virtio_blk' option and the kernel booted just fine.

It would indeed be great if we could find out why dracut fails to pick up some particular modules.
Comment 55 abiko 2014-01-29 10:19:53 UTC
I've had this problem on my CentOS 6 boxes on 3.10, 3.12, until I've cleared the .config files from the kernel source directory and you would need to have Virtio suppoer for block, PCI, net devices compiled in kernel (either a module or embedded).

Good guide to follow is: http://www.linux-kvm.org/page/Virtio

This has  been tested on following kernels:

kernel 3.10.27
kernel 3.10.28
kernel 3.12.8
kernel 3.12.9
kernel 3.13

Currently using the 3.12.8 kernel on one of my VMs :
root@dev01 [~]# lsinitrd /boot/initrd-3.12.8.img | grep virt
-rw-r--r--   1 root     root        28720 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/block/virtio_blk.ko
-rw-r--r--   1 root     root        26688 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/scsi/virtio_scsi.ko
drwxr-xr-x   2 root     root            0 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio
-rw-r--r--   1 root     root        12752 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio/virtio.ko
-rw-r--r--   1 root     root        18304 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio/virtio_mmio.ko
-rw-r--r--   1 root     root        20744 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio/virtio_pci.ko
-rw-r--r--   1 root     root        20912 Jan 25 04:14 lib/modules/3.12.8/kernel/drivers/virtio/virtio_ring.ko
Comment 56 kometch 2014-02-07 15:54:26 UTC
I also tried to introduced something to CentOS 6.5 and KVM on Ubuntu13.10, was the kernel compile the Kernel 3.13.2, but it does not start and continue to be output "module scsi_wait_scan not found". 

In the following, there is no /block/virtio_blk.ko result of the execution of lsinitrd. 

# Lsinitrd /boot/initramfs-3.13.2.img | grep virt 
drwxr-xr-x 2 root root 0 Feb 8 00:50 lib/modules/3.13.2/kernel/drivers/virtio 
-rw-r - r - 1 root root 13216 Feb 8 00:50 lib/modules/3.13.2/kernel/drivers/virtio/virtio.ko 
-rw-r - r - 1 root root 19944 Feb 8 00:50 lib/modules/3.13.2/kernel/drivers/virtio/virtio_pci.ko 
-rw-r - r - 1 root root 19448 Feb 8 00:50 lib/modules/3.13.2/kernel/drivers/virtio/virtio_ring.ko 

it did not improve work even reconfigure the initramfs with dracut. 

Is there a way to deal about this?
Comment 57 Thorsten Kohfeldt 2014-02-21 05:31:39 UTC
I have filed a bug against dracut:

https://bugzilla.redhat.com/show_bug.cgi?id=1067669

which gives an explanation at least for kernel 3.13.


That kernel version introduces multi queue block i/o for virtio_blk, which in turn does not any more call blk_init_queue() but blk_mq_init_queue() instead.

Dracut matches against symbol blk_init_queue but not yet against symbol blk_mq_init_queue.

This should be fixed for Fedora and also for all RHEL derivatives ...


In the mean time there is this WORKAROUND:

1) check if virtio_blk is in intrd:
# for i in /boot/initramfs-* ; do echo $i: ; lsinitrd $i | grep virt ; done

2) if virtio_blk is missing in the relevant initrd, then

# echo 'add_drivers+="virtio_blk"' >/etc/dracut.conf.d/force-vitio_blk-to-ensure-boot.conf
   (NOTE that the .conf extension is mandatory !)

3) then rebuild the initrd
Comment 58 Thorsten Kohfeldt 2014-02-21 05:35:10 UTC
Please verify/confirm (add comments to) that redhat/dracut bug/solution, so it gets dracut maintainers' attention.

https://bugzilla.redhat.com/show_bug.cgi?id=1067669
Comment 59 Lin Feng 2014-02-21 05:53:40 UTC
(In reply to Thorsten Kohfeldt from comment #57)
> I have filed a bug against dracut:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1067669
> 
> which gives an explanation at least for kernel 3.13.
> 
> 
> That kernel version introduces multi queue block i/o for virtio_blk, which
> in turn does not any more call blk_init_queue() but blk_mq_init_queue()
> instead.
> 
> Dracut matches against symbol blk_init_queue but not yet against symbol
> blk_mq_init_queue.
> 
Good, based on the bisect it seems that it's the root cause, thanks :)
Comment 60 Jamie Bainbridge 2015-10-08 02:55:56 UTC
This was resolved in dracut with http://git.kernel.org/cgit/boot/dracut/dracut.git/commit/?id=faa17f09218ed7e2ce4362cc2d9319f8d5b7a37f which was included in Red Hat's dracut-004-356.el6 package.

This bug can be closed.

Note You need to log in before you can comment on or make changes to this bug.