About one out of three times the system on which I installed a kernel obtained from linux/kernel/git/roland/infiniband.git (for-next branch) hangs during boot. The symptoms if the system hangs during boot are as follows (see also the attached screenshot): - console tty echo still works -- any keys pressed are echoed on the screen; caps lock works. - the console switching keys (Alt-F1 / Alt-F2) did not have any effect -- maybe the virtual consoles had not yet been initialized. - Ethernet interfaces had not yet been brought up: one of the Ethernet interfaces is connected with a crossed cable to another Linux system. That second system had logged the message "kernel: sky2 eth0: Link is down." during shutdown of the first system. But the second system had not yet reported that the first system had brought up its Ethernet interfaces. - The following message was displayed several times on the console: "ifup-dhcp [...] trap invalid opcode ip:... sp:... error:0 in bash[...]
Created attachment 22817 [details] Console screenshot
Created attachment 22818 [details] Kernel config
Kernel module list when booting succeeds: $ lsmod Module Size Used by rdma_ucm 13408 0 ib_srp 29472 0 scsi_transport_srp 7288 1 ib_srp scsi_tgt 15272 1 scsi_transport_srp hid_belkin 3192 0 ib_ipoib 83512 0 ib_iser 36176 0 ib_uverbs 35176 1 rdma_ucm rdma_cm 33068 2 rdma_ucm,ib_iser ib_cm 39184 3 ib_srp,ib_ipoib,rdma_cm ib_umad 15304 0 iw_cm 10712 1 rdma_cm ib_sa 24480 4 ib_srp,ib_ipoib,rdma_cm,ib_cm ib_addr 9032 1 rdma_cm mlx4_ib 45320 0 ib_mad 43000 4 ib_cm,ib_umad,ib_sa,mlx4_ib iscsi_tcp 13924 0 libiscsi_tcp 20028 1 iscsi_tcp ib_core 69744 12 rdma_ucm,ib_srp,ib_ipoib,ib_iser,ib_uverbs,rdma_cm,ib_cm,ib_umad,iw_cm,ib_sa,mlx4_ib,ib_mad libiscsi 47584 3 ib_iser,iscsi_tcp,libiscsi_tcp ipv6 314240 30 ib_ipoib,ib_addr scsi_transport_iscsi 38600 4 ib_iser,iscsi_tcp,libiscsi af_packet 24848 0 cpufreq_conservative 9584 0 cpufreq_userspace 4256 0 cpufreq_powersave 2040 0 acpi_cpufreq 9328 0 fuse 68184 1 md_mod 99116 0 dm_mod 78384 0 coretemp 7780 0 hwmon 3816 1 coretemp 8250_pnp 18520 0 8250 26952 1 8250_pnp button 7064 0 mlx4_core 90376 1 mlx4_ib serial_core 24336 1 8250 sky2 54204 0 pcspkr 3160 0 sg 27648 0 sr_mod 16516 0 usbhid 23424 0 hid 44292 2 hid_belkin,usbhid sd_mod 40224 4 uhci_hcd 26024 0 ehci_hcd 40000 0 usbcore 174228 4 usbhid,uhci_hcd,ehci_hcd ext3 134408 2 mbcache 9624 1 ext3 jbd 57000 1 ext3 fan 4216 0 ide_pci_generic 5372 0 ide_core 83936 1 ide_pci_generic ata_generic 6332 0 ata_piix 27804 3 thermal 17792 0 processor 47868 1 acpi_cpufreq pata_jmicron 4408 0 ahci 40524 0
Kernel command line: $ cat /proc/cmdline root=/dev/disk/by-id/ata-ST3160815AS_6RA2TMXQ-part6 resume=/dev/disk/by-id/ata-ST3160815AS_6RA2TMXQ-part5 splash=silent vga=0 edd=off Note: adding slub_debug=FZPU to the kernel command line did not reveal any extra information. I have not yet been able to reproduce this issue with the parameter slub_debug=FZPU added to the kernel command line.
Userspace: openSUSE 11.1 with the InfiniBand software provided by openSUSE (the OFED InfiniBand stack has not been installed on this system). Hardware info: $ lspci 00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03) 00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03) 00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5 00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6 00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2 00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller 00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 1 00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 5 00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 6 00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1 00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2 00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3 00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller 00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller 00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller 01:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX IB DDR, PCIe 2.0 5GT/s] (rev a0) 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) 03:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6121 SATA II Controller (rev b1) 05:01.0 VGA compatible controller: S3 Inc. ViRGE/DX or /GX (rev 01) 05:02.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 14) 05:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 70)
Note: the same system boots fine with the 2.6.30.4 and several older kernels.
Can you please test 2.6.31-rc9 too?
I have not yet been able to reproduce this issue with the 2.6.31 kernel (final release, not the rc9). Still testing with the latest infiniband.git kernel, which is based on 2.6.31-rc9.
Closing as invalid because caused by an unreliable motherboard.