Bug 14043 - System sometimes hangs during boot
System sometimes hangs during boot
Status: CLOSED INVALID
Product: Other
Classification: Unclassified
Component: Other
All Linux
: P1 normal
Assigned To: other_other
:
Depends on:
Blocks: 13615
  Show dependency treegraph
 
Reported: 2009-08-23 18:04 UTC by Bart Van Assche
Modified: 2009-09-29 21:12 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.31-rc3
Tree: Mainline
Regression: Yes


Attachments
Console screenshot (712.98 KB, image/jpeg)
2009-08-23 18:05 UTC, Bart Van Assche
Details
Kernel config (49.70 KB, text/plain)
2009-08-23 18:06 UTC, Bart Van Assche
Details

Description Bart Van Assche 2009-08-23 18:04:25 UTC
About one out of three times the system on which I installed a kernel obtained from linux/kernel/git/roland/infiniband.git (for-next branch) hangs during boot. The symptoms if the system hangs during boot are as follows (see also the attached screenshot):
- console tty echo still works -- any keys pressed are echoed on the screen; caps lock works.
- the console switching keys (Alt-F1 / Alt-F2) did not have any effect -- maybe the virtual consoles had not yet been initialized.
- Ethernet interfaces had not yet been brought up: one of the Ethernet interfaces is connected with a crossed cable to another Linux system. That second system had logged the message "kernel: sky2 eth0: Link is down." during shutdown of the first system. But the second system had not yet reported that the first system had brought up its Ethernet interfaces.
- The following message was displayed several times on the console: "ifup-dhcp [...] trap invalid opcode ip:... sp:... error:0 in bash[...]
Comment 1 Bart Van Assche 2009-08-23 18:05:35 UTC
Created attachment 22817 [details]
Console screenshot
Comment 2 Bart Van Assche 2009-08-23 18:06:45 UTC
Created attachment 22818 [details]
Kernel config
Comment 3 Bart Van Assche 2009-08-23 18:36:44 UTC
Kernel module list when booting succeeds:
$ lsmod                                  
Module                  Size  Used by                
rdma_ucm               13408  0                      
ib_srp                 29472  0                      
scsi_transport_srp      7288  1 ib_srp               
scsi_tgt               15272  1 scsi_transport_srp   
hid_belkin              3192  0                      
ib_ipoib               83512  0                      
ib_iser                36176  0                      
ib_uverbs              35176  1 rdma_ucm             
rdma_cm                33068  2 rdma_ucm,ib_iser     
ib_cm                  39184  3 ib_srp,ib_ipoib,rdma_cm
ib_umad                15304  0                        
iw_cm                  10712  1 rdma_cm                
ib_sa                  24480  4 ib_srp,ib_ipoib,rdma_cm,ib_cm
ib_addr                 9032  1 rdma_cm                      
mlx4_ib                45320  0                              
ib_mad                 43000  4 ib_cm,ib_umad,ib_sa,mlx4_ib  
iscsi_tcp              13924  0                              
libiscsi_tcp           20028  1 iscsi_tcp                    
ib_core                69744  12 rdma_ucm,ib_srp,ib_ipoib,ib_iser,ib_uverbs,rdma_cm,ib_cm,ib_umad,iw_cm,ib_sa,mlx4_ib,ib_mad                                    
libiscsi               47584  3 ib_iser,iscsi_tcp,libiscsi_tcp                  
ipv6                  314240  30 ib_ipoib,ib_addr                               
scsi_transport_iscsi    38600  4 ib_iser,iscsi_tcp,libiscsi                     
af_packet              24848  0                                                 
cpufreq_conservative     9584  0                                                
cpufreq_userspace       4256  0                                                 
cpufreq_powersave       2040  0                                                 
acpi_cpufreq            9328  0                                                 
fuse                   68184  1                                                 
md_mod                 99116  0                                                 
dm_mod                 78384  0                                                 
coretemp                7780  0                                                 
hwmon                   3816  1 coretemp                                        
8250_pnp               18520  0                                                 
8250                   26952  1 8250_pnp
button                  7064  0
mlx4_core              90376  1 mlx4_ib
serial_core            24336  1 8250
sky2                   54204  0
pcspkr                  3160  0
sg                     27648  0
sr_mod                 16516  0
usbhid                 23424  0
hid                    44292  2 hid_belkin,usbhid
sd_mod                 40224  4
uhci_hcd               26024  0
ehci_hcd               40000  0
usbcore               174228  4 usbhid,uhci_hcd,ehci_hcd
ext3                  134408  2
mbcache                 9624  1 ext3
jbd                    57000  1 ext3
fan                     4216  0
ide_pci_generic         5372  0
ide_core               83936  1 ide_pci_generic
ata_generic             6332  0
ata_piix               27804  3
thermal                17792  0
processor              47868  1 acpi_cpufreq
pata_jmicron            4408  0
ahci                   40524  0
Comment 4 Bart Van Assche 2009-08-23 18:38:44 UTC
Kernel command line:
$ cat /proc/cmdline
root=/dev/disk/by-id/ata-ST3160815AS_6RA2TMXQ-part6 resume=/dev/disk/by-id/ata-ST3160815AS_6RA2TMXQ-part5 splash=silent vga=0 edd=off

Note: adding slub_debug=FZPU to the kernel command line did not reveal any extra information. I have not yet been able to reproduce this issue with the parameter slub_debug=FZPU added to the kernel command line.
Comment 5 Bart Van Assche 2009-08-23 18:51:38 UTC
Userspace: openSUSE 11.1 with the InfiniBand software provided by openSUSE (the OFED InfiniBand stack has not been installed on this system).

Hardware info:
$ lspci
00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03)
00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 1
00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 5
00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 6
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller
01:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX IB DDR, PCIe 2.0 5GT/s] (rev a0)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
03:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6121 SATA II Controller (rev b1)
05:01.0 VGA compatible controller: S3 Inc. ViRGE/DX or /GX (rev 01)
05:02.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 14)
05:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 70)
Comment 6 Bart Van Assche 2009-08-26 06:08:46 UTC
Note: the same system boots fine with the 2.6.30.4 and several older kernels.
Comment 7 Rafael J. Wysocki 2009-09-06 19:53:54 UTC
Can you please test 2.6.31-rc9 too?
Comment 8 Bart Van Assche 2009-09-12 15:32:04 UTC
I have not yet been able to reproduce this issue with the 2.6.31 kernel (final release, not the rc9). Still testing with the latest infiniband.git kernel, which is based on 2.6.31-rc9.
Comment 9 Bart Van Assche 2009-09-19 14:24:40 UTC
Closing as invalid because caused by an unreliable motherboard.

Note You need to log in before you can comment on or make changes to this bug.