Bug 42969 - Softlockup during boot during init_sd
Summary: Softlockup during boot during init_sd
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-scsi@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-21 06:46 UTC by Eric Sesterhenn
Modified: 2012-09-04 11:04 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.4-rc6
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Bisect Log (2.36 KB, text/plain)
2012-03-21 06:46 UTC, Eric Sesterhenn
Details
Kernel Log from a working kernel (328.88 KB, text/plain)
2012-03-21 06:48 UTC, Eric Sesterhenn
Details
Kernel Log from a failing kernel (245.83 KB, text/plain)
2012-03-21 06:48 UTC, Eric Sesterhenn
Details
lspci output (60.60 KB, text/plain)
2012-03-21 06:49 UTC, Eric Sesterhenn
Details
multipath -ll output (9.36 KB, text/plain)
2012-03-21 06:50 UTC, Eric Sesterhenn
Details
dmidecode output (13.41 KB, text/plain)
2012-03-21 06:53 UTC, Eric Sesterhenn
Details
Kernel 3.4-rc6 log file (105.71 KB, text/plain)
2012-05-10 21:55 UTC, Stefan Pietsch
Details

Description Eric Sesterhenn 2012-03-21 06:46:54 UTC
Created attachment 72662 [details]
Bisect Log

Hi,

when switching from kernels 2.6 to 3 we run into a softlockup during sd_init.

The last working kernel we used was Debian Backports 2.6.39-
bpo.2-amd64. The newest kernel we tested which failed is vanilla 3.3-rc5.

The problem roughly happens in three out of four boots.


[   60.219724] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:257]                                                                  [   60.219726] Modules linked in: sd_mod(+) crc_t10dif uhci_hcd(+) lpfc 
scsi_transport_fc scsi_tgt ehci_hcd usbcore igb scsi_mod usb_common dca                                                                             [   60.219736] CPU 0                                                                               [   60.219737] Modules linked in: sd_mod(+) crc_t10dif uhci_hcd(+) lpfc 
scsi_transport_fc scsi_tgt ehci_hcd usbcore igb scsi_mod usb_common dca                                                                             [   60.219743]                                                                      [   60.219746] Pid: 257, comm: modprobe Not tainted 3.2.0-1-amd64 #1 HITACHI     
BladeSymphony F51   /7TPBVa                             
[   60.219750] RIP: 0010:[<ffffffff810703c0>]  [<ffffffff810703c0>] 
smp_call_function_many+0x1c7/0x1de                                   
[   60.219759] RSP: 0018:ffff880630939d48  EFLAGS: 00000202                                                                        
[   60.219761] RAX: 0000000000000200 RBX: 0000000000000010 RCX: 
0000000000000000                                                         
[   60.219763] RDX: 0000000000000010 RSI: 0000000000000200 RDI: 
0000000000000286                                                         
[   60.219766] RBP: 000000000000000f R08: 0000000000000200 R09: 
ffffffff8168cac0                                                         
[   60.219768] R10: 00000000000163a8 R11: 00000000000163a8 R12: 
ffffffffffffff95                                                         
[   60.219770] R13: 0000000000000286 R14: 0000000000000200 R15: 
0000000000000010                                                         
[   60.219773] FS:  00007f29dea69700(0000) GS:ffff88033fc00000(0000) 
knlGS:0000000000000000                                              
[   60.219776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                [   60.219778] CR2: ffffe8fcffc00000 CR3: 00000006308cb000 CR4: 
00000000000006f0                                                         
[   60.219780] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000                                                         
[   60.219782] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400                                                         
[   60.219785] Process modprobe (pid: 257, threadinfo ffff880630938000, task 
ffff88063170caa0)                                           
[   60.219787] Stack:                                                                          [   60.219788]  0000000000000246 01ffffff810eaa96 0000000000000088 
ffffffff810e9109                                                      
[   60.219794]  ffff88032cdc8f00 000000000000fef0 00000000ffffffff 
000000000000000f                                                      
[   60.219798]  000000000000003c ffffffff810703f2 00000000000163a8 
ffffffff81070409                                                      
[   60.219802] Call Trace:                                                                          [   60.219809]  [<ffffffff810e9109>] ? kmem_cache_size+0x4/0x4                                                         [   60.219813]  [<ffffffff810703f2>] ? smp_call_function+0x1b/0x20                                                     [   60.219816]  [<ffffffff81070409>] ? on_each_cpu+0x12/0x35                                                           [   60.219820]  [<ffffffff810eb078>] ? do_tune_cpucache+0xef/0x3d6                                                     [   60.219823]  [<ffffffff810eb530>] ? enable_cpucache+0x78/0x9d                                                       [   60.219827]  [<ffffffff810eba3a>] ? kmem_cache_create+0x3fb/0x476                                                   [   60.219834]  [<ffffffffa018609c>] ? init_sd+0x9c/0x1000 [sd_mod]                                                                      
[   60.219837]  [<ffffffffa0186000>] ? 0xffffffffa0185fff                                                              [   60.219840]  [<ffffffffa0186000>] ? 0xffffffffa0185fff                                                              [   60.219844]  [<ffffffff81002085>] ? do_one_initcall+0x75/0x12c                                                      [   60.219847]  [<ffffffffa0186000>] ? 0xffffffffa0185fff                                                              [   60.219851]  [<ffffffff81074efd>] ? sys_init_module+0x10c/0x25b                                                     [   60.219856]  [<ffffffff813457d2>] ? system_call_fastpath+0x16/0x1b                                                  [   60.219858] Code: 48 89 5a 08 48 89 c6 89 6b 28 48 89 1d 1a 3e 59 00 e8 a9 
04 2d 00 0f ae f0 4c 89 ef ff 15 11 3e 5a 00 80 7c 24 0f 00 75 04 eb 08 <f3> 
90 f6 43 20 01 75 f8 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41                                                            
[   60.219880] Call Trace:                                                                          [   60.219884]  [<ffffffff810e9109>] ? kmem_cache_size+0x4/0x4                                                         [   60.219887]  [<ffffffff810703f2>] ? smp_call_function+0x1b/0x20                                                     [   60.219890]  [<ffffffff81070409>] ? on_each_cpu+0x12/0x35                                                           [   60.219894]  [<ffffffff810eb078>] ? do_tune_cpucache+0xef/0x3d6                                                     [   60.219897]  [<ffffffff810eb530>] ? enable_cpucache+0x78/0x9d                                                       [   60.219901]  [<ffffffff810eba3a>] ? kmem_cache_create+0x3fb/0x476                                                   [   60.219906]  [<ffffffffa018609c>] ? init_sd+0x9c/0x1000 [sd_mod]                                                                      
[   60.219909]  [<ffffffffa0186000>] ? 0xffffffffa0185fff                                                              [   60.219911]  [<ffffffffa0186000>] ? 0xffffffffa0185fff                                                              [   60.219914]  [<ffffffff81002085>] ? do_one_initcall+0x75/0x12c                                                      [   60.219917]  [<ffffffffa0186000>] ? 0xffffffffa0185fff                                                              [   60.219920]  [<ffffffff81074efd>] ? sys_init_module+0x10c/0x25b                                                     [   60.219924]  [<ffffffff813457d2>] ? system_call_fastpath+0x16/0x1b           


A collegue tried a bisect, but the offending commit seemed USB related and reverting did not fix the problem for us.
Comment 1 Eric Sesterhenn 2012-03-21 06:48:02 UTC
Created attachment 72663 [details]
Kernel Log from a working kernel
Comment 2 Eric Sesterhenn 2012-03-21 06:48:56 UTC
Created attachment 72664 [details]
Kernel Log from a failing kernel
Comment 3 Eric Sesterhenn 2012-03-21 06:49:27 UTC
Created attachment 72665 [details]
lspci output
Comment 4 Eric Sesterhenn 2012-03-21 06:50:16 UTC
Created attachment 72666 [details]
multipath -ll output
Comment 5 Eric Sesterhenn 2012-03-21 06:53:10 UTC
Created attachment 72667 [details]
dmidecode output
Comment 6 Eric Sesterhenn 2012-04-25 14:39:09 UTC
Hi,

is there any additional information I can provide here? This really troubles us, since this means that only one of four boots succeed and the other three need a manual restart.

Regards, Eric
Comment 7 Stefan Pietsch 2012-05-10 21:51:00 UTC
Kernel 3.4-rc6 does not work either. The system seems to boot up after the soft lockup, but hangs with "Setting sysfs variables".

Have a look at the attached log file.
Comment 8 Stefan Pietsch 2012-05-10 21:55:54 UTC
Created attachment 73245 [details]
Kernel 3.4-rc6 log file
Comment 9 Alan 2012-09-04 11:04:22 UTC
Bugzilla is mostly used for tracking reports. It's a better idea to discuss it on linux-scsi@vger.kernel.org if you want to be involved in actively fixing it.

Note You need to log in before you can comment on or make changes to this bug.