Bug 26692

Summary: Fusion MPT SPI bringup takes 80 seconds when the kernel is quiet
Product: SCSI Drivers Reporter: Ferenc Wágner (wferi)
Component: OtherAssignee: scsi_drivers-other
Status: RESOLVED INSUFFICIENT_DATA    
Severity: normal CC: alan, jrnieder
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.1.0-rc3 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg after booting 2.6.37 quiet mtpbase.mpt_debug_level=0x8188
dmesg of a slow boot ('quiet' passed to the kernel) with mpt_debug_level=0xbbbff9
dmesg of a quick boot ('debug' passed to the kernel) with mpt_debug_level=0xbbbff9

Description Ferenc Wágner 2011-01-14 15:40:21 UTC
Created attachment 43542 [details]
dmesg after booting 2.6.37 quiet mtpbase.mpt_debug_level=0x8188

This is the upstream followup for http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607906, where you can find some basic kernel logs. Instead those I attach a log with mpt_debug_level=0x8188 (rather arbibtary value snatched from #11045, please ask for a better one).

In short: on an IBM x345 loading the mptspi module under 2.6.32 oopses, unless the kernel is started with 'quiet', in which case it works all right. Under 2.6.37 the module still works all right when the 'quiet' parameter is present, but introduces an 80-second boot delay without it (but at least manages to avoid the oops).
Comment 1 Ferenc Wágner 2011-01-14 15:46:58 UTC
Oh no, I mixed it up...

Actually, the 'quiet' parameter causes the problems (the oops under 2.6.32 and the delay under 2.6.37), and the thing works perfectly when 'quiet' is not present on the kernel command line. To be precise, even after booting with 'quiet', issuing a 'dmesg -n8' before 'modprobe mptspi' avoids the oops under 2.6.32.
Comment 2 Jonathan Nieder 2011-08-05 20:55:52 UTC
If I am reading the first log at <http://bugs.debian.org/607906> correctly, the oops happened in the READ32 of

        /* Read doorbell and check for active bit */
        if (!(CHIPREG_READ32(&ioc->chip->Doorbell) & MPI_DOORBELL_ACTIVE))
                return -5;

But that's only from a quick look, and it doesn't make sense in light of your later observations.

Would it be possible to sprinkle the code with printk calls to figure out what's causing trouble (and for example where time is being spent in the current version)?
Comment 3 Ferenc Wágner 2011-08-24 16:49:59 UTC
I rechecked the issue with pristine 3.1.0-rc3, and everything is the same: the 'quiet' kernel parameter introduces delays in the boot process. I will add printks here and there, but some guidance would be most useful in that.
Comment 4 Ferenc Wágner 2011-08-25 17:00:48 UTC
If I boot with 'quiet mptbase.mpt_fwfault_debug=1' I always reach the following panic:

[   19.692064] Kernel panic - not syncing: ioc0: Firmware is halted due to command timeout
[   19.692066] 
[   19.701704] Pid: 195, comm: scsi_scan_2 Not tainted 3.1.0-rc3+ #44
[   19.707953] Call Trace:
[   19.710482]  [<c12ce914>] ? panic+0x4d/0x14a
[   19.714830]  [<f82eba69>] ? mpt_halt_firmware+0x50/0x50 [mptbase]
[   19.721001]  [<f82ebab3>] ? mpt_SoftResetHandler+0x4a/0x2e3 [mptbase]
[   19.727519]  [<f82ed8df>] ? mpt_Soft_Hard_ResetHandler+0xb/0x1f [mptbase]
[   19.734385]  [<f82edef9>] ? mpt_config+0x457/0x4b5 [mptbase]
[   19.740124]  [<f82a955e>] ? mptspi_write_spi_device_pg1+0x2a4/0x2f7 [mptspi]
[   19.747246]  [<f82a960d>] ? mptspi_write_width+0x5c/0x61 [mptspi]
[   19.753414]  [<f82a9d9f>] ? mptspi_target_alloc+0x38/0x1a6 [mptspi]
[   19.759761]  [<f82a9eed>] ? mptspi_target_alloc+0x186/0x1a6 [mptspi]
[   19.766194]  [<f8215cc4>] ? scsi_alloc_target+0x194/0x1ed [scsi_mod]
[   19.772624]  [<f8216bd4>] ? __scsi_scan_target+0x40/0x536 [scsi_mod]
[   19.779054]  [<c100b93d>] ? __switch_to+0xe3/0x155
[   19.783920]  [<c12cf192>] ? schedule+0x5f7/0x622
[   19.788622]  [<f8217111>] ? scsi_scan_channel+0x47/0x76 [scsi_mod]
[   19.794881]  [<f8217200>] ? scsi_scan_host_selected+0xc0/0xff [scsi_mod]
[   19.801657]  [<f8217290>] ? do_scsi_scan_host+0x51/0x56 [scsi_mod]
[   19.807914]  [<f82172a9>] ? do_scan_async+0x14/0x107 [scsi_mod]
[   19.813910]  [<f8217295>] ? do_scsi_scan_host+0x56/0x56 [scsi_mod]
[   19.820170]  [<c1052417>] ? kthread+0x65/0x6a
[   19.824602]  [<c10523b2>] ? kthread_worker_fn+0x119/0x119
[   19.830080]  [<c12d553e>] ? kernel_thread_helper+0x6/0x10
[   19.835566] Rebooting in 10 seconds..

I hope it more or less answers where the time is being spent. Sometime instead of rebooting the console output continues as:

[   19.839551] ------------[ cut here ]------------
[   19.839551] WARNING: at arch/x86/kernel/smp.c:118 resched_task+0x59/0x5b()
[   19.839551] Hardware name: eserver xSeries 345 -[867091X]-
[   19.839551] Modules linked in: sg sr_mod cdrom mptspi(+) mptscsih ata_generic mptbase scsi_transport_spi pata_serverworks libata ohci_hcd ehci_hcd scsi_mod e1000 usbcore floppy
[   19.839551] Pid: 6, comm: migration/0 Not tainted 3.1.0-rc3+ #44
[   19.839551] Call Trace:
[   19.839551]  [<c102c396>] ? resched_task+0x59/0x5b
[   19.839551]  [<c103bd97>] ? warn_slowpath_common+0x68/0x7a
[   19.839551]  [<c103bdb5>] ? warn_slowpath_null+0xc/0xf
[   19.839551]  [<c102c396>] ? resched_task+0x59/0x5b
[   19.839551]  [<c102cd44>] ? check_preempt_curr+0x35/0x5f
[   19.839551]  [<c1036463>] ? active_load_balance_cpu_stop+0x1a0/0x1ed
[   19.839551]  [<c10362c3>] ? pull_task+0x34/0x34
[   19.839551]  [<c1075829>] ? cpu_stopper_thread+0xbd/0x150
[   19.839551]  [<c12cf192>] ? schedule+0x5f7/0x622
[   19.839551]  [<c10365f4>] ? try_to_wake_up+0x144/0x14d
[   19.839551]  [<c102d364>] ? __wake_up_common+0x34/0x59
[   19.839551]  [<c10322de>] ? complete+0x28/0x36
[   19.839551]  [<c107576c>] ? res_counter_charge+0xcd/0xcd
[   19.839551]  [<c1052417>] ? kthread+0x65/0x6a
[   19.839551]  [<c10523b2>] ? kthread_worker_fn+0x119/0x119
[   19.839551]  [<c12d553e>] ? kernel_thread_helper+0x6/0x10
[   19.839551] ---[ end trace af3cae9d17823201 ]---

but it's probably unrelated. I also recompiled with CONFIG_FUSION_LOGGING=y so setting 'mptbase.mpt_debug_level=-1' actually has an effect. But this of course can't be seen when coupled with 'quiet', so the above console output doesn't show this. I'm trying to find a better mpt_debug_level value.
Comment 5 Ferenc Wágner 2011-08-25 17:56:32 UTC
Created attachment 70222 [details]
dmesg of a slow boot ('quiet' passed to the kernel) with mpt_debug_level=0xbbbff9
Comment 6 Ferenc Wágner 2011-08-25 18:00:33 UTC
Created attachment 70232 [details]
dmesg of a quick boot ('debug' passed to the kernel) with mpt_debug_level=0xbbbff9

I post a "normal" dmesg for comparison. The boot is also quick if I simply leave off the 'quiet' parameter, 'debug' isn't required at all.
Comment 7 Jonathan Nieder 2012-02-15 21:08:07 UTC
(In reply to comment #5)
> scsi0 : ioc0: LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=27

Is there a firmware update available?
Comment 8 Ferenc Wágner 2012-02-16 15:25:25 UTC
http://www.lsi.com/support/products/Pages/LSI53C1030.aspx doesn't make it particularly clear for me, but if 01000e00 corresponds to FW-1.0.14.0, then probably there is: FW-1.03.29.20 or FW-1.03.34. IS, IT and IME doesn't say much to me, I'll give them a try.
Comment 9 Alan 2012-08-14 15:36:49 UTC
Any updates on this ?