Bug 197877 - arcmsr fails to initialize Areca ARC-1110/ARC-1120 on some systems
Summary: arcmsr fails to initialize Areca ARC-1110/ARC-1120 on some systems
Status: RESOLVED CODE_FIX
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: scsi_drivers-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-15 04:07 UTC by kr
Modified: 2019-07-02 22:12 UTC (History)
0 users

See Also:
Kernel Version: 3.18+
Tree: Mainline
Regression: No


Attachments
.config for 4.14 (178.86 KB, text/plain)
2017-11-15 13:32 UTC, kr
Details
Full dmesg log (kernel 4.14) (41.70 KB, text/plain)
2017-11-15 13:33 UTC, kr
Details
Output from lspci -vvv (12.16 KB, text/plain)
2017-11-15 13:34 UTC, kr
Details

Description kr 2017-11-15 04:07:54 UTC
Since the Areca driver was upgraded to v1.30.00.04 in kernel 3.18, ARC-1110 and ARC-1120 controllers is no longer properly initialized on some systems. All kernels from 3.18 onwards are affected.

Loading the module generates the following messages in the log:

---------------------
[  634.409073] Areca RAID Controller5: Model ARC-1120, F/W V1.49 2010-12-02
[  634.410504] scsi host5: Areca SATA RAID Controller (RAID6 capable)
               arcmsr version v1.30.00.04-20140919

[  634.410832] arcmsr 0000:07:0e.0: irq 19 for MSI/MSI-X
[  634.410893] arcmsr5: msi enabled
[  655.017019] arcmsr5: abort device command of scsi id = 0 lun = 0
[  655.017032] arcmsr5: scsi id = 0 lun = 0 ccb = '0xeeb00000' poll command abort successfully
[  676.009017] arcmsr5: abort device command of scsi id = 1 lun = 0
[  676.009030] arcmsr5: scsi id = 1 lun = 0 ccb = '0xeeb00660' poll command abort successfully

[---- (the above two lines repeat for SCSI IDs 2-15) ---]

[  970.025230] scsi 5:0:16:0: Processor         Areca    RAID controller  R001 PQ: 0 ANSI: 0 CCS
---------------------

No logical or passthrough drives are detected regardless of controller configuration. 

The problem only affects certain hardware. The above was taken from a ProLiant ML370 G3 (Intel x86). The issue was reproduced on x86_64 using an AMD-based Fujitsu PC. Both systems work as expected with kernel 3.17.8, and the exact same controllers works fine under any kernel when used in other systems, like for instance a ProLiant ML350 G5 (x86 or x86_64).
Comment 1 kr 2017-11-15 13:32:45 UTC
Created attachment 260677 [details]
.config for 4.14
Comment 2 kr 2017-11-15 13:33:57 UTC
Created attachment 260679 [details]
Full dmesg log (kernel 4.14)
Comment 3 kr 2017-11-15 13:34:22 UTC
Created attachment 260681 [details]
Output from lspci -vvv
Comment 4 kr 2017-11-17 19:39:47 UTC
Building the latest Areca driver (1.40.00.02 from http://www.areca.us/support/s_linux/driver/Source%20Code/arcmsr-1.40.00.02-source-only.dkms.tar.gz) against kernel 3.17.8 introduces the bug, so it's definitely the driver rather than some other issue with the kernel.

The latest driver that works on the affected systems seems to be 1.20.0X.15-130619 (ftp://ftp.areca.com.tw/RaidCards/AP_Drivers/Linux/DRIVER/SourceCode/arcmsr.1.20.0X.15-130619.zip), which unfortunately doesn't compile against recent kernels.
Comment 5 kr 2017-11-18 12:03:31 UTC
Interrupts are handled differently by the more recent driver. From a working system running kernel 3.17.8:

--- /proc/interrupts ---
            CPU0       CPU1       CPU2       CPU3
   0:        133          0          0          0   IO-APIC-edge      timer
   1:          1         11          0          0   IO-APIC-edge      i8042
   6:          0          3          0          0   IO-APIC-edge      floppy
   7:          0          0          0          0   IO-APIC-edge      parport0
   8:          0          1          0          0   IO-APIC-edge      rtc0
   9:          0          0          0          0   IO-APIC-fasteoi   acpi
  11:          0          0          0          0   IO-APIC-fasteoi   ohci_hcd:usb1
  12:          0        165          0          0   IO-APIC-edge      i8042
  14:          1        254          0          0   IO-APIC-edge      pata_serverworks
  15:          0          0          0          0   IO-APIC-edge      pata_serverworks
  16:          0       5658          0          0   IO-APIC   10-fasteoi   sata_sil
  17:          0          0          0          0   IO-APIC    1-fasteoi   hpilo
  18:          0        247          0          0   IO-APIC   13-fasteoi   eth0
  19:          0         35          0          0   IO-APIC    8-fasteoi   arcmsr
 NMI:          0          0          0          0   Non-maskable interrupts
 LOC:       8967       8566      11465       8375   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance monitoring interrupts
 IWI:          0          1          0          0   IRQ work interrupts
 RTR:          1          0          0          0   APIC ICR read retries
 RES:       3182       1918       6329       2498   Rescheduling interrupts
 CAL:        992         16         11       1265   Function call interrupts
 TLB:        134        139        181        189   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:          1          1          1          1   Machine check polls
 THR:          0          0          0          0   Hypervisor callback interrupts
 ERR:          0
 MIS:          0
------------------------

Unloading arcmsr v1.20.00.15 and loading v1.40.0X.02 instead results in this change:

--- /proc/interrupts ---
  20:          0          0          0          0   PCI-MSI-edge      arcmsr
------------------------

Here's the output from lspci. On a working system:

--- lspci -vvv with driver v1.20.00.15 ---
07:0e.0 RAID bus controller: Areca Technology Corp. ARC-1120 8-Port PCI-X to SATA RAID Controller
        Subsystem: Areca Technology Corp. ARC-1120 8-Port PCI-X to SATA RAID Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping+ SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
        Latency: 64 (32000ns min), Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 19
        NUMA node: 0
        Region 0: Memory at f7ff0000 (32-bit, non-prefetchable) [size=4K]
        Region 2: Memory at f7800000 (32-bit, prefetchable) [size=4M]
        [virtual] Expansion ROM at f7f00000 [disabled] [size=64K]
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [d0] MSI: Enable- Count=1/2 Maskable- 64bit+
                Address: 00000000fee0f00c  Data: 41a1
        Capabilities: [e0] PCI-X non-bridge device
                Command: DPERE+ ERO- RBC=1024 OST=8
                Status: Dev=07:0e.0 64bit+ 133MHz+ SCD- USC- DC=bridge DMMRBC=1024 DMOST=4 DMCRS=32 RSCEM- 266MHz- 533MHz-
        Kernel driver in use: arcmsr
        Kernel modules: arcmsr
------------------------------------------

On the same system after loading the most recent arcmsr driver:

--- lspci -vvv with driver v1.40.0X.02 ---
07:0e.0 RAID bus controller: Areca Technology Corp. ARC-1120 8-Port PCI-X to SATA RAID Controller
        Subsystem: Areca Technology Corp. ARC-1120 8-Port PCI-X to SATA RAID Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping+ SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
        Latency: 64 (32000ns min), Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 20
        NUMA node: 0
        Region 0: Memory at f7ff0000 (32-bit, non-prefetchable) [size=4K]
        Region 2: Memory at f7800000 (32-bit, prefetchable) [size=4M]
        [virtual] Expansion ROM at f7f00000 [disabled] [size=64K]
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/2 Maskable- 64bit+
                Address: 00000000fee0f00c  Data: 41a1
        Capabilities: [e0] PCI-X non-bridge device
                Command: DPERE+ ERO- RBC=1024 OST=8
                Status: Dev=07:0e.0 64bit+ 133MHz+ SCD- USC- DC=bridge DMMRBC=1024 DMOST=4 DMCRS=32 RSCEM- 266MHz- 533MHz-
        Kernel driver in use: arcmsr
        Kernel modules: arcmsr
------------------------------------------
Comment 6 kr 2019-07-02 22:12:33 UTC
This bug was caused by the arcmsr driver attempting to use MSIs on non-MSI systems.

This behavior may have been fixed (see https://patchwork.kernel.org/patch/10073751/), but regardless, in recent kernels MSIs can be manually disabled with  the arcmsr module parameter "msi_enable=0".

Note You need to log in before you can comment on or make changes to this bug.