Bug 202133 - genirq: ‘HP FlexFabric 554FLB Adapter’ not working with kernel 4.14.88
Summary: genirq: ‘HP FlexFabric 554FLB Adapter’ not working with kernel 4.14.88
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: HotPlug (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-04 11:56 UTC by liang gao
Modified: 2019-01-15 12:24 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.14
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description liang gao 2019-01-04 11:56:01 UTC
After upgrading kernel 4.14.40 to 4.14.88,I fonud that 'HP FlexFabric 10Gb 2-port 554FLB Adapter' is not in use. there are erros in dmesg log.

The error info:
	[ 1046.980480] lpfc 0000:04:00.3: 1:1303 Link Up Event x1 received Data: x1 x0 x4 x0 x0 x0 0
	[ 1046.980482] lpfc 0000:04:00.3: 1:(0):2753 PLOGI failure DID:020009 Status:x3/x103
	[ 1050.435167] lpfc 0000:04:00.2: 0:(0):2753 PLOGI failure DID:010012 Status:x3/x103
	[ 1065.713327] lpfc 0000:04:00.3: 1:(0):2753 PLOGI failure DID:040002 Status:x3/x103
	[ 1072.331933] lpfc 0000:04:00.2: 0:(0):2753 PLOGI failure DID:030003 Status:x3/x103
	[ 1137.628132] lpfc 0000:04:00.2: 0:(0):0748 abort handler timed out waiting for aborting I/O (xri:x64) to complete: ret 0x2003, ID 2, LUN 0
	[ 1137.644257] lpfc 0000:04:00.2: 0:(0):0713 SCSI layer issued Device Reset (2, 0) return x2002
	[ 1139.676124] lpfc 0000:04:00.3: 1:(0):0748 abort handler timed out waiting for aborting I/O (xri:x464) to complete: ret 0x2003, ID 4, LUN 0
	[ 1139.692242] lpfc 0000:04:00.3: 1:(0):0713 SCSI layer issued Device Reset (4, 0) return x2002
	[ 1197.664150] lpfc 0000:04:00.2: 0:(0):0724 I/O flush failure for context LUN : cnt x1
	[ 1197.664344] lpfc 0000:04:00.2: 0:(0):0723 SCSI layer issued Target Reset (2, 0) return x2002
	[ 1199.704116] lpfc 0000:04:00.3: 1:(0):0724 I/O flush failure for context LUN : cnt x1
	[ 1199.704368] lpfc 0000:04:00.3: 1:(0):0723 SCSI layer issued Target Reset (4, 0) return x2002

At the beginning, I thought the lpfc driver itself is the cause of the error.But,the error is still seen when 'lpfc driver' updates to the latest version.
To find the root cause and fix it, we roll back between 4.14.41 to 4.14.88, build and test the kernel for booting. 
The commit that caused error after bisect is ef86f3a72adb8a7931f67335560740a7ad696d1d,in this commit regression happens immediately after boot.

Commit info:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.14.y&id=ef86f3a72adb8a7931f67335560740a7ad696d1d

During the test, I also found another problem that the system of HP FlexServer B390 server failed to boot by "hpsa driver timeout",basically, looks like the hpsa didn't detect the hard drives and udevd is stalled.
After upgrading from v4.14.54 to 4.14.55, hp system doesn't boot ,but the system is ok with using the v4.14.55 kernel of rollback the commit.
The commit of ef86f3a72adb8a7931f67335560740a7ad696d1d also affects the HP Smart Array P220i RAID device.Because the v4.14.88 kernel is ok, I think that subsequent commits may have fixed the hpsa driver problem, but lpfc driver is not.

If there is any more info I can provide, just ask what would be useful.I need some help, do you have suggestions?
Comment 1 Greg Kroah-Hartman 2019-01-04 12:25:27 UTC
On Fri, Jan 04, 2019 at 11:56:01AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=202133
> 
>             Bug ID: 202133
>            Summary: genirq: ‘HP FlexFabric 554FLB Adapter’  not working
>                     with kernel 4.14.88
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 4.14

All USB bugs should be sent to the linux-usb@vger.kernel.org mailing
list, and not entered into bugzilla.  Please bring this issue up there,
if it is still a problem in the latest kernel release.
Comment 2 liang gao 2019-01-05 02:26:51 UTC
Hi,
   It is not a USB bug,but a bug of about CPU hotplug supporting.
Based on my test results, the error of 'HP FlexFabric 10Gb 2-port 554FLB Adapter' should be related to the ef86f3a72adb8a7931f67335560740a7ad696d1d commit(genirq/affinity: assign vectors to all possible CPUs).
Comment 3 Greg Kroah-Hartman 2019-01-05 08:13:38 UTC
On Sat, Jan 05, 2019 at 02:26:51AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=202133
> 
> --- Comment #2 from liang gao (liang_gao@h3c.com) ---
> Hi,
>    It is not a USB bug,but a bug of about CPU hotplug supporting.
> Based on my test results, the error of 'HP FlexFabric 10Gb 2-port 554FLB
> Adapter' should be related to the ef86f3a72adb8a7931f67335560740a7ad696d1d
> commit(genirq/affinity: assign vectors to all possible CPUs).

Please discuss stable kernel issues on the stable@vger.kernel.org
mailing list.
Comment 4 liang gao 2019-01-15 12:24:44 UTC
After upgrading kernel from 4.14.40 to 4.14.88,I found that 'HP FlexFabric 10Gb 2-port 554FLB Adapter' device is not in use. There are erros in dmesg log.
The Server is  ‘HP FlexServer B390’.

Device info:
lspci -n | grep 04:00
04:00.2 0c04: 19a2:0714 (rev 01)
…
04:00.2 Fibre Channel: Emulex Corporation OneConnect 10Gb FCoE Initiator (be3) (rev 01)
        Subsystem: Hewlett-Packard Company NC554FLB 10Gb 2-port FlexFabric Converged Network Adapter
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin C routed to IRQ 95
        …
        Kernel driver in use: lpfc

The error info:
         [ 1046.980480] lpfc 0000:04:00.3: 1:1303 Link Up Event x1 received Data: x1 x0 x4 x0 x0 x0 0
         [ 1046.980482] lpfc 0000:04:00.3: 1:(0):2753 PLOGI failure DID:020009 Status:x3/x103
         [ 1050.435167] lpfc 0000:04:00.2: 0:(0):2753 PLOGI failure DID:010012 Status:x3/x103
         [ 1065.713327] lpfc 0000:04:00.3: 1:(0):2753 PLOGI failure DID:040002 Status:x3/x103
         [ 1072.331933] lpfc 0000:04:00.2: 0:(0):2753 PLOGI failure DID:030003 Status:x3/x103
         [ 1137.628132] lpfc 0000:04:00.2: 0:(0):0748 abort handler timed out waiting for aborting I/O (xri:x64) to complete: ret 0x2003, ID 2, LUN 0
         [ 1137.644257] lpfc 0000:04:00.2: 0:(0):0713 SCSI layer issued Device Reset (2, 0) return x2002
         [ 1139.676124] lpfc 0000:04:00.3: 1:(0):0748 abort handler timed out waiting for aborting I/O (xri:x464) to complete: ret 0x2003, ID 4, LUN 0
         [ 1139.692242] lpfc 0000:04:00.3: 1:(0):0713 SCSI layer issued Device Reset (4, 0) return x2002
         [ 1197.664150] lpfc 0000:04:00.2: 0:(0):0724 I/O flush failure for context LUN : cnt x1
         [ 1197.664344] lpfc 0000:04:00.2: 0:(0):0723 SCSI layer issued Target Reset (2, 0) return x2002
         [ 1199.704116] lpfc 0000:04:00.3: 1:(0):0724 I/O flush failure for context LUN : cnt x1
         [ 1199.704368] lpfc 0000:04:00.3: 1:(0):0723 SCSI layer issued Target Reset (4, 0) return x2002

At the beginning, I thought the lpfc driver itself is the cause of the error.But,the error is still seen when 'lpfc driver' updates to the latest version.
To find the root cause and fix it, we checked the kernel version from 4.14.41 to 4.14.88, built and tested the kernel for booting. 
I fount that the commit resulted in this error after bisect is ef86f3a72adb8a7931f67335560740a7ad696d1d,when I removed the commit the issue went away.

Removed commit info:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.14.y&id=ef86f3a72adb8a7931f67335560740a7ad696d1d

During the test, I also found another issue that the system of ‘HP FlexServer B390’ server failed to boot by "hpsa driver timeout",basically, looks like the hpsa didn't detect the hard drives and udevd is stalled.
After upgrading from v4.14.54 to 4.14.55, hp system didn't boot ,but the system is ok when using the v4.14.55 kernel that has removed the commit.
The commit of ef86f3a72adb8a7931f67335560740a7ad696d1d also resulted in the error of HP Smart Array P220i RAID device. Because the v4.14.88 kernel is ok, I think that subsequent commits may have fixed the hpsa driver issue, but lpfc driver issue is not.

If there is any more info I can provide, just ask what would be useful. Any suggestions?

Thanks
Liang

Note You need to log in before you can comment on or make changes to this bug.