Bug 208507 - BISECTED: i2c timeout loading module ddbridge with commit d2345d1231d80ecbea5fb764eb43123440861462
Summary: BISECTED: i2c timeout loading module ddbridge with commit d2345d1231d80ecbea5...
Status: REOPENED
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-09 15:01 UTC by Berni
Modified: 2023-05-05 19:04 UTC (History)
4 users (show)

See Also:
Kernel Version: 6.1.4, 5.8.y, 5.7.y, 5.4.y, 4.19.y
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg on 4.19.132 (79.83 KB, text/plain)
2020-07-09 15:01 UTC, Berni
Details
dmesg on 4.19.132 with reverted commit (74.69 KB, text/plain)
2020-07-09 15:03 UTC, Berni
Details
git bisect log on debian buster kernel repo (1.43 KB, text/plain)
2020-07-09 15:04 UTC, Berni
Details
git bisect log on linux 4.19.y repo (1.56 KB, text/plain)
2020-07-09 15:05 UTC, Berni
Details
lscpu output (1.47 KB, text/plain)
2020-07-09 15:05 UTC, Berni
Details
lspci -vvv output (86.12 KB, text/plain)
2020-07-09 15:06 UTC, Berni
Details
dmesg on 5.7.8 (80.58 KB, text/plain)
2020-07-10 07:38 UTC, Berni
Details
dmesg on 5.7.8 with reverted commit (75.51 KB, text/plain)
2020-07-10 07:40 UTC, Berni
Details
dmesg on 5.8.0-rc4 (80.90 KB, text/plain)
2020-07-10 08:32 UTC, Berni
Details
dmesg on 5.8.0-rc4 with reverted commit (75.97 KB, text/plain)
2020-07-10 08:32 UTC, Berni
Details
lspci -vvvnn v5.7.10 plain (88.44 KB, text/plain)
2020-07-28 14:01 UTC, Berni
Details
lspci -vvvnn v5.7.10 with revert (88.44 KB, text/plain)
2020-07-28 14:01 UTC, Berni
Details
diff between lspci -vvvnn v5.7.10 plain and with revert (2.21 KB, patch)
2020-07-28 14:03 UTC, Berni
Details | Diff
dmesg with kernel 6.1.4 (88.27 KB, text/plain)
2023-01-25 15:34 UTC, Berni
Details
dmesg with kernel 6.1.4 with options ddbridge msi=1 (83.23 KB, text/plain)
2023-01-25 15:36 UTC, Berni
Details
lspci -vvvnn with kernel 6.1.4 (98.19 KB, text/plain)
2023-01-25 15:38 UTC, Berni
Details
lspci -vvvnn with kernel 6.1.4 with options ddbridge msi=1 (98.19 KB, text/plain)
2023-01-25 15:38 UTC, Berni
Details

Description Berni 2020-07-09 15:01:11 UTC
Created attachment 290179 [details]
dmesg on 4.19.132

OS: Debian 10.4 Buster
CPU: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz
Hardware: Supermicro  Super Server
Mainboard: Supermicro X10SDV
DVB card: Digital Devices Cine S2 V7 Advanced DVB adapter

Issue:
=====
Loading kernel module ddbridge fails with i2c timeouts, see attached dmesg. The dvb media adapter is unusable.
This happened after Linux kernel upgrade from 4.19.98-1+deb10u1 to 4.19.118-2+deb10u1.

A git bisect based on the Debian kernel repo on branch buster identified as first bad commit: [1fb0eb795661ab9e697c3a053b35aa4dc3b81165] Update to 4.19.116.

Another git bisect based on upstream Linux kernel repo on branch v4.19.y identified as first bad commit: [d2345d1231d80ecbea5fb764eb43123440861462] PCI: Add boot interrupt quirk mechanism for Xeon chipsets.

Other affected Debian kernel version: 5.6.14+2~bpo10+1
I tested this version via buster-backports, because so far I was unable to build my own kernel from 5.6.y or even 5.7.y.

Workaround:
==========
Reverting the mentioned commit d2345d1231d80ecbea5fb764eb43123440861462 on top of 4.19.132 is fixing the problem. Reverting the same commit on 4.19.118 or 4.19.116 is also fixing the problem.

It seems, I can only add one attachment now, so I will add more attachments later after bug is submitted.

Thanks and Regards
Berni
Comment 1 Berni 2020-07-09 15:03:48 UTC
Created attachment 290181 [details]
dmesg on 4.19.132 with reverted commit
Comment 2 Berni 2020-07-09 15:04:43 UTC
Created attachment 290183 [details]
git bisect log on debian buster kernel repo
Comment 3 Berni 2020-07-09 15:05:22 UTC
Created attachment 290185 [details]
git bisect log on linux 4.19.y repo
Comment 4 Berni 2020-07-09 15:05:56 UTC
Created attachment 290187 [details]
lscpu output
Comment 5 Berni 2020-07-09 15:06:25 UTC
Created attachment 290189 [details]
lspci -vvv output
Comment 6 Berni 2020-07-09 16:39:17 UTC
Debian Bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964734
Comment 7 Berni 2020-07-10 07:38:58 UTC
Created attachment 290193 [details]
dmesg on 5.7.8
Comment 8 Berni 2020-07-10 07:40:08 UTC
Created attachment 290195 [details]
dmesg on 5.7.8 with reverted commit
Comment 9 Berni 2020-07-10 07:48:25 UTC
I am now able to build also newer kernel versions.

v5.7.8 test results:
BAD: 5.7.8
GOOD: 5.7.8 with reverted commit b88bf6c3b6ff ("PCI: Add boot interrupt quirk mechanism for Xeon chipsets")
Comment 10 Berni 2020-07-10 08:32:13 UTC
Created attachment 290199 [details]
dmesg on 5.8.0-rc4
Comment 11 Berni 2020-07-10 08:32:57 UTC
Created attachment 290201 [details]
dmesg on 5.8.0-rc4 with reverted commit
Comment 12 Berni 2020-07-10 08:34:52 UTC
Tests with master (5.8.0-rc4) show the same results:

BAD: 5.8.0-rc4
GOOD: 5.8.0-rc4 with reverted commit b88bf6c3b6ff ("PCI: Add boot interrupt quirk mechanism for Xeon chipsets")
Comment 13 Sean V Kelley 2020-07-27 23:06:09 UTC
Interesting, the Platform is a D-1500 based Xeon which makes it a broadwell based Xeon. 

I don't see it in the logs, but I'm assuming the device ID is 0x6f28

Those Xeon have the capability disable the route to the ICH:

Xeon D1500 Data sheet, Volume 2 (Registers), 
#5.6.41 cipintrc Coherent Interface Protocol Interrupt Control.
Type: CFG PortID: N/A
Bus: 0 Device: 5 Function: 0 Offset: 0x14c

25:25 RW 0x0 dis_intx_route2ich:

Writing to the above will disable the legacy intx forwarding to the ICH.

In looking at the lspci output (assuming this is without the revert):

05:00.0 Multimedia controller: Digital Devices GmbH Cine V7
	Subsystem: Digital Devices GmbH Cine V7
	Physical Slot: 2
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 16

I'd be curious about your output of lspci -vvvnn with the revert.

I'm just wondering that with this particular vendor card, their is some sort of expectation that the IRQ will always be routed to the PCH...

Thanks,

Sean
Comment 14 Berni 2020-07-28 14:01:23 UTC
Created attachment 290649 [details]
lspci -vvvnn v5.7.10 plain
Comment 15 Berni 2020-07-28 14:01:47 UTC
Created attachment 290651 [details]
lspci -vvvnn v5.7.10 with revert
Comment 16 Berni 2020-07-28 14:03:31 UTC
Created attachment 290653 [details]
diff between lspci -vvvnn v5.7.10 plain and with revert
Comment 17 Berni 2020-07-28 14:22:08 UTC
Hello Sean,
yes, you are right, the Platform is a D-1500 Xeon Broadwell based and the device ID is indeed 0x6f28.

I have added three more files:
a) lspci -vvvnn on v5.7.10 plain
b) lspci -vvvnn on v5.7.10 with revert
c) diff between the two

Many Thanks
Berni
Comment 18 Sean V Kelley 2020-08-05 22:37:17 UTC
Thanks Berni.  The only difference that I can see is that when you revert the patch the DD Cine v7 card no longer shows INTx status as pending. When the patch is not reverted, the card is forced to follow INTx emulation. 

This driver is not handling INTx emulation and on top of that the MSI support is marked "experimental":

Config DVB_DDBRIDGE_MSIENABLE
        bool "Enable Message Signaled Interrupts (MSI) per default (EXPERIMENTAL)"

I'm willing to wager that Debian's kernel configuration is *not* enabling MSI.

Further, if you were to enable DVB_DDBRIDGE_MSIENABLE, I do not think you would see this issue requiring the revert.

The alternative is to plumb support for proper INTx emulation handling.

Could you enable DVB_DDBRIDGE_MSIENABLE in your system and see if you get the timeouts without the revert?

Thanks,

Sean
Comment 19 Berni 2020-09-10 15:43:50 UTC
Hello Sean,

sorry for not answering until today.

Indeed, MSI is not enabled by default, so I did enable it within /etc/modprobe.d/

 options ddbridge msi=1

With that setting, I can confirm the card is working fine with the current standard kernel version 4.19.132-1 in Debian 10 Buster as well as following versions taken from Buster Backports:

5.7.10-1~bpo10+1
5.6.14-2~bpo10+1
5.5.17-1~bpo10+1
5.4.19-1~bpo10+1

Many Thanks for the help and Best Regards
Berni
Comment 20 Berni 2020-09-10 15:46:16 UTC
Mark as Resolved - Invalid.
Comment 21 Bjorn Helgaas 2020-09-10 16:08:50 UTC
Having to specify a module option doesn't *seem* like a real resolution.  It's certainly a workaround, but it seems like other Debian users will likely trip over this, and it's a lot of hassle to find the workaround.

Sean, you mention "plumbing support for INTx emulation" above.  Is that in the ddbridge driver?  Elsewhere?  Is that really the only way to fix this without requiring a module parameter or kernel config change?
Comment 22 Berni 2020-09-11 06:57:31 UTC
Agreed, I was a bit too fast. Reopen it.
Comment 23 Sean V Kelley 2020-10-07 17:28:30 UTC
I've more time to look at this now with some PCI/RCEC patches close to merging hopefully!

Bjorn,

I believe it is in the driver and the way they have implemented it.  What I asked to test was just to confirm it was a driver issue and not per-se the quirk itself.  I will take a closer look now.

Sean
Comment 24 Berni 2021-05-03 16:48:32 UTC
Hello Sean,
I saw some patches regarding PCI merged in Dec 2020. So I just made another test with 5.12.0-11146-g8ca5297e7e3 which still has the same issue.

Let me know if you have any other news or a patch to try out.

Thanks
Berni
Comment 25 Salvatore Bonaccorso 2021-10-17 12:55:25 UTC
Hi Sean,

Did you had a chance to look into this?

Regards,
Salvatore
Comment 26 Bjorn Helgaas 2023-01-17 23:05:37 UTC
Is this still a problem?  If so, how can we make progress on it?  I don't think Sean is available any more, but I guess a dmesg log from a current kernel, e.g., v6.1, could be a start if somebody else can work on it.
Comment 27 Berni 2023-01-24 17:03:05 UTC
I will try to test with a recent kernel asap and report back the result including dmesg log.
Comment 28 Berni 2023-01-25 15:34:09 UTC
Created attachment 303645 [details]
dmesg with kernel 6.1.4
Comment 29 Berni 2023-01-25 15:36:43 UTC
Created attachment 303646 [details]
dmesg with kernel 6.1.4 with options ddbridge msi=1
Comment 30 Berni 2023-01-25 15:38:03 UTC
Created attachment 303647 [details]
lspci -vvvnn with kernel 6.1.4
Comment 31 Berni 2023-01-25 15:38:35 UTC
Created attachment 303648 [details]
lspci -vvvnn with kernel 6.1.4 with options ddbridge msi=1
Comment 32 Berni 2023-01-25 15:42:48 UTC
This is still a problem with kernel 6.1.4, see attachments for dmesg and lspci output.
Comment 33 Bjorn Helgaas 2023-05-05 19:04:36 UTC
I'm not sure we can find anybody with the right combination of knowledge about Xeon INTx handling, ddbridge, i2c, etc, to work on this.  

But I would suggest replying to this email thread:
https://lore.kernel.org/all/20200709191722.GA6054@bjorn-Precision-5520/
because I think the people who *might* be able to help only pay attention to email, not to bugzilla.

Note You need to log in before you can comment on or make changes to this bug.