Bug 215907

Summary: 5.12 Regression -- Null pointer exception on resume from S3
Product: Drivers Reporter: Samuel Clark (slc2015)
Component: I2CAssignee: Drivers/I2C virtual user (drivers-i2c)
Status: NEW ---    
Severity: normal CC: jarkko.nikula, pmenzel+bugzilla.kernel.org, regressions, slc2015
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.12 and later Subsystem:
Regression: Yes Bisected commit-id:
Attachments: ACPI dump
attachment-12555-0.html
Fix for handling unexpected real interrupt
attachment-27488-0.html

Description Samuel Clark 2022-04-27 20:55:03 UTC
Running Manjaro on a Gigabyte B660M DS3H DDR4 with custom 5.17 kernel. Confirmed on other distributions and kernels back to 5.12; issue is not present on most recent 5.11 kernel. dmesg traceback points to i2c DesignWare driver, specifically drivers/i2c/busses/i2c-designware-master.c:369. It seems the msgs struct passed in to i2c_dw_xfer_msg is null.

Similar issue seems to be reported here:  https://lore.kernel.org/lkml/YY5BRrE8bLyvd3PB@smile.fi.intel.com/t/  

lspci output: https://pastebin.com/MwFM2VBJ
dmesg from crashed kernel: https://pastebin.com/t6GsHjkq
kernel config: https://pastebin.com/awrSve5u
Comment 1 Samuel Clark 2022-04-27 20:57:31 UTC
CPU info

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               GenuineIntel
  Model name:            12th Gen Intel(R) Core(TM) i5-12400
    CPU family:          6
    Model:               151
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            5
    CPU(s) scaling MHz:  34%
    CPU max MHz:         5600.0000
    CPU min MHz:         800.0000
    BogoMIPS:            4993.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t
                         m pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpui
                         d aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse
                         4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault 
                         invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
                          bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves s
                         plit_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req umip pku ospke
                          waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr flush_l1d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   288 KiB (6 instances)
  L1i:                   192 KiB (6 instances)
  L2:                    7.5 MiB (6 instances)
  L3:                    18 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-11
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Not affected
Comment 2 Samuel Clark 2022-04-27 21:21:36 UTC
Testing shows that the issue happens when /sys/power/pm_test is set to "platform" or lower.
Comment 3 Jarkko Nikula 2022-05-06 08:31:40 UTC
Hi

So reason why it doesn't occur on v5.11 and earlier is that the I2C DesignWare support for Alder Lake -S came to v5.12 by the commit c7b79a752871 ("mfd: intel-lpss: Add Intel Alder Lake PCH-S PCI IDs").

Can you attach here the dump of ACPI tables? Tool below is typically available in acpica-tools, acpi-tools or similar package. Please run it as root.

acpidump -o acpi.dump
Comment 4 Samuel Clark 2022-05-06 13:35:00 UTC
Created attachment 300896 [details]
ACPI dump

Here is the acpi dump for this machine
Comment 5 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-06-20 08:59:33 UTC
(In reply to Jarkko Nikula from comment #3)

> Can you attach here the dump of ACPI tables? 

Did you have a chance to look into them? Samuel provided them a some time ago.
Comment 6 Samuel Clark 2022-06-21 15:17:17 UTC
Created attachment 301247 [details]
attachment-12555-0.html

Thanks. A recent UEFI update for the board completely resolved this issue.
Prior to that, disabling IOAPIC 24-119 options in the BIOS worked as a
temporary fix.

On Mon, Jun 20, 2022 at 3:59 AM <bugzilla-daemon@kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=215907
>
> The Linux kernel's regression tracker (Thorsten Leemhuis) (
> regressions@leemhuis.info) changed:
>
>            What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>                  CC|                            |regressions@leemhuis.info
>
> --- Comment #5 from The Linux kernel's regression tracker (Thorsten
> Leemhuis) (regressions@leemhuis.info) ---
> (In reply to Jarkko Nikula from comment #3)
>
> > Can you attach here the dump of ACPI tables?
>
> Did you have a chance to look into them? Samuel provided them a some time
> ago.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are on the CC list for the bug.
> You reported the bug.
Comment 7 Jarkko Nikula 2022-06-22 11:20:04 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #5)
> (In reply to Jarkko Nikula from comment #3)
> 
> > Can you attach here the dump of ACPI tables? 
> 
> Did you have a chance to look into them? Samuel provided them a some time
> ago.

Ah, sorry, forgot to reply. I didn't find anything obvious from the dumps and was sidetracked to another tasks. Glad to hear UEFI update is fixing the issue. Unfortunately doesn't help those who are not able to or aware to upgrade so some workaround is good to have.

Fortunately I got recently for a loan a machine with Gigabyte motherboard and it's showing the issue so I'll have a change to debug it next week.
Comment 8 Jarkko Nikula 2022-09-22 13:12:55 UTC
Created attachment 301846 [details]
Fix for handling unexpected real interrupt
Comment 9 Jarkko Nikula 2022-09-22 13:18:01 UTC
Sorry the long delay but I finally figured out a fix for this issue.

slc2015@gmail.com: I believe you are not able to verify the fix after the UEFI update but is it ok since I added your "Reported-by tag to the patch? If not I will remove it before sending upstream kernel.
Comment 10 Samuel Clark 2022-09-25 01:13:45 UTC
Created attachment 301869 [details]
attachment-27488-0.html

I'm not able to test but glad there's a fix. You can include the tag.
On Sep 22, 2022, 8:18 AM -0500, bugzilla-daemon@kernel.org, wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215907
>
> --- Comment #9 from Jarkko Nikula (jarkko.nikula@linux.intel.com) ---
> Sorry the long delay but I finally figured out a fix for this issue.
>
> slc2015@gmail.com: I believe you are not able to verify the fix after the
> UEFI
> update but is it ok since I added your "Reported-by tag to the patch? If not
> I
> will remove it before sending upstream kernel.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are on the CC list for the bug.
> You reported the bug.
Comment 11 Jarkko Nikula 2022-10-12 06:31:33 UTC
This is now merged into git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git as commit 301c8f5c32c8 ("i2c: designware: Fix handling of real but unexpected device interrupts").
Comment 12 Paul Menzel 2022-10-12 13:51:27 UTC
Awesome work. Thank you all.

@Samual, just for the record, can you please comment, what firmware version you used, when it was not working, and what version fixed it?

PS: Also, when replying via email, please remove the quote/citation, as the Bugzilla Web interface does not hide it.