Bug 202453 - TRACE irq/18-i801_smb Tainted when enabled threadirqs in kernel commandline.
Summary: TRACE irq/18-i801_smb Tainted when enabled threadirqs in kernel commandline.
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: I2C (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Drivers/I2C virtual user
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-30 00:52 UTC by Carlos Jimenez
Modified: 2021-01-04 20:58 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.0.0-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg,lsmod,lspci. (68.15 KB, text/plain)
2019-01-30 00:52 UTC, Carlos Jimenez
Details
dmesg-kernel-5.0.1 (61.79 KB, text/plain)
2019-03-11 22:30 UTC, Carlos Jimenez
Details
signature.asc (833 bytes, application/pgp-signature)
2021-01-04 20:58 UTC, Wolfram Sang
Details

Description Carlos Jimenez 2019-01-30 00:52:03 UTC
Created attachment 280861 [details]
dmesg,lsmod,lspci.

System:    Host: igloo-l440 Kernel: 5.0.0-rc4 x86_64 bits: 64 Desktop: MATE 1.20.3
           Distro: Gentoo Base System release 2.6
Machine:   Device: laptop System: LENOVO product: 20ASS06C00 v: ThinkPad L440 serial: N/A
           Mobo: LENOVO model: 20ASS06C00 v: 0B98401 Pro serial: N/A
           UEFI: LENOVO v: J4ET93WW(1.93) date: 05/30/2018
Battery    BAT0: charge: 42.2 Wh 98.1% condition: 43.0/47.5 Wh (91%)
CPU:       Dual core Intel Core i5-4200M (-MT-MCP-) cache: 3072 KB
           clock speeds: max: 3100 MHz 1: 3046 MHz 2: 3035 MHz 3: 3042 MHz 4: 3024 MHz
Graphics:  Card: Intel 4th Gen Core Processor Integrated Graphics Controller
           Display Server: X.Org 1.20.3 drivers: intel (unloaded: modesetting) Resolution: 1366x768@60.00hz
           OpenGL: renderer: Mesa DRI Intel Haswell Mobile version: 4.5 Mesa 18.3.2
Audio:     Card-1 Intel 8 Series/C220 Series High Definition Audio Controller driver: snd_hda_intel
           Card-2 Intel Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller driver: snd_hda_intel
           Sound: Advanced Linux Sound Architecture v: k5.0.0-rc4
Network:   Card: Realtek RTL8192EE PCIe Wireless Network Adapter driver: rtl8192ee
           IF: wlp2s0 state: up mac: 00:71:cc:0b:b1:af
Drives:    HDD Total Size: 320.1GB (25.7% used)
           ID-1: /dev/sda model: Hitachi_HTS54503 size: 320.1GB
Partition: ID-1: / size: 99G used: 67G (69%) fs: btrfs dev: /dev/sda4
           ID-2: swap-1 size: 5.25GB used: 0.00GB (0%) fs: swap dev: /dev/sda5
           ID-3: swap-2 size: 1.61GB used: 0.00GB (0%) fs: swap dev: /dev/zram0
           ID-4: swap-3 size: 1.61GB used: 0.00GB (0%) fs: swap dev: /dev/zram1
           ID-5: swap-4 size: 1.61GB used: 0.00GB (0%) fs: swap dev: /dev/zram2
           ID-6: swap-5 size: 1.61GB used: 0.00GB (0%) fs: swap dev: /dev/zram3
Sensors:   System Temperatures: cpu: 44.0C mobo: 37.0C
           Fan Speeds (in rpm): cpu: 0
Info:      Processes: 208 Uptime: 1 min Memory: 396.3/3832.7MB Client: Shell (bash) inxi: 2.3.56 

[   18.261222] ------------[ cut here ]------------
[   18.261229] irq 31 handler irq_default_primary_handler+0x0/0x10 enabled interrupts
[   18.261235] WARNING: CPU: 2 PID: 989 at kernel/irq/handle.c:153 __handle_irq_event_percpu+0x19f/0x1b0
[   18.261235] Modules linked in: crct10dif_pclmul rmi_smbus rmi_core crc32_pclmul ghash_clmulni_intel aesni_intel videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common aes_x86_64 crypto_simd arc4 rtl8192ee cryptd glue_helper intel_cstate btusb btrtl btbcm btintel bluetooth ecdh_generic thinkpad_acpi vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) msr btcoexist rtl_pci rtlwifi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi videodev media snd_hda_intel nvram iTCO_wdt snd_hda_codec mac80211 iTCO_vendor_support snd_hda_core ofpart intel_uncore cmdlinepart intel_spi_platform intel_spi spi_nor mtd pktcdvd cfg80211 psmouse snd_hwdep ledtrig_audio pcspkr intel_rapl_perf wmi_bmof input_leds snd_pcm rfkill snd_timer snd evdev battery mei_me mei ac i2c_i801 pcc_cpufreq ie31200_edac soundcore lpc_ich mac_hid serio_raw atkbd libps2 xhci_pci xhci_hcd ehci_pci ehci_hcd wmi i8042 serio
[   18.261262] CPU: 2 PID: 989 Comm: irq/18-i801_smb Tainted: G           OE     5.0.0-rc4 #1
[   18.261263] Hardware name: LENOVO 20ASS06C00/20ASS06C00, BIOS J4ET93WW(1.93) 05/30/2018
[   18.261273] RIP: 0010:__handle_irq_event_percpu+0x19f/0x1b0
[   18.261274] Code: 05 ff ff ff e9 bc 02 00 00 45 31 ed e9 04 ff ff ff 48 8b 13 44 89 e6 48 c7 c7 18 c1 84 9c c6 05 7c d3 45 01 01 e8 5b 8f fa ff <0f> 0b eb c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
[   18.261275] RSP: 0018:ffffba9e402a3dc0 EFLAGS: 00010282
[   18.261276] RAX: 0000000000000000 RBX: ffff9fec0e847300 RCX: ffffffff9ca3b408
[   18.261277] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffff9d122a0c
[   18.261278] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000319
[   18.261278] R10: 0000000000000001 R11: 0000000000000000 R12: 000000000000001f
[   18.261279] R13: 0000000000000000 R14: ffffba9e402a3e04 R15: ffff9fec1728ee00
[   18.261279] FS:  0000000000000000(0000) GS:ffff9fec1b080000(0000) knlGS:0000000000000000
[   18.261280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.261281] CR2: 000055749ceef980 CR3: 000000004e20e005 CR4: 00000000001606e0
[   18.261281] Call Trace:
[   18.261284]  ? irq_finalize_oneshot.part.1+0xf0/0xf0
[   18.261286]  handle_irq_event_percpu+0x30/0x80
[   18.261288]  handle_irq_event+0x37/0x54
[   18.261289]  handle_simple_irq+0x65/0x90
[   18.261291]  generic_handle_irq+0x27/0x30
[   18.261293]  i2c_handle_smbus_host_notify+0x24/0x40
[   18.261296]  i801_isr+0x183/0x2e0 [i2c_i801]
[   18.261297]  irq_forced_thread_fn+0x2a/0x70
[   18.261298]  irq_thread+0xe7/0x160
[   18.261299]  ? wake_threads_waitq+0x30/0x30
[   18.261300]  ? irq_thread_dtor+0x80/0x80
[   18.261303]  kthread+0x112/0x130
[   18.261304]  ? kthread_park+0x80/0x80
[   18.261307]  ret_from_fork+0x1f/0x40
[   18.261309] ---[ end trace 984341c6ba3a1c95 ]---
[
Comment 1 Carlos Jimenez 2019-01-30 00:55:28 UTC
I notice this happening a way back from 4.20.X
I dont know if previews versions have this bug when  "threadirqs" is enabled.
Thanks in Advance.
Comment 2 Carlos Jimenez 2019-03-11 22:29:43 UTC
UPDATE: Problem Persist linux kernel 5.0.1


[   25.521716] ------------[ cut here ]------------
[   25.521722] irq 31 handler irq_default_primary_handler+0x0/0x10 enabled interrupts
[   25.521727] WARNING: CPU: 0 PID: 1345 at kernel/irq/handle.c:153 __handle_irq_event_percpu+0x1bd/0x1d0
[   25.521727] Modules linked in: rmi_smbus rmi_core mac80211 crypto_simd cryptd thinkpad_acpi glue_helper cfg80211 nvram intel_cstate psmouse input_leds bnep rtsx_pci_ms memstick ofpart cmdlinepart rtsx_pci_sdmmc mmc_core intel_spi_platform intel_spi spi_nor mtd iTCO_wdt joydev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media iTCO_vendor_support nls_iso8859_1 btusb btrtl btbcm btintel nls_cp437 snd_hda_codec_realtek bluetooth vfat fat ecdh_generic snd_hda_codec_generic snd_hda_codec_hdmi pktcdvd mei_me rfkill intel_uncore wmi_bmof intel_rapl_perf ledtrig_audio pcspkr rtsx_pci snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd mei battery pcc_cpufreq i2c_i801 ie31200_edac ac soundcore evdev lpc_ich vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) uas usb_storage hid_dr ff_memless hid_generic usbhid hid serio_raw xhci_pci atkbd xhci_hcd libps2 i8042 wmi serio
[   25.521756] CPU: 0 PID: 1345 Comm: irq/18-i801_smb Tainted: G           OE     5.0.1 #1
[   25.521757] Hardware name: LENOVO 20ASS06C00/20ASS06C00, BIOS J4ET93WW(1.93) 05/30/2018
[   25.521759] RIP: 0010:__handle_irq_event_percpu+0x1bd/0x1d0
[   25.521760] Code: 82 0c ff ff ff e9 19 03 00 00 45 31 e4 e9 0b ff ff ff 49 8b 17 89 ee 48 c7 c7 a0 c1 64 bc c6 05 20 ae 64 01 01 e8 bd 8a fa ff <0f> 0b eb c4 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44
[   25.521761] RSP: 0018:ffffbcaf403efdb0 EFLAGS: 00010282
[   25.521762] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000006
[   25.521763] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff9908db015610
[   25.521763] RBP: 000000000000001f R08: 0000000000000334 R09: 0000000000000001
[   25.521764] R10: 0000000000000001 R11: 0000000000000004 R12: 0000000000000000
[   25.521764] R13: ffffbcaf403efdf4 R14: ffff9908d61fe200 R15: ffff9908d61f2f80
[   25.521765] FS:  0000000000000000(0000) GS:ffff9908db000000(0000) knlGS:0000000000000000
[   25.521766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   25.521767] CR2: 00007f9a45d653c9 CR3: 000000004940e004 CR4: 00000000001606f0
[   25.521767] Call Trace:
[   25.521772]  ? irq_finalize_oneshot.part.1+0xf0/0xf0
[   25.521774]  handle_irq_event+0x58/0xb0
[   25.521775]  handle_simple_irq+0x65/0xb0
[   25.521777]  generic_handle_irq+0x27/0x30
[   25.521780]  i2c_handle_smbus_host_notify+0x24/0x40
[   25.521783]  i801_isr+0x183/0x2e0 [i2c_i801]
[   25.521785]  irq_forced_thread_fn+0x2a/0x70
[   25.521787]  irq_thread+0x19b/0x240
[   25.521789]  ? irq_thread_fn+0x60/0x60
[   25.521790]  ? irq_thread_dtor+0xb0/0xb0
[   25.521792]  kthread+0x157/0x170
[   25.521793]  ? kthread_park+0x80/0x80
[   25.521795]  ret_from_fork+0x1f/0x40
[   25.521797] ---[ end trace c8e5d889fc5c2bdb ]---
Comment 3 Carlos Jimenez 2019-03-11 22:30:33 UTC
Created attachment 281749 [details]
dmesg-kernel-5.0.1
Comment 4 Jean Delvare 2019-08-06 14:07:20 UTC
The reason why the kernel is tainted is because you loaded out-of-tree, unsigned kernel modules (vbox*). This has nothing to do with the threadirqs kernel parameter. It shows in the logs simply because taint status is always displayed when you hit a WARNING.

Is this happening with vanilla kernels or gentoo kernels?

Thomas, I'm not sure if this is a bug in the i2c-i801 driver or something more general in how we handle the interrupts under threadirqs. Any suggestion how to investigate this further?
Comment 5 Oleksandr Natalenko 2020-12-03 09:17:32 UTC
See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873673

Does the patch from comment #6 work for you?
Comment 6 Thomas Gleixner 2020-12-03 17:36:03 UTC
On Thu, Dec 03 2020 at 09:17, bugzilla-daemon wrote:
> See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873673
>
> Does the patch from comment #6 work for you?

It papers over the problem and will break preempt-RT.

Thanks,

        tglx
Comment 7 Thomas Gleixner 2020-12-03 19:04:00 UTC
On Tue, Aug 06 2019 at 14:07, bugzilla-daemon wrote:
> Jean Delvare (jdelvare@suse.de) changed:
>
> Is this happening with vanilla kernels or gentoo kernels?
>
> Thomas, I'm not sure if this is a bug in the i2c-i801 driver or something
> more
> general in how we handle the interrupts under threadirqs. Any suggestion how
> to
> investigate this further?

Bah. What happens is that the i2c-i801 driver interrupt handler does:

i801_isr()

      ...
        return i801_host_notify_isr(priv);

which invokes:

      i2c_handle_smbus_host_notify()

which in turn invokes

      generic_handle_irq()

and that explodes with forced interrupt threading because it's called
with interrupts enabled.

The root of all evil is the usage of generic_handle_irq() under the
assumption that this is always called from hard interrupt context. That
assumption is not true since 8d32a307e4fa ("genirq: Provide forced
interrupt threading") which went into 2.6.39 almost 10 years ago.

Seems you got lucky that since 10 years someone no user of this uses a
threaded interrupt handler, like some of the i2c drivers actually do. :)

So there are a couple of options to fix this:

   1) Set the IRQF_NO_THREAD flag and replace the waitqueue as that
      won't work on RT.

      Looking at the usage it's a single waiter wakeup and a single
      waiter at a time because the xfer is fully serialized by the
      core. So we can switch it to rcuwait, if there would be
      rcu_wait_event_timeout(), but that's fixable.

   2) Have a wrapper around handle_generic_irq() which ensures that
      interrupts are disabled before invoking it.

   3) Make the interrupt which is dispatched there to be requested with
      [devm_]request_any_context_irq(). That also requires that the
      NESTED_THREAD flag is set on the irq descriptor.

      That's exactly made for the use case where the dispatching
      interrupt can be either threaded or in hard interrupt context.

      But that's lots of churn.

And because we have so many options, here is the question:

   Is i2c_handle_smbus_host_notify() guaranteed to be called from hard
   interrupt handlers (assumed that we use #1 above)?

   I can't tell because there is also i2c_slave_host_notify_cb() which
   invokes it and my i2c foo is not good enough to figure that out.

If that's the case the #1 would be the straight forward solution. If
not, then you want #2 because then the problem will just pop up via the
slave thing and that might be not as trivial to fix as this one.

If you can answer that, I can send you the proper patch :)

Thanks,

        tglx
Comment 8 Oleksandr Natalenko 2020-12-03 19:40:43 UTC
(In reply to Thomas Gleixner from comment #7)
>    Is i2c_handle_smbus_host_notify() guaranteed to be called from hard
>    interrupt handlers (assumed that we use #1 above)?

As far as I can grep through bus drivers, yes, it is called from hard interrupt handlers only.

i2c_handle_smbus_host_notify() is indeed called from i2c_slave_host_notify_cb(), which, in its turn, is set to be called as ->slave_cb() via i2c_slave_event() wrapper only.

Also, check [1], slide #9. I'm not sure about that "usually" word though since I couldn't find any examples of "unusual" usage.

/* not an i2c guru here either, just looking around the code */

[1] https://elinux.org/images/f/f6/ELCE15-WolframSang-ShinyNewI2CSlaveFramework.pdf
Comment 9 Thomas Gleixner 2020-12-03 23:46:13 UTC
On Thu, Dec 03 2020 at 19:40, bugzilla-daemon wrote:
>>    Is i2c_handle_smbus_host_notify() guaranteed to be called from hard
>>    interrupt handlers (assumed that we use #1 above)?
>
> As far as I can grep through bus drivers, yes, it is called from hard
> interrupt
> handlers only.
>
> i2c_handle_smbus_host_notify() is indeed called from
> i2c_slave_host_notify_cb(), which, in its turn, is set to be called as
> ->slave_cb() via i2c_slave_event() wrapper only.
>
> Also, check [1], slide #9. I'm not sure about that "usually" word though
> since
> I couldn't find any examples of "unusual" usage.

The question is whether it's guaranteed under all circumstances
including forced irq threading. The i801 driver has assumptions about
this, so I wouldn't be surprised if there are more.

Thanks,

        tglx
Comment 10 Oleksandr Natalenko 2020-12-04 20:26:41 UTC
On Thu, Dec 03, 2020 at 07:04:00PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=202453
> 
> --- Comment #7 from Thomas Gleixner (tglx@linutronix.de) ---
> On Tue, Aug 06 2019 at 14:07, bugzilla-daemon wrote:
> > Jean Delvare (jdelvare@suse.de) changed:
> >
> > Is this happening with vanilla kernels or gentoo kernels?
> >
> > Thomas, I'm not sure if this is a bug in the i2c-i801 driver or something
> > more
> > general in how we handle the interrupts under threadirqs. Any suggestion
> how
> > to
> > investigate this further?
> 
> Bah. What happens is that the i2c-i801 driver interrupt handler does:
> 
> i801_isr()
> 
>       ...
>         return i801_host_notify_isr(priv);
> 
> which invokes:
> 
>       i2c_handle_smbus_host_notify()
> 
> which in turn invokes
> 
>       generic_handle_irq()
> 
> and that explodes with forced interrupt threading because it's called
> with interrupts enabled.
> 
> The root of all evil is the usage of generic_handle_irq() under the
> assumption that this is always called from hard interrupt context. That
> assumption is not true since 8d32a307e4fa ("genirq: Provide forced
> interrupt threading") which went into 2.6.39 almost 10 years ago.
> 
> Seems you got lucky that since 10 years someone no user of this uses a
> threaded interrupt handler, like some of the i2c drivers actually do. :)
> 
> So there are a couple of options to fix this:
> 
>    1) Set the IRQF_NO_THREAD flag and replace the waitqueue as that
>       won't work on RT.
> 
>       Looking at the usage it's a single waiter wakeup and a single
>       waiter at a time because the xfer is fully serialized by the
>       core. So we can switch it to rcuwait, if there would be
>       rcu_wait_event_timeout(), but that's fixable.
> 
>    2) Have a wrapper around handle_generic_irq() which ensures that
>       interrupts are disabled before invoking it.
> 
>    3) Make the interrupt which is dispatched there to be requested with
>       [devm_]request_any_context_irq(). That also requires that the
>       NESTED_THREAD flag is set on the irq descriptor.
> 
>       That's exactly made for the use case where the dispatching
>       interrupt can be either threaded or in hard interrupt context.
> 
>       But that's lots of churn.
> 
> And because we have so many options, here is the question:
> 
>    Is i2c_handle_smbus_host_notify() guaranteed to be called from hard
>    interrupt handlers (assumed that we use #1 above)?
> 
>    I can't tell because there is also i2c_slave_host_notify_cb() which
>    invokes it and my i2c foo is not good enough to figure that out.
> 
> If that's the case the #1 would be the straight forward solution. If
> not, then you want #2 because then the problem will just pop up via the
> slave thing and that might be not as trivial to fix as this one.
> 
> If you can answer that, I can send you the proper patch :)

tglx suggested moving this to the appropriate mailing lists, so I'mm
Cc'ing those.

Jean, Wolfram, Benjamin, or someone else, could you please check Thomas'
questions above and let us know what you think?

I'll copy-paste my attempt to answer this in bugzilla below:

```
As far as I can grep through bus drivers, yes, it is called from hard
interrupt handlers only.

i2c_handle_smbus_host_notify() is indeed called from
i2c_slave_host_notify_cb(), which, in its turn, is set to be called as
->slave_cb() via i2c_slave_event() wrapper only.

Also, check [1], slide #9. I'm not sure about that "usually" word
though since I couldn't find any examples of "unusual" usage.

/* not an i2c guru here either, just looking around the code */

[1] https://elinux.org/images/f/f6/ELCE15-WolframSang-ShinyNewI2CSlaveFramework.pdf
```

and also tglx' follow-up question:

```
The question is whether it's guaranteed under all circumstances
including forced irq threading. The i801 driver has assumptions about
this, so I wouldn't be surprised if there are more.
```

Thanks.
Comment 11 Thomas Gleixner 2020-12-05 16:19:22 UTC
On Fri, Dec 04 2020 at 21:19, Oleksandr Natalenko wrote:
> On Thu, Dec 03, 2020 at 07:04:00PM +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
>>    2) Have a wrapper around handle_generic_irq() which ensures that
>>       interrupts are disabled before invoking it.

> The question is whether it's guaranteed under all circumstances
> including forced irq threading. The i801 driver has assumptions about
> this, so I wouldn't be surprised if there are more.

Assuming that a final answer might take some time, the below which
implements #2 will make it at least work for now.

Thanks,

        tglx
---
Subject: genirq, i2c: Provide and use generic_dispatch_irq()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 03 Dec 2020 19:12:24 +0100

Carlos reported that on his system booting with 'threadirqs' on the command
line result in the following warning:

irq 31 handler irq_default_primary_handler+0x0/0x10 enabled interrupts
WARNING: CPU: 2 PID: 989 at kernel/irq/handle.c:153 __handle_irq_event_percpu+0x19f/0x1b0

The reason is in the i2c stack:

    i801_isr()
      i801_host_notify_isr()
        i2c_handle_smbus_host_notify()
          generic_handle_irq()

and that explodes with forced interrupt threading because it's called with
interrupts enabled.

It would be possible to set IRQF_NO_THREAD on the i801 interrupt to exclude
it from force threading, but that would break on RT and require a larger
update.

It's also unclear whether there are other drivers which can reach that code
path via i2c_slave_host_notify_cb(). As there are enough i2c drivers which
use threaded interrupt handlers by default it seems not completely
impossible that this can happen even without force threaded interrupts.

For a quick fix provide a wrapper around generic_handle_irq() which has a
local_irq_save/restore() around the invocation and use it in the i2c code.

Reported-by: Carlos Jimenez <javashin1986@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202453
---
 drivers/i2c/i2c-core-base.c |    2 +-
 include/linux/irqdesc.h     |    1 +
 kernel/irq/irqdesc.c        |   20 ++++++++++++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

--- a/drivers/i2c/i2c-core-base.c
+++ b/drivers/i2c/i2c-core-base.c
@@ -1385,7 +1385,7 @@ int i2c_handle_smbus_host_notify(struct
 	if (irq <= 0)
 		return -ENXIO;
 
-	generic_handle_irq(irq);
+	generic_dispatch_irq(irq);
 
 	return 0;
 }
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -153,6 +153,7 @@ static inline void generic_handle_irq_de
 }
 
 int generic_handle_irq(unsigned int irq);
+int generic_dispatch_irq(unsigned int irq);
 
 #ifdef CONFIG_HANDLE_DOMAIN_IRQ
 /*
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -652,6 +652,26 @@ int generic_handle_irq(unsigned int irq)
 }
 EXPORT_SYMBOL_GPL(generic_handle_irq);
 
+/**
+ * generic_dispatch_irq - Dispatch an interrupt from an interrupt handler
+ * @irq:	The irq number to handle
+ *
+ * A wrapper around generic_handle_irq() which ensures that interrupts are
+ * disabled when the primary handler of the dispatched irq is invoked.
+ * This is useful for interrupt handlers with dispatching to be safe for
+ * the forced threaded case.
+ */
+int generic_dispatch_irq(unsigned int irq)
+{
+	unsigned long flags;
+	int ret;
+
+	local_irq_save(&flags);
+	ret = generic_handle_irq(irq);
+	local_irq_restore(&flags);
+	return ret;
+}
+
 #ifdef CONFIG_HANDLE_DOMAIN_IRQ
 /**
  * __handle_domain_irq - Invoke the handler for a HW irq belonging to a domain
Comment 12 Oleksandr Natalenko 2020-12-05 16:24:08 UTC
On Sat, Dec 05, 2020 at 05:19:18PM +0100, Thomas Gleixner wrote:
> On Fri, Dec 04 2020 at 21:19, Oleksandr Natalenko wrote:
> > On Thu, Dec 03, 2020 at 07:04:00PM +0000,
> bugzilla-daemon@bugzilla.kernel.org wrote:
> >>    2) Have a wrapper around handle_generic_irq() which ensures that
> >>       interrupts are disabled before invoking it.
> 
> > The question is whether it's guaranteed under all circumstances
> > including forced irq threading. The i801 driver has assumptions about
> > this, so I wouldn't be surprised if there are more.
> 
> Assuming that a final answer might take some time, the below which
> implements #2 will make it at least work for now.
> 
> Thanks,
> 
>         tglx
> ---
> Subject: genirq, i2c: Provide and use generic_dispatch_irq()
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Thu, 03 Dec 2020 19:12:24 +0100
> 
> Carlos reported that on his system booting with 'threadirqs' on the command
> line result in the following warning:
> 
> irq 31 handler irq_default_primary_handler+0x0/0x10 enabled interrupts
> WARNING: CPU: 2 PID: 989 at kernel/irq/handle.c:153
> __handle_irq_event_percpu+0x19f/0x1b0
> 
> The reason is in the i2c stack:
> 
>     i801_isr()
>       i801_host_notify_isr()
>         i2c_handle_smbus_host_notify()
>           generic_handle_irq()
> 
> and that explodes with forced interrupt threading because it's called with
> interrupts enabled.
> 
> It would be possible to set IRQF_NO_THREAD on the i801 interrupt to exclude
> it from force threading, but that would break on RT and require a larger
> update.
> 
> It's also unclear whether there are other drivers which can reach that code
> path via i2c_slave_host_notify_cb(). As there are enough i2c drivers which
> use threaded interrupt handlers by default it seems not completely
> impossible that this can happen even without force threaded interrupts.
> 
> For a quick fix provide a wrapper around generic_handle_irq() which has a
> local_irq_save/restore() around the invocation and use it in the i2c code.
> 
> Reported-by: Carlos Jimenez <javashin1986@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202453
> ---
>  drivers/i2c/i2c-core-base.c |    2 +-
>  include/linux/irqdesc.h     |    1 +
>  kernel/irq/irqdesc.c        |   20 ++++++++++++++++++++
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> --- a/drivers/i2c/i2c-core-base.c
> +++ b/drivers/i2c/i2c-core-base.c
> @@ -1385,7 +1385,7 @@ int i2c_handle_smbus_host_notify(struct
>       if (irq <= 0)
>               return -ENXIO;
>  
> -     generic_handle_irq(irq);
> +     generic_dispatch_irq(irq);
>  
>       return 0;
>  }
> --- a/include/linux/irqdesc.h
> +++ b/include/linux/irqdesc.h
> @@ -153,6 +153,7 @@ static inline void generic_handle_irq_de
>  }
>  
>  int generic_handle_irq(unsigned int irq);
> +int generic_dispatch_irq(unsigned int irq);
>  
>  #ifdef CONFIG_HANDLE_DOMAIN_IRQ
>  /*
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -652,6 +652,26 @@ int generic_handle_irq(unsigned int irq)
>  }
>  EXPORT_SYMBOL_GPL(generic_handle_irq);
>  
> +/**
> + * generic_dispatch_irq - Dispatch an interrupt from an interrupt handler
> + * @irq:     The irq number to handle
> + *
> + * A wrapper around generic_handle_irq() which ensures that interrupts are
> + * disabled when the primary handler of the dispatched irq is invoked.
> + * This is useful for interrupt handlers with dispatching to be safe for
> + * the forced threaded case.
> + */
> +int generic_dispatch_irq(unsigned int irq)
> +{
> +     unsigned long flags;
> +     int ret;
> +
> +     local_irq_save(&flags);
> +     ret = generic_handle_irq(irq);
> +     local_irq_restore(&flags);

FWIW, for me &flags explodes build on v5.10-rc6. I had to change it to local_irq_save/restore(flags) (without taking an address via &).

> +     return ret;
> +}
> +
>  #ifdef CONFIG_HANDLE_DOMAIN_IRQ
>  /**
>   * __handle_domain_irq - Invoke the handler for a HW irq belonging to a
>   domain
Comment 13 Thomas Gleixner 2020-12-05 18:59:50 UTC
On Sat, Dec 05 2020 at 17:24, Oleksandr Natalenko wrote:
> On Sat, Dec 05, 2020 at 05:19:18PM +0100, Thomas Gleixner wrote:
>> +/**
>> + * generic_dispatch_irq - Dispatch an interrupt from an interrupt handler
>> + * @irq:    The irq number to handle
>> + *
>> + * A wrapper around generic_handle_irq() which ensures that interrupts are
>> + * disabled when the primary handler of the dispatched irq is invoked.
>> + * This is useful for interrupt handlers with dispatching to be safe for
>> + * the forced threaded case.
>> + */
>> +int generic_dispatch_irq(unsigned int irq)
>> +{
>> +    unsigned long flags;
>> +    int ret;
>> +
>> +    local_irq_save(&flags);
>> +    ret = generic_handle_irq(irq);
>> +    local_irq_restore(&flags);
>
> FWIW, for me &flags explodes build on v5.10-rc6. I had to change it to
> local_irq_save/restore(flags) (without taking an address via &).

That's right. Don't know what I was thinking when writing it and then
compiling with the patch removed (just checked history ...) Oh, well
Comment 14 Wolfram Sang 2021-01-04 20:45:32 UTC
Okay, here is some context about HostNotify. It is a rarely used SMBus extension which allows clients to send a notification to the host. The host must be able to listen to the I2C address 0x08. The only host controller which has implemented this natively is the i801 because the hardware sets a bit when this happened. (Sidenote, the only clients I am aware of which send notifications are some touchscreen controllers.) This is why the i801 driver calls i2c_handle_smbus_host_notify() directly. And only that one, so far.

But recently, Alain Volmat got the idea that we can use the generic slave framework to make host controllers listen on address 0x08 and check for a valid HostNotification. This is why the generic i2c_slave_host_notify_cb() calls i2c_handle_smbus_host_notify(), too. This allows all I2C host controllers which select I2C_SLAVE to use HostNotify. Those are few currently, but their number is steadily increasing.

And as it looks to me, currently all drivers selecting I2C_SLAVE check their interrupts which handle the slave state (i.e. managing I2C address 0x08) in a non-threaded context. But there is no guarantee for that. Unless we formulate one. However, my gut feeling is that option #3 might be not so much churn for this case, but I need to double check if I am overlooking something.

Given that only some touchscreen controllers send HostNotify and you need to enforce threaded irqs for this WARN, this might explain why it went unnoticed for 10 years.

I hope this helps. Thank you everyone for the input provided so far!
Comment 15 Wolfram Sang 2021-01-04 20:58:19 UTC
Created attachment 294495 [details]
signature.asc

> Jean, Wolfram, Benjamin, or someone else, could you please check Thomas'
> questions above and let us know what you think?

Sending this message here again because Bugzilla didn't know about the
mailing lists. Sorry for the noise!

===

Okay, here is some context about HostNotify. It is a rarely used SMBus
extension which allows clients to send a notification to the host. The
host must be able to listen to the I2C address 0x08. The only host
controller which has implemented this natively is the i801 because the
hardware sets a bit when this happened. (Sidenote, the only clients I am
aware of which send notifications are some touchscreen controllers.)
This is why the i801 driver calls i2c_handle_smbus_host_notify()
directly. And only that one, so far.

But recently, Alain Volmat got the idea that we can use the generic
slave framework to make host controllers listen on address 0x08 and
check for a valid HostNotification. This is why the generic
i2c_slave_host_notify_cb() calls i2c_handle_smbus_host_notify(), too.
This allows all I2C host controllers which select I2C_SLAVE to use
HostNotify. Those are few currently, but their number is steadily
increasing.

And as it looks to me, currently all drivers selecting I2C_SLAVE check
their interrupts which handle the slave state (i.e. managing I2C address
0x08) in a non-threaded context. But there is no guarantee for that.
Unless we formulate one. However, my gut feeling is that option #3 might
be not so much churn for this case, but I need to double check if I am
overlooking something.

Given that only some touchscreen controllers send HostNotify and you
need to enforce threaded irqs for this WARN, this might explain why it
went unnoticed for 10 years.

I hope this helps. Thank you everyone for the input provided so far!

Note You need to log in before you can comment on or make changes to this bug.