Bug 212379 - amdpinctrl gpio pins pulled low which causes all i2c devices to stop working
Summary: amdpinctrl gpio pins pulled low which causes all i2c devices to stop working
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: I2C (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-22 02:49 UTC by Austin Kilgore
Modified: 2021-07-13 23:11 UTC (History)
8 users (show)

See Also:
Kernel Version: 5.12
Tree: Mainline
Regression: No


Attachments
sysinfo logs from while the i2c devices worked and when they stopped working. (506.84 KB, application/gzip)
2021-03-22 02:49 UTC, Austin Kilgore
Details
interrupts and dmesg log (26.11 KB, application/gzip)
2021-03-26 20:05 UTC, Austin Kilgore
Details
acpidump (652.02 KB, text/plain)
2021-05-03 22:48 UTC, Austin Kilgore
Details
facp.dsl (9.11 KB, text/x-csrc)
2021-05-03 22:56 UTC, Austin Kilgore
Details
fwts version (321.15 KB, text/plain)
2021-05-03 23:06 UTC, Austin Kilgore
Details
[PATCH] pinctrl: amd: Fix handling of PIN_CONFIG_BIAS_PULL_UP/_DOWN settings (3.91 KB, patch)
2021-05-30 14:10 UTC, Hans de Goede
Details | Diff
pins with dsdt overlay (6.13 KB, text/plain)
2021-05-30 17:06 UTC, Austin Kilgore
Details
pins without dsdt overlay (identical) (6.13 KB, text/plain)
2021-05-30 17:07 UTC, Austin Kilgore
Details
gpio with dsdt overlay (56.20 KB, text/plain)
2021-05-30 17:59 UTC, Austin Kilgore
Details
gpio without dsdt overlay (identical) (56.20 KB, text/plain)
2021-05-30 18:00 UTC, Austin Kilgore
Details
gpio with patch (56.20 KB, text/plain)
2021-05-30 21:33 UTC, Austin Kilgore
Details
[PATCH] GPIOLIB/AMD-pinctrl: add some debugging (1.93 KB, patch)
2021-05-31 09:51 UTC, Hans de Goede
Details | Diff
dsdt files (31.63 KB, application/gzip)
2021-05-31 19:22 UTC, Austin Kilgore
Details
acpidumps and dmesg logs (398.92 KB, application/gzip)
2021-06-05 11:52 UTC, Austin Kilgore
Details
dsdt.hex used in kernel as override (371.79 KB, text/x-csrc)
2021-06-24 20:03 UTC, Austin Kilgore
Details
attachment-19114-0.html (313 bytes, text/html)
2021-07-11 02:16 UTC, Nehal Shah
Details

Description Austin Kilgore 2021-03-22 02:49:47 UTC
Created attachment 295983 [details]
sysinfo logs from while the i2c devices worked and when they stopped working.

I have a Lenovo IdeaPad FLEX-14API 81SS000BUS that uses an elan i2c touchpad and wacom i2c touchscreen.

After using my touchpad or touchscreen after a while (it can be at any time, I haven't found a trigger point) both stop working simultaneously.

I have ruled out the device drivers and been in contact with a kernel developer for wacom and he said it's below their driver which points to it being the bus driver.

See https://github.com/linuxwacom/input-wacom/issues/221 for more information on the problem and logs as well as feedback from the dev.
Comment 1 Austin Kilgore 2021-03-23 20:24:38 UTC
I tried contacting Hans de Goede through his email to get his attention on this bug as per the doc outlined for kernel bugs. However, I have not heard back so maybe his github email is no longer being used.
Comment 2 Borislav Petkov 2021-03-23 20:31:21 UTC
I see this in MAINTAINERS:

SYNOPSYS DESIGNWARE I2C DRIVER
M:	Jarkko Nikula <jarkko.nikula@linux.intel.com>
R:	Andy Shevchenko <andriy.shevchenko@linux.intel.com>
R:	Mika Westerberg <mika.westerberg@linux.intel.com>
L:	linux-i2c@vger.kernel.org
S:	Maintained
F:	drivers/i2c/busses/i2c-designware-*
F:	include/linux/platform_data/i2c-designware.h

or which bus do you mean?
Comment 3 Austin Kilgore 2021-03-23 21:34:54 UTC
(In reply to Borislav Petkov from comment #2)
> I see this in MAINTAINERS:
> 
> SYNOPSYS DESIGNWARE I2C DRIVER
> M:    Jarkko Nikula <jarkko.nikula@linux.intel.com>
> R:    Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> R:    Mika Westerberg <mika.westerberg@linux.intel.com>
> L:    linux-i2c@vger.kernel.org
> S:    Maintained
> F:    drivers/i2c/busses/i2c-designware-*
> F:    include/linux/platform_data/i2c-designware.h
> 
> or which bus do you mean?

Okay thanks, I'll try contacting Jarkko.

I only use i2c-designware in my kernel so it's one of the designware files with the problem. I wish I could provide more useful information.
Comment 4 Austin Kilgore 2021-03-23 21:53:29 UTC
Here's the schematic for my laptop: https://drive.google.com/file/d/1fIVByM7ZrC7NUxW1HjCbQIEaOjFs8xm0/view?usp=sharing

Kernel config: https://drive.google.com/file/d/1RwcXcU8tbOuniG98PnvaHRnPHDkC3WBa/view?usp=sharing 

Kernel config pastebin: https://pastebin.com/5FWLN7pB

i2c devices in my laptop:

touchpad ELN4690
touchscreen WCOM51C7
g-sensor BMA250E

I run a custom kernel with all the device drivers with those i2c components enabled as well as the amd i2c bus driver (which is really what I think should work for me, but doesn't) and the synopsys designware bus driver, which is the only one that makes the i2c devices function at all.
Comment 5 Jarkko Nikula 2021-03-24 08:00:43 UTC
Don't see quickly anything obvious that would explain this. E.g. no transfer failures etc in dmesg.

I added also two contacts from AMD that have worked with i2c-designware and are found from bugzilla.
Comment 6 Austin Kilgore 2021-03-24 16:51:43 UTC
(In reply to Jarkko Nikula from comment #5)
> Don't see quickly anything obvious that would explain this. E.g. no transfer
> failures etc in dmesg.
> 
> I added also two contacts from AMD that have worked with i2c-designware and
> are found from bugzilla.

Thanks.

I also should mention that this happened back when I was on Arch. I tried the linux-kernel and linux-zen-kernel. Right now on Gentoo, I've tried the 5.4.97 Gentoo stable kernel and the various custom kernel I've configured since I've been on Gentoo. And it's always been an issue, I have never been able to 100% pin down what or where the problem comes from but I feel like the bus driver is a safe bet since it's been in all the kernels I've tested and all the devices that stop functioning rely on it.

The fact that it's triggered seemingly at random doesn't make much sense to me. Perhaps it's some kind of byte overflow (I think that's what it's called) happening in one of the i2c devices that the driver can't handle.
Comment 7 Hans de Goede 2021-03-25 10:12:34 UTC
I see in the logs that i2c-WCOM51C7:00 sits on the AMDI0010:00/i2c-0 bus, where as i2c-ELN4690:00 sits on the AMDI0010:01/i2c-1 bus.

If they both stop working at the same time, then this feels like an IRQ problem and not an i2c-designware problem at all.

Maybe the 2 i2c-controllers share an IRQ and there is an IRQ handling problem here. E.g. the IRQ is configured as edge, while it should be level or something like that.

Or maybe the GpioInt-s used by the devices to tell they have data ready stop working ?

What is the output of:

cat /proc/interrupts

On this laptop?

Also please do the following when the i2c-hid devices stop working again:

sudo rmmod i2c-hid
sudo modprobe i2c-hid
dmesg > dmesg-after-i2c-hid-reprobe

And attach the dmesg-after-i2c-hid-reprobe file here. This will tell us if i2c-transfers are no longer working, or if there is some other issue.
Comment 8 Austin Kilgore 2021-03-26 20:04:53 UTC
(In reply to Hans de Goede from comment #7)
> I see in the logs that i2c-WCOM51C7:00 sits on the AMDI0010:00/i2c-0 bus,
> where as i2c-ELN4690:00 sits on the AMDI0010:01/i2c-1 bus.
> 
> If they both stop working at the same time, then this feels like an IRQ
> problem and not an i2c-designware problem at all.
> 
> Maybe the 2 i2c-controllers share an IRQ and there is an IRQ handling
> problem here. E.g. the IRQ is configured as edge, while it should be level
> or something like that.
> 
> Or maybe the GpioInt-s used by the devices to tell they have data ready stop
> working ?
> 
> What is the output of:
> 
> cat /proc/interrupts
> 
> On this laptop?
> 
> Also please do the following when the i2c-hid devices stop working again:
> 
> sudo rmmod i2c-hid
> sudo modprobe i2c-hid
> dmesg > dmesg-after-i2c-hid-reprobe
> 
> And attach the dmesg-after-i2c-hid-reprobe file here. This will tell us if
> i2c-transfers are no longer working, or if there is some other issue.

Sorry for the delay, here's the information you asked for:
Comment 9 Austin Kilgore 2021-03-26 20:05:26 UTC
Created attachment 296077 [details]
interrupts and dmesg log
Comment 10 Hans de Goede 2021-03-27 12:48:45 UTC
Thank you for the logs.

So the logs show that on reprobing i2c-hid, the i2c_hid_fetch_hid_descriptor() done during probe works, which means that we still have working i2c communications.

And then during i2c_hid_parse() the i2c_hid_hwreset() calls, this not only needs working i2c-communication, but also a working interrupt from the i2c-hid device to the main CPU.

Another thing which I noticed is that the BMA250E accelerometer also stops working.

All 3 of these use the amd_gpio chip for their interrupt lines and those 3 are the only 3 devices using the amd_gpio chip for interrupts.

So my conclusion is that this is not an i2c problem at all, it is a problem with the  amd_gpio chip no longer delivering interrupts to the main CPU after a while.

I'm afraid that that is as far as I can help you with this because I have no experience with / knowledge of the amd_gpio driver.
Comment 11 Hans de Goede 2021-03-27 12:50:25 UTC
Grumble, I messed up one of the sentences above, that should be:

"And then during i2c_hid_parse() i2c_hid_hwreset() get called, this not only needs working i2c-communication, but also a working interrupt from the i2c-hid device to the main CPU. The i2c_hid_hwreset() call fails, indicating an interrupt problem."
Comment 12 Austin Kilgore 2021-03-27 23:01:57 UTC
I don't know what the next step to fixing this would be but I've edited the title to try to be more accurate now that i2c is ruled out as being the problem.

I looked and I didn't see a irq flag for this bug, so I just set it to "other".

Feel free to improve the description if you want.


I did want to mention still that I think I should be able to use the amd-i2c  bus driver since that's what windows calls the bus driver they are on. Perhaps that doesn't really matter but I just wanted to mention it.
Comment 13 Hans de Goede 2021-03-29 15:26:28 UTC
> I don't know what the next step to fixing this would be

I'm afraid I don't really know either. You could try contacting AMD about this, if they can take a look at this, that would be best.
Comment 14 Austin Kilgore 2021-04-16 20:02:35 UTC
Alright, I added everyone that I could find with an AMD email address on here. I apologize in advance for the mass cc but I just don't know who at AMD to contact about an IRQ problem with an i2c bus.

I need someone to take a look at this because it is a really serious problem. As always, I am at your disposal for any testing or further questions if need be. Thank you for your time.
Comment 15 Austin Kilgore 2021-04-26 21:04:38 UTC
Is this ever going to get looked at?

Did I need to @ mention someone? Honestly asking since I don't know what to do. I don't even know who needs to be told to take a look at this.

Like I said, if this bug report is lacking information just let me know and I'll test, check, or upload whatever is needed. It's really important to me that this gets fixed.
Comment 16 Christian König 2021-04-27 07:03:56 UTC
Well just adding random people to an unrelated bug is not considered very polite nor helpful.

You should rather try to find a maintainer for your driver instead.

Going to ignore this bug from now on.
Comment 17 Jarkko Nikula 2021-04-27 07:37:58 UTC
Since you are answering from amd.com I think you are in better position to help here and find contacts inside AMD why it looks "it is a problem with the  amd_gpio chip no longer delivering interrupts to the main CPU after a while".
Comment 18 Andy Shevchenko 2021-04-27 14:14:13 UTC
(In reply to Jarkko Nikula from comment #17)
> Since you are answering from amd.com I think you are in better position to
> help here and find contacts inside AMD why it looks "it is a problem with
> the  amd_gpio chip no longer delivering interrupts to the main CPU after a
> while".

I have removed all AMD contacts, but two that mentioned in pinctrl-amd.c.
Comment 19 Andy Shevchenko 2021-04-27 14:15:18 UTC
OTOH, added Coiby. Coiby, if you think it's wrong, tell me, I'll remove you from the Cc list.
Comment 20 mario.limonciello 2021-05-03 21:56:23 UTC
Austin,

Can you please attach an acpidump to this issue?
Comment 21 Austin Kilgore 2021-05-03 22:47:56 UTC
(In reply to mario.limonciello from comment #20)
> Austin,
> 
> Can you please attach an acpidump to this issue?

Sure thing.
Comment 22 Austin Kilgore 2021-05-03 22:48:25 UTC
Created attachment 296617 [details]
acpidump
Comment 23 Austin Kilgore 2021-05-03 22:56:45 UTC
Created attachment 296619 [details]
facp.dsl
Comment 24 Austin Kilgore 2021-05-03 23:06:05 UTC
Created attachment 296621 [details]
fwts version
Comment 25 Austin Kilgore 2021-05-07 01:51:24 UTC
I came across this https://github.com/torvalds/linux/blob/master/drivers/platform/x86/i2c-multi-instantiate.c and I was wondering if that's what mine needs? It sounds like it's for devices that use multiple hids on i2c, which is in line with what my laptop has (touchscreen, touchpad, and accel all on i2c).

I could be wrong here too I'm just trying to dig deeper into what it might be.
Comment 26 mario.limonciello 2021-05-07 02:13:07 UTC
I don't think that's likely the case.

I've looked at a few other designs ACPI tables and noted they have the GPIO config set as PullNone not PullUp.  I wonder if this can possibly be a mistake in the ACPI tables.

Are you comfortable with experimenting with a DSDT modification that you load into your initrd?  If so, can you please try to change the _CRS for your touchpad device (TPDB) from PullUp to PullNone?

From:

```
                    GpioInt (Level, ActiveLow, ExclusiveAndWake, PullUp, 0x0000,
                        "\\_SB.GPIO", 0x00, ResourceConsumer, ,
                        )
                        {   // Pin list
                            0x0055
                        }
```

To:

```
                    GpioInt (Level, ActiveLow, ExclusiveAndWake, PullNone, 0x0000,
                        "\\_SB.GPIO", 0x00, ResourceConsumer, ,
                        )
                        {   // Pin list
                            0x0055
                        }
```

See if that that still works at all and if so, does it remain stable?
Comment 27 Austin Kilgore 2021-05-07 02:20:26 UTC
(In reply to mario.limonciello from comment #26)
> I don't think that's likely the case.
> 
> I've looked at a few other designs ACPI tables and noted they have the GPIO
> config set as PullNone not PullUp.  I wonder if this can possibly be a
> mistake in the ACPI tables.
> 
> Are you comfortable with experimenting with a DSDT modification that you
> load into your initrd?  If so, can you please try to change the _CRS for
> your touchpad device (TPDB) from PullUp to PullNone?
> 
> From:
> 
> ```
>                     GpioInt (Level, ActiveLow, ExclusiveAndWake, PullUp,
> 0x0000,
>                         "\\_SB.GPIO", 0x00, ResourceConsumer, ,
>                         )
>                         {   // Pin list
>                             0x0055
>                         }
> ```
> 
> To:
> 
> ```
>                     GpioInt (Level, ActiveLow, ExclusiveAndWake, PullNone,
> 0x0000,
>                         "\\_SB.GPIO", 0x00, ResourceConsumer, ,
>                         )
>                         {   // Pin list
>                             0x0055
>                         }
> ```
> 
> See if that that still works at all and if so, does it remain stable?

Sure I don't mind trying that out. I might need some help with understanding the process though since I've never done that before, could you explain a bit of what I'll need to do or point me to an article that explains it?

I found https://www.kernel.org/doc/html/latest/admin-guide/acpi/initrd_table_override.html but I don't this part: "Add the raw ACPI tables to an uncompressed cpio archive". Maybe it's a bit dated too?
Comment 28 mario.limonciello 2021-05-07 02:44:36 UTC
The way to accomplish that is distribution dependent, so maybe look for some more distribution localized steps for including a DSDT in the initrd.

At one point this also worked to load it from GRUB directly without a kernel recompile:
https://blog.michael.kuron-germany.de/2011/03/patching-dsdt-in-recent-linux-kernels-without-recompiling/

You can confirm whichever method you performed was successful by dumping and comparing the values are what you put in (or if things happen to work and don't break again ;))
Comment 29 Hans de Goede 2021-05-07 09:46:25 UTC
Hi Austin,

(In reply to Austin Kilgore from comment #25)
> I came across this
> https://github.com/torvalds/linux/blob/master/drivers/platform/x86/i2c-multi-
> instantiate.c and I was wondering if that's what mine needs?

No that is for something completely difference sometimes a Device("name") {} entry in the ACPI table describes multiple I2C connected chips in a single Device entry rather then using one Device entry per I2C connected chips. The file you link to deals with this special case and this is not relevant to your laptop since it has one ACPI Device entry per I2C connected device (one for each of the touchpad and touchscreen).

The HID in the case of that code stands for _H_ardware-ID rather then for Human Input Device, this is confusing I know.
Comment 30 Hans de Goede 2021-05-07 09:53:23 UTC
Here is a good guide on how to prepare an initrd with a DSDT overlay:

https://www.kernel.org/doc/html/latest/admin-guide/acpi/initrd_table_override.html

Once you have prepared the initrd, then the easiest way to make grub load it before it loads the actual initrd is to run:

sudo grub2-editenv - set early_initrd="initrd-filename.img"

At least that is how it works under Fedora. You can check if this has worked by doing:

"dmesg | grep DSDT" after rebooting, this should result in a message aboutn the DSDT being overridden.


Another option is to set:

GRUB_EARLY_INITRD_LINUX_CUSTOM="initrd-filename.img"

in /etc/default/grub

And then rerun grub2-mkconfig to generate the grub2.cfg (or grub2-efi.cfg) file. But not that this overrides your old grub.cfg, which was normally also autogenerated so this should be fine, but still...
Comment 31 Austin Kilgore 2021-05-14 05:25:00 UTC
Alright I think I did everything correctly.
Here's what I did:

cat /sys/firmware/acpi/tables/DSDT > dsdt.dat
iasl -d dsdt.dat

I edited the dsdt.dsl to change PullUp to PullNone in the two spots it's used.

I then tried to compile the dsdt with iasl -tc dsdt.dsl but I got an error and a bunch of warnings:

dsdt.dsl   9880:                             Case (Zero)
Error    6022 -         Case value already specified ^ 

    Original Case value below:
    dsdt.dsl   9766:                             Case (Zero)

I decided to try removing that function altogether since nothing else I did would work.
Removing it lead to more warnings:

Compilation successful. 0 Errors, 59 Warnings, 189 Remarks, 135 Optimizations

Since the error was gone and it compiled, I proceeded to add it into my kernel config and compile a custom kernel with it. 
The kernel built fine and I rebooted.
After booting into the new kernel I checked dmesg and saw:

[    0.005321] ACPI: Override [DSDT-CB-01   ], this is unsafe: tainting kernel

I'm guessing now it's just a matter of time to see if the problem still happens.
Comment 32 Austin Kilgore 2021-05-27 16:38:46 UTC
I haven't noticed the problem since correcting the acpi table so feel free to close this off if you want.
Comment 33 Hans de Goede 2021-05-27 17:56:14 UTC
(In reply to Austin Kilgore from comment #32)
> I haven't noticed the problem since correcting the acpi table so feel free
> to close this off if you want.

Thank you for testing this; and great to hear that this fixes things for you, but this is not really a proper fix for all the other users impacted by this.

A while ago the amd-pinctrl driver was modified to not touch the glitch-filter settings on pins used for edge-triggered interrupts, because touching those was breaking interrupts from the touchpad. I wonder if we need to do something similar for the pull-up/down settings for GpioInt resources.

I'm getting the feeling that the Windows AMD GPIO driver relies on the BIOS having setup settings like these and it does not touch them, which means that we (Linux) should be doing the same to avoid bugs like this one.

Austin, I can whip up a hackish kernel-patch which just disables all re-programming of the pull-up/down bits in the amd-pinctrl driver. Would you be able and willing to build a kernel with such a patch and test that as a workaround instead of using the DSDT overlay?
Comment 34 Austin Kilgore 2021-05-27 18:07:05 UTC
(In reply to Hans de Goede from comment #33)
> (In reply to Austin Kilgore from comment #32)
> > I haven't noticed the problem since correcting the acpi table so feel free
> > to close this off if you want.
> 
> Thank you for testing this; and great to hear that this fixes things for
> you, but this is not really a proper fix for all the other users impacted by
> this.
> 
> A while ago the amd-pinctrl driver was modified to not touch the
> glitch-filter settings on pins used for edge-triggered interrupts, because
> touching those was breaking interrupts from the touchpad. I wonder if we
> need to do something similar for the pull-up/down settings for GpioInt
> resources.
> 
> I'm getting the feeling that the Windows AMD GPIO driver relies on the BIOS
> having setup settings like these and it does not touch them, which means
> that we (Linux) should be doing the same to avoid bugs like this one.
> 
> Austin, I can whip up a hackish kernel-patch which just disables all
> re-programming of the pull-up/down bits in the amd-pinctrl driver. Would you
> be able and willing to build a kernel with such a patch and test that as a
> workaround instead of using the DSDT overlay?

Sure, I wouldn't mind testing it.
Comment 35 Coiby 2021-05-28 23:11:57 UTC
(In reply to Andy Shevchenko from comment #19)
> OTOH, added Coiby. Coiby, if you think it's wrong, tell me, I'll remove you
> from the Cc list.

Thanks for adding me to the Cc list! I've been watching this thread but couldn't find a clue. So I haven't replied until knowing there is a workaround for the problem.
Comment 36 Coiby 2021-05-28 23:25:07 UTC
(In reply to Hans de Goede from comment #33)
> (In reply to Austin Kilgore from comment #32)
> > I haven't noticed the problem since correcting the acpi table so feel free
> > to close this off if you want.
> 
> Thank you for testing this; and great to hear that this fixes things for
> you, but this is not really a proper fix for all the other users impacted by
> this.
> 
> A while ago the amd-pinctrl driver was modified to not touch the
> glitch-filter settings on pins used for edge-triggered interrupts, because
> touching those was breaking interrupts from the touchpad. I wonder if we
> need to do something similar for the pull-up/down settings for GpioInt
> resources.
> 
> I'm getting the feeling that the Windows AMD GPIO driver relies on the BIOS
> having setup settings like these and it does not touch them, which means
> that we (Linux) should be doing the same to avoid bugs like this one.
> 
> Austin, I can whip up a hackish kernel-patch which just disables all
> re-programming of the pull-up/down bits in the amd-pinctrl driver. Would you
> be able and willing to build a kernel with such a patch and test that as a
> workaround instead of using the DSDT overlay?

Hi Hans,

There are three other amd-pinctrl related touchpad bugs that I haven't figured out a workaround or the root cause [1] [2] [3]. If you think they are related to this bug, you can Cc me the patch. I'll contact the bug reporters to test your patch.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1887190 
[2] https://forum.manjaro.org/t/random-short-touchpad-freezes/30832/6
[3] https://forum.manjaro.org/t/elan-touchpad-working-in-live-but-not-in-native-os/31860/50
Comment 37 Coiby 2021-05-29 17:29:43 UTC
(In reply to Coiby from comment #36)
> (In reply to Hans de Goede from comment #33)
> > (In reply to Austin Kilgore from comment #32)
> > > I haven't noticed the problem since correcting the acpi table so feel
> free
> > > to close this off if you want.
> > 
> > Thank you for testing this; and great to hear that this fixes things for
> > you, but this is not really a proper fix for all the other users impacted
> by
> > this.
> > 
> > A while ago the amd-pinctrl driver was modified to not touch the
> > glitch-filter settings on pins used for edge-triggered interrupts, because
> > touching those was breaking interrupts from the touchpad. I wonder if we
> > need to do something similar for the pull-up/down settings for GpioInt
> > resources.
> > 
> > I'm getting the feeling that the Windows AMD GPIO driver relies on the BIOS
> > having setup settings like these and it does not touch them, which means
> > that we (Linux) should be doing the same to avoid bugs like this one.
> > 
> > Austin, I can whip up a hackish kernel-patch which just disables all
> > re-programming of the pull-up/down bits in the amd-pinctrl driver. Would
> you
> > be able and willing to build a kernel with such a patch and test that as a
> > workaround instead of using the DSDT overlay?
> 
> Hi Hans,
> 
> There are three other amd-pinctrl related touchpad bugs that I haven't
> figured out a workaround or the root cause [1] [2] [3]. If you think they
> are related to this bug, you can Cc me the patch. I'll contact the bug
> reporters to test your patch.
> 
> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1887190 
> [2] https://forum.manjaro.org/t/random-short-touchpad-freezes/30832/6
> [3]
> https://forum.manjaro.org/t/elan-touchpad-working-in-live-but-not-in-native-
> os/31860/50

Btw, [2] has PullUP PinConfig and [3] has PullNone PinConfig. So maybe it's worth looking at [2].

Sorry [1] is inaccurate, I mean [comment #411] (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1887190/comments/411). I have sent an message to that comment's author to ask for the ACPI DSDT.
Comment 38 Hans de Goede 2021-05-30 11:31:20 UTC
Coiby, thank you for your efforts to track all these issues. Note that not all of these are necessary amd_pinctrl related, e.g. this one:
https://forum.manjaro.org/t/elan-touchpad-working-in-live-but-not-in-native-os/31860/50

Sounds like it is something different, as I've explained in a long comment on some other bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1868899#c5

There are multiple root-causes for touchpads not working on various Lenovo models.

Note I've not even listed this bug in the list of known root causes there, because here the touchpad does work initially and then stops working, rather then not being recognized at all. I suspect that:
https://forum.manjaro.org/t/elan-touchpad-working-in-live-but-not-in-native-os/31860/50

Is caused by root-cause 3. or 4. from the linked comment.
Comment 39 Hans de Goede 2021-05-30 11:42:18 UTC
Austin, I just noticed 2 things:

1. The amd-pinctrl driver has debugfs support which should be able to get us some info about the pull-up/down settings with and without the DSDT overlay.

2. I've noticed something fishy wrt pull-up settings in the amd-pinctrl driver and I suspect that what is happening here is that when the kernel tries to enable the pull-up it actually ends up disabling it. Setting the ACPI pull config to none results in the kernel not touching things and thus not messing up. More about this in a separate comment.

Can you please do the following:

cat /sys/kernel/debug/pinctrl/<something>/pins > pins-with-dsdt-overlay

And then (temporirily) disable the dsdt-overlay and do it again:

cat /sys/kernel/debug/pinctrl/<something>/pins > pins-without-dsdt-overlay

With the <something> in both cases likely being in the form of AMDI####, and then attach both files here.
Comment 40 Hans de Goede 2021-05-30 11:51:03 UTC
So as said I believe that the amd-pinctrl driver will actually end up disabling the pull-up when setting it. The reason is this bit of code:

amd_pinconf_set() {
...
                case PIN_CONFIG_BIAS_PULL_UP:
                        pin_reg &= ~BIT(PULL_UP_SEL_OFF);
                        pin_reg |= (arg & BIT(0)) << PULL_UP_SEL_OFF;
                        pin_reg &= ~BIT(PULL_UP_ENABLE_OFF);
                        pin_reg |= ((arg>>1) & BIT(0)) << PULL_UP_ENABLE_OFF;
                        break;

...
}

Notice there are 2 bits, one at PULL_UP_ENABLE_OFF and one at PULL_UP_SEL_OFF.

The PULL_UP_SEL_OFF bit selects if the pull-up is 4k or 8k *if enabled* and the PULL_UP_ENABLE_OFF bit enables/disables the pull-up.

But now take a good look at what the code does when the value of the arg variable is "1". In this case it sets the PULL_UP_SEL_OFF bit to 1 (8k) while leaving the PULL_UP_ENABLE_OFF bit at 0, so even though the arg is 1, the pull-up is being disabled.

I suspect that since the ACPI code ends up specifying 1 as arg and that that results in actually turning the pull-up off (while it was already enabled by the BIOS, which is what makes have a setting of none work).

I suspect this code needs to be changes to something like this:

                case PIN_CONFIG_BIAS_PULL_UP:
                        pin_reg &= ~(BIT(PULL_UP_ENABLE_OFF)|BIT(PULL_UP_SEL_OFF));
                        if (arg)
                                pin_reg |= BIT(PULL_UP_ENABLE_OFF);
                        if (arg > 1)
                                pin_reg |= BIT(PULL_UP_SEL_OFF);
                        break;

And the corresponding amd_pinconf_get() function needs to be updated accordingly.
Comment 41 Hans de Goede 2021-05-30 12:56:08 UTC
So I just checked and drivers/gpio/gpiolib.c: gpio_set_bias() does a set pin config with a pin_config_param of PIN_CONFIG_BIAS_PULL_UP and arg = 1 as I suspected.

So I'm pretty sure that that is what is breaking things in this case, and likely is causing a whole bunch of issues on AMD platforms.

I'll prepare a patch fixing this.
Comment 42 Hans de Goede 2021-05-30 13:41:23 UTC
>
> https://forum.manjaro.org/t/elan-touchpad-working-in-live-but-not-in-native-os/31860/50

About this issue, I just realized that I know what is the likely problem here. The issue is likely the amd-pinctrl is being build as a module, and if i2c-hid then loads before amd-pinctrl then things won't work. The trick is to have amd-pinctrl build into the kernel (and maybe also to have i2c-hid be a module).
Comment 43 Hans de Goede 2021-05-30 14:10:29 UTC
Created attachment 297045 [details]
[PATCH] pinctrl: amd: Fix handling of PIN_CONFIG_BIAS_PULL_UP/_DOWN settings

Here is a patch which I believe should fix this.

Austin if you can give this a try (without the DSDT override) that would be great. Note please first collect the 2 debugfs pinctrl files as mentioned before, I would still like to see those to confirm my theory.
Comment 44 Austin Kilgore 2021-05-30 17:05:26 UTC
So the pinctrl file lists all the gpio pins. And I checked just now and both with and without the dsdt overlay the same pins are listed in that file. Perhaps pinctrl would change when the problem occurs? I'm not sure.

But based on the files I uploaded previously, that is /proc/interrupts, even when the problem happened those pins were unchanged, namely these:

amd_gpio   90  WCOM51C7:00
amd_gpio   85  ELN4690:00
amd_gpio   86  BMA250E:00

They were consistently on those gpio pins and showed up in the /proc/interrupts both when they functioned and when they stopped working.

But if you want me to, I can see what the pinctrl does when they stop working.

I'll be sure to also test your kernel patch and let you know how that goes.
Comment 45 Austin Kilgore 2021-05-30 17:06:12 UTC
Created attachment 297049 [details]
pins with dsdt overlay
Comment 46 Austin Kilgore 2021-05-30 17:07:03 UTC
Created attachment 297051 [details]
pins without dsdt overlay (identical)
Comment 47 Hans de Goede 2021-05-30 17:31:50 UTC
(In reply to Austin Kilgore from comment #46)
> pins without dsdt overlay (identical)

My bad, I was expecting the debug info from the amd_gpio_dbg_show function to show up there, because that is where most of the interesting info is with the Intel pinctrl drivers. But I see now that the info which I'm looking for is actually present in /sys/kernel/debug/gpio. So if you can capture the contents of that file both with and without the DSDT overlay then that will hopefully give us some more interesting results.  Although I'm afraid that GPIOs used as interrupts don't show up there, still worth a try though.

Even without the debug info which I was hoping for I'm still pretty confident that I've found the issue. So I hope that you will have some good news when you report back from testing the kernel patch.
Comment 48 Austin Kilgore 2021-05-30 17:58:51 UTC
So this is weird, both of these files are also the same.

The way I have been switching between the dsdt overlay was by just creating a second kernel which I removed the dsdt overlay file from.

So I have 5.12.8-gentoo (no dsdt) and 5.12.8-gentoo.old (with dsdt overlay).

To be absolutely sure I was successful I also grepped the configs in /boot to make sure they had been changed correctly, as well as checking the currently system's config at /proc/config.gz. So I've been through.
Comment 49 Austin Kilgore 2021-05-30 17:59:38 UTC
Created attachment 297053 [details]
gpio with dsdt overlay
Comment 50 Austin Kilgore 2021-05-30 18:00:15 UTC
Created attachment 297055 [details]
gpio without dsdt overlay (identical)
Comment 51 Hans de Goede 2021-05-30 19:59:01 UTC
Did you perhaps also put the DSDT in the initrd in an earlier attempt?

What does "dmesg | grep DSDT" say with both kernels ? there is a clear msg in dmesg about the DSDT being overridden.

This is weird, I wonder what the contents of /sys/kernel/debug/gpio is going to look like once you are done building a kernel with my patch.
Comment 52 Austin Kilgore 2021-05-30 21:32:31 UTC
(In reply to Hans de Goede from comment #51)
> Did you perhaps also put the DSDT in the initrd in an earlier attempt?
> 
> What does "dmesg | grep DSDT" say with both kernels ? there is a clear msg
> in dmesg about the DSDT being overridden.
> 
> This is weird, I wonder what the contents of /sys/kernel/debug/gpio is going
> to look like once you are done building a kernel with my patch.

No sir, my only method of loading the dsdt file was using this feature in the kernel config: "Custom DSDT Table file to include" and I put the path to the dsdt.hex there.

And I checked the kernels and only the one with the overlay had this dmesg message "[DSDT-CB-01   ], this is unsafe: tainting kernel".

I've got the kernel with your patch built and am testing it out to see if the problem is fixed still.

I'll include the gpio file from it since I compared hashes with the previous 2 and it said it was different, I didn't look super close on its contents but there may be something important there.
Comment 53 Austin Kilgore 2021-05-30 21:33:02 UTC
Created attachment 297059 [details]
gpio with patch
Comment 54 Hans de Goede 2021-05-31 09:51:02 UTC
(In reply to Austin Kilgore from comment #53)
> Created attachment 297059 [details]
> gpio with patch

Thank you. The only difference is one unrelated pin reading as high instead of low, which is just a coincidence.

This is weird. I know you are being pretty thorough with all your tests, still I have to ask, did you perhaps still have the DSDT overlay active while testing with the patch ?

I'm going to attach another patch here, can you add that on top of the previous patch and then after booting a kernel with both patches, collect dmesg and attach it here please?
Comment 55 Hans de Goede 2021-05-31 09:51:35 UTC
Created attachment 297067 [details]
[PATCH] GPIOLIB/AMD-pinctrl: add some debugging
Comment 56 Hans de Goede 2021-05-31 10:01:48 UTC
Ok, never mind, I just found that the path in pinctrl-amd.c which hooks set_config calls from the GPIO subsystem has this:

static int amd_gpio_set_config(struct gpio_chip *gc, unsigned offset,
                               unsigned long config)
{
        u32 debounce;

        if (pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE)
                return -ENOTSUPP;

        debounce = pinconf_to_config_argument(config);
        return amd_gpio_set_debounce(gc, offset, debounce);
}

So pull-up/down requests coming from the GPIO subsystem are ignored, only when they are made through pinctrl API calls will they be honored.

So no need to do another kernel build with the patch adding debugging, because this whole line of thought is going nowhere.

This does beg the question though, why does changing the pull setting in the DSDT help at all ?

Can you provide "diff -u" output between your unpatched DSDT and the patched one?
Comment 57 Austin Kilgore 2021-05-31 19:21:54 UTC
(In reply to Hans de Goede from comment #55)
> Created attachment 297067 [details]
> [PATCH] GPIOLIB/AMD-pinctrl: add some debugging

I did want to go ahead and mention that I didn't have the dsdt overlay on the patched kernel when I tested it.

I do find it odd that there hasn't been a clear indication as to what's being changed in the kernel after editing those two lines in my dsdt for PullUp and switching them to PullNone.

My understanding (which is pretty limited) is that PullUp/PullNone directly affects the state on those gpio pins that the three i2c devices are connected on. And their state is being triggered off by some unknown third party whenever they are set to PullUp.
But while set to PullNone they remain unaffected by that third party.

What that third party is still a mystery to me. I initially guessed it was time-based, and that left on long enough something would cause the pins to be turned off. This is definitely a tough problem to pin down.

I tried diffing the two dsdt.dsl files but it just said "Binary files dsdt-pullup.dat and dsdt-pullnone.dat differ" I'll upload them so you can take a look if you want.
Comment 58 Austin Kilgore 2021-05-31 19:22:51 UTC
Created attachment 297083 [details]
dsdt files
Comment 59 Andy Shevchenko 2021-06-04 17:28:43 UTC
(In reply to Hans de Goede from comment #56)

...

> This does beg the question though, why does changing the pull setting in the
> DSDT help at all ?

Pull bias settings affects the initial pin state of the GPIO when it's output.
It's explained here:
https://www.kernel.org/doc/html/latest/firmware-guide/acpi/gpio-properties.html#dsd-device-properties-related-to-gpio
Comment 60 Hans de Goede 2021-06-04 17:43:31 UTC
(In reply to Andy Shevchenko from comment #59)
> (In reply to Hans de Goede from comment #56)
> 
> ...
> 
> > This does beg the question though, why does changing the pull setting in
> the
> > DSDT help at all ?
> 
> Pull bias settings affects the initial pin state of the GPIO when it's
> output.

I know that, but we are talking about GpioInt resources here, not outputs.
Comment 61 Hans de Goede 2021-06-04 17:47:13 UTC
(In reply to Austin Kilgore from comment #58)
> Created attachment 297083 [details]
> dsdt files

I meant do a diff -u between the source (dsl) files. With that said, I've disassembled the 2 DSDTs and a diff -u on the resulting dsl-s looks weird:

--- original-pullup-dsdt.dsl	2021-06-04 19:41:14.575128707 +0200
+++ modified-pullnone-dsdt.dsl	2021-06-04 19:41:03.274066000 +0200
@@ -5,13 +5,13 @@
  * 
  * Disassembling to symbolic ASL+ operators
  *
- * Disassembly of original-pullup-dsdt.dat, Fri Jun  4 19:41:14 2021
+ * Disassembly of modified-pullnone-dsdt.dat, Fri Jun  4 19:41:03 2021
  *
  * Original Table Header:
  *     Signature        "DSDT"
- *     Length           0x00009E20 (40480)
+ *     Length           0x00009E26 (40486)
  *     Revision         0x01 **** 32-bit table (V1), no 64-bit math support
- *     Checksum         0xE7
+ *     Checksum         0x84
  *     OEM ID           "LENOVO"
  *     OEM Table ID     "CB-01   "
  *     OEM Revision     0x00000001 (1)
@@ -5911,8 +5911,7 @@
             UT2E,   1, 
                 ,   1, 
             EMMD,   2, 
-            SDMC,   1, 
-                ,   2, 
+                ,   3, 
             XHCE,   1, 
                 ,   1, 
                 ,   1, 
@@ -7064,6 +7063,12 @@
                 0xFEDD5000,         // Address Base
                 0x00001000,         // Address Length
                 )
+            GpioInt (Edge, ActiveLow, SharedAndWake, PullUp, 0x0BB8,
+                "\\_SB.GPIO", 0x00, ResourceConsumer, ,
+                )
+                {   // Pin list
+                    0x0044
+                }
         })
         Name (AHID, "AMDI0040")
         Name (ACID, "AMDI0040")
@@ -7192,11 +7197,6 @@
                 {
                     If ((EMMD == One))
                     {
-                        If ((SDMC == One))
-                        {
-                            Return (Zero)
-                        }
-
                         Return (One)
                     }
 
@@ -10390,14 +10390,9 @@
                 Local0 |= 0x0200
                 If (HOTM)
                 {
-                    Local0 &= 0x0BFF
-                }
-                Else
-                {
                     Local0 |= 0x0400
                 }
 
-                Local0 |= 0x0800
                 Local0 &= 0xEFFF
                 Local0 |= 0x4000
                 If (UCBM)
@@ -10442,15 +10437,15 @@
                 If ((Arg0 == 0x0D)) {}
                 If ((Arg0 == 0x0E))
                 {
-                    HOTM = Zero
-                    WXMS (0x71, 0x20)
+                    HOTM = One
+                    WXMS (0x71, 0x10)
                     WXMS (0x70, 0x55)
                 }
 
                 If ((Arg0 == 0x0F))
                 {
-                    HOTM = One
-                    WXMS (0x71, 0x10)
+                    HOTM = Zero
+                    WXMS (0x71, 0x20)
                     WXMS (0x70, 0x55)
                 }
 

This does not seem to include the PullUp -> None changes and this does include a bunch of other changes. Also the revision of the modified is not bumped up, so it won't work as an override; instead I guess it might end up getting installed as a second DSDT (which will be ignored AFAIK).

Austin, can you provide the output of dmesg after booting the kernel with the DSDT overlay, I would like to check the exact messages related to the DSDT overlay being printend.

Also can you attach 2 acpidumps, 1 with a standard kernel and 1 while running the kernel with the overlay?
Comment 62 Andy Shevchenko 2021-06-04 18:00:55 UTC
(In reply to Hans de Goede from comment #60)
> (In reply to Andy Shevchenko from comment #59)
> > (In reply to Hans de Goede from comment #56)

...

> > > This does beg the question though, why does changing the pull setting in
> > the
> > > DSDT help at all ?
> > 
> > Pull bias settings affects the initial pin state of the GPIO when it's
> > output.
> 
> I know that, but we are talking about GpioInt resources here, not outputs.

Ah, okay. In this case it is only related to hardware behaviour then when GPIO subsystem tries to setup that (but IIUC it should be not the case according to the code in one of the above comments).
Comment 63 Austin Kilgore 2021-06-05 11:52:15 UTC
Created attachment 297175 [details]
acpidumps and dmesg logs
Comment 64 Austin Kilgore 2021-06-05 11:57:43 UTC
I made sure to get those dumps and logs from my pre-patch kernels (dtsdt overlay and the non-dsdt overlay kernel).
Comment 65 Hans de Goede 2021-06-21 14:41:08 UTC
(In reply to Austin Kilgore from comment #64)
> I made sure to get those dumps and logs from my pre-patch kernels (dtsdt
> overlay and the non-dsdt overlay kernel).

Sorry for being so slow to respond.

The 2 acpidumps from the acpidumps-and-dmesg-logs.tar.gz are the same, this seems to have to do with the way you are overriding the DSDT, when using the initrd method the modified DSDT shows up in the acpidump.

FWIW the dsdt.dat from the acpidump is identical to the original-pullup-dsdt.dat from dsdt-files.tar.gz.

So instead I've compared the modified-pullnone-dsdt.dat from dsdt-files.tar.gz with the dsdt.dat from the acpidump, yielding the same results as when I compared the 2 .dat files from dsdt-files.tar.gz (which is expected since dsdt.dat is identical to original-pullup-dsdt.dat), see comment 61.

The reason why I asked for the acpidumps is because the DSDT diff from comment 61 looks suspect, there are some changes not related to pull-up settings at all and there are no actual pull-up changes at all.

So I've checked the length in the dmesg, dmesg-without-dsdt.txt says:

ACPI: DSDT 0x00000000AF7E5000 009E20 (v01 LENOVO CB-01    00000001 ACPI 00040000)

Notice the 009E20 this match with the length of the unmodified DSDT from the acpidump / with original-pullup-dsdt.dat.

But dmesg-dsdt.txt has:
ACPI: DSDT 0xFFFFFFFFB3FB4100 009E77 (v01 LENOVO CB-01    00000001 INTL 20200717)

With a length of 009E77, but modified-pullnone-dsdt.dat has a length of 009E26, so modified-pullnone-dsdt.dat is not the DSDT which you are actually using as override.

Austin, can you please attach the DSDT which you are actually using as override here ?

###

Also given that we have already concluded that the pinctrl-amd.c ignores the pull-up setting I wonder if things are fixed by some other unrelated changed which happened around the same time. Can you try running an identical kernel as the one which you are using with the override (minus the override) and see if the problem reproduces in that case?
Comment 66 Austin Kilgore 2021-06-24 20:01:40 UTC
I have a directory in / that contains the overlay, its full path is /dsdt/dsdt.hex.

/dsdt/dsdt.hex is what I entered in the kernel config "Custom DSDT Table file to include" section.

I'll upload that file for you. I find it odd that the other ones are coming up as identical after all I did to double-check my work. I'll definitely look back at those and see what I did wrong.

As per your last request I have just built a 5.12.13 kernel using my regular config but without the patch or the dsdt overlay to confirm the problem still exists.
Comment 67 Austin Kilgore 2021-06-24 20:03:43 UTC
Created attachment 297595 [details]
dsdt.hex used in kernel as override
Comment 68 Austin Kilgore 2021-07-11 02:16:01 UTC
So far it looks like the newer kernels have been fixed.

I'll try some older ones to see which version changed it.
Comment 69 Nehal Shah 2021-07-11 02:16:15 UTC
Created attachment 297787 [details]
attachment-19114-0.html


Hi

I am out of office .

I will have no access /limited access to mail.For emergency please contact YA Naveen.


Thanks and Regards
Nehal
Comment 70 Austin Kilgore 2021-07-13 23:11:40 UTC
Okay so I just had the problem occur on 5.13.1 same config and everything.

Either it was never corrected or it reverted somewhere in the code recently.

It's been hard for me to test it out in detail because I've got a full-time job now.

I think I'll just go back to patching the kernel when I build it since that fixes it for me.

Note You need to log in before you can comment on or make changes to this bug.