Bug 216426

Summary: USB-C port is incorrectly reporting that it's powered when Dell XPS 15-9500 is unplugged
Product: Platform Specific/Hardware Reporter: Mattia Orlandi (mattia.orlandi21)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: NEW ---    
Severity: normal CC: answer2019, b.terrier, benoitg, bugzilla, grzegorz.alibozek, heikki.krogerus, noah, postix, pugonfireyt
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.19.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output
dmesg output
acpidump after plugging
acpidump after blacklisting ucsi driver
dmesg output after blacklisting ucsi driver

Description Mattia Orlandi 2022-08-29 17:04:21 UTC
System
------
- Dell XPS 15 9500
- 5.19.4 kernel

Problem description
-------------------
Whenever I plug and then unplug my laptop from AC power using the USB-C port, the system thinks it is still plugged in (i.e., the KDE applet reports "Plugged in but still discharging").
If I check in Dell's BIOS, it correctly reports when the power supply is plugged/unplugged; `acpi -V` also correctly shows `Adapter 0: off-line`.

On the other hand, `upower -d` incorrectly reports `/org/freedesktop/UPower/devices/line_power_ucsi_source_psy_USBC000o002` as `online: yes`.
Moreover, `journalctl` reports `ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-110)`.

I'm testing the LTS kernel (5.15.63) and the issue does not occur, so I assume it's a regression bug, possibly introduced in kernel 5.18 (I tried downgrading the kernel to version 5.18.16 and the issue was already present).
Comment 1 Bastien Nocera 2022-09-02 11:16:51 UTC
Duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=210425
Comment 2 Benjamin Terrier 2022-09-02 11:35:21 UTC
Is it really a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=210425

On the XPS 15 9500, the issue occurs even when the system is not suspended.
Comment 3 Mattia Orlandi 2022-09-02 12:57:43 UTC
I'm quite sure it's not a duplicate of that bug: first of all, the problem on the XPS 9500 occurs right after booting the OS, regardless of suspension. Moreover, from what I've read, the bug you linked affects kernel 5.15, whereas that kernel works fine with me.
Comment 4 Heikki Krogerus 2022-09-05 08:10:04 UTC
Bug 210425 seems to be a symptom of this one, at least in some cases.

Can somebody test does it help if we increase the command completion timeout value:

diff --git a/drivers/usb/typec/ucsi/ucsi_acpi.c b/drivers/usb/typec/ucsi/ucsi_acpi.c
index 8873c1644a295..804b45249c46b 100644
--- a/drivers/usb/typec/ucsi/ucsi_acpi.c
+++ b/drivers/usb/typec/ucsi/ucsi_acpi.c
@@ -78,7 +78,7 @@ static int ucsi_acpi_sync_write(struct ucsi *ucsi, unsigned int offset,
        if (ret)
                goto out_clear_bit;
 
-       if (!wait_for_completion_timeout(&ua->complete, HZ))
+       if (!wait_for_completion_timeout(&ua->complete, 5 * HZ))
                ret = -ETIMEDOUT;
 
 out_clear_bit:
Comment 5 Grzegorz Alibożek 2022-09-05 08:24:18 UTC
I'll try to check today.
Comment 6 Grzegorz Alibożek 2022-09-05 14:06:13 UTC
Currently, the message only appears on the first undocking
Comment 7 Grzegorz Alibożek 2022-09-05 14:36:00 UTC
on the first undocking or docking. but it looks like the charging status is updating in upowerd
Comment 8 Mattia Orlandi 2022-09-06 15:54:35 UTC
I've just updated the LTS kernel from 5.15.64 to 5.15.65 and the problem started occurring.
Comment 9 Mattia Orlandi 2022-09-15 09:37:59 UTC
(In reply to Heikki Krogerus from comment #4)
> Bug 210425 seems to be a symptom of this one, at least in some cases.
> 
> Can somebody test does it help if we increase the command completion timeout
> value:
> 
> diff --git a/drivers/usb/typec/ucsi/ucsi_acpi.c
> b/drivers/usb/typec/ucsi/ucsi_acpi.c
> index 8873c1644a295..804b45249c46b 100644
> --- a/drivers/usb/typec/ucsi/ucsi_acpi.c
> +++ b/drivers/usb/typec/ucsi/ucsi_acpi.c
> @@ -78,7 +78,7 @@ static int ucsi_acpi_sync_write(struct ucsi *ucsi,
> unsigned int offset,
>         if (ret)
>                 goto out_clear_bit;
>  
> -       if (!wait_for_completion_timeout(&ua->complete, HZ))
> +       if (!wait_for_completion_timeout(&ua->complete, 5 * HZ))
>                 ret = -ETIMEDOUT;
>  
>  out_clear_bit:

I tried this patch on the latest kernel (5.19.8) but the issue is still present.
Comment 10 Heikki Krogerus 2022-09-15 13:19:24 UTC
(In reply to Mattia Orlandi from comment #9)
> I tried this patch on the latest kernel (5.19.8) but the issue is still
> present.

Thanks. This looks like a firmware issue to me. Do you guys have the latest XPS 9500 BIOS?
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=7nn1r&oscode=wt64a&productcode=xps-15-9500-laptop

If you don't have the latest BIOS, then please upgrade that, but if you do have it, then can you test if downgrading it helps?
Comment 11 Mattia Orlandi 2022-09-15 15:55:27 UTC
(In reply to Heikki Krogerus from comment #10)
> (In reply to Mattia Orlandi from comment #9)
> > I tried this patch on the latest kernel (5.19.8) but the issue is still
> > present.
> 
> Thanks. This looks like a firmware issue to me. Do you guys have the latest
> XPS 9500 BIOS?
> https://www.dell.com/support/home/en-us/drivers/
> driversdetails?driverid=7nn1r&oscode=wt64a&productcode=xps-15-9500-laptop
> 
> If you don't have the latest BIOS, then please upgrade that, but if you do
> have it, then can you test if downgrading it helps?

I installed the newest BIOS (1.18.0, previously I was on 1.17.0) but the issue persists. I don't think it's a firmware issue because the problem does not occur in neither Windows partition, BIOS UI or Linux 5.15.64.

I have noticed a few things:
- in 5.15.64 (the one which works as expected), the system takes about half a second to "realize" it has been connected to AC, and it takes the same time to "realize" it has been disconnected from AC;
- in 5.19.8 (the one affected by the issue), the system "realizes" almost instantaneously it has been connected to AC, but then it never "realizes" it has been disconnected from AC;
- in both cases, when I plug/unplug the charger `dmesg` shows entries like these:
    ACPI Error: Thread X cannot release Mutex [ECMX] acquired by thread Y (20210730/exmutex-378)
    ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20220331/psparse-529)
  although in 5.19.8 (the one affected by the issue) there are much more of them.
Comment 12 Benjamin Terrier 2022-09-16 06:00:41 UTC
(In reply to Heikki Krogerus from comment #10)
> (In reply to Mattia Orlandi from comment #9)
> > I tried this patch on the latest kernel (5.19.8) but the issue is still
> > present.
> 
> Thanks. This looks like a firmware issue to me. Do you guys have the latest
> XPS 9500 BIOS?
> https://www.dell.com/support/home/en-us/drivers/
> driversdetails?driverid=7nn1r&oscode=wt64a&productcode=xps-15-9500-laptop
> 
> If you don't have the latest BIOS, then please upgrade that, but if you do
> have it, then can you test if downgrading it helps?

I've tried all firmware update from DELL as soon as they are available. I also tried older versions down to v1.11.0 (DELL forbids downgrading to v1.10 or earlier). It has no effects.

The only thing that seems to have an effect is changing the kernel version.
For instance I have the issue with 5.18.19, but I do not have it with 5.18.17 (or at least, it is harder to reproduce).
Comment 13 Heikki Krogerus 2022-09-16 07:05:20 UTC
Okay, I stand corrected. This is a regression in kernel. Can somebody bisect the problem so we know exactly which commit introduced the regression?

I can not reproduce this with the systems that I have unfortunately.
Comment 14 Mattia Orlandi 2022-09-20 14:31:26 UTC
(In reply to Heikki Krogerus from comment #13)
> Okay, I stand corrected. This is a regression in kernel. Can somebody bisect
> the problem so we know exactly which commit introduced the regression?
> 
> I can not reproduce this with the systems that I have unfortunately.

It must be between 5.18.16 and 5.19. Unfortunately arch does not provide packages for 5.18.{17, 18, 19}, so I will have to compile them to verify when the problem starts occurring (probably in 5.18.18 or 5.18.19, as Benjamin pointed out). I will give you more information as soon as possible.
Comment 15 Benjamin Terrier 2022-09-21 06:32:24 UTC
(In reply to Mattia Orlandi from comment #14)
> (In reply to Heikki Krogerus from comment #13)
> > Okay, I stand corrected. This is a regression in kernel. Can somebody
> bisect
> > the problem so we know exactly which commit introduced the regression?
> > 
> > I can not reproduce this with the systems that I have unfortunately.
> 
> It must be between 5.18.16 and 5.19. Unfortunately arch does not provide
> packages for 5.18.{17, 18, 19}, so I will have to compile them to verify
> when the problem starts occurring (probably in 5.18.18 or 5.18.19, as
> Benjamin pointed out). I will give you more information as soon as possible.

I have rebuilt 5.18.17 and 5.18.18 from source.


- 5.18.17 seems fine
- 5.18.18 has the issue
Comment 16 Heikki Krogerus 2022-09-21 07:45:51 UTC
(In reply to Benjamin Terrier from comment #15)
> I have rebuilt 5.18.17 and 5.18.18 from source.
> 
> - 5.18.17 seems fine
> - 5.18.18 has the issue

Thanks. Binary search (git-bisect) would give us the exact commit, but I wonder could it be this one:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/acpi/ec.c?h=v5.18.18&id=0fbb5ce2f426753c94b74d134de4b71402d7fb93

Can you test does reverting it help?
Comment 17 Mattia Orlandi 2022-09-21 10:59:26 UTC
(In reply to Heikki Krogerus from comment #16)
> (In reply to Benjamin Terrier from comment #15)
> > I have rebuilt 5.18.17 and 5.18.18 from source.
> > 
> > - 5.18.17 seems fine
> > - 5.18.18 has the issue
> 
> Thanks. Binary search (git-bisect) would give us the exact commit, but I
> wonder could it be this one:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> drivers/acpi/ec.c?h=v5.18.18&id=0fbb5ce2f426753c94b74d134de4b71402d7fb93
> 
> Can you test does reverting it help?

I have just tested this and I can confirm that it seems to work now. Benjamin, can you double-check just to be sure this is the exact commit?
Comment 18 Benjamin Terrier 2022-09-21 20:39:05 UTC
I've done some tests and it seems it is a bit more complicated.

I've built a kernel at commit 0fbb5ce and a kernel at the commit just before,
and indeed this commit seems to introduce the bug.

By bug here I mean: "booting with the laptop connected on AC, it does not detect future connect/disconnect of AC"

However, I have noticed that for 5.18.17, 5.18.18, 0fbb5ce and 0fbb5ce~1 there is another variant of this bug:
"booting with the laptop not connected on AC, it will detect the first time it is plugged in, but will fail to detect further connect/disconnect".

Variant B does impact all 5.18 I have tested, but did not impact any 5.15 I have tested. In particular I just tested 5.15.64 and it suffers from variant A, but not from variant B.
Comment 19 Heikki Krogerus 2022-09-22 08:34:35 UTC
That is another issue, not related to this bug. I'm pretty sure that issue is caused by commit 512df95b9432, however, the problem is a bit more complicated.

In any case, can you create a separate bug for that (please don't forget to CC me)?
Comment 20 Heikki Krogerus 2022-09-22 11:18:48 UTC
I don't understand how upstream commit f7090e0ef360 ("ACPI: EC: Drop the EC_FLAGS_IGNORE_DSDT_GPE quirk") could cause this? That quirk was never used on your board, or any other Dell system. I still want to debug this a bit more.

Could you attach acpidump and complete dmesg output after reproducing the problem - ideally soon after bootup?

    acpidump -o acpi.dump
Comment 21 Benjamin Terrier 2022-09-24 07:55:20 UTC
Created attachment 301859 [details]
dmesg output

It's getting weirder... I was doing more tests and checking dmesg, today I could consistently reproduce the bug with the kernel built from the commit ed733f9, i.e. the one just before f7090e0ef360.

Anyway, when it happens there are ACPI errors in dmesg, see attached file.
Comment 22 Benjamin Terrier 2022-09-24 08:21:03 UTC
After some more testing I have observed this

- Shutdown/Boot 5.18.17~ed733f9 > Bug 
- Shutdown/Boot 5.15.0-48 > No bug
- Reboot 5.18.17~ed733f9 > No Bug
- Shutdown/Boot 5.18.18 > No bug
- Reboot 5.18.18 >  Bug

So now it seems not related to a linux version...

Also the only difference in dmesg when the bug happens is the presence of the following line when disconnecting the power is not detected:

ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-110)

All the other errors are present on all Linux versions whether the bug happens or not.
Comment 23 Heikki Krogerus 2022-09-27 09:00:46 UTC
Well, that would make this more likely to be BIOS problem after all.
Can you please get the acpidump?

    acpidump -o xps_acpi.dump

It would also be useful to see the entire dmesg output.
Comment 24 Mattia Orlandi 2022-09-28 12:54:36 UTC
Created attachment 301883 [details]
dmesg output
Comment 25 Mattia Orlandi 2022-09-28 12:55:09 UTC
Created attachment 301884 [details]
acpidump after plugging
Comment 26 Mattia Orlandi 2022-09-28 12:56:38 UTC
(In reply to Heikki Krogerus from comment #23)
> Well, that would make this more likely to be BIOS problem after all.
> Can you please get the acpidump?
> 
>     acpidump -o xps_acpi.dump
> 
> It would also be useful to see the entire dmesg output.

I attached the output of dmesg and acpidump after plugging the power cable, using the latest kernel (5.19.11).
Comment 27 Heikki Krogerus 2022-10-03 08:29:31 UTC
Thank you for the logs.

I took a look at that ECMX mutex in the ACPI tables that causes the error you can see in the dmesg (Thread X cannot release Mutex [ECMX] acquired by thread Y). This is the ASL that causes it:

            Method (_Q66, 0, NotSerialized)
            {
                Acquire (ECMX, 0x0064)
                If ((ECRD == One))
                {
                    NEVT ()
                }

                Release (ECMX)
                Return (Zero)
            }

I'm not sure is that error related to this problem. It's an old issue. Rafael (ACPI maintainter in kernel) explained that problem already years ago here:
https://bugzilla.kernel.org/show_bug.cgi?id=196415#c7

Can someone try to disable the UCSI driver in kernel (or blacklist it), and check do those ACPI Errors still appear?
Comment 28 Mattia Orlandi 2022-10-03 17:47:57 UTC
(In reply to Heikki Krogerus from comment #27)
> Thank you for the logs.
> 
> I took a look at that ECMX mutex in the ACPI tables that causes the error
> you can see in the dmesg (Thread X cannot release Mutex [ECMX] acquired by
> thread Y). This is the ASL that causes it:
> 
>             Method (_Q66, 0, NotSerialized)
>             {
>                 Acquire (ECMX, 0x0064)
>                 If ((ECRD == One))
>                 {
>                     NEVT ()
>                 }
> 
>                 Release (ECMX)
>                 Return (Zero)
>             }
> 
> I'm not sure is that error related to this problem. It's an old issue.
> Rafael (ACPI maintainter in kernel) explained that problem already years ago
> here:
> https://bugzilla.kernel.org/show_bug.cgi?id=196415#c7
> 
> Can someone try to disable the UCSI driver in kernel (or blacklist it), and
> check do those ACPI Errors still appear?

I tried blacklisting the `ucsi_acpi` module (I passed the "module_blacklist=ucsi_acpi" parameter to the kernel) and the problem seems to be solved! I tested both "Boot unplugged -> Plug -> Unplug" and "Boot plugged -> Unplug".

I'll re-attach dmesg and acpidump outputs. The ACPI errors "Thread X cannot release Mutex [ECMX] acquired by thread Y" are still present, but the "ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-110)" error seems gone.
Comment 29 Mattia Orlandi 2022-10-03 17:48:44 UTC
Created attachment 302929 [details]
acpidump after blacklisting ucsi driver
Comment 30 Mattia Orlandi 2022-10-03 17:49:04 UTC
Created attachment 302930 [details]
dmesg output after blacklisting ucsi driver
Comment 31 Heikki Krogerus 2022-10-07 10:33:09 UTC
(In reply to Mattia Orlandi from comment #28)
> I tried blacklisting the `ucsi_acpi` module (I passed the
> "module_blacklist=ucsi_acpi" parameter to the kernel) and the problem seems
> to be solved! I tested both "Boot unplugged -> Plug -> Unplug" and "Boot
> plugged -> Unplug".
> 
> I'll re-attach dmesg and acpidump outputs. The ACPI errors "Thread X cannot
> release Mutex [ECMX] acquired by thread Y" are still present, but the
> "ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS
> failed (-110)" error seems gone.

Thank you for confirming that. That ECMX mutex warning is unrelated.

I'll try to get one of those XPS 15 9500 systems. I really need to be able to reproduce this one.
Comment 32 Benoit Grégoire 2022-10-11 15:44:54 UTC
If helpful for hardware availability, my Dell XPS 13 9310 exhibits the same problem (Firmware 3.10.0, mainline kernel 6.0.0-060000-generic).  Blacklisting usci_acpi also works around it successfully for me.
Comment 33 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-10-26 12:25:38 UTC
Anyone still working on this? Reminder, this ideally should be fixed by now, as explained in the kernel docs (see "Prioritize work on fixing regressions" in https://www.kernel.org/doc/html/latest/process/handling-regressions.html )
Comment 34 Heikki Krogerus 2022-10-27 07:27:05 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #33)
> Anyone still working on this? Reminder, this ideally should be fixed by now,
> as explained in the kernel docs (see "Prioritize work on fixing regressions"
> in https://www.kernel.org/doc/html/latest/process/handling-regressions.html )

This bug is not a regression.

I'm still trying to reproduce the problem. I don't have XPS 15 9500. I did get access to an older XPS 9730, but with that the problem does not happen.

There is still a good change that the problem is caused by the firmware. Ideally Dell could take a look at this, but unfortunately I don't have contacts to Dell anymore. Nevertheless, rest assured, I'm working on this.
Comment 35 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-10-27 07:32:02 UTC
(In reply to Heikki Krogerus from comment #34)
>
> This bug is not a regression.

Could you quickly describe why not? Then I'll drop it from the list of tracked regressions. The initial report sounded a lot like a regression to me (``` I'm testing the LTS kernel (5.15.63) and the issue does not occur, so I assume it's a regression bug, possibly introduced in kernel 5.18```
Comment 36 Heikki Krogerus 2022-10-27 10:08:04 UTC
Based on comment 22, the problem can't be tied to any specific kernel version.
Comment 37 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-10-27 10:38:57 UTC
(In reply to Heikki Krogerus from comment #36)
> Based on comment 22, the problem can't be tied to any specific kernel
> version.

Many thx, that helped, with all the regression I track I can't watch each and everyone closely…
Comment 38 Konstantin 2023-07-09 21:18:36 UTC
XPS 15 9520 the same:
If boot with plugged power supply - take error 

USBC000:00: UCSI_GET_PDOS failed (-5) 
and detection plug|unplug isn't working.
If boot with unplugged power supply - detection plug|unplug is working.
Blacklisting usci_acpi - detection plug|unplug is working.
Bios Latest avalible on Dell site.
Kernel 6.0.x-6.4.2
Comment 39 Waldo 2024-03-30 03:36:21 UTC
When the bug is present, the "online" attribute of one of the USB-C ports in /sys/class/power_supply is stuck on 1, whether plugged in or not. `rmmod ucsi_acpi` makes these entries disappear (so only AC and BAT0 remain) and makes the bug disappear. `modprobe ucsi_acpi` doesn't bring the bug back, and "online" is 0 for both ports whether plugged in or not.

In summary, ucsi_acpi fails to modify "online" when plugging or unplugging the charger.

This is on a Dell Latitude 5440, so it's not just the XPS laptops that exhibit this issue.