System ------ - Dell XPS 15 9500 - 5.19.4 kernel Problem description ------------------- Whenever I plug and then unplug my laptop from AC power using the USB-C port, the system thinks it is still plugged in (i.e., the KDE applet reports "Plugged in but still discharging"). If I check in Dell's BIOS, it correctly reports when the power supply is plugged/unplugged; `acpi -V` also correctly shows `Adapter 0: off-line`. On the other hand, `upower -d` incorrectly reports `/org/freedesktop/UPower/devices/line_power_ucsi_source_psy_USBC000o002` as `online: yes`. Moreover, `journalctl` reports `ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-110)`. I'm testing the LTS kernel (5.15.63) and the issue does not occur, so I assume it's a regression bug, possibly introduced in kernel 5.18 (I tried downgrading the kernel to version 5.18.16 and the issue was already present).
Duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=210425
Is it really a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=210425 On the XPS 15 9500, the issue occurs even when the system is not suspended.
I'm quite sure it's not a duplicate of that bug: first of all, the problem on the XPS 9500 occurs right after booting the OS, regardless of suspension. Moreover, from what I've read, the bug you linked affects kernel 5.15, whereas that kernel works fine with me.
Bug 210425 seems to be a symptom of this one, at least in some cases. Can somebody test does it help if we increase the command completion timeout value: diff --git a/drivers/usb/typec/ucsi/ucsi_acpi.c b/drivers/usb/typec/ucsi/ucsi_acpi.c index 8873c1644a295..804b45249c46b 100644 --- a/drivers/usb/typec/ucsi/ucsi_acpi.c +++ b/drivers/usb/typec/ucsi/ucsi_acpi.c @@ -78,7 +78,7 @@ static int ucsi_acpi_sync_write(struct ucsi *ucsi, unsigned int offset, if (ret) goto out_clear_bit; - if (!wait_for_completion_timeout(&ua->complete, HZ)) + if (!wait_for_completion_timeout(&ua->complete, 5 * HZ)) ret = -ETIMEDOUT; out_clear_bit:
I'll try to check today.
Currently, the message only appears on the first undocking
on the first undocking or docking. but it looks like the charging status is updating in upowerd
I've just updated the LTS kernel from 5.15.64 to 5.15.65 and the problem started occurring.
(In reply to Heikki Krogerus from comment #4) > Bug 210425 seems to be a symptom of this one, at least in some cases. > > Can somebody test does it help if we increase the command completion timeout > value: > > diff --git a/drivers/usb/typec/ucsi/ucsi_acpi.c > b/drivers/usb/typec/ucsi/ucsi_acpi.c > index 8873c1644a295..804b45249c46b 100644 > --- a/drivers/usb/typec/ucsi/ucsi_acpi.c > +++ b/drivers/usb/typec/ucsi/ucsi_acpi.c > @@ -78,7 +78,7 @@ static int ucsi_acpi_sync_write(struct ucsi *ucsi, > unsigned int offset, > if (ret) > goto out_clear_bit; > > - if (!wait_for_completion_timeout(&ua->complete, HZ)) > + if (!wait_for_completion_timeout(&ua->complete, 5 * HZ)) > ret = -ETIMEDOUT; > > out_clear_bit: I tried this patch on the latest kernel (5.19.8) but the issue is still present.
(In reply to Mattia Orlandi from comment #9) > I tried this patch on the latest kernel (5.19.8) but the issue is still > present. Thanks. This looks like a firmware issue to me. Do you guys have the latest XPS 9500 BIOS? https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=7nn1r&oscode=wt64a&productcode=xps-15-9500-laptop If you don't have the latest BIOS, then please upgrade that, but if you do have it, then can you test if downgrading it helps?
(In reply to Heikki Krogerus from comment #10) > (In reply to Mattia Orlandi from comment #9) > > I tried this patch on the latest kernel (5.19.8) but the issue is still > > present. > > Thanks. This looks like a firmware issue to me. Do you guys have the latest > XPS 9500 BIOS? > https://www.dell.com/support/home/en-us/drivers/ > driversdetails?driverid=7nn1r&oscode=wt64a&productcode=xps-15-9500-laptop > > If you don't have the latest BIOS, then please upgrade that, but if you do > have it, then can you test if downgrading it helps? I installed the newest BIOS (1.18.0, previously I was on 1.17.0) but the issue persists. I don't think it's a firmware issue because the problem does not occur in neither Windows partition, BIOS UI or Linux 5.15.64. I have noticed a few things: - in 5.15.64 (the one which works as expected), the system takes about half a second to "realize" it has been connected to AC, and it takes the same time to "realize" it has been disconnected from AC; - in 5.19.8 (the one affected by the issue), the system "realizes" almost instantaneously it has been connected to AC, but then it never "realizes" it has been disconnected from AC; - in both cases, when I plug/unplug the charger `dmesg` shows entries like these: ACPI Error: Thread X cannot release Mutex [ECMX] acquired by thread Y (20210730/exmutex-378) ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20220331/psparse-529) although in 5.19.8 (the one affected by the issue) there are much more of them.
(In reply to Heikki Krogerus from comment #10) > (In reply to Mattia Orlandi from comment #9) > > I tried this patch on the latest kernel (5.19.8) but the issue is still > > present. > > Thanks. This looks like a firmware issue to me. Do you guys have the latest > XPS 9500 BIOS? > https://www.dell.com/support/home/en-us/drivers/ > driversdetails?driverid=7nn1r&oscode=wt64a&productcode=xps-15-9500-laptop > > If you don't have the latest BIOS, then please upgrade that, but if you do > have it, then can you test if downgrading it helps? I've tried all firmware update from DELL as soon as they are available. I also tried older versions down to v1.11.0 (DELL forbids downgrading to v1.10 or earlier). It has no effects. The only thing that seems to have an effect is changing the kernel version. For instance I have the issue with 5.18.19, but I do not have it with 5.18.17 (or at least, it is harder to reproduce).
Okay, I stand corrected. This is a regression in kernel. Can somebody bisect the problem so we know exactly which commit introduced the regression? I can not reproduce this with the systems that I have unfortunately.
(In reply to Heikki Krogerus from comment #13) > Okay, I stand corrected. This is a regression in kernel. Can somebody bisect > the problem so we know exactly which commit introduced the regression? > > I can not reproduce this with the systems that I have unfortunately. It must be between 5.18.16 and 5.19. Unfortunately arch does not provide packages for 5.18.{17, 18, 19}, so I will have to compile them to verify when the problem starts occurring (probably in 5.18.18 or 5.18.19, as Benjamin pointed out). I will give you more information as soon as possible.
(In reply to Mattia Orlandi from comment #14) > (In reply to Heikki Krogerus from comment #13) > > Okay, I stand corrected. This is a regression in kernel. Can somebody > bisect > > the problem so we know exactly which commit introduced the regression? > > > > I can not reproduce this with the systems that I have unfortunately. > > It must be between 5.18.16 and 5.19. Unfortunately arch does not provide > packages for 5.18.{17, 18, 19}, so I will have to compile them to verify > when the problem starts occurring (probably in 5.18.18 or 5.18.19, as > Benjamin pointed out). I will give you more information as soon as possible. I have rebuilt 5.18.17 and 5.18.18 from source. - 5.18.17 seems fine - 5.18.18 has the issue
(In reply to Benjamin Terrier from comment #15) > I have rebuilt 5.18.17 and 5.18.18 from source. > > - 5.18.17 seems fine > - 5.18.18 has the issue Thanks. Binary search (git-bisect) would give us the exact commit, but I wonder could it be this one: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/acpi/ec.c?h=v5.18.18&id=0fbb5ce2f426753c94b74d134de4b71402d7fb93 Can you test does reverting it help?
(In reply to Heikki Krogerus from comment #16) > (In reply to Benjamin Terrier from comment #15) > > I have rebuilt 5.18.17 and 5.18.18 from source. > > > > - 5.18.17 seems fine > > - 5.18.18 has the issue > > Thanks. Binary search (git-bisect) would give us the exact commit, but I > wonder could it be this one: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/ > drivers/acpi/ec.c?h=v5.18.18&id=0fbb5ce2f426753c94b74d134de4b71402d7fb93 > > Can you test does reverting it help? I have just tested this and I can confirm that it seems to work now. Benjamin, can you double-check just to be sure this is the exact commit?
I've done some tests and it seems it is a bit more complicated. I've built a kernel at commit 0fbb5ce and a kernel at the commit just before, and indeed this commit seems to introduce the bug. By bug here I mean: "booting with the laptop connected on AC, it does not detect future connect/disconnect of AC" However, I have noticed that for 5.18.17, 5.18.18, 0fbb5ce and 0fbb5ce~1 there is another variant of this bug: "booting with the laptop not connected on AC, it will detect the first time it is plugged in, but will fail to detect further connect/disconnect". Variant B does impact all 5.18 I have tested, but did not impact any 5.15 I have tested. In particular I just tested 5.15.64 and it suffers from variant A, but not from variant B.
That is another issue, not related to this bug. I'm pretty sure that issue is caused by commit 512df95b9432, however, the problem is a bit more complicated. In any case, can you create a separate bug for that (please don't forget to CC me)?
I don't understand how upstream commit f7090e0ef360 ("ACPI: EC: Drop the EC_FLAGS_IGNORE_DSDT_GPE quirk") could cause this? That quirk was never used on your board, or any other Dell system. I still want to debug this a bit more. Could you attach acpidump and complete dmesg output after reproducing the problem - ideally soon after bootup? acpidump -o acpi.dump
Created attachment 301859 [details] dmesg output It's getting weirder... I was doing more tests and checking dmesg, today I could consistently reproduce the bug with the kernel built from the commit ed733f9, i.e. the one just before f7090e0ef360. Anyway, when it happens there are ACPI errors in dmesg, see attached file.
After some more testing I have observed this - Shutdown/Boot 5.18.17~ed733f9 > Bug - Shutdown/Boot 5.15.0-48 > No bug - Reboot 5.18.17~ed733f9 > No Bug - Shutdown/Boot 5.18.18 > No bug - Reboot 5.18.18 > Bug So now it seems not related to a linux version... Also the only difference in dmesg when the bug happens is the presence of the following line when disconnecting the power is not detected: ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-110) All the other errors are present on all Linux versions whether the bug happens or not.
Well, that would make this more likely to be BIOS problem after all. Can you please get the acpidump? acpidump -o xps_acpi.dump It would also be useful to see the entire dmesg output.
Created attachment 301883 [details] dmesg output
Created attachment 301884 [details] acpidump after plugging
(In reply to Heikki Krogerus from comment #23) > Well, that would make this more likely to be BIOS problem after all. > Can you please get the acpidump? > > acpidump -o xps_acpi.dump > > It would also be useful to see the entire dmesg output. I attached the output of dmesg and acpidump after plugging the power cable, using the latest kernel (5.19.11).
Thank you for the logs. I took a look at that ECMX mutex in the ACPI tables that causes the error you can see in the dmesg (Thread X cannot release Mutex [ECMX] acquired by thread Y). This is the ASL that causes it: Method (_Q66, 0, NotSerialized) { Acquire (ECMX, 0x0064) If ((ECRD == One)) { NEVT () } Release (ECMX) Return (Zero) } I'm not sure is that error related to this problem. It's an old issue. Rafael (ACPI maintainter in kernel) explained that problem already years ago here: https://bugzilla.kernel.org/show_bug.cgi?id=196415#c7 Can someone try to disable the UCSI driver in kernel (or blacklist it), and check do those ACPI Errors still appear?
(In reply to Heikki Krogerus from comment #27) > Thank you for the logs. > > I took a look at that ECMX mutex in the ACPI tables that causes the error > you can see in the dmesg (Thread X cannot release Mutex [ECMX] acquired by > thread Y). This is the ASL that causes it: > > Method (_Q66, 0, NotSerialized) > { > Acquire (ECMX, 0x0064) > If ((ECRD == One)) > { > NEVT () > } > > Release (ECMX) > Return (Zero) > } > > I'm not sure is that error related to this problem. It's an old issue. > Rafael (ACPI maintainter in kernel) explained that problem already years ago > here: > https://bugzilla.kernel.org/show_bug.cgi?id=196415#c7 > > Can someone try to disable the UCSI driver in kernel (or blacklist it), and > check do those ACPI Errors still appear? I tried blacklisting the `ucsi_acpi` module (I passed the "module_blacklist=ucsi_acpi" parameter to the kernel) and the problem seems to be solved! I tested both "Boot unplugged -> Plug -> Unplug" and "Boot plugged -> Unplug". I'll re-attach dmesg and acpidump outputs. The ACPI errors "Thread X cannot release Mutex [ECMX] acquired by thread Y" are still present, but the "ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-110)" error seems gone.
Created attachment 302929 [details] acpidump after blacklisting ucsi driver
Created attachment 302930 [details] dmesg output after blacklisting ucsi driver
(In reply to Mattia Orlandi from comment #28) > I tried blacklisting the `ucsi_acpi` module (I passed the > "module_blacklist=ucsi_acpi" parameter to the kernel) and the problem seems > to be solved! I tested both "Boot unplugged -> Plug -> Unplug" and "Boot > plugged -> Unplug". > > I'll re-attach dmesg and acpidump outputs. The ACPI errors "Thread X cannot > release Mutex [ECMX] acquired by thread Y" are still present, but the > "ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS > failed (-110)" error seems gone. Thank you for confirming that. That ECMX mutex warning is unrelated. I'll try to get one of those XPS 15 9500 systems. I really need to be able to reproduce this one.
If helpful for hardware availability, my Dell XPS 13 9310 exhibits the same problem (Firmware 3.10.0, mainline kernel 6.0.0-060000-generic). Blacklisting usci_acpi also works around it successfully for me.
Anyone still working on this? Reminder, this ideally should be fixed by now, as explained in the kernel docs (see "Prioritize work on fixing regressions" in https://www.kernel.org/doc/html/latest/process/handling-regressions.html )
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #33) > Anyone still working on this? Reminder, this ideally should be fixed by now, > as explained in the kernel docs (see "Prioritize work on fixing regressions" > in https://www.kernel.org/doc/html/latest/process/handling-regressions.html ) This bug is not a regression. I'm still trying to reproduce the problem. I don't have XPS 15 9500. I did get access to an older XPS 9730, but with that the problem does not happen. There is still a good change that the problem is caused by the firmware. Ideally Dell could take a look at this, but unfortunately I don't have contacts to Dell anymore. Nevertheless, rest assured, I'm working on this.
(In reply to Heikki Krogerus from comment #34) > > This bug is not a regression. Could you quickly describe why not? Then I'll drop it from the list of tracked regressions. The initial report sounded a lot like a regression to me (``` I'm testing the LTS kernel (5.15.63) and the issue does not occur, so I assume it's a regression bug, possibly introduced in kernel 5.18```
Based on comment 22, the problem can't be tied to any specific kernel version.
(In reply to Heikki Krogerus from comment #36) > Based on comment 22, the problem can't be tied to any specific kernel > version. Many thx, that helped, with all the regression I track I can't watch each and everyone closely…
XPS 15 9520 the same: If boot with plugged power supply - take error USBC000:00: UCSI_GET_PDOS failed (-5) and detection plug|unplug isn't working. If boot with unplugged power supply - detection plug|unplug is working. Blacklisting usci_acpi - detection plug|unplug is working. Bios Latest avalible on Dell site. Kernel 6.0.x-6.4.2
When the bug is present, the "online" attribute of one of the USB-C ports in /sys/class/power_supply is stuck on 1, whether plugged in or not. `rmmod ucsi_acpi` makes these entries disappear (so only AC and BAT0 remain) and makes the bug disappear. `modprobe ucsi_acpi` doesn't bring the bug back, and "online" is 0 for both ports whether plugged in or not. In summary, ucsi_acpi fails to modify "online" when plugging or unplugging the charger. This is on a Dell Latitude 5440, so it's not just the XPS laptops that exhibit this issue.