Created attachment 240811 [details]
dmesg with lid close and open
On some Dell XPS machines, the touchscreen fails to work after lid close and open.
The touchscreen works well on ubuntu 3.19 kernel and after upgraded to kernel 4.4, the issue appears.
By closing the lid to enter S3, and opening the lid to wake up, the touchscreen doesn't work anymore, but using other methods to enter S3, such as pm-suspend or click suspend from GUI, the touchscreen still works after waking up.
This issue could be reproduce even if upgrading to the latest kernel 4.8.
After diagnostics, we found that the driver pinctrl-sunrisepoint which introduced since v4.1-rc1 is suspicious. Unload that driver and then the touchscreen can survive after lid close/open.
BTW, there is no new touchscreen interrupts emitted after this issue happened.
Created attachment 240821 [details]
/proc/interrupts before lid closed
Created attachment 240831 [details]
/proc/interrupts after lid close and open
The interrupt for touchscreen should be 122
122: 255 44 82 259 IR-PCI-MSI 327680-edge xhci_hcd
It is probably this one
[ 8.771522] input: ELAN Touchscreen as /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4:1.0/0003:04F3:20D0.0001/input/input15
[ 8.771945] hid-multitouch 0003:04F3:20D0.0001: input,hiddev0,hidraw0: USB HID v1.10 Device [ELAN Touchscreen] on usb-0000:00:14.0-4/input0
I can see the interrupt count has increased between suspend/resume so the xHCI controller still seems to work, correct?
If you open the raw input device, do you see anything when you use the touchscreen? Like 'od -x /dev/input/event15'.
Oh, and can you attach contents of /sys/kernel/debug/gpio and /sys/kernel/debug/pinctrl/INT344B:00/pins to this bug, thanks.
Created attachment 240911 [details]
After lid close and open, the od command shows nothing while touching the touchscreen.
But before closing the lid, there are lots of message show.
Created attachment 240921 [details]
/sys/kernel/debug/pinctrl/INT344B:00/pins before lid close
I found this file changed a little bit before and after the lid close/open, so I attached 2 of them.
Created attachment 240931 [details]
/sys/kernel/debug/pinctrl/INT344B:00/pins after lid close/open
Created attachment 240951 [details]
Sunrisepoint pinctrl fix
Can you try the attached patch? It is on top of v4.8. Please set also CONFIG_DEBUG_PINCTRL=y in your .config.
Created attachment 241021 [details]
dmesg with patched kernel and debug option
The patch doesn't work, after lid close/open, there is no touchscreen interrupt.
BTW, trying to unload/reload pinctrl-sunrisepoint driver, I get
sunrisepoint-pinctrl INT344B:00: failed to lookup the default state
sunrisepoint-pinctrl INT344B:00: failed to lookup the sleep state
The message shows up no matter after clean boot up or after S3.
That's harmless message.
I'm suspecting that the problem still lies elsewhere than in the pinctrl driver (but it causes something to change triggering the problem). Your dmesg show these:
[ 49.065849] Broke affinity for irq 277
[ 49.065851] Broke affinity for irq 283
Which tells me that the x86 migration code failed to move the IRQ off from the offlined CPU. Do you see these when you disable the pinctrl driver?
Also can you try to move the IRQ on another CPU after resume. Like
# echo 0 > /proc/irq/122/smp_affinity_hint
(and same for 1,2 and 3).
In addition, can you attach acpidump from the machine to this bug, thanks.
Created attachment 241041 [details]
1. Yes, I'm still seeing those Broke affinity message after pinctrl-* driver is removed.
2. Do you mean /proc/irq/122/smp_affinity_list?
If this is the case, I can change the value to any combination of 0,1,2,3 without any error messages.
3. The acpidump file is attached.
BTW, I can provide you other ways the recovery the functionality of the touchscreen to see if you have more ideas.
1. Close the lid and open it quickly before the system enter S3, eventually, the system will enter S3, and the press power button to wake it up, the touchscreen will come back alive.
2. Using a magnet to touch the area near power button, it mimics the lid close event, and let the system enter S3, and then press the power button to wake it up, the touchscreen will come back to work.
Hmm, so if you run
# echo mem > /sys/power/state
without closing the lid, wait for it to suspend and then press power button to wake it up, does it work?
Oh, and you seem to have "acpi_osi=!Windows 2015" in the kernel command line. Why is that?
This issue only triggers by lid close and open, so using any kind of command to enter S3 won't lead to the issue
2. echo mem > /sys/power/state
3. click from GUI to enter suspend mode
Those kernel parameters are just for testing, for the touchscreen works on 3.19 kernel, but doesn't work after 4.4, so I try to remove win10 OSI, and set the acpi version back, but it doesn't help.
I just removed all of them, and it doesn't change the test result, the touchscreen still can't survive with the patched kernel, and the Broke affinity message in dmesg is the same.
OK, can you try these next:
1. Build the pinctrl driver into the kernel image:
2. Disable I2C HID driver
and see if anything changes. It is possible that the touchpad does not work anymore but does it have any effect on the touchscreen.
Created attachment 241051 [details]
Skip pin 103 when restoring pad configs
Can you also try the attached patch? It should leave the pin 103 untouched during resume.
Created attachment 241061 [details]
dmesg for comment 17
For comment 17, set those 2 kernel options doesn't help.
The touchpad fallback to use ps/2 mouse, and the touchscreen still don't survive after lid close/open.
Created attachment 241071 [details]
dmesg for comment 18
Yes, the patch works.
OK, so the BIOS is using that particular GPIO (103) for something related to suspend/resume but it does not take ownership or lock it. I need to discuss with Dell guys about this and hopefully get fix ready during next week. Quite possibly we just need to change the pinctrl driver not to restore pins that are not explicitly requested using GPIO APIs.
Just for those following around at home, Mika did discuss this with Dell and submitted this patch as a result of the conversation:
The fix should now be in mainline kernel.