Bug 176361

Summary: Touchscreen fails to work after lid close and open
Product: Drivers Reporter: AceLan Kao (acelan)
Component: Input DevicesAssignee: drivers_input-devices
Status: RESOLVED CODE_FIX    
Severity: normal CC: acelan, andy.shevchenko, mika.westerberg, superm1
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.8 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg with lid close and open
/proc/interrupts before lid closed
/proc/interrupts after lid close and open
/sys/kernel/debug/gpio
/sys/kernel/debug/pinctrl/INT344B:00/pins before lid close
/sys/kernel/debug/pinctrl/INT344B:00/pins after lid close/open
Sunrisepoint pinctrl fix
dmesg with patched kernel and debug option
acpidump file
Skip pin 103 when restoring pad configs
dmesg for comment 17
dmesg for comment 18

Description AceLan Kao 2016-10-05 12:41:14 UTC
Created attachment 240811 [details]
dmesg with lid close and open

On some Dell XPS machines, the touchscreen fails to work after lid close and open.
The touchscreen works well on ubuntu 3.19 kernel and after upgraded to kernel 4.4, the issue appears.

By closing the lid to enter S3, and opening the lid to wake up, the touchscreen doesn't work anymore, but using other methods to enter S3, such as pm-suspend or click suspend from GUI, the touchscreen still works after waking up.

This issue could be reproduce even if upgrading to the latest kernel 4.8.

After diagnostics, we found that the driver pinctrl-sunrisepoint which introduced since v4.1-rc1 is suspicious. Unload that driver and then the touchscreen can survive after lid close/open.

BTW, there is no new touchscreen interrupts emitted after this issue happened.
Comment 1 AceLan Kao 2016-10-05 12:41:53 UTC
Created attachment 240821 [details]
/proc/interrupts before lid closed
Comment 2 AceLan Kao 2016-10-05 12:44:57 UTC
Created attachment 240831 [details]
/proc/interrupts after lid close and open

The interrupt for touchscreen should be 122
 122:        255         44         82        259  IR-PCI-MSI 327680-edge      xhci_hcd
Comment 3 Mika Westerberg 2016-10-05 13:08:39 UTC
It is probably this one

[    8.771522] input: ELAN Touchscreen as /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4:1.0/0003:04F3:20D0.0001/input/input15
[    8.771945] hid-multitouch 0003:04F3:20D0.0001: input,hiddev0,hidraw0: USB HID v1.10 Device [ELAN Touchscreen] on usb-0000:00:14.0-4/input0

I can see the interrupt count has increased between suspend/resume so the xHCI controller still seems to work, correct?

If you open the raw input device, do you see anything when you use the touchscreen? Like 'od -x /dev/input/event15'.
Comment 4 Mika Westerberg 2016-10-05 13:10:20 UTC
Oh, and can you attach contents of /sys/kernel/debug/gpio and /sys/kernel/debug/pinctrl/INT344B:00/pins to this bug, thanks.
Comment 5 AceLan Kao 2016-10-05 23:30:14 UTC
Created attachment 240911 [details]
/sys/kernel/debug/gpio

After lid close and open, the od command shows nothing while touching the touchscreen.
But before closing the lid, there are lots of message show.
Comment 6 AceLan Kao 2016-10-05 23:31:28 UTC
Created attachment 240921 [details]
/sys/kernel/debug/pinctrl/INT344B:00/pins before lid close

I found this file changed a little bit before and after the lid close/open, so I attached 2 of them.
Comment 7 AceLan Kao 2016-10-05 23:31:50 UTC
Created attachment 240931 [details]
/sys/kernel/debug/pinctrl/INT344B:00/pins after lid close/open
Comment 8 Mika Westerberg 2016-10-06 09:26:27 UTC
Created attachment 240951 [details]
Sunrisepoint pinctrl fix
Comment 9 Mika Westerberg 2016-10-06 09:27:12 UTC
Can you try the attached patch? It is on top of v4.8. Please set also CONFIG_DEBUG_PINCTRL=y in your .config.
Comment 10 AceLan Kao 2016-10-07 01:42:20 UTC
Created attachment 241021 [details]
dmesg with patched kernel and debug option

The patch doesn't work, after lid close/open, there is no touchscreen interrupt.
Comment 11 AceLan Kao 2016-10-07 02:55:49 UTC
BTW, trying to unload/reload pinctrl-sunrisepoint driver, I get
   sunrisepoint-pinctrl INT344B:00: failed to lookup the default state
   sunrisepoint-pinctrl INT344B:00: failed to lookup the sleep state
 
The message shows up no matter after clean boot up or after S3.
Comment 12 Mika Westerberg 2016-10-07 06:39:59 UTC
That's harmless message.

I'm suspecting that the problem still lies elsewhere than in the pinctrl driver (but it causes something to change triggering the problem). Your dmesg show these:

[   49.065849] Broke affinity for irq 277
[   49.065851] Broke affinity for irq 283

Which tells me that the x86 migration code failed to move the IRQ off from the offlined CPU. Do you see these when you disable the pinctrl driver?

Also can you try to move the IRQ on another CPU after resume. Like

 # echo 0 > /proc/irq/122/smp_affinity_hint

(and same for 1,2 and 3).

In addition, can you attach acpidump from the machine to this bug, thanks.
Comment 13 AceLan Kao 2016-10-07 07:19:55 UTC
Created attachment 241041 [details]
acpidump file

1. Yes, I'm still seeing those Broke affinity message after pinctrl-* driver is removed.

2. Do you mean /proc/irq/122/smp_affinity_list?
If this is the case, I can change the value to any combination of 0,1,2,3 without any error messages.

3. The acpidump file is attached.

BTW, I can provide you other ways the recovery the functionality of the touchscreen to see if you have more ideas.
1. Close the lid and open it quickly before the system enter S3, eventually, the system will enter S3, and the press power button to wake it up, the touchscreen will come back alive.
2. Using a magnet to touch the area near power button, it mimics the lid close event, and let the system enter S3, and then press the power button to wake it up, the touchscreen will come back to work.
Comment 14 Mika Westerberg 2016-10-07 07:28:39 UTC
Hmm, so if you run

  # echo mem > /sys/power/state

without closing the lid, wait for it to suspend and then press power button to wake it up, does it work?
Comment 15 Mika Westerberg 2016-10-07 07:33:45 UTC
Oh, and you seem to have "acpi_osi=!Windows 2015" in the kernel command line. Why is that?
Comment 16 AceLan Kao 2016-10-07 07:42:16 UTC
This issue only triggers by lid close and open, so using any kind of command to enter S3 won't lead to the issue
1. pm-suspend
2. echo mem > /sys/power/state
3. click from GUI to enter suspend mode

Those kernel parameters are just for testing, for the touchscreen works on 3.19 kernel, but doesn't work after 4.4, so I try to remove win10 OSI, and set the acpi version back, but it doesn't help.
I just removed all of them, and it doesn't change the test result, the touchscreen still can't survive with the patched kernel, and the Broke affinity message in dmesg is the same.
Comment 17 Mika Westerberg 2016-10-07 08:13:57 UTC
OK, can you try these next:

1. Build the pinctrl driver into the kernel image:

  CONFIG_PINCTRL_SUNRISEPOINT=y

2. Disable I2C HID driver

  CONFIG_I2C_HID=n

and see if anything changes. It is possible that the touchpad does not work anymore but does it have any effect on the touchscreen.
Comment 18 Mika Westerberg 2016-10-07 08:24:13 UTC
Created attachment 241051 [details]
Skip pin 103 when restoring pad configs
Comment 19 Mika Westerberg 2016-10-07 08:25:33 UTC
Can you also try the attached patch? It should leave the pin 103 untouched during resume.
Comment 20 AceLan Kao 2016-10-07 08:51:11 UTC
Created attachment 241061 [details]
dmesg for comment 17

For comment 17, set those 2 kernel options doesn't help.
The touchpad fallback to use ps/2 mouse, and the touchscreen still don't survive after lid close/open.
Comment 21 AceLan Kao 2016-10-07 09:09:27 UTC
Created attachment 241071 [details]
dmesg for comment 18

Yes, the patch works.
Comment 22 Mika Westerberg 2016-10-07 09:28:34 UTC
OK, so the BIOS is using that particular GPIO (103) for something related to suspend/resume but it does not take ownership or lock it. I need to discuss with Dell guys about this and hopefully get fix ready during next week. Quite possibly we just need to change the pinctrl driver not to restore pins that are not explicitly requested using GPIO APIs.
Comment 23 Mario Limonciello 2016-10-10 18:32:06 UTC
Just for those following around at home, Mika did discuss this with Dell and submitted this patch as a result of the conversation:
https://marc.info/?l=linux-gpio&m=147610677825233&w=2
Comment 24 Mika Westerberg 2016-10-20 08:15:30 UTC
The fix should now be in mainline kernel.