Bug 206403 - surface3-spi (touchscreen driver): crash after two touch input
Summary: surface3-spi (touchscreen driver): crash after two touch input
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Input Devices (show other bugs)
Hardware: x86-64 Linux
: P2 normal
Assignee: Jarkko Nikula
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-04 14:07 UTC by Tsuchiya Yuto (kitakar5525)
Modified: 2020-05-29 15:16 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.5.1
Tree: Mainline
Regression: No


Attachments
surface3-spi: workaround: prevent driver from using DMA (2.20 KB, application/mbox)
2020-02-04 14:07 UTC, Tsuchiya Yuto (kitakar5525)
Details
reloading only spi_pxa2xx_platform module after the touch input crash causes NULL pointer dereference (44.05 KB, text/html)
2020-02-04 14:49 UTC, Tsuchiya Yuto (kitakar5525)
Details

Description Tsuchiya Yuto (kitakar5525) 2020-02-04 14:07:17 UTC
Created attachment 287105 [details]
surface3-spi: workaround: prevent driver from using DMA

Overview:
On kernels at least after 4.19, touch input is broken after suspend
(s2idle) then making touch input two times:

        kern  :err   : [  +0.203408] Surface3-spi spi-MSHW0037:00: SPI transfer timed out

On recent stable kernels at least after 5.1, touch input will crash after
making touch input two times:

        kern  :err   : [  +0.203592] Surface3-spi spi-MSHW0037:00: SPI transfer timed out
        kern  :err   : [  +0.000173] spi_master spi1: failed to transfer one message from queue

Steps to Reproduce:
On 4.19.y:
1) boot the system
2) go into s2idle
3) wakeup from s2idle
4) make the first touch input (by both finger or stylus)
5) try to make the second touch input

On 5.5.y:
1) boot the system
2) make the first touch input
3) try to make the second touch input

Actual Results:
Touch input will no longer usable with the aforementioned dmesg log.

Expected Results:
Touch input will still usable after the second touch.

Additional Information:
The touchscreen driver works as expected with PIO. Only broken with
DMA. So, I made a patch to prevent the driver from using DMA (I attached
the patch for reference).

Note that in some kernel config setups, the driver uses PIO by default.
In other kernel config setups, on the other hand, the driver uses DMA
by default.

I confirmed that at least the following distribution kernels with their
config are affected:
- Fedora (5.4.12-200.fc31.x86_64)
- Arch Linux (both linux (5.5.y) and linux-lts (4.19.y) kernels
- Ubuntu (5.3.0-26-generic)

So, questions are:
- Is the driver intended for use with DMA?
- If not, is it possible to make the driver usable with DMA?
Comment 1 Tsuchiya Yuto (kitakar5525) 2020-02-04 14:49:01 UTC
Created attachment 287107 [details]
reloading only spi_pxa2xx_platform module after the touch input crash causes NULL pointer dereference

By the way, interestingly, I can fix the touch input crash with DMA mode
by reloading only spi_pxa2xx_platform module after (and only after) the
crash (with causing kernel NULL pointer dereference, race at shutting
down the spi_pxa2xx_platform module? [1]).

        sudo modprobe -r spi_pxa2xx_platform
        sudo modprobe spi_pxa2xx_platform

When I reload spi_pxa2xx_platform module with surface3_spi like this
(after the touch input crash), I can avoid the kernel NULL pointer 
dereference, but the touch input crash is still reproducible after this
reload.

        sudo modprobe -r surface3_spi
        sudo modprobe -r spi_pxa2xx_platform
        sudo modprobe spi_pxa2xx_platform
        sudo modprobe surface3_spi

[1] Have a look at the work that is done so far: https://github.com/linux-surface/kernel/commit/66c840113c7f7c7ffcaf5e4809ed1074c550ec0b
     I attached the dmesg log here, too.
Comment 2 Andy Shevchenko 2020-02-05 13:52:55 UTC
Reassigned to Jarrko and downgrade priority (it doesn't prevent most of the system working). Jarkko, I have my patch in incubation here: https://github.com/andy-shev/linux/tree/topic/spi/reload, I think I almost nailed it down, need more time to investigate. But it is being developed in order to solve second issue (removal crash), and not main part.
Comment 3 Tsuchiya Yuto (kitakar5525) 2020-02-08 23:19:44 UTC
Changed title to describe more precise situation.
Comment 4 Tsuchiya Yuto (kitakar5525) 2020-05-29 13:36:31 UTC
From 930f5d85e3fbf1c47cede7319cc873b49963a895 Mon Sep 17 00:00:00 2001
From: "Tsuchiya Yuto" <kitakar@gmail.com>
Date: Fri, 29 May 2020 18:15:01 +0900
Subject: [PATCH] spi: pxa2xx: release/setup DMA on runtime_pm

On recent v4.19 series (confirmed on v4.19.125), surface3_spi driver
stops working after resuming from s2idle then making a touch input:

    kern  :err   : [  +0.203408] Surface3-spi spi-MSHW0037:00: SPI transfer timed out

On recent v5.4, v5.6, and v5.7-rc series (confirmed on v5.4.43, v5.6.15,
and v5.7-rc7), that driver stops working after putting (internal)
display off/on then making a touch input. (On earlier versions, it stopped
after making touch input two times, regardless of s2idle/display state)

    kern  :err   : [  +0.203592] Surface3-spi spi-MSHW0037:00: SPI transfer timed out
    kern  :err   : [  +0.000173] spi_master spi1: failed to transfer one message from queue

This issue happens only with DMA mode.

I found that release/setup DMA cycle helps to get that driver working
again with DMA mode but I'm not sure where is the proper place to do so.
So, for now, I put changes into runtime_suspend()/runtime_resume() because
those gets called on every display off/on.

This commit release/setup DMA on runtime_suspend()/runtime_resume() to
avoid surface3_spi driver crashing.

Note: Considering that this issue is reproducible by only display off/on,
the cause of this issue is not related to s2idle. I also tried disabling
runtime_pm by commenting out the following lines [drivers/spi/spi-pxa2xx.c]

		SET_RUNTIME_PM_OPS(pxa2xx_spi_runtime_suspend,
				pxa2xx_spi_runtime_resume, NULL)

but still reproducible. (surface3_spi doesn't use runtime_pm). So,
runtime_pm is also not relevant.

TODO: Can someone guess why this is needed for Surface 3 (or surface3_spi
driver) ? I don't even know where the root cause lies; who/where to ask?
maybe the cause is not in surface3_spi or spi-pxa2xx code.

Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
---
 drivers/spi/spi-pxa2xx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
index f6e87344a36c8..d8d03dc5ac9c0 100644
--- a/drivers/spi/spi-pxa2xx.c
+++ b/drivers/spi/spi-pxa2xx.c
@@ -1978,6 +1978,13 @@ static int pxa2xx_spi_runtime_suspend(struct device *dev)
 {
 	struct driver_data *drv_data = dev_get_drvdata(dev);
 
+	pr_info("DEBUG: pxa2xx_spi_runtime_suspend() called\n");
+
+	/* Release DMA.
+	 * Code is extracted from pxa2xx_spi_remove(). */
+	if (drv_data->controller_info->enable_dma)
+		pxa2xx_spi_dma_release(drv_data);
+
 	clk_disable_unprepare(drv_data->ssp->clk);
 	return 0;
 }
@@ -1987,6 +1994,23 @@ static int pxa2xx_spi_runtime_resume(struct device *dev)
 	struct driver_data *drv_data = dev_get_drvdata(dev);
 	int status;
 
+	pr_info("DEBUG: pxa2xx_spi_runtime_resume() called\n");
+
+	/* Setup DMA if requested.
+	 * Code is extracted from pxa2xx_spi_probe(). */
+	if (drv_data->controller_info) {
+		status = pxa2xx_spi_dma_setup(drv_data);
+		if (status) {
+			dev_warn(dev, "no DMA channels available, using PIO\n");
+			drv_data->controller_info = false;
+		} else {
+			drv_data->controller->can_dma = pxa2xx_spi_can_dma;
+			drv_data->controller->max_dma_len = MAX_DMA_LEN;
+			drv_data->controller->max_transfer_size =
+				pxa2xx_spi_max_dma_transfer_size;
+		}
+	}
+
 	status = clk_prepare_enable(drv_data->ssp->clk);
 	return status;
 }
-- 
2.26.2
Comment 5 Andy Shevchenko 2020-05-29 14:29:12 UTC
(In reply to Tsuchiya Yuto (kitakar5525) from comment #4)

...

> This issue happens only with DMA mode.
> 
> I found that release/setup DMA cycle helps to get that driver working
> again with DMA mode but I'm not sure where is the proper place to do so.
> So, for now, I put changes into runtime_suspend()/runtime_resume() because
> those gets called on every display off/on.
> 
> This commit release/setup DMA on runtime_suspend()/runtime_resume() to
> avoid surface3_spi driver crashing.

...

> TODO: Can someone guess why this is needed for Surface 3 (or surface3_spi
> driver) ? I don't even know where the root cause lies; who/where to ask?
> maybe the cause is not in surface3_spi or spi-pxa2xx code.

> +     pr_info("DEBUG: pxa2xx_spi_runtime_suspend() called\n");
> +
> +     /* Release DMA.
> +      * Code is extracted from pxa2xx_spi_remove(). */
> +     if (drv_data->controller_info->enable_dma)
> +             pxa2xx_spi_dma_release(drv_data);
> +
>       clk_disable_unprepare(drv_data->ssp->clk);

Ah, looks that's again related to the wrong state machine implementation / handling.

Actually doesn't simple call of dma_terminate_sync() help with this? (we have function called pxa2xx_spi_dma_stop() for that)
Comment 6 Tsuchiya Yuto (kitakar5525) 2020-05-29 15:03:43 UTC
(In reply to Andy Shevchenko from comment #5)
> (In reply to Tsuchiya Yuto (kitakar5525) from comment #4)
> 
> ...
> 
> > This issue happens only with DMA mode.
> > 
> > I found that release/setup DMA cycle helps to get that driver working
> > again with DMA mode but I'm not sure where is the proper place to do so.
> > So, for now, I put changes into runtime_suspend()/runtime_resume() because
> > those gets called on every display off/on.

Oh, what I've said is incorrect. runtime_suspend()/runtime_resume() gets
called rather on every touch input stop/start.

> > This commit release/setup DMA on runtime_suspend()/runtime_resume() to
> > avoid surface3_spi driver crashing.
> 
> ...
> 
> > TODO: Can someone guess why this is needed for Surface 3 (or surface3_spi
> > driver) ? I don't even know where the root cause lies; who/where to ask?
> > maybe the cause is not in surface3_spi or spi-pxa2xx code.
> 
> > +     pr_info("DEBUG: pxa2xx_spi_runtime_suspend() called\n");
> > +
> > +     /* Release DMA.
> > +      * Code is extracted from pxa2xx_spi_remove(). */
> > +     if (drv_data->controller_info->enable_dma)
> > +             pxa2xx_spi_dma_release(drv_data);
> > +
> >       clk_disable_unprepare(drv_data->ssp->clk);
> 
> Ah, looks that's again related to the wrong state machine implementation /
> handling.
> 
> Actually doesn't simple call of dma_terminate_sync() help with this? (we
> have function called pxa2xx_spi_dma_stop() for that)

Like this?

diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
index 73d2a65d0b6ef..c8d305cc9a373 100644
--- a/drivers/spi/spi-pxa2xx.c
+++ b/drivers/spi/spi-pxa2xx.c
@@ -1977,6 +1977,11 @@ static int pxa2xx_spi_runtime_suspend(struct device *dev)
 {
 	struct driver_data *drv_data = dev_get_drvdata(dev);
 
+	/* Stop the DMA if running.
+	 * Code is extracted from pxa2xx_spi_handle_err(). */
+	if (atomic_read(&drv_data->dma_running))
+		pxa2xx_spi_dma_stop(drv_data);
+
 	clk_disable_unprepare(drv_data->ssp->clk);
 	return 0;
 }

but this didn't work. It seems that we really need to release DMA
(dma_release_channel()) and setup DMA again.
Comment 7 Tsuchiya Yuto (kitakar5525) 2020-05-29 15:16:39 UTC
(In reply to Tsuchiya Yuto (kitakar5525) from comment #6)
> (In reply to Andy Shevchenko from comment #5)
> > (In reply to Tsuchiya Yuto (kitakar5525) from comment #4)
> > > TODO: Can someone guess why this is needed for Surface 3 (or surface3_spi
> > > driver) ? I don't even know where the root cause lies; who/where to ask?
> > > maybe the cause is not in surface3_spi or spi-pxa2xx code.
> > 
> > > +     pr_info("DEBUG: pxa2xx_spi_runtime_suspend() called\n");
> > > +
> > > +     /* Release DMA.
> > > +      * Code is extracted from pxa2xx_spi_remove(). */
> > > +     if (drv_data->controller_info->enable_dma)
> > > +             pxa2xx_spi_dma_release(drv_data);
> > > +
> > >       clk_disable_unprepare(drv_data->ssp->clk);
> > 
> > Ah, looks that's again related to the wrong state machine implementation /
> > handling.
> > 
> > Actually doesn't simple call of dma_terminate_sync() help with this? (we
> > have function called pxa2xx_spi_dma_stop() for that)
> 
> Like this?
> 
> diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
> index 73d2a65d0b6ef..c8d305cc9a373 100644
> --- a/drivers/spi/spi-pxa2xx.c
> +++ b/drivers/spi/spi-pxa2xx.c
> @@ -1977,6 +1977,11 @@ static int pxa2xx_spi_runtime_suspend(struct device
> *dev)
>  {
>       struct driver_data *drv_data = dev_get_drvdata(dev);
>  
> +     /* Stop the DMA if running.
> +      * Code is extracted from pxa2xx_spi_handle_err(). */
> +     if (atomic_read(&drv_data->dma_running))
> +             pxa2xx_spi_dma_stop(drv_data);
> +
>       clk_disable_unprepare(drv_data->ssp->clk);
>       return 0;
>  }
> 
> but this didn't work. It seems that we really need to release DMA
> (dma_release_channel()) and setup DMA again.

Ah, that pxa2xx_spi_dma_stop() didn't get called. So, I tried the following

diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
index 73d2a65d0b6ef..a2b848b92b3ae 100644
--- a/drivers/spi/spi-pxa2xx.c
+++ b/drivers/spi/spi-pxa2xx.c
@@ -1977,6 +1977,9 @@ static int pxa2xx_spi_runtime_suspend(struct device *dev)
 {
 	struct driver_data *drv_data = dev_get_drvdata(dev);
 
+	pr_info("DEBUG: before pxa2xx_spi_dma_stop()\n");
+	pxa2xx_spi_dma_stop(drv_data);
+
 	clk_disable_unprepare(drv_data->ssp->clk);
 	return 0;
 }

but didn't work

@@ -1986,6 +1989,9 @@ static int pxa2xx_spi_runtime_resume(struct device *dev)
 	struct driver_data *drv_data = dev_get_drvdata(dev);
 	int status;
 
+	pr_info("DEBUG: before pxa2xx_spi_dma_start()\n");
+	pxa2xx_spi_dma_start(drv_data);
+
 	status = clk_prepare_enable(drv_data->ssp->clk);
 	return status;
 }

even if I add pxa2xx_spi_dma_start(), didn't work either.

Note You need to log in before you can comment on or make changes to this bug.