Bug 217448 - BUG: Patch to spi-nor hangs Lenovo ThinkPad X1 Titanium with divide by zero
Summary: BUG: Patch to spi-nor hangs Lenovo ThinkPad X1 Titanium with divide by zero
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Flash/Memory Technology Devices (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: David Woodhouse
URL:
Keywords:
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2023-05-16 19:06 UTC by Todd Brandt
Modified: 2023-05-18 23:54 UTC (History)
2 users (show)

See Also:
Kernel Version: 6.4.0-rc1
Subsystem:
Regression: Yes
Bisected commit-id: 9d6c5d64f0288a814d4435b7da39e360a4c39e40


Attachments
0001-MTD-SPI-NOR-BUG-FIX-of-divide-by-zero-in-new-n_banks.patch (1.45 KB, patch)
2023-05-17 00:29 UTC, Todd Brandt
Details | Diff

Description Todd Brandt 2023-05-16 19:06:30 UTC
This is the system I'm having troubles with:

os-version              : Ubuntu 20.04.2 LTS
baseboard-manufacturer  : LENOVO
baseboard-product-name  : 20QA000FUS
baseboard-serial-number : W1KS11R111D
baseboard-version       : SDK0J40697 WIN
bios-release-date       : 05/20/2021
bios-vendor             : LENOVO
bios-version            : N2MET49W (1.14 )
chassis-manufacturer    : LENOVO
chassis-serial-number   : R911Q4MM
chassis-version         : None
processor-manufacturer  : Intel(R) Corporation
processor-version       : 11th Gen Intel(R) Core(TM) i7-1160G7 @ 1.20GHz
system-manufacturer     : LENOVO
system-product-name     : 20QA000FUS
system-serial-number    : R911Q4MM
system-version          : ThinkPad X1 Titanium Gen 1
cpucount                : 8
memtotal                : 15939728 kB
memfree                 : 8428028 kB

Ever since 6.4-rc1 this system has hung on every S3/S2idle suspend and shutdown during reboot. I've bisected it to a patch to the MTD: SPI-NOR subsystem about introducing the "concept of a bank"

9d6c5d64f0288a814d4435b7da39e360a4c39e40 is the first bad commit
commit 9d6c5d64f0288a814d4435b7da39e360a4c39e40
Author: Miquel Raynal <miquel.raynal@bootlin.com>
Date:   Tue Mar 28 17:40:58 2023 +0200

    mtd: spi-nor: Introduce the concept of bank

    SPI NOR chips are made of pages, which gathered in small groups make
    (erase) sectors. Sectors, gathered together, make banks inside the
    chip. Until now, there was only one bank per device supported, but we
    are about to introduce support for new chips featuring several banks (up
    to 4 so far) where different operations may happen in parallel.

    Let's allow describing these additional bank parameters, and let's do
    this independently of any other value (like the number of sectors) with
    an absolute value.

    By default we consider that all chips have a single bank.

    Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
    Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
    Link: https://lore.kernel.org/r/20230328154105.448540-2-miquel.raynal@bootlin.com
    Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>

 drivers/mtd/spi-nor/core.c   |  1 +
 drivers/mtd/spi-nor/core.h   | 16 +++++++++++-----
 drivers/mtd/spi-nor/xilinx.c |  1 +
 3 files changed, 13 insertions(+), 5 deletions(-)

Basically something about the new code is improperly handling the memory size on this machine. It's not complex but I haven't dug any deeper since I want it posted as quickly as possible. To reproduce, run any of these 3 commands and the system will hang:

echo freeze > /sys/power/state
echo mem > /sys/power/state
sudo reboot
Comment 1 Todd Brandt 2023-05-16 21:20:44 UTC
I'm debugging it now, I'm willing to bet that this line is causing a divide by zero:

diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 0a78045ca1d9..7d9ca799f767 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -2583,6 +2583,7 @@ static void spi_nor_init_default_params(struct spi_nor *nor)
    /* Set SPI NOR sizes. */
    params->writesize = 1;
    params->size = (u64)info->sector_size * info->n_sectors;
+   params->bank_size = div64_u64(params->size, info->n_banks);
    params->page_size = info->page_size;
 
    if (!(info->flags & SPI_NOR_NO_FR)) {

I'm testing this now to see if it's the case. There might be some condition under which the new n_banks member is never properly initialized.
Comment 2 Todd Brandt 2023-05-16 22:08:29 UTC
This is indeed the cause, I'm testing a fix where the "n_banks" value is made 0 based, so that "1" is the lowest possible number and is achieved without any additional initialization.
Comment 3 Todd Brandt 2023-05-16 22:23:13 UTC
It appears that in 6.4-rc2 the code already depends on n_banks having a 0 value that means something. So here's an easier fix. We just make sure there's no divide by zero if n_banks is 0. Testing this now...

diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 0bb0ad14a2fc..084117959be4 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -2921,7 +2921,10 @@ static void spi_nor_late_init_params(struct spi_nor *nor)
    if (nor->flags & SNOR_F_HAS_LOCK && !nor->params->locking_ops)
        spi_nor_init_default_locking_ops(nor);
 
-   nor->params->bank_size = div64_u64(nor->params->size, nor->info->n_banks);
+   if (nor->info->n_banks > 0)
+       nor->params->bank_size = div64_u64(nor->params->size, nor->info->n_banks);
+   else
+       nor->params->bank_size = 0;
 }
Comment 4 Todd Brandt 2023-05-16 22:52:41 UTC
Tested and it works, I just submitted the patch to lkml:

https://marc.info/?l=linux-kernel&m=168427739123254&w=2
Comment 5 Todd Brandt 2023-05-17 00:29:28 UTC
Created attachment 304280 [details]
0001-MTD-SPI-NOR-BUG-FIX-of-divide-by-zero-in-new-n_banks.patch

Tested extensively on the Lenovo and works well.
Comment 6 Bagas Sanjaya 2023-05-18 07:24:57 UTC
Hi Todd,

There is a fix landed on linux-next (79a4db50192c19 ("mtd: spi-nor: Delay the initialization of bank_size")). Can you check linux-next and see if your regression is solved?
Comment 7 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-05-18 08:18:41 UTC
(In reply to Bagas Sanjaya from comment #6)
> 
> There is a fix landed on linux-next

Not sure if it's a good idea to ask reporters to test linux-next, as the risk that something goes sideways there is a bit bigger than mainline. 

But well, I guess in this case it doesn't matter much, as Todd likely knowns enough about the kernel to judge for himself.

> (79a4db50192c19 ("mtd: spi-nor: Delay the initialization of bank_size")). 

Huh? That patch is already in 6.4-rc1. Did you want to copy'n'paste something else?
Comment 8 Bagas Sanjaya 2023-05-18 13:22:22 UTC
On 5/18/23 15:18, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217448
> 
> --- Comment #7 from The Linux kernel's regression tracker (Thorsten Leemhuis)
> (regressions@leemhuis.info) ---
> (In reply to Bagas Sanjaya from comment #6)
>>
>> There is a fix landed on linux-next
> 
> Not sure if it's a good idea to ask reporters to test linux-next, as the risk
> that something goes sideways there is a bit bigger than mainline. 
> 
> But well, I guess in this case it doesn't matter much, as Todd likely knowns
> enough about the kernel to judge for himself.
> 
>> (79a4db50192c19 ("mtd: spi-nor: Delay the initialization of bank_size")). 
> 
> Huh? That patch is already in 6.4-rc1. Did you want to copy'n'paste something
> else?
> 

I forgot to `git describe` that fix...
Comment 9 Todd Brandt 2023-05-18 23:54:29 UTC
The patch you mentioned already appears to be in mainline, and on it's own it still has the divide by zero.

Tudor Ambarus just submitted a full patch that I've tested and verified, so hopefully that will get accepted soon.

https://marc.info/?l=linux-kernel&m=168440000825531&w=2

Note You need to log in before you can comment on or make changes to this bug.