This is the system I'm having troubles with: os-version : Ubuntu 20.04.2 LTS baseboard-manufacturer : LENOVO baseboard-product-name : 20QA000FUS baseboard-serial-number : W1KS11R111D baseboard-version : SDK0J40697 WIN bios-release-date : 05/20/2021 bios-vendor : LENOVO bios-version : N2MET49W (1.14 ) chassis-manufacturer : LENOVO chassis-serial-number : R911Q4MM chassis-version : None processor-manufacturer : Intel(R) Corporation processor-version : 11th Gen Intel(R) Core(TM) i7-1160G7 @ 1.20GHz system-manufacturer : LENOVO system-product-name : 20QA000FUS system-serial-number : R911Q4MM system-version : ThinkPad X1 Titanium Gen 1 cpucount : 8 memtotal : 15939728 kB memfree : 8428028 kB Ever since 6.4-rc1 this system has hung on every S3/S2idle suspend and shutdown during reboot. I've bisected it to a patch to the MTD: SPI-NOR subsystem about introducing the "concept of a bank" 9d6c5d64f0288a814d4435b7da39e360a4c39e40 is the first bad commit commit 9d6c5d64f0288a814d4435b7da39e360a4c39e40 Author: Miquel Raynal <miquel.raynal@bootlin.com> Date: Tue Mar 28 17:40:58 2023 +0200 mtd: spi-nor: Introduce the concept of bank SPI NOR chips are made of pages, which gathered in small groups make (erase) sectors. Sectors, gathered together, make banks inside the chip. Until now, there was only one bank per device supported, but we are about to introduce support for new chips featuring several banks (up to 4 so far) where different operations may happen in parallel. Let's allow describing these additional bank parameters, and let's do this independently of any other value (like the number of sectors) with an absolute value. By default we consider that all chips have a single bank. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Link: https://lore.kernel.org/r/20230328154105.448540-2-miquel.raynal@bootlin.com Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> drivers/mtd/spi-nor/core.c | 1 + drivers/mtd/spi-nor/core.h | 16 +++++++++++----- drivers/mtd/spi-nor/xilinx.c | 1 + 3 files changed, 13 insertions(+), 5 deletions(-) Basically something about the new code is improperly handling the memory size on this machine. It's not complex but I haven't dug any deeper since I want it posted as quickly as possible. To reproduce, run any of these 3 commands and the system will hang: echo freeze > /sys/power/state echo mem > /sys/power/state sudo reboot
I'm debugging it now, I'm willing to bet that this line is causing a divide by zero: diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c index 0a78045ca1d9..7d9ca799f767 100644 --- a/drivers/mtd/spi-nor/core.c +++ b/drivers/mtd/spi-nor/core.c @@ -2583,6 +2583,7 @@ static void spi_nor_init_default_params(struct spi_nor *nor) /* Set SPI NOR sizes. */ params->writesize = 1; params->size = (u64)info->sector_size * info->n_sectors; + params->bank_size = div64_u64(params->size, info->n_banks); params->page_size = info->page_size; if (!(info->flags & SPI_NOR_NO_FR)) { I'm testing this now to see if it's the case. There might be some condition under which the new n_banks member is never properly initialized.
This is indeed the cause, I'm testing a fix where the "n_banks" value is made 0 based, so that "1" is the lowest possible number and is achieved without any additional initialization.
It appears that in 6.4-rc2 the code already depends on n_banks having a 0 value that means something. So here's an easier fix. We just make sure there's no divide by zero if n_banks is 0. Testing this now... diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c index 0bb0ad14a2fc..084117959be4 100644 --- a/drivers/mtd/spi-nor/core.c +++ b/drivers/mtd/spi-nor/core.c @@ -2921,7 +2921,10 @@ static void spi_nor_late_init_params(struct spi_nor *nor) if (nor->flags & SNOR_F_HAS_LOCK && !nor->params->locking_ops) spi_nor_init_default_locking_ops(nor); - nor->params->bank_size = div64_u64(nor->params->size, nor->info->n_banks); + if (nor->info->n_banks > 0) + nor->params->bank_size = div64_u64(nor->params->size, nor->info->n_banks); + else + nor->params->bank_size = 0; }
Tested and it works, I just submitted the patch to lkml: https://marc.info/?l=linux-kernel&m=168427739123254&w=2
Created attachment 304280 [details] 0001-MTD-SPI-NOR-BUG-FIX-of-divide-by-zero-in-new-n_banks.patch Tested extensively on the Lenovo and works well.
Hi Todd, There is a fix landed on linux-next (79a4db50192c19 ("mtd: spi-nor: Delay the initialization of bank_size")). Can you check linux-next and see if your regression is solved?
(In reply to Bagas Sanjaya from comment #6) > > There is a fix landed on linux-next Not sure if it's a good idea to ask reporters to test linux-next, as the risk that something goes sideways there is a bit bigger than mainline. But well, I guess in this case it doesn't matter much, as Todd likely knowns enough about the kernel to judge for himself. > (79a4db50192c19 ("mtd: spi-nor: Delay the initialization of bank_size")). Huh? That patch is already in 6.4-rc1. Did you want to copy'n'paste something else?
On 5/18/23 15:18, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217448 > > --- Comment #7 from The Linux kernel's regression tracker (Thorsten Leemhuis) > (regressions@leemhuis.info) --- > (In reply to Bagas Sanjaya from comment #6) >> >> There is a fix landed on linux-next > > Not sure if it's a good idea to ask reporters to test linux-next, as the risk > that something goes sideways there is a bit bigger than mainline. > > But well, I guess in this case it doesn't matter much, as Todd likely knowns > enough about the kernel to judge for himself. > >> (79a4db50192c19 ("mtd: spi-nor: Delay the initialization of bank_size")). > > Huh? That patch is already in 6.4-rc1. Did you want to copy'n'paste something > else? > I forgot to `git describe` that fix...
The patch you mentioned already appears to be in mainline, and on it's own it still has the divide by zero. Tudor Ambarus just submitted a full patch that I've tested and verified, so hopefully that will get accepted soon. https://marc.info/?l=linux-kernel&m=168440000825531&w=2