Bug 217939 - mtd: Function mtdchar_read() does not return error code when retlen is 0
Summary: mtd: Function mtdchar_read() does not return error code when retlen is 0
Status: NEW
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-23 00:17 UTC by wangzhaolong1
Modified: 2023-09-23 00:57 UTC (History)
0 users

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description wangzhaolong1 2023-09-23 00:17:35 UTC
~~~c

        while (count) {
                ......
                ......
                        ret = mtd_read(mtd, *ppos, len, &retlen, kbuf);
                 }
                /* Nand returns -EBADMSG on ECC errors, but it returns
                 * the data. For our userspace tools it is important
                 * to dump areas with ECC errors!
                 * For kernel internal usage it also might return -EUCLEAN
                 * to signal the caller that a bitflip has occurred and has
                 * been corrected by the ECC algorithm.
                 * Userspace software which accesses NAND this way
                 * must be aware of the fact that it deals with NAND
                 */
                if (!ret || mtd_is_bitflip_or_eccerr(ret)) {
                        *ppos += retlen;
                        if (copy_to_user(buf, kbuf, retlen)) {
                                kfree(kbuf);
                                return -EFAULT;
                        }
                        else
                                total_retlen += retlen;
 
                        count -= retlen;
                        buf += retlen;
                        if (retlen == 0)
                                count = 0;
                }
                else {
                        kfree(kbuf);
                        return ret;
                }
 
        }
 
        kfree(kbuf);
        return total_retlen;
~~~

In the first while loop, if the mtd_read() function returns -EBADMSG
and 'retlen' returns 0, the loop break and the function returns value
'total_retlen' is 0, not the error code.
    
This problem causes the user-space program to encounter EOF when it has
not finished reading the mtd partion, and this also violates the read
system call standard in POSIX.
Comment 1 wangzhaolong1 2023-09-23 00:36:46 UTC
### Why does retlen=0 occur?

The NAND driver developer directly returns the -EBADMSG error code in ->readpage().

~~~
mtd_read
  mtd_read_oob
    mtd_read_oob_std
      nand_read_oob   ->_read_oob()
        nand_do_read_ops
          ret = chip->ecc.read_page() // return -EBASMSG;
~~~

If the -EBASMSG error code is returned directly, the ops->retlen calculation result is 0.

~~~c
static int nand_do_read_ops(struct nand_chip *chip, loff_t from,
                struct mtd_oob_ops *ops)
    ......
    uint32_t readlen = ops->len;
    ......
    while (1) {
        ......
                ret = chip->ecc.read_page(chip, bufpoi,
                              oob_required, page);
            if (ret < 0) {  // // return -EBASMSG 
                if (use_bounce_buf)
                    /* Invalidate page cache */
                    chip->pagecache.page = -1;
                break;
            }
        ......
    }
    nand_deselect_target(chip);

    ops->retlen = ops->len - (size_t) readlen; // retlen is 0
    ......
    if (ret < 0)
        return ret;  // return -EBASMSG 
~~~

This may be a mistake made by driver developers not understanding the NAND driver framework.

But I think maybe the mtd driver layer should report this error to the user.
Comment 2 wangzhaolong1 2023-09-23 00:57:40 UTC
### Problem reproduction and impact on user space

Modify mtd_read() to simulate the fault.

~~~diff
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 9bd661be3ae9..eb3d692a6dcc 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -1502,6 +1502,14 @@ int mtd_read(struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen,
        };
        int ret;
 
+       /* Simulate a malfunction here.
+          PEB with 256Kb size,
+          Always returns an ECC error on PEB number 64 */
+       if (from >= 16777216 && from < 17039360) {
+               *retlen = 0;
+               return -EBADMSG;
+       }
+
        ret = mtd_read_oob(mtd, from, &ops);
        *retlen = ops.retlen;
 
~~~

kernel CONFIG:

~~~
CONFIG_MTD=y
CONFIG_MTD_PARTITIONED_MASTER=y
CONFIG_MTD_NAND_CORE=y
CONFIG_MTD_RAW_NAND=m
CONFIG_MTD_NAND_ECC=y
CONFIG_MTD_NAND_ECC_SW_HAMMING=y
CONFIG_MTD_UBI=m
~~~

Do the following in user space:

~~~bash
ID="0xec,0xd3,0x51,0x25"  # 2GB (256KB PEB, 2KB page) Nand Flash
# Create a 32 MB MTD partition.
modprobe nandsim id_bytes=$ID overridesize=10 parts=128

# Scenario 1: Only 16 MB data can be read, and no error is returned.
dd if=/dev/mtd1 of=./mtd1.bin

# Scenario 2: An infinite loop occurs during the ubiformat process.
ubiformat /dev/mtd1 -O 2048
~~~

Note You need to log in before you can comment on or make changes to this bug.