Bug 218006
Summary: | [ext4] system panic when ext4_writepages:2918: Journal has aborted | ||
---|---|---|---|
Product: | File System | Reporter: | Gary (fengchunguo) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | RESOLVED INVALID | ||
Severity: | high | CC: | tytso |
Priority: | P3 | ||
Hardware: | ARM | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: | fs/ext4 code |
Description
Gary
2023-10-13 08:48:01 UTC
Looks like your storage is faulty: mmcblk0: error -110 sending status Also, note the panic message: Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 This indicates that the init process received a SIGBUS (signal number 7). Given all of the large number of mmc0 / sdhci errors, it's pretty clear that the storage device is *very* unhappy. The most common cause, as Artem as stated, is that it's a hardware problem, It's possible that forcing a factory reset might work. If the SD card is removable, you could just to see if reseating the SD card, or if that doesn't work, replacing the SD card. If the eMMC flash device is soldered onto the mainboard, then probably solution is complete hardware replacement. (In reply to Theodore Tso from comment #2) > Also, note the panic message: > > Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 > > This indicates that the init process received a SIGBUS (signal number 7). > Given all of the large number of mmc0 / sdhci errors, it's pretty clear that > the storage device is *very* unhappy. > > > The most common cause, as Artem as stated, is that it's a hardware problem, > It's possible that forcing a factory reset might work. If the SD card is > removable, you could just to see if reseating the SD card, or if that > doesn't work, replacing the SD card. If the eMMC flash device is soldered > onto the mainboard, then probably solution is complete hardware replacement. Hi Theodore, Thanks for your suggestion. Uesed cpuburn and memtest tool test 7*24 hours, not found mmc issue, included stroage part. We need use kernel 4.14, could you please kindly offer the debugging way? Thanks, BRs, Gary (In reply to Artem S. Tashkinov from comment #1) > Looks like your storage is faulty: > > mmcblk0: error -110 sending status Hi Artem S. Tashkinov, Thanks for your suggestion. It seems than it was not mmc issue. Thanks, BRs, Gary Created attachment 305227 [details]
fs/ext4 code
Unfortunately the 4.14 kernel was released in 2017, which is over six years ago. Most companies where you can pay $$$ to get support for Linux distributions based on 4.14 are EOL'ing products based on 4.14. As far upstream kernel developers who are essentially volunteers when people ask them for free help, in general, upstream kernel developers do not support LTS kernels, and certainly not an LTS kernel as old as 4.14. If there is someone is willing to be the ext4 upstream stable backports maintainer, then that person might be willing to provide limited support for LTS kernels --- but the 4.14 LTS upstream kernel is planned to be EOL'ed in January 2024, and I had stopped running gce-xfstests on 4.14 LTS kernels about a year or so ago. I barely have time to run gce-xfststs on LTS kernels for 6.1, 5.15 and 5.10 every quarter or two, and if someone were to volunteer to become ext4 stable backports maintainer, I'd encourage them to focus on 6.6 and 6.1 LTS kernels, with 5.10 and 5.15 LTS kernels as a lower priority (because most commercial companies are going to be moving off of 5.10 LTS in the near future). But volunteer support for 4.14 LTS? TO be honest, that's extremely unlikely. *If* there is a company that has a misguided business reason to support the 4.14 LTS kernel, then of course an employee of that company can certainly fund an engineer to to do all of the support that they need. But quite frankly, I'd be encouraging that company to rethink their business case for supporting the 4.14 kernel. It would be probably far more cost effective to migrate their customers to a non-pre-historic kernel such as the 6.6 LTS kernel. (In reply to Theodore Tso from comment #6) > Unfortunately the 4.14 kernel was released in 2017, which is over six years > ago. Most companies where you can pay $$$ to get support for Linux > distributions based on 4.14 are EOL'ing products based on 4.14. As far > upstream kernel developers who are essentially volunteers when people ask > them for free help, in general, upstream kernel developers do not support > LTS kernels, and certainly not an LTS kernel as old as 4.14. > > If there is someone is willing to be the ext4 upstream stable backports > maintainer, then that person might be willing to provide limited support for > LTS kernels --- but the 4.14 LTS upstream kernel is planned to be EOL'ed in > January 2024, and I had stopped running gce-xfstests on 4.14 LTS kernels > about a year or so ago. I barely have time to run gce-xfststs on LTS > kernels for 6.1, 5.15 and 5.10 every quarter or two, and if someone were to > volunteer to become ext4 stable backports maintainer, I'd encourage them to > focus on 6.6 and 6.1 LTS kernels, with 5.10 and 5.15 LTS kernels as a lower > priority (because most commercial companies are going to be moving off of > 5.10 LTS in the near future). But volunteer support for 4.14 LTS? TO be > honest, that's extremely unlikely. > > *If* there is a company that has a misguided business reason to support the > 4.14 LTS kernel, then of course an employee of that company can certainly > fund an engineer to to do all of the support that they need. But quite > frankly, I'd be encouraging that company to rethink their business case for > supporting the 4.14 kernel. It would be probably far more cost effective > to migrate their customers to a non-pre-historic kernel such as the 6.6 LTS > kernel. Thanks for your reply. We will try to debug this issue. For this issue, I think that we should focus on the below infromation. Emmc error should be one side effect. [2023-10-13 02:51:08] [60086.731357] EXT4-fs error (device mmcblk0p44) in ext4_da_write_end:3210: IO failure [2023-10-13 02:51:09] [60086.739386] EXT4-fs (mmcblk0p44): Delayed block allocation failed for inode 155757 at logical offset 438 with max blocks 25 with error 30 [2023-10-13 02:51:09] [60086.739388] EXT4-fs (mmcblk0p44): This should not happen!! Data will be lost [2023-10-13 02:51:09] [60086.739388] [2023-10-13 02:51:09] [60086.739399] EXT4-fs error (device mmcblk0p44) in ext4_writepages:2918: Journal has aborted |