Bug 218006 - [ext4] system panic when ext4_writepages:2918: Journal has aborted
Summary: [ext4] system panic when ext4_writepages:2918: Journal has aborted
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: ARM Linux
: P3 high
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-13 08:48 UTC by Gary
Modified: 2023-10-17 01:41 UTC (History)
1 user (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
fs/ext4 code (370.51 KB, application/x-gzip)
2023-10-16 08:44 UTC, Gary
Details

Description Gary 2023-10-13 08:48:01 UTC
Hi All,

This is the lowest probability issue. how to debug this issue now?
mmcblk0p44 was as userdata
mmcblk0p22 was located all logs.

console:/ $ [60086.159230] EXT4-fs error (device mmcblk0p22): ext4_journal_check_start:61: Detected aborted journal
[2023-10-13 02:51:08]  [60086.171309] EXT4-fs (mmcblk0p22): Remounting filesystem read-only
[2023-10-13 02:51:08]  [60086.185218] EXT4-fs (mmcblk0p22): ext4_writepages: jbd2_start: 10240 pages, ino 16; err -30
[2023-10-13 02:51:08]  [60086.731357] EXT4-fs error (device mmcblk0p44) in ext4_da_write_end:3210: IO failure
[2023-10-13 02:51:09]  [60086.739386] EXT4-fs (mmcblk0p44): Delayed block allocation failed for inode 155757 at logical offset 438 with max blocks 25 with error 30
[2023-10-13 02:51:09]  [60086.739388] EXT4-fs (mmcblk0p44): This should not happen!! Data will be lost
[2023-10-13 02:51:09]  [60086.739388]
[2023-10-13 02:51:09]  [60086.739399] EXT4-fs error (device mmcblk0p44) in ext4_writepages:2918: Journal has aborted
[2023-10-13 02:51:09]  [60086.920057] EXT4-fs error (device mmcblk0p44): ext4_journal_check_start:61: Detected aborted journal
[2023-10-13 02:51:09]  [60086.931781] EXT4-fs (mmcblk0p44): Remounting filesystem read-only
[2023-10-13 02:51:09]  [60086.943848] EXT4-fs
[60086.943848] EXT4-fs (mmcblk0p44): ext4_writepages: jbd2_start: 1024 pages, ino 24635; err -30
[2023-10-13 02:51:09]  [60089.823354] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007
[2023-10-13 02:51:12]  [60089.823354]
[2023-10-13 02:51:12]  [60089.832522] CPU: 0 PID: 1 Comm: init Tainted: G           O    4.14.61-00012-g23e8b99bce8b-dirty #136
[2023-10-13 02:51:12]  [60089.841754] Hardware name: X9 REF Board (DT)
[2023-10-13 02:51:12]  [60089.847510] Call trace:
[2023-10-13 02:51:12]  [60089.849983] [<ffff00000808a3cc>] dump_backtrace+0x0/0x3c0
[2023-10-13 02:51:12]  [60089.855400] [<ffff00000808a7a0>] show_stack+0x14/0x1c
[2023-10-13 02:51:12]  [60089.859789] snd_afe_dai_trigger:1085 -- cmd(0)stream(0)name(subdevice #0),cpu_dai name 30600000.i2s
[2023-10-13 02:51:12]  [60089.861223] snd_afe_dai_trigger:1113 -----i2s stop----------
console:/ $ [60086.159230] EXT4-fs error (device mmcblk0p22): ext4_journal_check_start:61: Detected aborted journal
[2023-10-13 02:51:08]  [60086.171309] EXT4-fs (mmcblk0p22): Remounting filesystem read-only
[2023-10-13 02:51:08]  [60086.185218] EXT4-fs (mmcblk0p22): ext4_writepages: jbd2_start: 10240 pages, ino 16; err -30
[2023-10-13 02:51:08]  [60086.731357] EXT4-fs error (device mmcblk0p44) in ext4_da_write_end:3210: IO failure
[2023-10-13 02:51:09]  [60086.739386] EXT4-fs (mmcblk0p44): Delayed block allocation failed for inode 155757 at logical offset 438 with max blocks 25 with error 30
[2023-10-13 02:51:09]  [60086.739388] EXT4-fs (mmcblk0p44): This should not happen!! Data will be lost
[2023-10-13 02:51:09]  [60086.739388]
[2023-10-13 02:51:09]  [60086.739399] EXT4-fs error (device mmcblk0p44) in ext4_writepages:2918: Journal has aborted
[2023-10-13 02:51:09]  [60086.920057] EXT4-fs error (device mmcblk0p44): ext4_journal_check_start:61: Detected aborted journal
[2023-10-13 02:51:09]  [60086.931781] EXT4-fs (mmcblk0p44): Remounting filesystem read-only
[2023-10-13 02:51:09]  [60086.943848] EXT4-fs
[60086.943848] EXT4-fs (mmcblk0p44): ext4_writepages: jbd2_start: 1024 pages, ino 24635; err -30
[2023-10-13 02:51:09]  [60089.823354] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007
[2023-10-13 02:51:12]  [60089.823354]
[2023-10-13 02:51:12]  [60089.832522] CPU: 0 PID: 1 Comm: init Tainted: G           O    4.14.61-00012-g23e8b99bce8b-dirty #136
[2023-10-13 02:51:12]  [60089.841754] Hardware name: Semidrive kunlun x9 REF Board (DT)
[2023-10-13 02:51:12]  [60089.847510] Call trace:
[2023-10-13 02:51:12]  [60089.849983] [<ffff00000808a3cc>] dump_backtrace+0x0/0x3c0
[2023-10-13 02:51:12]  [60089.855400] [<ffff00000808a7a0>] show_stack+0x14/0x1c
[2023-10-13 02:51:12]  [60089.920459] Exception stack(0xffff00000a293ec0 to 0xffff00000a294000)
[2023-10-13 02:51:12]  [60089.926920] 3ec0: 0000000000000007 0000aaaae2d75358 0000000000000028 0000000000000180
[2023-10-13 02:51:12]  [60089.934768] 3ee0: 0000aaaae2d76067 0000ffff868b0508 00000000706d742e 00000000706d742e
[2023-10-13 02:51:12]  [60089.942603] 3f00: 0000ffffe744a8c0 0000000000000018 0000000000000000 0000ffffe744a850
[2023-10-13 02:51:12]  [60089.950434] 3f20: 0000ffffe744a7d0 0000ffffe744a808 0000000000000001 0000000000008000
[2023-10-13 02:51:12]  [60089.958264] 3f40: 0000ffff87976818 0000ffff8795b82c 0000ffff88056000 0000ffffe744aaa0
[2023-10-13 02:51:12]  [60089.966094] 3f60: 0000ffff8917c188 0000000000000001 0000ffffe744a8d8 0000ffffe744a8d0
[2023-10-13 02:51:12]  [60089.973924] 3f80: 000000000000001e 0000aaaae2e11388 0000aaaae2e111d8 0000aaaae2e111b0
[2023-10-13 02:51:12]  [60089.981753] 3fa0: 0000aaaae2e11200 0000ffffe744a840 0000ffff8906363c 0000ffffe74495f0
[2023-10-13 02:51:12]  [60089.989583] 3fc0: 0000aaaae2dcebe0 0000000020000000 00000000ffffff9c 00000000ffffffff
[2023-10-13 02:51:12]  [60089.997413] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[2023-10-13 02:51:12]  [60090.005244] [<ffff000008083964>] work_pending+0x8/0x10
[2023-10-13 02:51:12]  [60090.010388] SMP: stopping secondary CPUs
[2023-10-13 02:51:12]  [60090.014314] SMP: stopping secondary CPUs
[2023-10-13 02:51:12]  [60091.018237] SMP: failed to stop secondary CPUs 0-3
[2023-10-13 02:51:30]  [60107.853608] Kernel Offset: disabled
[2023-10-13 02:51:30]  [60107.857098] CPU features: 0x0802210
[2023-10-13 02:51:30]  [60107.860585] Memory Limit: none
[2023-10-13 02:51:30]  [60107.866736] flush all cache
[2023-10-13 02:51:30]  [60107.869595] flush all cache done
[2023-10-13 02:51:30]  [60107.872829] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007
[2023-10-13 02:51:30]  [60107.872829]
[2023-10-13 02:51:30]  [60107.884731] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:30]  [60107.891134] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:30]  [60107.897530] mmcblk0: error -110 sending status command, aborting
[2023-10-13 02:51:30]  [60107.904353] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:30]  [60107.913487] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:30]  [60107.919932] mmc0: sdhci: Sys addr:  0x00000040 | Version:  0x00000005
[2023-10-13 02:51:30]  [60107.926381] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:30]  [60107.932831] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:30]  [60107.939271] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:30]  [60107.945711] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:30]  [60107.952150] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:30]  [60107.958590] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:30]  [60107.965029] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:30]  [60107.971468] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:30]  [60107.977908] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:30]  [60107.984353] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:30]  [60107.990792] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:30]  [60107.997231] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:30]  [60108.003670] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:30]  [60108.008113] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:30]  [60108.015247] mmc0: sdhci: ============================================
[2023-10-13 02:51:30]  [60108.021736] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:30]  [60108.028121] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:30]  [60108.037254] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:30]  [60108.043699] mmc0: sdhci: Sys addr:  0x00000040 | Version:  0x00000005
[2023-10-13 02:51:30]  [60108.050147] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:30]  [60108.056597] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:30]  [60108.063037] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:30]  [60108.069478] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:30]  [60108.075917] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:30]  [60108.082357] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:30]  [60108.088796] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:30]  [60108.095236] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:30]  [60108.101676] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:30]  [60108.108120] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:30]  [60108.114560] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:30]  [60108.120999] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:30]  [60108.127438] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:30]  [60108.131881] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:30]  [60108.139014] mmc0: sdhci: ============================================
[2023-10-13 02:51:30]  [60108.145494] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:30]  [60108.151880] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:30]  [60108.161013] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:30]  [60108.167458] mmc0: sdhci: Sys addr:  0x00000040 | Version:  0x00000005
[2023-10-13 02:51:30]  [60108.173908] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:30]  [60108.180356] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:30]  [60108.186796] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:30]  [60108.193236] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:30]  [60108.199675] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:30]  [60108.206114] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:30]  [60108.212554] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:30]  [60108.218994] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:30]  [60108.225433] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:30]  [60108.231877] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:30]  [60108.238317] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:30]  [60108.244756] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:30]  [60108.251195] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:30]  [60108.255638] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:31]  [60108.262771] mmc0: sdhci: ============================================
[2023-10-13 02:51:31]  [60108.269251] mmcblk0: error -110 sending status command, aborting
[2023-10-13 02:51:31]  [60108.276055] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:31]  [60108.285187] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:31]  [60108.291631] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00000005
[2023-10-13 02:51:31]  [60108.298080] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:31]  [60108.304529] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:31]  [60108.310968] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:31]  [60108.317407] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:31]  [60108.323847] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:31]  [60108.330287] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:31]  [60108.336726] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:31]  [60108.343166] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:31]  [60108.349606] mmc0: sdhci: Caps:
[60108.349606] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:31]  [60108.356050] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:31]  [60108.362490] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:31]  [60108.368930] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:31]  [60108.375369] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:31]  [60108.379812] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:31]  [60108.386945] mmc0: sdhci: ============================================
[2023-10-13 02:51:31]  [60108.393418] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:31]  [60108.399807] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:31]  [60108.408941] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:31]  [60108.415385] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00000005
[2023-10-13 02:51:31]  [60108.421834] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:31]  [60108.428282] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:31]  [60108.434722] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:31]  [60108.441162] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:31]  [60108.447600] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:31]  [60108.454040] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:31]  [60108.460479] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:31]  [60108.466919] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:31]  [60108.473358] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:31]  [60108.479802] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:31]  [60108.486242] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:31]  [60108.492682] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:31]  [60108.499121] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:31]  [60108.503564] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:31]  [60108.510698] mmc0: sdhci: ============================================
[2023-10-13 02:51:31]  [60108.517175] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:31]  [60108.523607] mmcblk0: error -110 sending status command, aborting
[2023-10-13 02:51:31]  [60108.530402] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:31]  [60108.539535] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:31]  [60108.545979] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00000005
[2023-10-13 02:51:31]  [60108.552427] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:31]  [60108.558876] mmc0:
[60108.558876] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:31]  [60108.565316] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:31]  [60108.571756] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:31]  [60108.578196] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:31]  [60108.584636] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:31]  [60108.591075] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:31]  [60108.597515] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:31]  [60108.603955] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:31]  [60108.610399] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:31]  [60108.616839] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:31]  [60108.623278] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:31]  [60108.629717] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:31]  [60108.634160] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:31]  [60108.641293] mmc0: sdhci: ============================================
[2023-10-13 02:51:31]  [60108.647786] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:31]  [60108.654253] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:31]  [60108.663387] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:31]  [60108.669831] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00000005
[2023-10-13 02:51:31]  [60108.676280] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:31]  [60108.682729] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:31]  [60108.689169] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:31]  [60108.695608] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:31]  [60108.702048] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:31]  [60108.708488] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:31]  [60108.714928] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:31]  [60108.721367] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:31]  [60108.727807] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:31]  [60108.734251] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:31]  [60108.740691] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:31]  [60108.747131] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:31]  [60108.753570] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:31]  [60108.758013] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:31]  [60108.765146] mmc0: sdhci: ============================================
[2023-10-13 02:51:31]  [60108.771628] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:31]  [60108.778023] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:31]  [60108.787155] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:31]  [60108.793599] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00000005
[2023-10-13 02:51:31]  [60108.800049] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:31]  [60108.806499] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:31]  [60108.812939] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:31]  [60108.819378] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:31]  [60108.825818] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:31]  [60108.832258] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:31]  [60108.838698] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:31]  [60108.845137] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:31]  [60108.851577] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:31]  [60108.858021] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:31]  [60108.864460] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:31]  [60108.870900] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:31]  [60108.877339] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:31]  [60108.881782] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:31]  [60108.888915] mmc0: sdhci: ============================================
[2023-10-13 02:51:31]  [60108.895396] mmcblk0: error -110 sending status command, aborting
[2023-10-13 02:51:31]  [60108.902196] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:31]  [60108.911330] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:31]  [60108.917774] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00000005
[2023-10-13 02:51:31]  [60108.924224] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:31]  [60108.930674] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:31]  [60108.937113] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:31]  [60108.943553] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:31]  [60108.949992] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:31]  [60108.956432] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:31]  [60108.962871] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:31]  [60108.969311] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:31]  [60108.975751] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:31]  [60108.982195] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:31]  [60108.988635] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:31]  [60108.995074] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:31]  [60109.001513] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:31]  [60109.005956] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:31]  [60109.013089] mmc0: sdhci: ============================================
[2023-10-13 02:51:31]  [60109.019575] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:31]  [60109.025960] mmc0: Got command interrupt 0x00000001 even though no command operation was in progress.
[2023-10-13 02:51:31]  [60109.035093] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[2023-10-13 02:51:31]  [60109.041538] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00000005
[2023-10-13 02:51:31]  [60109.047986] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000000
[2023-10-13 02:51:31]  [60109.054435] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000032
[2023-10-13 02:51:31]  [60109.060875] mmc0: sdhci: Present:   0x03f700f0 | Host ctl: 0x00000001
[2023-10-13 02:51:31]  [60109.067315] mmc0: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[2023-10-13 02:51:31]  [60109.073753] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[2023-10-13 02:51:31]  [60109.080194] mmc0: sdhci: Timeout:   0x00000005 | Int stat: 0x00000000
[2023-10-13 02:51:31]  [60109.086633] mmc0: sdhci: Int enab:  0x01ff1033 | Sig enab: 0x01ff1033
[2023-10-13 02:51:31]  [60109.093072] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[2023-10-13 02:51:31]  [60109.099512] mmc0: sdhci: Caps:      0x3d6dc881 | Caps_1:   0x08008077
[2023-10-13 02:51:31]  [60109.105955] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[2023-10-13 02:51:31]  [60109.112394] mmc0: sdhci: Resp[0]:   0x40ff8080 | Resp[1]:  0x00000000
[2023-10-13 02:51:31]  [60109.118834] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[2023-10-13 02:51:31]  [60109.125274] mmc0: sdhci: Host ctl2: 0x00001808
[2023-10-13 02:51:31]  [60109.129716] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000000000
[2023-10-13 02:51:31]  [60109.136849] mmc0: sdhci: ============================================
[2023-10-13 02:51:31]  [60109.143355] mmcblk0: error -110 sending status command, retrying
[2023-10-13 02:51:31]  [60109.149802] mmcblk0: error -110 sending status command, aborting
Comment 1 Artem S. Tashkinov 2023-10-13 22:44:48 UTC
Looks like your storage is faulty:

mmcblk0: error -110 sending status
Comment 2 Theodore Tso 2023-10-14 21:44:47 UTC
Also, note the panic message:

Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007

This indicates that the init process received a SIGBUS (signal number 7).   Given all of the large number of mmc0 / sdhci errors, it's pretty clear that the storage device is *very* unhappy.


The most common cause, as Artem as stated, is that it's a hardware problem,   It's possible that forcing a factory reset might work.  If the SD card is removable, you could just to see if reseating the SD card, or if that doesn't work, replacing the SD card.  If the eMMC flash device is soldered onto the mainboard, then probably solution is complete hardware replacement.
Comment 3 Gary 2023-10-16 08:25:59 UTC
(In reply to Theodore Tso from comment #2)
> Also, note the panic message:
> 
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007
> 
> This indicates that the init process received a SIGBUS (signal number 7).  
> Given all of the large number of mmc0 / sdhci errors, it's pretty clear that
> the storage device is *very* unhappy.
> 
> 
> The most common cause, as Artem as stated, is that it's a hardware problem, 
> It's possible that forcing a factory reset might work.  If the SD card is
> removable, you could just to see if reseating the SD card, or if that
> doesn't work, replacing the SD card.  If the eMMC flash device is soldered
> onto the mainboard, then probably solution is complete hardware replacement.

Hi Theodore,

Thanks for your suggestion.

Uesed cpuburn and memtest tool test 7*24 hours, not found mmc issue, included stroage part.

We need use kernel 4.14, could you please kindly offer the debugging way?

Thanks,

BRs,
Gary
Comment 4 Gary 2023-10-16 08:27:01 UTC
(In reply to Artem S. Tashkinov from comment #1)
> Looks like your storage is faulty:
> 
> mmcblk0: error -110 sending status

Hi Artem S. Tashkinov,

Thanks for your suggestion.

It seems than it was not mmc issue.

Thanks,

BRs,
Gary
Comment 5 Gary 2023-10-16 08:44:12 UTC
Created attachment 305227 [details]
fs/ext4 code
Comment 6 Theodore Tso 2023-10-16 15:41:52 UTC
Unfortunately the 4.14 kernel was released in 2017, which is over six years ago.   Most companies where you can pay $$$ to get support for Linux distributions based on 4.14 are EOL'ing products based on 4.14.   As far upstream kernel developers who are essentially volunteers when people ask them for free help, in general, upstream kernel developers do not support LTS kernels, and certainly not an LTS kernel as old as 4.14.

If there is someone is willing to be the ext4 upstream stable backports maintainer, then that person might be willing to provide limited support for LTS kernels --- but the 4.14 LTS upstream kernel is planned to be EOL'ed in January 2024, and I had stopped running gce-xfstests on 4.14 LTS kernels about a year or so ago.  I barely have time to run gce-xfststs on LTS kernels for 6.1, 5.15 and 5.10 every quarter or two, and if someone were to volunteer to become ext4 stable backports maintainer, I'd encourage them to focus on 6.6 and 6.1 LTS kernels, with 5.10 and 5.15 LTS kernels as a lower priority (because most commercial companies are going to be moving off of 5.10 LTS in the near future).   But volunteer support for 4.14 LTS?  TO be honest, that's extremely unlikely.

*If* there is a company that has a misguided business reason to support the 4.14 LTS kernel, then of course an employee of that company can certainly fund an engineer to to do all of the support that they need.  But quite frankly, I'd be encouraging that company to rethink their business case for supporting the 4.14 kernel.   It would be probably far more cost effective to migrate their customers to a non-pre-historic kernel such as the 6.6 LTS kernel.
Comment 7 Gary 2023-10-17 01:41:21 UTC
(In reply to Theodore Tso from comment #6)
> Unfortunately the 4.14 kernel was released in 2017, which is over six years
> ago.   Most companies where you can pay $$$ to get support for Linux
> distributions based on 4.14 are EOL'ing products based on 4.14.   As far
> upstream kernel developers who are essentially volunteers when people ask
> them for free help, in general, upstream kernel developers do not support
> LTS kernels, and certainly not an LTS kernel as old as 4.14.
> 
> If there is someone is willing to be the ext4 upstream stable backports
> maintainer, then that person might be willing to provide limited support for
> LTS kernels --- but the 4.14 LTS upstream kernel is planned to be EOL'ed in
> January 2024, and I had stopped running gce-xfstests on 4.14 LTS kernels
> about a year or so ago.  I barely have time to run gce-xfststs on LTS
> kernels for 6.1, 5.15 and 5.10 every quarter or two, and if someone were to
> volunteer to become ext4 stable backports maintainer, I'd encourage them to
> focus on 6.6 and 6.1 LTS kernels, with 5.10 and 5.15 LTS kernels as a lower
> priority (because most commercial companies are going to be moving off of
> 5.10 LTS in the near future).   But volunteer support for 4.14 LTS?  TO be
> honest, that's extremely unlikely.
> 
> *If* there is a company that has a misguided business reason to support the
> 4.14 LTS kernel, then of course an employee of that company can certainly
> fund an engineer to to do all of the support that they need.  But quite
> frankly, I'd be encouraging that company to rethink their business case for
> supporting the 4.14 kernel.   It would be probably far more cost effective
> to migrate their customers to a non-pre-historic kernel such as the 6.6 LTS
> kernel.

Thanks for your reply.

We will try to debug this issue. For this issue, I think that we should focus on the below infromation. Emmc error should be one side effect.

[2023-10-13 02:51:08]  [60086.731357] EXT4-fs error (device mmcblk0p44) in ext4_da_write_end:3210: IO failure
[2023-10-13 02:51:09]  [60086.739386] EXT4-fs (mmcblk0p44): Delayed block allocation failed for inode 155757 at logical offset 438 with max blocks 25 with error 30
[2023-10-13 02:51:09]  [60086.739388] EXT4-fs (mmcblk0p44): This should not happen!! Data will be lost
[2023-10-13 02:51:09]  [60086.739388]
[2023-10-13 02:51:09]  [60086.739399] EXT4-fs error (device mmcblk0p44) in ext4_writepages:2918: Journal has aborted

Note You need to log in before you can comment on or make changes to this bug.