Bug 206625 - ubifs: tnc mismatch data|inode node caused by recovery
Summary: ubifs: tnc mismatch data|inode node caused by recovery
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-21 13:14 UTC by Zhihao Cheng
Modified: 2020-02-21 13:54 UTC (History)
0 users

See Also:
Kernel Version: 5.6.0-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
bad flash with ecc error (4.73 MB, application/gzip)
2020-02-21 13:28 UTC, Zhihao Cheng
Details
set up ubifs (621 bytes, application/x-shellscript)
2020-02-21 13:29 UTC, Zhihao Cheng
Details
simulate_ecc_corrupt (911 bytes, text/x-csrc)
2020-02-21 13:47 UTC, Zhihao Cheng
Details
flash.tar.gz (4.73 MB, application/gzip)
2020-02-21 13:47 UTC, Zhihao Cheng
Details
reproduce patch (3.44 KB, patch)
2020-02-21 13:48 UTC, Zhihao Cheng
Details | Diff

Description Zhihao Cheng 2020-02-21 13:14:24 UTC

    
Comment 1 Zhihao Cheng 2020-02-21 13:27:08 UTC
1. Decompress flash_bad

2. ./setup.sh # You may see some ecc errs, but UBIFS ignored them
[  124.560001] __nand_correct_data: uncorrectable ECC error
[  124.561082] ubi0 warning: ubi_io_read [ubi]: error -74 (ECC error) while reading 81920 bytes from PEB 1651:49152, read only 81920 bytes, retry
[  124.563293] __nand_correct_data: uncorrectable ECC error
[  124.564223] ubi0 warning: ubi_io_read [ubi]: error -74 (ECC error) while reading 81920 bytes from PEB 1651:49152, read only 81920 bytes, retry
[  124.566420] __nand_correct_data: uncorrectable ECC error
[  124.567387] ubi0 warning: ubi_io_read [ubi]: error -74 (ECC error) while reading 81920 bytes from PEB 1651:49152, read only 81920 bytes, retry
[  124.569406] __nand_correct_data: uncorrectable ECC error
[  124.570312] ubi0 error: ubi_io_read [ubi]: error -74 (ECC error) while reading 81920 bytes from PEB 1651:49152, read 81920 bytes

# mount success
3. du -h temp
[  234.254701] UBIFS error (ubi0:0 pid 1719): ubifs_read_node [ubifs]: bad node type (255 but expected 2)
[  234.258970] UBIFS error (ubi0:0 pid 1719): ubifs_read_node [ubifs]: bad node at LEB 61:51200, LEB mapping status 0
[  234.261324] Not a node, first 24 bytes:
[  234.261330] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff                          ........................
[  234.264978] CPU: 4 PID: 1719 Comm: du Not tainted 5.6.0-rc2 #131
[  234.266276] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[  234.268682] Call Trace:
[  234.269238]  dump_stack+0x98/0xca
[  234.269986]  ubifs_read_node+0x2f7/0x3d0 [ubifs]
[  234.270989]  ? stack_trace_save+0x53/0x80
[  234.271896]  ubifs_tnc_read_node+0x1e7/0x2b0 [ubifs]
[  234.272976]  ? kmemleak_alloc+0x79/0xe0
[  234.273831]  tnc_read_hashed_node+0x1bc/0x2a0 [ubifs]
[  234.274946]  ubifs_tnc_next_ent+0x370/0x3d0 [ubifs]
[  234.276011]  ? put_object+0x5c/0xa0
[  234.276770]  ? __delete_object+0x55/0xa0
[  234.277619]  ? filldir+0x5e/0x350
[  234.278360]  ubifs_readdir+0x1f4/0x6a0 [ubifs]
[  234.279330]  iterate_dir+0x163/0x240
[  234.280107]  __x64_sys_getdents+0xdc/0x210
[  234.280993]  ? filldir64+0x350/0x350
[  234.281772]  ? do_syscall_64+0xbf/0x440
[  234.282603]  do_syscall_64+0xbf/0x440
[  234.283413]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  234.284500] RIP: 0033:0x7fad1461433b
Comment 2 Zhihao Cheng 2020-02-21 13:28:46 UTC
Created attachment 287535 [details]
bad flash with ecc error
Comment 3 Zhihao Cheng 2020-02-21 13:29:06 UTC
Created attachment 287537 [details]
set up ubifs
Comment 4 Zhihao Cheng 2020-02-21 13:46:34 UTC
How to simulate the bad flash with ecc error?

1. We get a good flash and mount ubifs on it (See flash.tar.gz in attachments). 
This flash is dumped with following conditions (Can be easily created by random writing ubifs with reproduce patch applied in attachments):
  A. Valid inode nodes has been moved to GC LEB 42, old LEB was unmapped.
  B. LEB 42 ends with inode nodes.
  C. LEB 42 is the last GC type bud node in log.
Following displays the replay nodes in LEB 42:
[   51.189352] replay 42:45056 0======  1651
[   51.190981] inum 3393, type 0 42:45056 sqnum 42059
[   51.191774] inum 3394, type 0 42:45216 sqnum 42069
...
[   51.224522] inum 3370, type 0 42:54688 sqnum 41726
[   51.225328] inum 3371, type 0 42:54848 sqnum 41736
[   51.226140] inum 3372, type 0 42:55008 sqnum 41746
[   51.226929] inum 3373, type 0 42:55168 sqnum 41756
[   51.227732] inum 3374, type 0 42:55328 sqnum 41766
[   51.228556] inum 3375, type 0 42:55488 sqnum 41776
[   51.229366] inum 3376, type 0 42:55648 sqnum 41786
[   51.230347] ----------

2. Use simulate_ecc_corrupt (See simulate_ecc_corrupt.c in attachments) to corrupt crc in inode node 3374. The flash_bad is created!
   ./simulate_ecc_corrupt
Comment 5 Zhihao Cheng 2020-02-21 13:47:05 UTC
Created attachment 287539 [details]
simulate_ecc_corrupt
Comment 6 Zhihao Cheng 2020-02-21 13:47:26 UTC
Created attachment 287541 [details]
flash.tar.gz
Comment 7 Zhihao Cheng 2020-02-21 13:48:09 UTC
Created attachment 287543 [details]
reproduce patch
Comment 8 Zhihao Cheng 2020-02-21 13:54:55 UTC
After applying fix patch, mount will return failure to prevent inode nodes being discarded. We can try other methods to recovery ubifs, at least, the important data are still exist on flash.
1. ./setup.sh
mount: /root/temp: mount(2) system call failed: Structure needs cleaning.
[   39.360386] UBIFS error (ubi0:0 pid 7432): ubifs_scanned_corruption [ubifs]: corruption at LEB 42:55328
[   39.361942] UBIFS error (ubi0:0 pid 7432): ubifs_scanned_corruption [ubifs]: first 8192 bytes from LEB 42:55328
[   39.364157] UBIFS error (ubi0:0 pid 7432): ubifs_recover_leb [ubifs]: LEB 42 scanning failed, ecc error detected!

Note You need to log in before you can comment on or make changes to this bug.