Bug 206625

Summary: ubifs: tnc mismatch data|inode node caused by recovery
Product: File System Reporter: Zhihao Cheng (chengzhihao1)
Component: OtherAssignee: fs_other
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.6.0-rc2 Subsystem:
Regression: No Bisected commit-id:
Attachments: bad flash with ecc error
set up ubifs
simulate_ecc_corrupt
flash.tar.gz
reproduce patch

Description Zhihao Cheng 2020-02-21 13:14:24 UTC

    
Comment 1 Zhihao Cheng 2020-02-21 13:27:08 UTC
1. Decompress flash_bad

2. ./setup.sh # You may see some ecc errs, but UBIFS ignored them
[  124.560001] __nand_correct_data: uncorrectable ECC error
[  124.561082] ubi0 warning: ubi_io_read [ubi]: error -74 (ECC error) while reading 81920 bytes from PEB 1651:49152, read only 81920 bytes, retry
[  124.563293] __nand_correct_data: uncorrectable ECC error
[  124.564223] ubi0 warning: ubi_io_read [ubi]: error -74 (ECC error) while reading 81920 bytes from PEB 1651:49152, read only 81920 bytes, retry
[  124.566420] __nand_correct_data: uncorrectable ECC error
[  124.567387] ubi0 warning: ubi_io_read [ubi]: error -74 (ECC error) while reading 81920 bytes from PEB 1651:49152, read only 81920 bytes, retry
[  124.569406] __nand_correct_data: uncorrectable ECC error
[  124.570312] ubi0 error: ubi_io_read [ubi]: error -74 (ECC error) while reading 81920 bytes from PEB 1651:49152, read 81920 bytes

# mount success
3. du -h temp
[  234.254701] UBIFS error (ubi0:0 pid 1719): ubifs_read_node [ubifs]: bad node type (255 but expected 2)
[  234.258970] UBIFS error (ubi0:0 pid 1719): ubifs_read_node [ubifs]: bad node at LEB 61:51200, LEB mapping status 0
[  234.261324] Not a node, first 24 bytes:
[  234.261330] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff                          ........................
[  234.264978] CPU: 4 PID: 1719 Comm: du Not tainted 5.6.0-rc2 #131
[  234.266276] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[  234.268682] Call Trace:
[  234.269238]  dump_stack+0x98/0xca
[  234.269986]  ubifs_read_node+0x2f7/0x3d0 [ubifs]
[  234.270989]  ? stack_trace_save+0x53/0x80
[  234.271896]  ubifs_tnc_read_node+0x1e7/0x2b0 [ubifs]
[  234.272976]  ? kmemleak_alloc+0x79/0xe0
[  234.273831]  tnc_read_hashed_node+0x1bc/0x2a0 [ubifs]
[  234.274946]  ubifs_tnc_next_ent+0x370/0x3d0 [ubifs]
[  234.276011]  ? put_object+0x5c/0xa0
[  234.276770]  ? __delete_object+0x55/0xa0
[  234.277619]  ? filldir+0x5e/0x350
[  234.278360]  ubifs_readdir+0x1f4/0x6a0 [ubifs]
[  234.279330]  iterate_dir+0x163/0x240
[  234.280107]  __x64_sys_getdents+0xdc/0x210
[  234.280993]  ? filldir64+0x350/0x350
[  234.281772]  ? do_syscall_64+0xbf/0x440
[  234.282603]  do_syscall_64+0xbf/0x440
[  234.283413]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  234.284500] RIP: 0033:0x7fad1461433b
Comment 2 Zhihao Cheng 2020-02-21 13:28:46 UTC
Created attachment 287535 [details]
bad flash with ecc error
Comment 3 Zhihao Cheng 2020-02-21 13:29:06 UTC
Created attachment 287537 [details]
set up ubifs
Comment 4 Zhihao Cheng 2020-02-21 13:46:34 UTC
How to simulate the bad flash with ecc error?

1. We get a good flash and mount ubifs on it (See flash.tar.gz in attachments). 
This flash is dumped with following conditions (Can be easily created by random writing ubifs with reproduce patch applied in attachments):
  A. Valid inode nodes has been moved to GC LEB 42, old LEB was unmapped.
  B. LEB 42 ends with inode nodes.
  C. LEB 42 is the last GC type bud node in log.
Following displays the replay nodes in LEB 42:
[   51.189352] replay 42:45056 0======  1651
[   51.190981] inum 3393, type 0 42:45056 sqnum 42059
[   51.191774] inum 3394, type 0 42:45216 sqnum 42069
...
[   51.224522] inum 3370, type 0 42:54688 sqnum 41726
[   51.225328] inum 3371, type 0 42:54848 sqnum 41736
[   51.226140] inum 3372, type 0 42:55008 sqnum 41746
[   51.226929] inum 3373, type 0 42:55168 sqnum 41756
[   51.227732] inum 3374, type 0 42:55328 sqnum 41766
[   51.228556] inum 3375, type 0 42:55488 sqnum 41776
[   51.229366] inum 3376, type 0 42:55648 sqnum 41786
[   51.230347] ----------

2. Use simulate_ecc_corrupt (See simulate_ecc_corrupt.c in attachments) to corrupt crc in inode node 3374. The flash_bad is created!
   ./simulate_ecc_corrupt
Comment 5 Zhihao Cheng 2020-02-21 13:47:05 UTC
Created attachment 287539 [details]
simulate_ecc_corrupt
Comment 6 Zhihao Cheng 2020-02-21 13:47:26 UTC
Created attachment 287541 [details]
flash.tar.gz
Comment 7 Zhihao Cheng 2020-02-21 13:48:09 UTC
Created attachment 287543 [details]
reproduce patch
Comment 8 Zhihao Cheng 2020-02-21 13:54:55 UTC
After applying fix patch, mount will return failure to prevent inode nodes being discarded. We can try other methods to recovery ubifs, at least, the important data are still exist on flash.
1. ./setup.sh
mount: /root/temp: mount(2) system call failed: Structure needs cleaning.
[   39.360386] UBIFS error (ubi0:0 pid 7432): ubifs_scanned_corruption [ubifs]: corruption at LEB 42:55328
[   39.361942] UBIFS error (ubi0:0 pid 7432): ubifs_scanned_corruption [ubifs]: first 8192 bytes from LEB 42:55328
[   39.364157] UBIFS error (ubi0:0 pid 7432): ubifs_recover_leb [ubifs]: LEB 42 scanning failed, ecc error detected!