Bug 201639

Summary: ext4 and btrfs filesystem corruption
Product: IO/Storage Reporter: richts
Component: Serial ATAAssignee: Tejun Heo (tj)
Status: RESOLVED UNREPRODUCIBLE    
Severity: normal CC: dsterba, henrique.rodrigues
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.19.1 Subsystem:
Regression: No Bisected commit-id:

Description richts 2018-11-08 17:52:46 UTC
I recently installed the 4.19.1 mainline kernel from the ubuntu kernel team on my Kubuntu 18.10 installation on 2 systems. Both ran 4.19 before that.

After the upgrade I experienced several filesystem corruptions that I had to fix with a live system (actually, I was able to fix the laptop's ext4 from busybox).

1 of the systems is a 9 year old Dell laptop (Studio 1557) with the root fs (ext4) on a SSD. The other machine is a Dell workstation (about 3 years old), also the root filesytem (btrfs) is on a SSD. The workstation uses UEFI boot.

I'm using the mainline kernels for a long time now, and I never had any issues. I still believe that I didn't have any problems in 4.19, but in 4.19.1 they appear pretty quick after login.

Just to mention, both filesystems switched to readonly mode as they discovered the issues. When using the same systems with the distros default kernel (some 4.18 right now) all the problems are gone.
Comment 1 richts 2018-11-09 05:23:51 UTC
The kernel.log.1 of my dell workstation shows entries like

Nov  5 09:01:42 TAG0094487167 kernel: [ 1257.844137] BTRFS error (device sda4): bad tree block start, want 14247926824960 have 18374686483949813760
Nov  5 09:01:43 TAG0094487167 kernel: [ 1258.612421] BTRFS error (device sda4): bad tree block start, want 14247586365440 have 2537442050321239272
Nov  5 09:23:43 TAG0094487167 kernel: [ 2578.735971] BTRFS error (device sda4): bad tree block start, want 14249406283776 have 1767

and

Nov  5 21:18:21 TAG0094487167 kernel: [45456.528604] BTRFS error (device sda4): bad tree block start, want 12038601654272 have 0
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537903] INFO: task journal-offline:29867 blocked for more than 120 seconds.
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537907]       Not tainted 4.19.1-041901-generic #201811041431
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537911] journal-offline D    0 29867      1 0x00000120
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537914] Call Trace:
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537921]  __schedule+0x29e/0x840
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537924]  schedule+0x2c/0x80
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537962]  wait_for_commit+0x5e/0x90 [btrfs]
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537965]  ? wait_woken+0x80/0x80
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537984]  btrfs_commit_transaction+0x6ca/0x870 [btrfs]
Nov  5 21:24:01 TAG0094487167 kernel: [45796.537988]  ? dput+0xe/0x10
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538009]  ? btrfs_log_dentry_safe+0x61/0x80 [btrfs]
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538031]  btrfs_sync_file+0x36e/0x3c0 [btrfs]
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538034]  vfs_fsync_range+0x48/0x80
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538036]  ? __fget_light+0x54/0x60
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538038]  do_fsync+0x3d/0x70
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538040]  __x64_sys_fsync+0x14/0x20
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538043]  do_syscall_64+0x5a/0x110
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538045]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538047] RIP: 0033:0x7fee80dc2197
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538052] Code: Bad RIP value.
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538053] RSP: 002b:00007fee7da1ccb0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538055] RAX: ffffffffffffffda RBX: 0000000000000044 RCX: 00007fee80dc2197
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538056] RDX: 0000000000000000 RSI: 00007fee80bbd9c6 RDI: 0000000000000044
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538057] RBP: 00007fee80bbeec0 R08: 00007fee7da1d700 R09: 00007fee7da1d700
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538058] R10: 0000000000000006 R11: 0000000000000293 R12: 0000000000000002
Nov  5 21:24:01 TAG0094487167 kernel: [45796.538059] R13: 00007ffe6bb30d7f R14: 0000000000000000 R15: 00007fee7da1cdc0
Comment 2 richts 2018-11-14 18:43:37 UTC
Today I tested the 4.19.2 kernel on the desktop machine (btrfs), the problem happened again after some hours of usual work.
Comment 3 Henrique Rodrigues 2018-11-27 23:57:04 UTC
Could this be the same as 201685?
Comment 4 richts 2018-11-30 10:48:47 UTC
Of course my ext4 problems could be the same as 201685, in that case my btrfs problems would be another issue.
Comment 5 Artem S. Tashkinov 2018-12-05 20:43:35 UTC
It was discovered that bug 201685 may affect any file system since the issue is in the block layer. Please try this patch (after running fsck): https://bugzilla.kernel.org/show_bug.cgi?id=201685#c255
Comment 6 richts 2018-12-12 17:37:51 UTC
I upgraded the laptop to 4.20.0-rc6 and the problem seems to be gone. Due to a coincidence, I don't have the other machine anymore, so I can't verify the btrfs part...

Nevertheless, I assume this is a duplicate anyway.