I've got an corrupted XFS file system on an USB hard disk courtesy of a TV with USB storage capability. This particular TV model is known to corrupt file systems from time to time. The problem persists for a long time now and occurs regularly. Usually, I can repair the file system by attaching the USB disk to one of my Gentoo boxes, mounting them, perform an xfs_repair just for good measure and deleting some metafiles. Those three steps solve various problems I encountered so far. This time it didn't solve anything. Instead three issues occurred before I could implement those steps: 1) I could not mount the corrupted file system; the mount command hangs (nor can I perform an "xfs_repair", it suggests to mount the partition to replay the journal :) 2) only solution seems to be the use of 'xfs_repair -L' (but I'd like to keep the data intact) 3) my USB subsystem hung completely; the kernel shows various issues, though it is more or less usable not considering USB Issue 3) is definitly because of the PAX features enabled. Using an kernel without PAX/Grsecurity the USB subsystem shows no problems. I will report that to the guys at grsecurity.net. Not your concern. Issue 1) on the other hand might be in your domain, issue 2) probably the domain of the manufacturer of the TV, but maybe you have an idea, which refrains from using 'xfs_repair -L'. I encounter the following symptoms: a) "sync" command stalls; so does "xfs_repair" and another "mount" applied to the corrupt partition; stalling means that I even cannot kill them by executing "kill -9" b) I cannot reboot cleanly anymore; console hangs right after "powering down now" message; system is still running, I can log in at another console; but ultimately have to flip the power switch to regain full functionality (I think that is because a "sync" stalling.) c) suspend-to-ram is not possible, sleep indicator LED is blinking (again, maybe because of hanging at sync) d) but, I can mount and unmount a remote sshfs, or any other file system after the problem occurs I have similar symptoms on two systems based upon Gentoo (one x86, one x86-64); will attach both dmesg output, where the last line in each file is the result from "uname -a". Both systems have a kernel with PAX enabled. But when investigating the problem I relied on a non-hardened (vanilla) kernel, in which the USB subsystem has no problems. Trying the Gentoo kernel 3.4.67 the mount command does not hang but instead provides the following error message > mount: mount /dev/sdb1 on /mnt/standard failed: Die Struktur muss bereinigt > werden Translation would be like: "The structure has to be cleared". I will attach that dmesg as well and the kernel's configuration. This behaviour is more like what I'd expect. Though I hoped the disk might be salvageable without dropping the journal. The problem that 'mount' hangs seems to exist at least as long as 3.10.1; I tried that kernel, too. Trying a vanilla 3.12.1 kernel directly from kernel.org, the problem still persists (mount command hangs). Again, I will attach that dmesg output and the kernel's configuration. Here is some information I gathered so far: xfs_repair Version 3.1.10 I will attach a trace report as requested in http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F and the 'dmesg' output of 'echo w > /proc/sysrq-trigger'. Regarding symptom a), I will attach the output of "ps aux | grep \(mount\|xfs\)" showing that both processes are in uninterruptable sleep. Sadly, because of the erratic nature of the TV, I cannot provide any steps to reproduce the issue. The hard disk is not mine, but I will try to keep it around to perform tests on the file system if requested.
Created attachment 115391 [details] mount error dmesg; PAX enabled kernel 3.11.2 x86
Created attachment 115401 [details] mount error dmesg; PAX enabled kernel 3.11.2 x86-64
Created attachment 115411 [details] mount error dmesg; Gentoo kernel 3.4.67 x86-64
Created attachment 115421 [details] kernel config; Gentoo kernel 3.4.67 x86-64
Created attachment 115431 [details] mount error dmesg; vanilla kernel 3.12.1 x86-64
Created attachment 115441 [details] kernel config; vanilla kernel 3.12.1 x86-64
Created attachment 115451 [details] XFS trace report; vanilla kernel 3.12.1 x86-64
Created attachment 115461 [details] sysrq shows blocked processes; vanilla kernel 3.12.1 x86-64
Created attachment 115471 [details] output of 'ps aux | grep (xfs|mount)' after error occurs; vanilla kernel 3.12.1 x86-64
If you have a corrupted filesystem that log recovery can't complete on, then your only option for recovery is xfs_repair -L.. The mount hang was caused by teh corruption that was detected and recovery not cleaning up an EFI that was left in teh AIL. unmount will only succeed when teh AIL is emptied, an dit can't do that with recovery EFIs. I'll have a look at why that is happening. But quite frankly, we don't knwo what the TV or PAX has done to your filesystem, so recovery is pretty much a case of run repair and clean up the pieces that are left behind. Normally it does a pretty good job when errors like this are detected, but other than the mount hang there isn't anything that we can really help you fix. As such, you may as well close this bug because otherwise it's just going to sit here forever and bitrot.... -Dave.
(In reply to Dave Chinner from comment #10) > when teh AIL is emptied, an dit can't do that with recovery EFIs. I'll have > a look at why that is happening. Thanks. > Normally it does a pretty good job when errors > like this are detected, but other than the mount hang there isn't anything > that we can really help you fix. Understood. "xfs_repair -L" it is. > As such, you may as well close this bug because otherwise it's just going to > sit here forever and bitrot.... Since the main concern of this bug report is that the try to mount the corrupt partition hangs and puts my system into a half usable state, I will leave the bug open until that's resolved. Or someone tells me it cannot be resolved. > But quite frankly, we don't knwo what the TV or PAX has done to your > filesystem, I think PAX didn't do anything to the file system. :) Just wanted to give a complete picture. The TV, well, that's a whole different story. I just got another corrupted HDD within a week from the same TV. I doubt that it has actually anything to do with XFS. XFS has just to deal with the consequences. Weird thing though: The TV's OS seems to be based upon Linux. I wonder what they did wrong while integrating that into their TV system. But, I digress; thanks again for your help.
Samsung TV I assume?
Yes, was it that obvious. ;)
Closing, because bug didn't really bug me anymore. I think it's possible, that the TV is still corrupting disks, but I haven't heard from the owner and I would have no further input to provide to help with the issue. I don't know if a current kernel would still hang up the `mount` command. Modus operandi is to connect the hard disk to a LiveCD I made, which automatically does the shell commands, I manually perform to recover the disk.
Though I suspect it's borderline off-topic; I'd like to update you on the reason why the TV "broke" the disk: It seems that current fluctuations in the power network lead to the TV shutting down improperly. That's why the disk journal was not clean. When the TV is recording during that time, the journal seems to be in even bigger jeopardy and could get corrupted beyond recovery; then dropping it is the only solution.
FWIW, this is likely a hardware or configuration issue, unless shown otherwise ... xfs doesn't get corrupted on a power loss in general, that is the whole point of the log. It's likely that an external USB disk is not properly handling cache flush requests.
(In reply to Eric Sandeen from comment #16) > FWIW, this is likely a hardware or configuration issue, unless shown > otherwise ... xfs doesn't get corrupted on a power loss in general, that is > the whole point of the log. Your pointers are very much appreciated, thank you. Though I am aware of the idea of journaling, I did not come to the conclusion that some -- in my eyes -- established technology like USB-SATA interfaces could be implemented "faulty". If I get the chance, I will take a dedicated look at the external hard disk set-up itself and see if I can reproduce the error by disconnecting during a continious write using a recent kernel on some Linux desktop machine. Maybe I can confirm your suspicion and it's the USB-SATA interface not passing the flush/barrier request through or the hard disk ignoring the request. In the mean time, I had the pleasure of having one of those USB hard disks again. :) The original/actual problem I described (file system hangs on mount; even `sync` hangs afterwards; clean shutdown impossible) happend with a pretty recent "3.18.0-rc5" without any GrSecurity Patch. I keep the bug closed, unless someone else has the same problem -- it seems like an edge case. [ 0.000000] Linux version 3.18.0-rc5-laptop03 (root@laptop03) (gcc version 4.7.3 (Gentoo Hardened 4.7.3-r1 p1.4, pie-0.5.5) ) #1 SMP PREEMPT Sun Nov 23 20:22:47 GMT-1 2014 [...] [462151.655334] XFS (sdc1): Mounting V4 Filesystem [462151.903453] XFS (sdc1): Starting recovery (logdev: internal) [462152.241261] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1595 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_extent+0xd4/0x110 [462152.241266] CPU: 3 PID: 372924 Comm: mount Tainted: G U W 3.18.0-rc5-laptop03 #1 [462152.241267] Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET91WW (2.51 ) 01/14/2013 [462152.241269] 0000000000000000 000000000140ceb8 ffffffff81ab829b ffff8802a372bb30 [462152.241272] ffffffff81330e89 ffff88040a3ec3c0 0000000000000000 ffff88040b49c000 [462152.241274] ffff88040a075200 0000000100000000 0000000000000000 0000000000000001 [462152.241276] Call Trace: [462152.241288] [<ffffffff81ab829b>] ? dump_stack+0x49/0x6a [462152.241291] [<ffffffff81330e89>] ? xfs_free_ag_extent+0x519/0x690 [462152.241293] [<ffffffff81332fb4>] ? xfs_free_extent+0xd4/0x110 [462152.241297] [<ffffffff8138d55c>] ? xlog_recover_process_efi+0x16c/0x1b0 [462152.241299] [<ffffffff8138f065>] ? xlog_recover_process_efis.isra.25+0x55/0xa0 [462152.241302] [<ffffffff813928e3>] ? xlog_recover_finish+0x23/0xb0 [462152.241304] [<ffffffff81387edc>] ? xfs_log_mount_finish+0x2c/0x50 [462152.241307] [<ffffffff8137fc15>] ? xfs_mountfs+0x4f5/0x740 [462152.241310] [<ffffffff81382b3f>] ? xfs_fs_fill_super+0x2cf/0x340 [462152.241312] [<ffffffff81382870>] ? xfs_parseargs+0xbb0/0xbb0 [462152.241315] [<ffffffff8118ea67>] ? mount_bdev+0x1c7/0x210 [462152.241317] [<ffffffff8118ecba>] ? mount_fs+0x1a/0xd0 [462152.241321] [<ffffffff811a93e2>] ? vfs_kern_mount+0x72/0x130 [462152.241323] [<ffffffff811ab013>] ? do_mount+0x1e3/0xa60 [462152.241326] [<ffffffff81180374>] ? __kmalloc_track_caller+0x34/0x140 [462152.241329] [<ffffffff8114aacf>] ? memdup_user+0x3f/0x90 [462152.241331] [<ffffffff811abb68>] ? SyS_mount+0x78/0xc0 [462152.241334] [<ffffffff81ac1e92>] ? system_call_fastpath+0x12/0x17 [462152.241339] XFS (sdc1): Failed to recover EFIs [462152.241341] XFS (sdc1): log mount finish failed Another try to mount USB drive with 3.17.2-hardened: [ 0.000000] Linux version 3.17.2-hardened-r1-laptop03 (root@laptop03) (gcc version 4.7.3 (Gentoo Hardened 4.7.3-r1 p1.4, pie-0.5.5) ) #1 SMP PREEMPT Wed Nov 12 15:15 [...] [ 106.519796] XFS (sdb1): Mounting V4 Filesystem [ 106.764790] XFS (sdb1): Starting recovery (logdev: internal) [ 107.074952] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1595 of file fs/xfs/libxfs/xfs_alloc.c. Caller ffffffff81377c8d [ 107.074962] CPU: 0 PID: 2984 Comm: mount Not tainted 3.17.2-hardened-r1- laptop03 #1 [ 107.074965] Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET91WW (2.51 ) 01/14/2013 [ 107.074968] 0000000000000000 000000000140ceb8 ffffffff81badb38 ffff880406a9c000 [ 107.074974] ffffffff81375b39 ffff88040b0e2300 0000000000000000 ffff88040b35b800 [ 107.074978] ffff880403c0b200 0000000100000000 0000000000000000 0000000000000001 [ 107.074983] Call Trace: [ 107.074995] [<ffffffff81badb38>] ? dump_stack+0x49/0x7c [ 107.075003] [<ffffffff81375b39>] ? xfs_free_ag_extent+0x519/0x690 [ 107.075009] [<ffffffff813ce400>] ? xfs_parseargs+0xc40/0xc40 [ 107.075014] [<ffffffff81377c8d>] ? xfs_free_extent+0xdd/0x110 [ 107.075022] [<ffffffff813d9c0c>] ? xlog_recover_process_efi+0x16c/0x1b0 [ 107.075028] [<ffffffff813dbc35>] ? xlog_recover_process_efis.isra.25+0x55/0xa0 [ 107.075032] [<ffffffff813df723>] ? xlog_recover_finish+0x23/0xc0 [ 107.075037] [<ffffffff813d404c>] ? xfs_log_mount_finish+0x2c/0x50 [ 107.075043] [<ffffffff813cb5c5>] ? xfs_mountfs+0x515/0x760 [ 107.075048] [<ffffffff813ce6ef>] ? xfs_fs_fill_super+0x2ef/0x390 [ 107.075053] [<ffffffff811b7958>] ? mount_bdev+0x1d8/0x220 [ 107.075058] [<ffffffff811b7bf5>] ? mount_fs+0x25/0xf0 [ 107.075067] [<ffffffff811d72e2>] ? vfs_kern_mount+0x72/0x140 [ 107.075074] [<ffffffff811d93b8>] ? do_mount+0x578/0xbf0 [ 107.075081] [<ffffffff81517a2b>] ? copy_user_enhanced_fast_string+0x1b/0x30 [ 107.075090] [<ffffffff81537bfc>] ? strnlen_user+0xec/0x2a0 [ 107.075097] [<ffffffff811d9fb8>] ? SyS_mount+0x98/0xf0 [ 107.075103] [<ffffffff81bb8bb3>] ? system_call_fastpath+0x16/0x1b [ 107.075110] [<ffffffff811b9262>] ? SYSC_newstat+0x12/0x40 [ 107.075116] [<ffffffff81bb8bdb>] ? sysret_check+0x1e/0x60 [ 107.075130] XFS (sdb1): Failed to recover EFIs [ 107.075135] XFS (sdb1): log mount finish failed
Created attachment 181651 [details] xfs trace on kernel 4.0.5
This problem still is present in kernel 4.0.5 of openSuse Tumbleweed (4.0.5-3-default #1 SMP) and also Debian jessie with kernel 3.16 (3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux) is affected by the hanging mount. I have a dd dump of the filesystem and can reproduce the hanging mount as often as it is needed to find the issue. It can be repaired with xfs_repair -L of version 3.2.2 in a way that you can mount it afterwards. I attached a trace report of it. You can find a metadata dump at "http://ftp.innogames.de/~juergen/metadata_dump" but I'm not sure if it is complete as it terminated with "xfs_metadump: invalid dqblk inode number (-1)". with the debug kernel mount does not hang, but gets a Segmentation fault after 1 second. Thanks for looking at it, Jürgen openSuse kernel 4.0.5-debug: [ 198.572472] XFS (vdb): Mounting V4 Filesystem [ 198.644457] XFS (vdb): Starting recovery (logdev: internal) [ 199.975722] XFS: Assertion failed: fs_is_ok, file: ../fs/xfs/libxfs/xfs_alloc.c, line: 1594 [ 199.975757] ------------[ cut here ]------------ [ 199.975762] kernel BUG at ../fs/xfs/xfs_message.c:106! [ 199.975768] invalid opcode: 0000 [#1] SMP [ 199.975775] Modules linked in: xfs libcrc32c nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit dell_rbu iscsi_ibft iscsi_boot_sysfs af_packet ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw dcdbas ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables ppdev pcspkr serio_raw virtio_balloon qxl ttm drm_kms_helper drm i2c_piix4 pvpanic parport_pc 8250_fintek parport acpi_cpufreq button processor sr_mod cdrom ata_generic ata_piix virtio_blk virtio_console virtio_net uhci_hcd ehci_hcd virtio_pci virtio_ring virtio usbcore usb_common floppy sg [ 199.975894] CPU: 0 PID: 1423 Comm: mount Not tainted 4.0.5-3-debug #1 [ 199.975900] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 199.975905] task: ffff88003b9c2410 ti: ffff88003b5dc000 task.ti: ffff88003b5dc000 [ 199.975911] RIP: 0010:[<ffffffffa0345872>] [<ffffffffa0345872>] assfail+0x22/0x30 [xfs] [ 199.975945] RSP: 0018:ffff88003b5dfb08 EFLAGS: 00010296 [ 199.975950] RAX: 000000000000004f RBX: ffff880036cacc48 RCX: 0000000000000000 [ 199.975955] RDX: 000000000000004f RSI: ffff88003fc0da58 RDI: ffff88003fc0da58 [ 199.975960] RBP: 0000000000045f20 R08: 0000000000000000 R09: 0000000000000267 [ 199.975964] R10: 0000000000000000 R11: 0000000000000267 R12: ffff88003728b300 [ 199.975969] R13: ffff8800371d3000 R14: 0000000000000001 R15: 0000000000000002 [ 199.975975] FS: 00007f91e8ce7840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 199.975980] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 199.975985] CR2: 00007f662325e000 CR3: 000000003718e000 CR4: 00000000001007f0 [ 199.975993] Stack: [ 199.975997] ffff88003728b300 ffffffffa02e1586 ffff8800371d3200 ffff88003728b300 [ 199.976012] 000000003b404000 ffff8800371d3000 ffff88003b404000 0000000000000000 [ 199.976021] 0000000000000002 0000000000000001 0000000100000000 0000000000000000 [ 199.976033] Call Trace: [ 199.976059] [<ffffffffa02e1586>] xfs_free_ag_extent+0x3a6/0xa00 [xfs] [ 199.976079] [<ffffffffa02e4b65>] xfs_free_extent+0x115/0x150 [xfs] [ 199.976104] [<ffffffffa035e014>] xlog_recover_process_efi+0x194/0x1e0 [xfs] [ 199.976128] [<ffffffffa035e0d1>] xlog_recover_process_efis+0x71/0xf0 [xfs] [ 199.976150] [<ffffffffa035f19c>] xlog_recover_finish+0x1c/0xc0 [xfs] [ 199.976173] [<ffffffffa034f0ef>] xfs_log_mount_finish+0x4f/0x70 [xfs] [ 199.976198] [<ffffffffa0346a5e>] xfs_mountfs+0x5be/0x840 [xfs] [ 199.976222] [<ffffffffa034ae98>] xfs_fs_fill_super+0x2d8/0x350 [xfs] [ 199.976232] [<ffffffff811db41a>] mount_bdev+0x1ba/0x1f0 [ 199.976239] [<ffffffff811dbd76>] mount_fs+0x36/0x190 [ 199.976247] [<ffffffff811f8042>] vfs_kern_mount+0x62/0x120 [ 199.976253] [<ffffffff811fa8c6>] do_mount+0x206/0xb60 [ 199.976260] [<ffffffff811fb54d>] SyS_mount+0x8d/0xe0 [ 199.976269] [<ffffffff8164760d>] system_call_fastpath+0x16/0x1b [ 199.976279] [<00007f91e83d3eda>] 0x7f91e83d3eda [ 199.976283] Code: 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f1 41 89 d0 48 83 ec 08 48 89 fa 48 c7 c6 a0 76 38 a0 31 ff 31 c0 e8 3e fb ff ff <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 [ 199.976471] RIP [<ffffffffa0345872>] assfail+0x22/0x30 [xfs] [ 199.976565] RSP <ffff88003b5dfb08> [ 199.976645] ---[ end trace 9fc0fa797f96f95b ]--- Stacktrace on Debian: [1118957.081386] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1595 of file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_alloc.c. Caller xfs_free_extent+0xb9/0xf0 [xfs] [1118957.081429] CPU: 6 PID: 5297 Comm: mount Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1 [1118957.081430] Hardware name: Dell Inc. PowerEdge M620/0T36VK, BIOS 2.5.2 02/03/2015 [1118957.081432] 0000000000000002 ffffffff8150b405 0000000000045f20 ffffffffa030c1b6 [1118957.081435] ffff880dd9c07dc0 ffff880fecbd3dc0 ffff8809660c2c48 0000000000000002 [1118957.081436] 0000000000000000 0000000100000000 0000000000000000 0000000000000001 [1118957.081438] Call Trace: [1118957.081444] [<ffffffff8150b405>] ? dump_stack+0x41/0x51 [1118957.081453] [<ffffffffa030c1b6>] ? xfs_free_ag_extent+0x1f6/0x7e0 [xfs] [1118957.081460] [<ffffffffa030d729>] ? xfs_free_extent+0xb9/0xf0 [xfs] [1118957.081469] [<ffffffffa03409bc>] ? xlog_recover_process_efi+0x16c/0x1b0 [xfs] [1118957.081477] [<ffffffffa0306270>] ? xfs_parseargs+0xb80/0xb80 [xfs] [1118957.081484] [<ffffffffa0342812>] ? xlog_recover_process_efis.isra.27+0x62/0xb0 [xfs] [1118957.081491] [<ffffffffa0306270>] ? xfs_parseargs+0xb80/0xb80 [xfs] [1118957.081499] [<ffffffffa03458ec>] ? xlog_recover_finish+0x1c/0xb0 [xfs] [1118957.081507] [<ffffffffa034a16c>] ? xfs_log_mount_finish+0x2c/0x50 [xfs] [1118957.081514] [<ffffffffa0303419>] ? xfs_mountfs+0x469/0x700 [xfs] [1118957.081521] [<ffffffffa0306503>] ? xfs_fs_fill_super+0x293/0x310 [xfs] [1118957.081525] [<ffffffff811ab456>] ? mount_bdev+0x1a6/0x1e0 [1118957.081527] [<ffffffff811abce4>] ? mount_fs+0x34/0x1a0 [1118957.081530] [<ffffffff811c5562>] ? vfs_kern_mount+0x62/0x110 [1118957.081533] [<ffffffff811c7dfa>] ? do_mount+0x23a/0xb00 [1118957.081536] [<ffffffff8115811d>] ? memdup_user+0x3d/0x70 [1118957.081538] [<ffffffff811c89b1>] ? SyS_mount+0x81/0xc0 [1118957.081540] [<ffffffff815115cd>] ? system_call_fast_compare_end+0x10/0x15 [1118957.081547] XFS (dm-5): Failed to recover EFIs [1118957.081559] XFS (dm-5): log mount finish failed
Created attachment 181761 [details] 4.0.5-default kernel config
Created attachment 181771 [details] 4.0.5-debug kernel config