Bug 13770

Summary: System freeze on XFS filesystem recovery on an external disk
Product: IO/Storage Reporter: Jean-Luc Coulon (jean.luc.coulon)
Component: IDEAssignee: io_ide (io_ide)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: florian, hch, rjw, sandeen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31-rc2-git9 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13070    

Description Jean-Luc Coulon 2009-07-14 10:31:42 UTC
Distribution: Debian Sid
System: Athlon 64 x2 4200+
Architecture: X86_64
Latest kernel without the problem: 2.6.30.1
I never got it on 2.6.30. It seems that it was the latest kernel without this problem.
Latest tested kernel where the problem occured: 2.6.31-rc2-git9 (not yet tested on rc3.)

Hi,

I've an external disk (SATA) connected in USB.
If, for some reason, the disk has not been umounted correctly, I get a freeze of the system while trying to access it.
Mostly it is done by a cron process to do backup.

The last messages in the syslog befor the crash are always:

Jul 14 10:36:10 tangerine kernel: Starting XFS recovery on filesystem: sdd4 (logdev: internal)
Jul 14 10:36:10 tangerine kernel: Ending XFS recovery on filesystem: sdd4 (logdev: internal)

At this point, it is mounting the device for a backup (with backup2l).

I've set File system -> XFS but, as the disk is on an USB attachement, it can be something else...

Regards

Jean-Luc
Comment 1 Eric Sandeen 2009-07-14 14:42:58 UTC
Does the system actually crash (BUG or oops) or just hang?

If the former, can you try to get the info from the console, and if it's a hang, can you try sysrq W to get the stuck thread info?

-Eric
Comment 2 Jean-Luc Coulon 2009-07-14 15:30:28 UTC
No BUG, no oops, just completely frozen.

I will try sysrq W next time (I will try to trigger the problem but I've data to preserve before :) ).

J-L
Comment 3 Jean-Luc Coulon 2009-07-26 08:39:38 UTC
I got the problem several times again but I didnt manage to get SysRq W to show anything: I was in X window and I think the display can only be done on a console.

This time, the filesystem was clean but, as usual, the latest messages in the log are related to this device (there is a recovery message but the filesystem is marked clean at mount).

I remarked also that syslog was restarting just before the crash (log rotation via anacron).


Jul 26 09:27:07 tangerine kernel: imklog 4.2.0, log source = /proc/kmsg started.
Jul 26 09:27:07 tangerine rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="4718" x-info="http://www.rsyslog.com"] (re)start
Jul 26 09:27:07 tangerine rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="4718" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'restart'.
Jul 26 09:27:07 tangerine kernel: Kernel logging (proc) stopped.
Jul 26 09:27:07 tangerine kernel: imklog 4.2.0, log source = /proc/kmsg started.
Jul 26 09:27:07 tangerine rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="4718" x-info="http://www.rsyslog.com"] (re)start
Jul 26 09:27:16 tangerine kernel: XFS mounting filesystem sdh4
Jul 26 09:27:16 tangerine kernel: Ending clean XFS mount for filesystem: sdh4
Jul 26 09:27:20 tangerine kernel: XFS mounting filesystem dm-16
Jul 26 09:27:20 tangerine kernel: Starting XFS recovery on filesystem: dm-16 (logdev: internal)
Jul 26 09:27:20 tangerine kernel: Ending XFS recovery on filesystem: dm-16 (logdev: internal)
Jul 26 09:27:26 tangerine ntpd[5746]: synchronized to 88.191.80.132, stratum 2
Jul 26 09:29:56 tangerine kernel: hdc: lost interrupt
Jul 26 09:52:45 tangerine kernel: imklog 4.2.0, log source = /proc/kmsg started.
Comment 4 Jean-Luc Coulon 2009-07-29 11:18:35 UTC
Hi,

I thought it was a regression but doing quite intensive tests, I had a problem with 2.6.30 as well and eventually the system was frozen.

Last syslog messages were:
Jul 29 12:45:10 tangerine kernel: I/O error in filesystem ("sdh4") meta-data dev sdh4 block 0xae4598       ("xfs_trans_read_buf") error 5 buf count 40
96
Jul 29 12:45:12 tangerine kernel: XFS: Filesystem sdi4 has duplicate UUID - can't mount
Jul 29 12:45:12 tangerine kernel: I/O error in filesystem ("sdh4") meta-data dev
 sdh4 block 0xae4598       ("xfs_trans_read_buf") error 5 buf count 4096
Jul 29 12:45:12 tangerine kernel: end_request: I/O error, dev sdh, sector 87747168
Jul 29 12:45:12 tangerine kernel: I/O error in filesystem ("sdh4") meta-data dev sdh4 block 0x53aea60       ("xlog_iodone") error 5 buf count 1024
Jul 29 12:45:12 tangerine kernel: xfs_force_shutdown(sdh4,0x2) called from line 1043 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa0225b13
Jul 29 12:45:12 tangerine kernel: Filesystem "sdh4": Log I/O Error Detected.  Shutting down filesystem: sdh4
Jul 29 12:45:12 tangerine kernel: Please umount the filesystem, and rectify the problem(s)
Jul 29 12:45:12 tangerine kernel: XFS: Unable to update superblock counters. Freespace may not be correct on next mount.
Jul 29 12:45:13 tangerine kernel: hdc: lost interrupt
----

BTW /dev/hdc is a CDROM drive without a media in it.

Regards

Jean-Luc
Comment 5 Rafael J. Wysocki 2009-08-03 20:54:10 UTC
Moving to the list of post-2.6.29 regressions.
Comment 6 Christoph Hellwig 2010-02-09 22:16:03 UTC
The " I/O error in filesystem ("sdh4") meta-data dev ... " messages mean XFS got I/O errors from the underluing device.  Together with the hdc: lost interrupt messages this looks a lot like an IDE problem to me.
Comment 7 Florian Mickler 2010-11-30 10:30:17 UTC
Jean, is this still a problem on current mainline kernels?
Comment 8 Jean-Luc Coulon 2010-12-02 12:16:02 UTC
I'm now running 2.6.36 and I've not had this problem since.

J-L
Comment 9 Florian Mickler 2011-02-06 14:18:12 UTC
Thx. I'm closing this as unreproducible.