Bug 13770
Summary: | System freeze on XFS filesystem recovery on an external disk | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Jean-Luc Coulon (jean.luc.coulon) |
Component: | IDE | Assignee: | io_ide (io_ide) |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | normal | CC: | florian, hch, rjw, sandeen |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.31-rc2-git9 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 13070 |
Description
Jean-Luc Coulon
2009-07-14 10:31:42 UTC
Does the system actually crash (BUG or oops) or just hang? If the former, can you try to get the info from the console, and if it's a hang, can you try sysrq W to get the stuck thread info? -Eric No BUG, no oops, just completely frozen. I will try sysrq W next time (I will try to trigger the problem but I've data to preserve before :) ). J-L I got the problem several times again but I didnt manage to get SysRq W to show anything: I was in X window and I think the display can only be done on a console. This time, the filesystem was clean but, as usual, the latest messages in the log are related to this device (there is a recovery message but the filesystem is marked clean at mount). I remarked also that syslog was restarting just before the crash (log rotation via anacron). Jul 26 09:27:07 tangerine kernel: imklog 4.2.0, log source = /proc/kmsg started. Jul 26 09:27:07 tangerine rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="4718" x-info="http://www.rsyslog.com"] (re)start Jul 26 09:27:07 tangerine rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="4718" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'restart'. Jul 26 09:27:07 tangerine kernel: Kernel logging (proc) stopped. Jul 26 09:27:07 tangerine kernel: imklog 4.2.0, log source = /proc/kmsg started. Jul 26 09:27:07 tangerine rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="4718" x-info="http://www.rsyslog.com"] (re)start Jul 26 09:27:16 tangerine kernel: XFS mounting filesystem sdh4 Jul 26 09:27:16 tangerine kernel: Ending clean XFS mount for filesystem: sdh4 Jul 26 09:27:20 tangerine kernel: XFS mounting filesystem dm-16 Jul 26 09:27:20 tangerine kernel: Starting XFS recovery on filesystem: dm-16 (logdev: internal) Jul 26 09:27:20 tangerine kernel: Ending XFS recovery on filesystem: dm-16 (logdev: internal) Jul 26 09:27:26 tangerine ntpd[5746]: synchronized to 88.191.80.132, stratum 2 Jul 26 09:29:56 tangerine kernel: hdc: lost interrupt Jul 26 09:52:45 tangerine kernel: imklog 4.2.0, log source = /proc/kmsg started. Hi, I thought it was a regression but doing quite intensive tests, I had a problem with 2.6.30 as well and eventually the system was frozen. Last syslog messages were: Jul 29 12:45:10 tangerine kernel: I/O error in filesystem ("sdh4") meta-data dev sdh4 block 0xae4598 ("xfs_trans_read_buf") error 5 buf count 40 96 Jul 29 12:45:12 tangerine kernel: XFS: Filesystem sdi4 has duplicate UUID - can't mount Jul 29 12:45:12 tangerine kernel: I/O error in filesystem ("sdh4") meta-data dev sdh4 block 0xae4598 ("xfs_trans_read_buf") error 5 buf count 4096 Jul 29 12:45:12 tangerine kernel: end_request: I/O error, dev sdh, sector 87747168 Jul 29 12:45:12 tangerine kernel: I/O error in filesystem ("sdh4") meta-data dev sdh4 block 0x53aea60 ("xlog_iodone") error 5 buf count 1024 Jul 29 12:45:12 tangerine kernel: xfs_force_shutdown(sdh4,0x2) called from line 1043 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa0225b13 Jul 29 12:45:12 tangerine kernel: Filesystem "sdh4": Log I/O Error Detected. Shutting down filesystem: sdh4 Jul 29 12:45:12 tangerine kernel: Please umount the filesystem, and rectify the problem(s) Jul 29 12:45:12 tangerine kernel: XFS: Unable to update superblock counters. Freespace may not be correct on next mount. Jul 29 12:45:13 tangerine kernel: hdc: lost interrupt ---- BTW /dev/hdc is a CDROM drive without a media in it. Regards Jean-Luc Moving to the list of post-2.6.29 regressions. The " I/O error in filesystem ("sdh4") meta-data dev ... " messages mean XFS got I/O errors from the underluing device. Together with the hdc: lost interrupt messages this looks a lot like an IDE problem to me. Jean, is this still a problem on current mainline kernels? I'm now running 2.6.36 and I've not had this problem since. J-L Thx. I'm closing this as unreproducible. |