Bug 14714 - intermittent xfs crash
Summary: intermittent xfs crash
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Eric Sandeen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-02 00:32 UTC by Kevin Mitchell
Modified: 2009-12-07 02:58 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32-rc8
Tree: Mainline
Regression: Yes


Attachments

Description Kevin Mitchell 2009-12-02 00:32:53 UTC
After upgrading from 2.6.32-rc7 to rc8 I have experienced a number of unexpected crashes that appear to be related to my xfs filesystems. When these occur, dmesg says:

    [33091.758929] xfs_force_shutdown(sda6,0x2) called from line 1043 of file    fs/xfs/xfs_log.c.  Return address = 0xffffffff8114c5d3
    [33091.758951] Filesystem "sda6": Log I/O Error Detected.  Shutting down filesystem: sda6
    [33091.758956] Please umount the filesystem, and rectify the problem(s)
    ...
    [33171.850058] Filesystem "sda6": xfs_log_force: error 5 returned.
    [33191.540124] No probe response from AP 00:16:b6:26:41:29 after 500ms, disconnecting.

Any further attempts to access files on the affected filesystem hang, however anything else that has already been loaded into memory continues to work fine. An ls -l shows the mountpoint as red with question marks in the metadata fields. Attempts to unmount the filesystem fail saying that it is in use, while an fuser on the mount point hangs like everything else.

This appears to occur approximately once daily with the typical usage patterns for my laptop. There does not appear to be a particular action on my part that triggers it. So unfortunately it will be difficult to consistently reproduce. It can occur quite suddenly under only mild disk usage such as saving, compiling a latex document followed by rereading the postscript file. I recall that it also occurred once after a suspend to ram.

Let me know if there is any additional information I can provide.

Kevin
Comment 1 Eric Sandeen 2009-12-02 01:10:48 UTC
What was in the system logs 10 lines or so before "Filesystem "sda6": Log I/O Error Detected?"

This looks like XFS is getting an IO error from the storage beneath it, and doing the right thing in the face of that ...
Comment 2 Kevin Mitchell 2009-12-02 19:48:54 UTC
Sorry, you're right. Not sure where my head was at:

[33090.932246] wlan0: direct probe to AP 00:03:52:e7:07:f0 (try 1)
[33090.992157] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
[33090.992165] ata1.00: irq_stat 0x00000040, connection status changed
[33090.992176] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[33090.992178]          res 50/00:08:4f:10:e5/00:00:12:00:00/40 Emask 0x10 (ATA bus error)
[33090.992187] ata1: hard resetting link
[33091.130129] wlan0: direct probe to AP 00:03:52:e7:07:f0 (try 2)
[33091.330107] wlan0: direct probe to AP 00:03:52:e7:07:f0 (try 3)
[33091.530125] wlan0: direct probe to AP 00:03:52:e7:07:f0 timed out
[33091.740128] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[33091.741504] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (unknown) succeeded
[33091.741512] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (unknown) filtered out
[33091.741518] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (unknown) filtered out
[33091.744278] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (unknown) succeeded
[33091.744286] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (unknown) filtered out
[33091.744292] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (unknown) filtered out
[33091.745531] ata1.00: configured for UDMA/133
[33091.748270] ata1.00: configured for UDMA/133
[33091.748295] ata1: EH complete
[33091.758884] end_request: I/O error, dev sda, sector 223677343
[33091.758921] I/O error in filesystem ("sda6") meta-data dev sda6 block 0x74158f0       ("xlog_iodone") error 5 buf count 5632
[33091.758929] xfs_force_shutdown(sda6,0x2) called from line 1043 of file fs/xfs/xfs_log.c.  Return address = 0xffffffff8114c5d3
[33091.758951] Filesystem "sda6": Log I/O Error Detected.  Shutting down filesystem: sda6
[33091.758956] Please umount the filesystem, and rectify the problem(s)

Reassign as necessary. I've switched back to rc7 and have yet to have a problem, so I'm thinking (hoping) this is not a hardware issue. I'll report back if I find otherwise.
Comment 3 Eric Sandeen 2009-12-02 20:08:08 UTC
That's ok, we should probably make the error mesage say:

printk("^^^ look up there before filing a bug ^^^\n");

;)

-Eric
Comment 4 Tejun Heo 2009-12-07 02:58:46 UTC
Looks like similar problem as the following one.

  http://bugzilla.kernel.org/show_bug.cgi?id=14543

but I can't think of anything which could make recent kernels newly prone to this type of problem.  Are you sure this is a regression?  Can you please check again?

The workaround for the above bug will be merged during this merge window but the change is a bit pervasive so I'll wait a bit until backport it for -stable.

Thanks.

Note You need to log in before you can comment on or make changes to this bug.