Bug 5914

Summary: recent davej kernels (2.6.15-git*-base) eat my filesystem
Product: IO/Storage Reporter: Nicolas Mailhot (Nicolas.Mailhot)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: CLOSED CODE_FIX    
Severity: blocking CC: agk, akpm, io_md
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.15-git* Subsystem:
Regression: --- Bisected commit-id:

Description Nicolas Mailhot 2006-01-18 02:21:22 UTC
Most recent kernel where this bug did not occur: 2.6.15
Distribution: Fedora Core Devel (Rawhide)
Hardware Environment: AMD64 + SATA, nforce4 chipset + sil chipset, 2 GiB mem
Software Environment: raid 1 + lvm
Problem Description:
Booting in any recent davej kernel results in lots of system errors (md, sata)
and quick filesystem corruption

Some examples :
ata5: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata5: status=0x51 { DriveReady SeekComplete Error }
ata5: error=0x04 { DriveStatusError }

sd 4:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 4192831

(no idea if the problem is in the SATA layer or somewhere deeper)

More material (logs, lspci...) available in:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951
Comment 1 Nicolas Mailhot 2006-01-22 11:34:47 UTC
I'm a fool - retested yesterday with latest davej kernel (some git versions
later), crashed again, spend some hours cleaning up the system and worrying
about damage

Thanksfully 2.6.15 "only" got a scsi memleak, so if I reboot it often enough
it's working fine
Comment 2 Nicolas Mailhot 2006-01-24 01:12:33 UTC
Smart info was attached to the Fedora bug
(basically smart thinks the drives are fine)
Comment 3 Nicolas Mailhot 2006-01-26 13:10:40 UTC
2.6.15-1.1872_FC5 (2.6.16-rc1-git4) patched to disable FUA (as suggested by
Tejun Heo there : http://marc.theaimsgroup.com/?l=linux-ide&m=113825474609128)
boots fine
Comment 4 Nicolas Mailhot 2006-02-13 12:23:19 UTC
Closing as workaround for this firmware was merged by Linus