Bug 8445

Summary: SI3112/3512 + Nvidia - Filesystem silent corrupt on SATA
Product: IO/Storage Reporter: juhis99 (juhis99)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: REJECTED DUPLICATE    
Severity: high CC: alan
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.20 Subsystem:
Regression: --- Bisected commit-id:

Description juhis99 2007-05-07 09:19:26 UTC
Distribution:
Tested on Debian systems

Hardware Environment:
Tested on two PC's, 1. P3 600MHz on Abit BH6 and 2. new AMD X2 3600+ on ASUS
M2NPV-VM. SATA hardware tested are 1. Silicon Image SIL3512, 2. SIL 3112A, 3.
NVidia nForce 430. Harddrive is Samsung HD501LJ 500GB SATA-II.

Software Environment:
Tested with recent Debian Etch AMD64 (2.6.18) and Debian Sarge I386 with self
compiled kernel up to 2.6.20.

Problem Description:
Silent data corrupt occurs on system described above. Tested SATA_SIL based PCI
controllers on two PC, changed controllers, cables, everything BUT the disk.
Disks 1.5G jumper did not help.

Steps to reproduce:
My recent test was to install Debian Etch AMD64 (comes with 2.6.18) on NVIDIA
computer on IDE drive first (clean installation, no binary modules from nvidia
installed, just the base system). Then I tried to copy the system to SATA disk:
fdisk, mkfs, cp -xa / /newdisk. Now umount /newdisk and run filesystem check
fsck.ext3 -f /dev/sda1 produce a lot of errors! Even if I mkfs /dev/sda1 and
then force fsck it, there are errors! Partition size does not matter either.

This same filesystem copy is done also on older Debian Sarge on i386 computer.
Most recent kernel I did try was 2.6.20.

Error is also possible to reproduce the error with both computers, all SATA
controllers, with just diff'ing files:
dd if=/dev/urandom of=test count=.. (like 100MB files or so)
cp test test1
cp test test2
...
diff test test1
diff test test2
Maybe 50% of tested files are corrupted, but after remount the partition and
test files again there are 100% corruption.

Kernel option NOAPIC helped with the older computer BUT only once (copying
filesystems and running fsck did not found errors) but when tried to boot system
from SATA drive boot time fsck found errors immediately at the first boot!!

I'm really confused and frustrated about this. All I can think is that the
problem must be the disk? Really, tested with three SATA controllers on two
computers even with clean Debian installation. There are no errors on dmesg.
(SATA disk must be "OK", tried to install XP on it - no problems)

Here is beginning of one filesystem check:
e2fsck 1.40-WIP (14-Nov-2006)
Pass 1: Checking inodes, blocks, and sizes
Inode 4864421 is in use, but has dtime set.  Fix<y>? no

Inode 4864421 has imagic flag set.  Clear<y>? no

Inode 4864421 has compression flag set on filesystem without compression 
support.  Clear<y>? no

Inode 4864421 has INDEX_FL flag set but is not a directory.
Clear HTree index<y>? no

HTREE directory inode 4864421 has an invalid root node.
Clear HTree index<y>? no

Error reading block 4294967295 (Invalid argument) while doing inode 
scan.  Ignore error<y>? no

HTREE directory inode 4864421 has an invalid root node.
Clear HTree index<y>? no

HTREE directory inode 4864421 has an invalid root node.
Clear HTree index<y>?

Inode 4864421, i_blocks is 4294967295, should be 0.  Fix<y>? no

Inode 6395301 is in use, but has dtime set.  Fix<y>? no

Inode 6395301 has compression flag set on filesystem without compression 
support.  Clear<y>? no

Inode 6395301, i_blocks is 1295860160, should be 0.  Fix<y>? no

Inode 7379405 is in use, but has dtime set.  Fix<y>? no

Inode 7379405 has compression flag set on filesystem without compression 
support.  Clear<y>? no

Inode 7379405 has INDEX_FL flag set but is not a directory.
Clear HTree index<y>? no

HTREE directory inode 7379405 has an invalid root node.
Clear HTree index<y>? no

Error reading block 1583312129 (Invalid argument) while doing inode 
scan.  Ignore error<y>?

HTREE directory inode 7379405 has an invalid root node.
Clear HTree index<y>? no

HTREE directory inode 7379405 has an invalid root node.
Clear HTree index<y>? no

Inode 7379405, i_blocks is 1581473345, should be 0.  Fix<y>? no

Inode 9786929 is in use, but has dtime set.  Fix<y>? no

Inode 9786929 has imagic flag set.  Clear<y>? no

Inode 9786929 has compression flag set on filesystem without compression 
support.  Clear<y>? no

Inode 9786929 has INDEX_FL flag set but is not a directory.
Clear HTree index<y>? no

HTREE directory inode 9786929 has an invalid root node.
Clear HTree index<y>?
Comment 1 Alan 2007-06-05 08:09:12 UTC
Suspect this is controller or controller + Nvidia combination. There seem to be
several similar reports and I'm going to round them all up in a bit to see if
there are commonalities
Comment 2 Alan 2007-06-05 10:07:16 UTC

*** This bug has been marked as a duplicate of 6845 ***