Bug 2561 - NFORCE2 PCI code corrupts data
Summary: NFORCE2 PCI code corrupts data
Status: REJECTED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-04-21 19:39 UTC by Patola
Modified: 2004-04-23 06:50 UTC (History)
0 users

See Also:
Kernel Version: 2.6.5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Patola 2004-04-21 19:39:42 UTC
Distribution: Debian Unstable
keywords: nforce amd athlon pci IDE
Hardware Environment: Athlon XP 2400+ with 2 256 DDR 166 memory modules
and A7N8X motherboard (with Nforce2 PCI controller) and two hard drives
(40 and 120 gig.) and one DVD-RW (LG GSA-4040b). It is more verbose on
the description

Software Environment:
Linux patola 2.6.5-1-k7 #1 Wed Apr 7 03:36:30 EST 2004 i686 GNU/Linux
  
Gnu C                  3.3.3
Gnu make               3.80
binutils               2.14.90.0.7
util-linux             2.12
mount                  2.12
module-init-tools      3.0-pre10
e2fsprogs              1.35
xfsprogs               2.6.5
pcmcia-cs              3.2.5
quota-tools            3.11.
PPP                    2.4.2
isdn4k-utils           3.3
nfs-utils              1.0.6
Linux C Library        2.3.2
Dynamic linker (ldd)   2.3.2
Procps                 3.2.1
Net-tools              1.60
Console-tools          0.2.3
Sh-utils               5.0.91
Modules Loaded         snd_pcm_oss ide_cd cdrom ppp_deflate zlib_deflate
bsd_comp nvidia eth1394 sata_sil ohci1394 ieee1394 pci_hotplug hid ohci_hcd
snd_mixer_oss ds yenta_socket ppp_async ipv6 ppp_generic slhc i810_audio
ac97_codec forcedeth tsdev mousedev amd74xx evdev psmouse reiserfs w83l785ts
asb100 i2c_sensor i2c_nforce2 i2c_core snd_intel8x0 snd_ac97_codec snd_pcm
snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device
iptable_nat ip_conntrack ip_tables pcmcia_core parport_pc lp parport snd
soundcore ehci_hcd autofs4 af_packet nls_iso8859_1 nls_cp437 3c59x stv680
usbcore videodev apm rtc xfs isofs vfat fat ext2 ext3 jbd mbcache ide_disk
ide_generic siimage atiixp ide_core sd_mod ata_piix libata scsi_mod unix font
cfbcopyarea cfbimgblt cfbfillrect

Steps to reproduce:
Please see the description.

Problem Description:
This problem is really strange, but it is happening consistently for several
weeks now.
It occurs in an Athlon XP 2400+ with 2x256 DDR 166 modules and an A7N8X Deluxe board
with 2 HD's (40 GB and 120 GB) and one DVD-RW (LG-GSA 4040b). The A7N8X Deluxe uses
the nforce2 PCI Bridge and IDE Controller.

I had kernel 2.4.22 with the XFS patch. I upgraded to 2.6.5 yesterday to see if
I had
the problem corrected, but it behaves the same way.

The problem is: when I read a large file, say, larger than 200 MB, some bytes
are read
incorrectly. It looks like it also happens when I write it too. It only happens
when I
try to read it quickly, like in a cp or md5sum operation. If I download it from the
internet, it looks like it is ok.

If I try to run md5sum 4 times on the same file, every time the md5sum is different.
Like that:

$ md5sum 1GB_file.rar

bd17afc743b1d69d7458553cc5971145 1GB_file.rar

$ md5sum 1GB_file.rar

2b17bc4e5d7609b5fddbf67b5c84b869 1GB_file.rar

$ md5sum 1GB_file.rar

3ed1c36bed43f355f53df2ac763b7ea2 1GB_file.rar

$ md5sum 1GB_file.rar

0f6e73894382bca04326f4e4abbca4d3 1GB_file.rar

I've made an experience trying to get the pattern. I've built a shell script for
building an 1GB file
with 167777217 lines with 64 'a' each:

x=0
while [ $x -lt 16777216 ]
do
let x+=1
echo aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa >> a.txt
done

Well, it turns out that a 'grep -nv
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
on the file shows up random line numbers, like that:

$ grep -nv aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a.txt
1485924: aaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4722747: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaa
5001213: aaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
8292837: aaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
9013827: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaa
11832018: aaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
11918307: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaa
14193810: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaa
16530182: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Some of these files are wrong because of when I wrote - they always show up when
I run the
same grep again. Anyway, it's always the same bit that is switched - an 'a'
turns into a 'c'.

Oh, and this problem occurs in Linux for every partition and drive I test: the
VFAT partition
in drive 1, the XFS partition in drive 1 and the ReiserFS partition in drive 2.

Anyway it looks like it is a hardware problem, but then I booted in windows
(which uses the
VFAT partition in drive 1) and made the same tests using md5sum for windows and
md5summer.
Well, this time, after more than a dozen tries with the same large files, the
md5sum showed
up completely equals! No flaws, no changed bits. All disk optimizations were on.

I even tried, in linux, disabling DMA, readahead, multcount, 32-bit support and
unmaskirq with
no good results (with everything disabled, in single mode, sometimes the file
returned the
right md5sum, *but* I repeat the operation and it doesn't return the correct one
anymore.
Maybe it returned ok the first time because the operation was slower than with
every feature
enabled).

My main suspect here is the NForce PCI bridge or IDE Controller. Using the dmesg
command it
tells me something about cable bits set incorrectly, could this be related?
Here is the configuration of my machine: lspci -vv, hdparm's for hda and hdb.

0000:00:00.0 Host bridge: nVidia Corporation nForce2 AGP (different version?)
(rev c1)
Subsystem: Asustek Computer, Inc.: Unknown device 80ac
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0
Region 0: Memory at d0000000 (32-bit, prefetchable)
Capabilities: [40] AGP version 2.0
Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3-
Rate=x1,x2,x4
Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=x1
Capabilities: [60] #08 [2001]

0000:00:00.1 RAM memory: nVidia Corporation nForce2 Memory Controller 1 (rev c1)
Subsystem: nVidia Corporation: Unknown device 0c17
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-

0000:00:00.2 RAM memory: nVidia Corporation nForce2 Memory Controller 4 (rev c1)
Subsystem: nVidia Corporation: Unknown device 0c17
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-

0000:00:00.3 RAM memory: nVidia Corporation nForce2 Memory Controller 3 (rev c1)
Subsystem: nVidia Corporation: Unknown device 0c17
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-

0000:00:00.4 RAM memory: nVidia Corporation nForce2 Memory Controller 2 (rev c1)
Subsystem: nVidia Corporation: Unknown device 0c17
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-

0000:00:00.5 RAM memory: nVidia Corporation nForce2 Memory Controller 5 (rev c1)
Subsystem: nVidia Corporation: Unknown device 0c17
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-

0000:00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a4)
Subsystem: Asustek Computer, Inc. A7N8X Mainboard
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0
Capabilities: [48] #08 [01e1]

0000:00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2)
Subsystem: Asustek Computer, Inc.: Unknown device 0c11
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Interrupt: pin A routed to IRQ 5
Region 0: I/O ports at e000
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4)
(prog-if 10 [OHCI])
Subsystem: Asustek Computer, Inc. A7N8X Mainboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0 (750ns min, 250ns max)
Interrupt: pin A routed to IRQ 5
Region 0: Memory at ee087000 (32-bit, non-prefetchable)
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4)
(prog-if 10 [OHCI])
Subsystem: Asustek Computer, Inc. A7N8X Mainboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0 (750ns min, 250ns max)
Interrupt: pin B routed to IRQ 11
Region 0: Memory at ee082000 (32-bit, non-prefetchable)
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:02.2 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4)
(prog-if 20 [EHCI])
Subsystem: Asustek Computer, Inc. A7N8X Mainboard
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0 (750ns min, 250ns max)
Interrupt: pin C routed to IRQ 5
Region 0: Memory at ee083000 (32-bit, non-prefetchable)
Capabilities: [44] #0a [2080]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:04.0 Ethernet controller: nVidia Corporation nForce2 Ethernet Controller
(rev a1)
Subsystem: Asustek Computer, Inc.: Unknown device 80a7
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0 (250ns min, 5000ns max)
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ee086000 (32-bit, non-prefetchable)
Region 1: I/O ports at e400 [size=8]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable+ DSel=0 DScale=0 PME-

0000:00:05.0 Multimedia audio controller: nVidia Corporation nForce MultiMedia
audio [Via VT82C686B] (rev a2)
Subsystem: Asustek Computer, Inc.: Unknown device 0c11
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0 (250ns min, 3000ns max)
Interrupt: pin A routed to IRQ 5
Region 0: Memory at ee000000 (32-bit, non-prefetchable)
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:06.0 Multimedia audio controller: nVidia Corporation nForce2 AC97 Audio
Controler (MCP) (rev a1)
Subsystem: Asustek Computer, Inc.: Unknown device 8095
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0 (500ns min, 1250ns max)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at d000
Region 1: I/O ports at d400 [size=128]
Region 2: Memory at ee080000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:08.0 PCI bridge: nVidia Corporation nForce2 External PCI Bridge (rev a3)
(prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
I/O behind bridge: 0000a000-0000bfff
Memory behind bridge: ec000000-edffffff
Expansion ROM at 0000a000 [disabled] [size=8K]
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-

0000:00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2) (prog-if 8a
[Master SecP PriP])
Subsystem: Asustek Computer, Inc.: Unknown device 0c11
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0 (750ns min, 250ns max)
Region 4: I/O ports at f000 [size=16]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:00:0c.0 PCI bridge: nVidia Corporation nForce2 PCI Bridge (rev a3) (prog-if
00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=02, subordinate=02, sec-latency=32
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: e8000000-e9ffffff
Expansion ROM at 0000c000 [disabled] [size=4K]
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-

0000:00:0d.0 FireWire (IEEE 1394): nVidia Corporation nForce2 FireWire (IEEE
1394) Controller (rev a3) (prog-if 10 [OHCI])
Subsystem: Asustek Computer, Inc.: Unknown device 809a
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-
Latency: 0 (750ns min, 250ns max)
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ee084000 (32-bit, non-prefetchable)
Region 1: Memory at ee085000 (32-bit, non-prefetchable) [size=64]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME+

0000:00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1) (prog-if 00
[Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 32
Bus: primary=00, secondary=03, subordinate=03, sec-latency=32
Memory behind bridge: ea000000-ebffffff
Prefetchable memory behind bridge: d8000000-e7ffffff
BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-

0000:01:0b.0 RAID bus controller: CMD Technology Inc Silicon Image SiI 3112
SATARaid Controller (rev 01)
Subsystem: CMD Technology Inc: Unknown device 6112
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 32, Cache Line Size: 0x01 (4 bytes)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at a000
Region 1: I/O ports at a400 [size=4]
Region 2: I/O ports at a800 [size=8]
Region 3: I/O ports at ac00 [size=4]
Region 4: I/O ports at b000 [size=16]
Region 5: Memory at ed000000 (32-bit, non-prefetchable) [size=512]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-

0000:02:01.0 Ethernet controller: 3Com Corporation 3C920B-EMB Integrated Fast
Ethernet Controller (rev 40)
Subsystem: Asustek Computer, Inc.: Unknown device 80ab
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 32 (2500ns min, 2500ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 5
Region 0: I/O ports at c000
Region 1: Memory at e9000000 (32-bit, non-prefetchable) [size=128]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-

0000:03:00.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4 MX
440] (rev a3) (prog-if 00 [VGA])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 248 (1250ns min, 250ns max)
Interrupt: pin A routed to IRQ 5
Region 0: Memory at ea000000 (32-bit, non-prefetchable)
Region 1: Memory at d8000000 (32-bit, prefetchable) [size=128M]
Region 2: Memory at e0000000 (32-bit, prefetchable) [size=512K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [44] AGP version 2.0
Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA- ITACoh- GART64- HTrans- 64bit- FW+ AGP3-
Rate=x1,x2,x4
Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none>


$ hdparm /dev/hda

/dev/hda:
multcount = 0 (off)
IO_support = 3 (32-bit w/sync)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 65535/16/63, sectors = 80043264, start = 0

$ hdparm -i /dev/hda

/dev/hda:

Model=Maxtor 4D040H2, FwRev=DAH017K0, SerialNo=D239668E
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57
BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=off
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=80043264
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: ATA/ATAPI-6 T13 1410D revision 0:

* signifies the current active mode

$ hdparm -I /dev/hda

/dev/hda:

ATA device, with non-removable media
Model Number: Maxtor 4D040H2
Serial Number: D239668E
Firmware Revision: DAH017K0
Standards:
Used: ATA/ATAPI-6 T13 1410D revision 0
Supported: 6 5 4 3
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 80043264
device size with M = 1024*1024: 39083 MBytes
device size with M = 1000*1000: 40982 MBytes (40 GB)
Capabilities:
LBA, IORDY(can be disabled)
bytes avail on r/w long: 57 Queue depth: 1
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 0
Advanced power management level: unknown setting (0x0000)
Recommended acoustic management value: 192, current value: 192
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* NOP cmd
* READ BUFFER cmd
* WRITE BUFFER cmd
* Host Protected Area feature set
* Look-ahead
* Write cache
* Power Management feature set
* SMART feature set
* Device Configuration Overlay feature set
* Automatic Acoustic Management feature set
SET MAX security extension
Advanced Power Management feature set
* DOWNLOAD MICROCODE cmd
* SMART self-test
* SMART error logging
HW reset results:
CBLID- above Vih
Device num = 0 determined by the jumper
Checksum: correct

$ hdparm /dev/hdb
hdparm -I /dev
/dev/hdb:
multcount = 0 (off)
IO_support = 3 (32-bit w/sync)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 65535/16/63, sectors = 234441648, start = 0

$ hdparm -i /dev/hdb

/dev/hdb:

Model=ST3120023A, FwRev=3.33, SerialNo=3KA1YADZ
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=off
CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=234441648
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=no WriteCache=enabled
Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:

* signifies the current active mode

$ hdparm -I /dev/hdb

/dev/hdb:

ATA device, with non-removable media
Model Number: ST3120023A
Serial Number: 3KA1YADZ
Firmware Revision: 3.33
Standards:
Used: ATA/ATAPI-6 T13 1410D revision 2
Supported: 6 5 4 3
Configuration:
Logical max current
cylinders 16383 4047
heads 16 16
sectors/track 63 255
--
CHS current addressable sectors: 16511760
LBA user addressable sectors: 234441648
device size with M = 1024*1024: 114473 MBytes
device size with M = 1000*1000: 120034 MBytes (120 GB)
Capabilities:
LBA, IORDY(can be disabled)
bytes avail on r/w long: 4 Queue depth: 1
Standby timer values: spec'd by Standard
R/W multiple sector transfer: Max = 16 Current = ?
Recommended acoustic management value: 128, current value: 128
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=240ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* READ BUFFER cmd
* WRITE BUFFER cmd
* Host Protected Area feature set
* Look-ahead
* Write cache
* Power Management feature set
Security Mode feature set
* SMART feature set
* Mandatory FLUSH CACHE command
* Device Configuration Overlay feature set
* Automatic Acoustic Management feature set
SET MAX security extension
* DOWNLOAD MICROCODE cmd
* SMART self-test
* SMART error logging
Security:
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
HW reset results:
CBLID- above Vih
Device num = 1 determined by the jumper
Checksum: correct


-----------------------

I think I have found the culprit. It's the amd74xx module - the one that says
that on init:

NFORCE2: IDE controller at PCI slot 0000:00:09.0
NFORCE2: chipset revision 162
NFORCE2: not 100% native mode: will probe irqs later
NFORCE2: BIOS didn't set cable bits correctly. Enabling workaround.
NFORCE2: 0000:00:09.0 (rev a2) UDMA133 controller

I have rebooted in single mode, rmmod -f amd74xx and hdparm -d 0 /dev/hda /dev/hdb
and so I could md5sum all files correctly. All file operations deemed successful.
If I compile the kernel without this module, though, it doesn't recognize
/dev/hda or /dev/hdb at all. =(

It looks like the DMA handling code for this thing is buggy and leads to
corruption and lockups. It has a lot of &, | and such and I don't have a clue
how the nforce2 DMA works, so I can't even understand it for now.

But If for the time being I could arrange a way to recognize /dev/hda, hdb and
hdc without this module (and without it compiled in the kernel) I think this
would be a good workaround, as I would have DMA - only not optimized for
nforce2. For the time being I am using my computer without DMA. =(
------------------
And just for the record...
PLEASE someone help me! =( It's not good being stuck with not being able to rely
on your computer!
Much less your beloved free operational system...
Comment 1 Patola 2004-04-21 19:43:19 UTC
Also, this bug is related to bug 2252. While it is about data corruption, it
also seems to lead to lockups. I'll try the temporary solution of 2252 - upgrade
to a SMP kernel - to see if this gets any better.
Comment 2 Patola 2004-04-21 20:40:34 UTC
Ok, the tests with the SMP kernel failed. SMP doesn't help.

As a side note, I made an initrd file without amd74xx and siimage and it boots
normally without enabling DMA's for the drives, and as such everything works
normal - but with no DMA, that means: sloooooooooooooooooow!

I am gonna make it md5sum the same files the whole day to be sure of this.
Comment 3 Patola 2004-04-22 14:59:59 UTC
DAMN!

That was too good to be true.
No amd74xx loaded, also not included in the kernel. So this is not a bug of this
module. Yet I still have problems with data corruption:

$ rm a.txt
$ x=0
$ while [ $x -lt 16777216 ]
> do
> let x+=1
> echo aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa >> a.txt
> done
$ grep -nv aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a.txt
908281:aaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
2655130:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaa
4707226:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaa
6425530:aaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
8729120:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaa
10186666:aaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
12751833:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaa
$ grep -nv aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a.txt
869904:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaa
908281:aaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4685486:aaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4707226:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaa
8521421:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaa
8729120:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaa
12381996:aaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
12751833:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaa
16098328:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaa
$ grep -nv aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a.txt
908281:aaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
3178221:aaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4707226:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaa
7031297:aaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
8729120:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaa
10908003:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaa
12751833:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaa
14633599:aaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Comment 4 Patola 2004-04-22 15:21:32 UTC
Additional information...
Making the same scripts with 'c' instead of 'a', I could notice
that it doesn't flip bits: it just set them, so a file with many 'c's
goes on uncorrupted.

$ grep -nv cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc c.txt
$ grep -nv cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc c.txt
$ grep -nv cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc c.txt
$ 
Comment 5 Patola 2004-04-23 06:50:09 UTC
Ok, I just tested it in windows for the fourth time and this time the error
happened in that stupid operational system too. So it's a hardware problem.
Sorry to waste anybody's time. I am rejecting this bug as invalid.

Goodbye and thanks for all the fish.

Note You need to log in before you can comment on or make changes to this bug.