Bug 2271

Summary: Serious disk corruption with HPT374
Product: IO/Storage Reporter: Olaf Boehm (olaf.boehm)
Component: IDEAssignee: Bartlomiej Zolnierkiewicz (bzolnier)
Status: CLOSED CODE_FIX    
Severity: blocking CC: carlojpisani
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.3 Subsystem:
Regression: --- Bisected commit-id:
Attachments: patch to solve both kernel panic and dma data corruption

Description Olaf Boehm 2004-03-08 05:48:30 UTC
i did a disk setup like

hpt 374 (kernel hpt366 driver)
8 drives connected
all 8 drives as raid5
created a loop device with blowfish-128
created a reiser 3.6 on that loop device

now this setup generates a crc error all 200mb

for testing i do a 

file1 = 2g testfile
cp file1.rar file2.rar
cmp file.rar file2.rar
and get always a crc error


is the hpt driver in kernel serious broken ??

if yes disable it as data corruption is far worst error which can happen!

this is other user
"I once encountered data corruption when I was copying about 80 gigs at 
once from disk to disk, but the machine was overclocked by that time so 
I can't be sure if it was the HDD controller, CPU, chipset or memory. 
After that, I returned the frequencies to nominal values and the machine 
seems to work without hassle, as a ftp server, and also for some video 
processing tasks, which both are moderately hdd-intensive."

he got too data corruption
Comment 1 Olaf Boehm 2004-03-10 12:20:21 UTC
i did now extensive tests 

i disabled all non needed kernel stuff and mainboard stuff

my ram is ecc cpu is non oc

when i do a raid with 1-6 drives all is ok
when i use 1-7 drives i get crc
when i use 1-8 drives i get crc
when i use only drive 7 its ok

looks like a pci load / dma issue

kernel driver is serious broken under load your filesystem is broken

i tried raid 0,5 and reiserfs and ext3 all the same

you can repeast it pretty easy just dd a random 2g file
cp file 
cmp file
gaia root # cmp /tmp/testfile.bla /mnt/crypt/test.bla
/tmp/testfile.bla /mnt/crypt/test.bla differ: char 768258045, line 3001530

i would change the hpt driver to experimental or make a big warning
data corruption is serious!!!!
Comment 2 Olaf Boehm 2004-03-11 13:11:24 UTC
i did even more test

first the <9c test in hpt366c. should be changed to <a0 to avoid kernel panic 
on hpt374

KERNEL PANIC SO PLEASE CHANGE THAT IN NEW KERNEL RELEASES


second

hpt374 is more or less two controller with each 4 drives
al long you copy to one controller e.g raid 0 with drives 1-4 or 5-8 
everything is ok when you are trying to copy to drives from conta and contb 
e.g. 1-8 or 1-7 with raid5 and raid 0 you get file corruption

first fix would be to diable the second controller on that drive and change 
the <9c to <a0 to solve kernel panic and drive corruption until highpoint is 
providing a working 2.6 driver

Comment 3 Jindrich Makovicka 2004-03-14 02:41:01 UTC
Created attachment 2327 [details]
patch to solve both kernel panic and dma data corruption

The patch is based on
http://www.ussg.iu.edu/hypermail/linux/kernel/0403.1/0889.html .
Additional changes are the wider range for 33MHz timing and PLL setup for
hpt374 (using the 370a timing table, as it is the same as used in the
"opensource" driver by highpoint). It has been confirmed by Olaf, that the
test_irq patch resolved the data corruption issues he experienced with 8-drive
raid.
Comment 4 Bartlomiej Zolnierkiewicz 2004-04-05 10:38:45 UTC
Fixed in 2.6.5.
Comment 5 carlojpisani 2019-05-03 21:17:55 UTC
hi
I experimented the same problem on kernel 4.16.
The problem only happens when I transfer files bigger than 500Mbyte.