Bug 8816 - ST340823A IDE disk problems with ide-disk.c, libata-core.c
Summary: ST340823A IDE disk problems with ide-disk.c, libata-core.c
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: IDE (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Bartlomiej Zolnierkiewicz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-27 02:07 UTC by Mikko Rapeli
Modified: 2007-09-12 05:14 UTC (History)
0 users

See Also:
Kernel Version: 2.6.22.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
kernel config for 2.6.22.1 (77.48 KB, text/plain)
2007-07-27 02:09 UTC, Mikko Rapeli
Details
dmesg with Debian kernel 2.6.8-4 (6.81 KB, text/plain)
2007-07-27 02:09 UTC, Mikko Rapeli
Details
dmesg with Debian kernel 2.6.10-1 (7.45 KB, text/plain)
2007-07-27 02:10 UTC, Mikko Rapeli
Details
dmesg with with 2.6.22.1 (92.10 KB, text/plain)
2007-07-27 02:11 UTC, Mikko Rapeli
Details
lshw output (9.69 KB, text/plain)
2007-07-27 02:13 UTC, Mikko Rapeli
Details
smartctl -a /dev/hdd, the drive seems ok (4.89 KB, text/plain)
2007-07-31 12:59 UTC, Mikko Rapeli
Details
dirty but working fix for 2.6.22.1 (503 bytes, patch)
2007-07-31 13:16 UTC, Mikko Rapeli
Details | Diff
dmesg with 2.6.22.1 when drivers/ide/* compiled with -DDEBUG (259.17 KB, text/plain)
2007-08-02 12:57 UTC, Mikko Rapeli
Details
libata log with 2.6.22.1 (7.64 KB, text/plain)
2007-08-05 11:43 UTC, Mikko Rapeli
Details
libsata log with 2.6.22.1 and hpa disable by default (static int ata_ignore_hpa = 1;) (21.68 KB, text/plain)
2007-08-05 11:45 UTC, Mikko Rapeli
Details
blacklist hpa buggy devices in libata with 2.6.22.1 (1.63 KB, patch)
2007-08-05 11:49 UTC, Mikko Rapeli
Details | Diff
log with 2.6.22.1 plus hpa enabled by default and the previous blacklist patch (7.79 KB, text/plain)
2007-08-05 11:50 UTC, Mikko Rapeli
Details
Patch against 2.6.23-rc5 to add ST320413A to the black list in ide-disk.c (317 bytes, patch)
2007-09-12 05:14 UTC, Jorge Juan
Details | Diff

Description Mikko Rapeli 2007-07-27 02:07:24 UTC
Most recent kernel where this bug did not occur: 

2.6.8-4 from Debian sarge with smaller size, 2.6.12-1 from Debian with larger size, Debian kernels 2.6.13-1, 2.6.14-2 hang silently during boot at the point where HPA messages occur with older kernels, Debian kernels 2.6.16-2, 2.6.18-4 and 2.6.22.1 from kernel.org fail loudly

Distribution: 

Debian etch

Hardware Environment:

           *-ide:1
                description: IDE Channel 1
                physical id: 1
                bus info: ide@1
                logical name: ide1
                clock: 33MHz
              *-disk
                   description: ATA Disk
                   product: ST340823A
                   vendor: Seagate
                   physical id: 1
                   bus info: ide@1.1
                   logical name: /dev/hdd
                   version: 3.07
                   serial: 7EF1EYYG
                   size: 37GB
                   capacity: 37GB
                   capabilities: ata dma lba iordy smart security pm partitioned partitioned:dos
                   configuration: mode=udma2 smart=on
                 *-volume:0
                      description: Linux LVM Physical Volume partition
                      physical id: 1
                      bus info: ide@1.1,1
                      logical name: /dev/hdd1
                      serial: YwheMf-s0pv-97XS-sAOB-oGSk-WcHA-ZEo8Ag
                      size: 9601MB
                      capacity: 9601MB
                      capabilities: primary multi lvm2
                 *-volume:1
                      description: Linux LVM Physical Volume partition
                      physical id: 2
                      bus info: ide@1.1,2
                      logical name: /dev/hdd2
                      serial: TEN6QM-MVlA-LDlO-Moiu-O7mU-VTfZ-pinLz6
                      size: 15GB
                      capacity: 15GB
                      capabilities: primary multi lvm2
                 *-volume:2
                      description: Linux LVM Physical Volume partition
                      physical id: 3
                      bus info: ide@1.1,3
                      logical name: /dev/hdd3
                      serial: b9Xviu-XSOP-9lz4-4gcY-tE9W-fqxA-k2amBh
                      size: 12GB
                      capacity: 12GB
                      capabilities: primary multi lvm2

Software Environment:

Debian etch with 2.6.22.1 kernel

Problem Description:

ST340823A IDE drive ( http://www.seagate.com/support/disc/ata/st340823a.html ) does not work with Debian kernels after 2.6.12-1 and with kernel.org 2.6.22.1:

Probing IDE interface ide1...
hdd: ST340823A, ATA DISK drive
hdd: selected mode 0x42
ide1 at 0x170-0x177,0x376 on irq 15
...
hdd: max request size: 128KiB
hdd: Host Protected Area detected.
	current capacity is 78165360 sectors (40020 MB)
	native  capacity is 78165361 sectors (40020 MB)
hdd: Host Protected Area disabled.
hdd: 78165361 sectors (40020 MB) w/1024KiB Cache, CHS=65535/16/63, UDMA(33)
hdd: cache flushes not supported
 hdd: hdd1 hdd2 hdd3
hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdd: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=78165360, sector=78165360
ide: failed opcode was: unknown
hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdd: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=78165360, sector=78165360
ide: failed opcode was: unknown
hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdd: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=78165360, sector=78165360
ide: failed opcode was: unknown
hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdd: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=78165360, sector=78165360
ide: failed opcode was: unknown
hdd: DMA disabled
ide1: reset: master: error (0x00?)
hdd: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdd: task_in_intr: error=0x10 { SectorIdNotFound }, LBAsect=78230639, sector=78165360
ide: failed opcode was: unknown
hdd: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdd: task_in_intr: error=0x10 { SectorIdNotFound }, LBAsect=78230639, sector=78165360
ide: failed opcode was: unknown
hdd: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdd: task_in_intr: error=0x10 { SectorIdNotFound }, LBAsect=78230639, sector=78165360
ide: failed opcode was: unknown
hdd: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdd: task_in_intr: error=0x10 { SectorIdNotFound }, LBAsect=78230639, sector=78165360
ide: failed opcode was: unknown
ide1: reset: master: error (0x00?)
end_request: I/O error, dev hdd, sector 78165360
Buffer I/O error on device hdd, logical block 78165360
end_request: I/O error, dev hdd, sector 78165360
Buffer I/O error on device hdd, logical block 78165360
end_request: I/O error, dev hdd, sector 78165360
Buffer I/O error on device hdd, logical block 78165360
end_request: I/O error, dev hdd, sector 78165360
Buffer I/O error on device hdd, logical block 78165360
end_request: I/O error, dev hdd, sector 78165296
Buffer I/O error on device hdd, logical block 78165296
end_request: I/O error, dev hdd, sector 78165297
Buffer I/O error on device hdd, logical block 78165297
end_request: I/O error, dev hdd, sector 78165298
...

Disks filesystem is fsck clean under 2.6.8-4 and badblocks reports no errors with read and non-destructive write tests for the whole drive.

Many users have been complaining about these drives and the problem seems to be the sector size, which Linux for some odd reason sees as being too big by one sector:

https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.15/+bug/26119
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=401035
http://groups.google.fi/group/fa.linux.kernel/browse_thread/thread/68eaef2909603238/fd5574cfb1fb43a0?lnk=st&q=linux+ST340823A&rnum=13&hl=en&fwc=1

I think at least three factors contribute to this:

1) if kernel code is correct, disk reports two different sizes and larger one is incorrect, from dmesg on 2.6.8-4:

        current capacity is 78165360 sectors (40020 MB)
        native  capacity is 78165361 sectors (40020 MB)

Disk manual ( http://www.seagate.com/support/disc/manuals/ata/u5pmb01.pdf ) says that size n = 78165360 and that blocks 0 to n-1 are accessible with LBA mode, what ever it means.

2) kernels after 2.6.10 use the larger size as the disk size, diff between 2.6.8-4 and 2.6.10-1 Debian kernel dmesgs:

+Probing IDE interface ide1...
+ide1: Wait for ready failed before probe !
 hdd: ST340823A, ATA DISK drive
 ide1 at 0x170-0x177,0x376 on irq 15
 hdd: max request size: 128KiB
 hdd: Host Protected Area detected.
        current capacity is 78165360 sectors (40020 MB)
        native  capacity is 78165361 sectors (40020 MB)
-hdd: 78165360 sectors (40020 MB) w/1024KiB Cache, CHS=65535/16/63, UDMA(33)
+hdd: Host Protected Area disabled.
+hdd: 78165361 sectors (40020 MB) w/1024KiB Cache, CHS=65535/16/63, UDMA(33)
+hdd: cache flushes not supported
  /dev/ide/host0/bus1/target1/lun0: p1 p2 p3
+Probing IDE interface ide2...
+Probing IDE interface ide3...
+Probing IDE interface ide4...
+Probing IDE interface ide5...

This alone does not seem to be a problem, although a file system accessing the last sector might wreck havoc, but at least I have not run into this. This size increase is due to "enable stroke by default" ( http://linux.bkbits.net:8080/linux-2.6.13-stable/?PAGE=cset&REV=1.1938.270.9 ) which has had some issues before ( http://marc.info/?l=linux-kernel&m=113060651817474&w=2 ).

3) Presumably, I'm speculating here, partition checking code starts reading the last sectors without error checking with 2.6.13 and with error checking after 2.6.16 which results in the disk failing loudly since the last sector does not exists while kernel thinks is should. 2.6.22.1 shouts:

hdd: Host Protected Area detected.
	current capacity is 78165360 sectors (40020 MB)
	native  capacity is 78165361 sectors (40020 MB)
hdd: Host Protected Area disabled.
hdd: 78165361 sectors (40020 MB) w/1024KiB Cache, CHS=65535/16/63, UDMA(33)
hdd: cache flushes not supported
 hdd: hdd1 hdd2 hdd3
hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdd: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=78165360, sector=78165360
ide: failed opcode was: unknown
hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdd: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=78165360, sector=78165360
ide: failed opcode was: unknown
...

Steps to reproduce:

Try to use this drive with newer kernels than 2.6.12, presumably.

Workaround(s):

My first trial with 2.6.22.1 resulted in pata+libata being used and the drive worked fine, lvm partitions found and a clean filesystem too. After that 2.6.22.1 started to slow down to be unusable, but that may be a different issue.

hdd=noprobe does not help and there seems to be no boot options with might help.

Dumb questions:

Why size increased by one sector between 2.6.8 and 2.6.10?
stroke was enabled but why?

http://linux.bkbits.net:8080/linux-2.6.13-stable/drivers/ide/ide-disk.c?PAGE=diffs&REV=1.108
http://linux.bkbits.net:8080/linux-2.6.13-stable/drivers/ide/ide-disk.c?PAGE=anno&REV=1.108

What's the difference between HPA current and native capacity?
Does the drive report these current and native capacities correctly?
How to force the smaller capacity if the drive reports it wrong?

Why kernels after 2.6.12 access end of the disk? 
To support GUID/GDP/EFI partition type(s)?
How to disable these scans?
Could a filled up filesystem access the last sector and fail miserably even if partition scan didn't?
Comment 1 Mikko Rapeli 2007-07-27 02:09:08 UTC
Created attachment 12178 [details]
kernel config for 2.6.22.1
Comment 2 Mikko Rapeli 2007-07-27 02:09:46 UTC
Created attachment 12179 [details]
dmesg with Debian kernel 2.6.8-4
Comment 3 Mikko Rapeli 2007-07-27 02:10:17 UTC
Created attachment 12180 [details]
dmesg with Debian kernel 2.6.10-1
Comment 4 Mikko Rapeli 2007-07-27 02:11:53 UTC
Created attachment 12181 [details]
dmesg with with 2.6.22.1
Comment 5 Mikko Rapeli 2007-07-27 02:13:18 UTC
Created attachment 12182 [details]
lshw output
Comment 6 Mikko Rapeli 2007-07-31 12:59:20 UTC
Created attachment 12213 [details]
smartctl -a /dev/hdd, the drive seems ok
Comment 7 Mikko Rapeli 2007-07-31 13:16:43 UTC
Created attachment 12214 [details]
dirty but working fix for 2.6.22.1

This disables the old stroke feature, what ever it did. Linus suggested this
as a dirty fix for silent reboot hangs that were reported to exists between
2.6.10 and 2.6.14, http://marc.info/?l=linux-kernel&m=113060651817474&w=2 .
The hang is very similar to what I described on the debian kernels between
2.6.12 and 2.6.16.
Comment 8 Mikko Rapeli 2007-08-02 12:57:05 UTC
Created attachment 12235 [details]
dmesg with 2.6.22.1 when drivers/ide/* compiled with -DDEBUG
Comment 9 Mikko Rapeli 2007-08-03 00:08:11 UTC
Discussion about this bug:
http://marc.info/?t=118596132600016&r=1&w=2

And the fix for 2.6.23-rc1:
http://marc.info/?l=linux-ide&m=118609784802744&w=2

(Though 2.6.18 and 2.6.22.1 would propably like http://marc.info/?l=linux-ide&m=118609784802744&w=2 better.)
Comment 10 Mikko Rapeli 2007-08-05 11:43:42 UTC
Created attachment 12251 [details]
libata log with 2.6.22.1
Comment 11 Mikko Rapeli 2007-08-05 11:45:32 UTC
Created attachment 12252 [details]
libsata log with 2.6.22.1 and hpa disable by default (static int ata_ignore_hpa = 1;)
Comment 12 Mikko Rapeli 2007-08-05 11:49:52 UTC
Created attachment 12253 [details]
blacklist hpa buggy devices in libata with 2.6.22.1

Tested and gives a lot less warnings than normal hpa enabled kernel, though
the lvm partitions even without this patch as the previous log shows.
Comment 13 Mikko Rapeli 2007-08-05 11:50:56 UTC
Created attachment 12254 [details]
log with 2.6.22.1 plus hpa enabled by default and the previous blacklist patch
Comment 14 Jorge Juan 2007-09-12 05:14:03 UTC
Created attachment 12803 [details]
Patch against 2.6.23-rc5 to add ST320413A to the black list in ide-disk.c

ST320413A has the same problem as ST340823A. Please see 
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.15/+bug/26119

Thanks.

Note You need to log in before you can comment on or make changes to this bug.