Bug 9397

Summary: HPA isn't being unlocked (probably distro problem)
Product: SCSI Drivers Reporter: klaus.schneider
Component: OtherAssignee: Tejun Heo (htejun)
Status: RESOLVED CODE_FIX    
Severity: low CC: htejun, jgarzik
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: since commit d7a80dad2fe19a2b8c119c8e9cba605474a75a2b Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Log of boot with kernel 2.6.23
boot log with libata ignore_hpa=1
sata_sis-use-HRST.patch
bug9397-sata_sis-use-HRST-dbg0.patch
Kernel log of 2.6.24-rc4 with applied patch
bug9397-sata_sis-use-HRST-dbg1.patch
Kernel log with 2.6.24-rc4 and dbg1.patch
bug9397-sata_sis-use-HRST-dbg2.patch

Description klaus.schneider 2007-11-16 10:53:56 UTC
Most recent kernel where this bug did not occur: see "git bisect" below
Distribution: Debian
Hardware Environment: 
sata_sis 0000:00:05.0: version 0.5
ACPI: PCI Interrupt 0000:00:05.0[A] -> GSI17 (level, low) -> IRQ 209
sata_sis 0000:00:05.0: Detected SiS 180/181 chipset in SATA mode
ata1: SATA max UDMA/133 cmd 0x18B0 ctl 0x18A6 bmdma 0x1890 irq 209
ata2: SATA max UDMA/133 cmd 0x18A8 ctl 0x18A2 bmdma 0x1898 irq 209
ata1: SATA link up 1.5 Gbps (SStatus 113)
ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3c69 86:3f01 87:4003 88:00ff
ata1: dev 0 ATA-7, max UDMA7, 156368016 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : sata_sis
ata2: SATA link down (SStatus 0)
scsi1 : sata_sis
  Vendor: ATA       Model: SAMSUNG SP0812C  Rev: SU10
  Type:   Direct-Access  ANSI SCSI revision: 05

Problem Description:
The kernel detects a wrong harddisk size, which leads
to "access beyond end of device" errors and thereby makes the system
unusable. The corresponding partition cannot be mounted. Even cfdisk
does not run, because it detects a corrupted partition table.

System log:
kernel: SCSI device sda: 139968963 512-byte hdwr sectors (71664 MB)
kernel:  sda: p7 exceeds device capacity
kernel: attempt to access beyond end of device
kernel: sda: rw=0, want=146833964, limit=139968963
and many more of the "attempt to access beyond end of device" errors

The drive has 80MB, as stated by the manufacturer and correctly reported by 
the 2.6.17 and previous kernels (find "git bisect" below):
SCSI device sda: 156368016 512-byte hdwr sectors (80060 MB).

The disc was partitioned when the kernel saw the whole disk, i.e. 156368016
blocks. Newer kernels see only 139968963 blocks (16399053 blocks are missing), but the last partition ends after block 139968963.

Partition table:
               First       Last
 # Type       Sector      Sector   Offset    Length   Filesystem Type (ID) 
Flag
-- ------- ----------- ----------- ------ ----------- -------------------- ----
 1 Primary           0   146834099     63   146834100 W95 Ext'd (LBA) (0F) Boot
 5 Logical          63*    1108484     63     1108422*Linux swap / So (82) None
 6 Logical     1108485    32579819     63    31471335 Linux (83)           None
 7 Logical    32579820   146834099     63   114254280 Linux (83)           None
   Pri/Log   146834100   156360644      0     9526545 Free Space           None

git bisect:

d7a80dad2fe19a2b8c119c8e9cba605474a75a2b is first bad commit
commit d7a80dad2fe19a2b8c119c8e9cba605474a75a2b
Author: Tejun Heo <htejun@gmail.com>
Date:   Fri Jun 16 15:00:18 2006 +0900

    [PATCH] libata: convert several bmdma-style controllers to new EH, take #3

    Convert sata_sis, svw, uli and vsc drivers to new EH.  All the drivers
    used to specify ATA_FLAG_SATA_RESET to tell libata to use SATA
    hardreset instead of SRST.  This patch makes all the converted drivers
    use the standard bmdma error handler which uses both SRST and SATA
    hardreset.

    All the controllers should be able to perform SRST but still needs
    verification.  If some of the controllers can't do SRST, it will be
    very easy to spot as it will show up during boot probing.

    Signed-off-by: Tejun Heo <htejun@gmail.com>
    Signed-off-by: Jeff Garzik <jeff@garzik.org>

:040000 040000 
2973aec2d849baac212ea19fefc76e7ff551882ac2179659d3f5375f3ed03e687510e38f1058333f 
M      drivers
Comment 1 Tejun Heo 2007-11-16 18:59:10 UTC
That sounds like HPA not being unlocked.  Does passing "libata.ata_ignore_hpa=1" change anything?  Can you post failing kernel boot log?
Comment 2 klaus.schneider 2007-11-23 00:16:05 UTC
Created attachment 13703 [details]
Log of boot with kernel 2.6.23
Comment 3 klaus.schneider 2007-11-23 00:17:39 UTC
Created attachment 13704 [details]
boot log with libata ignore_hpa=1
Comment 4 klaus.schneider 2007-11-23 00:18:37 UTC
As Debian uses initramfs, I put 
libata ignore_hpa=1
in /etc/initramfs-tools/modules
however I did not see a significant change. Does the kernel print that it got the option?
See the two logs of the normal boot and the one with ignore_hpa=1 in the attachments.
Comment 5 Tejun Heo 2007-11-23 18:42:54 UTC
The parameter is not being passed to the module.  libata will say something like "HPA unlocked: 139968963 -> 156368016, native 156368016".  If you can't persuade initrd to specify the parameter, you can modify the default value by editing drivers/ata/libata-core.c such that ata_ignore_hpa is initialized to 1 instead of zero.
Comment 6 klaus.schneider 2007-11-30 01:58:20 UTC
By inserting some more ata_dev_printk()s in libata-core.c, I found out that
hpa_sectors = ata_set_native_max_address_ext(dev, hpa_sectors);
in libata-core.c:999 (of version 2.6.23.1) returns 0. Therefore, nothing changes when setting ignore_hpa=1.
Comment 7 Tejun Heo 2007-11-30 02:05:30 UTC
Can you please give a shot at 2.6.24-rc3?  There has been major update to HPA unlocking.  The previous implementation was faulty because some drives don't return new size after successful SET_MAX.  Thanks.
Comment 8 klaus.schneider 2007-12-03 02:23:00 UTC
Linux version 2.6.24-rc3 (kschneid@werckmeister) (gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3)) #1 SMP Fri Nov 30 12:58:27 CET 2007
...
ata1.00: failed to set max address (err_mask=0x1)
ata1.00: device aborted resize (139968963 -> 156368016), skipping HPA handling
Comment 9 klaus.schneider 2007-12-03 02:25:25 UTC
BTW, static int ata_ignore_hpa in libata-core.c is uninitialised in 2.6.24-rc3, is this on purpose? I initialised it to 1 for my test.
Comment 10 Tejun Heo 2007-12-03 16:51:41 UTC
Yeap, that's intentional.  HPA isn't locked by default on vanilla kernel.  You need to specify libata.ignore_hpa=1.

Does specifying "libata.ignore_hpa=1 libata.noacpi=1" make any difference?
Comment 11 klaus.schneider 2007-12-05 04:23:50 UTC
No, no difference.

In 2.6.23.1, ata_ignore_hpa was explicitly initialised to 0, and the explicit initialisation was removed in 2.6.24-rc3. I stumbled on that. Later I realised that it is a static variable and those are automatically initialised to zero, so it makes no difference, but it was less confusing to explicitly initialise it to zero.
Comment 12 Tejun Heo 2007-12-05 04:48:57 UTC
Created attachment 13862 [details]
sata_sis-use-HRST.patch

OIC now.  I'm embarrassed it took me this long to realize even though you kindly bisected it for me.  The BIOS is freeze locking HPA setting so only HRST can clear it.  Please test the attached patch.
Comment 13 klaus.schneider 2007-12-06 01:05:31 UTC
No difference. Was I supposed to set ata_ignore_hpa to zero again? I left it set to 1 and set noacpi back to 0.
Comment 14 Tejun Heo 2007-12-06 02:27:12 UTC
libata.ignore_hpa=1 should be enough.  Weird, I'll prep another debug patch.
Comment 15 Tejun Heo 2007-12-06 02:29:39 UTC
Created attachment 13886 [details]
bug9397-sata_sis-use-HRST-dbg0.patch

Please apply the attached patch on top of -rc4 and report kernel log.  Thanks.
Comment 16 klaus.schneider 2007-12-07 05:20:09 UTC
Created attachment 13902 [details]
Kernel log of 2.6.24-rc4 with applied patch
Comment 17 Tejun Heo 2007-12-07 15:16:51 UTC
Created attachment 13909 [details]
bug9397-sata_sis-use-HRST-dbg1.patch

Weird, I wonder why it's not hardresetting.  Can you please revert dbg0 patch (patch -R -p1 < dbg0.patch) and try this one?  Thanks.
Comment 18 klaus.schneider 2007-12-11 03:56:02 UTC
Created attachment 13971 [details]
Kernel log with 2.6.24-rc4 and dbg1.patch
Comment 19 Tejun Heo 2007-12-11 23:01:44 UTC
Created attachment 13989 [details]
bug9397-sata_sis-use-HRST-dbg2.patch

Okay, this should do it.  Thanks.
Comment 20 klaus.schneider 2008-09-11 01:03:04 UTC
At least kernel version 2.6.26.3 works again, bug is solved. Thanks!
Comment 21 Tejun Heo 2008-09-11 01:08:15 UTC
Thanks.  Resolving.