Most recent kernel where this bug did not occur: see "git bisect" below Distribution: Debian Hardware Environment: sata_sis 0000:00:05.0: version 0.5 ACPI: PCI Interrupt 0000:00:05.0[A] -> GSI17 (level, low) -> IRQ 209 sata_sis 0000:00:05.0: Detected SiS 180/181 chipset in SATA mode ata1: SATA max UDMA/133 cmd 0x18B0 ctl 0x18A6 bmdma 0x1890 irq 209 ata2: SATA max UDMA/133 cmd 0x18A8 ctl 0x18A2 bmdma 0x1898 irq 209 ata1: SATA link up 1.5 Gbps (SStatus 113) ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3c69 86:3f01 87:4003 88:00ff ata1: dev 0 ATA-7, max UDMA7, 156368016 sectors: LBA48 ata1: dev 0 configured for UDMA/133 scsi0 : sata_sis ata2: SATA link down (SStatus 0) scsi1 : sata_sis Vendor: ATA Model: SAMSUNG SP0812C Rev: SU10 Type: Direct-Access ANSI SCSI revision: 05 Problem Description: The kernel detects a wrong harddisk size, which leads to "access beyond end of device" errors and thereby makes the system unusable. The corresponding partition cannot be mounted. Even cfdisk does not run, because it detects a corrupted partition table. System log: kernel: SCSI device sda: 139968963 512-byte hdwr sectors (71664 MB) kernel: sda: p7 exceeds device capacity kernel: attempt to access beyond end of device kernel: sda: rw=0, want=146833964, limit=139968963 and many more of the "attempt to access beyond end of device" errors The drive has 80MB, as stated by the manufacturer and correctly reported by the 2.6.17 and previous kernels (find "git bisect" below): SCSI device sda: 156368016 512-byte hdwr sectors (80060 MB). The disc was partitioned when the kernel saw the whole disk, i.e. 156368016 blocks. Newer kernels see only 139968963 blocks (16399053 blocks are missing), but the last partition ends after block 139968963. Partition table: First Last # Type Sector Sector Offset Length Filesystem Type (ID) Flag -- ------- ----------- ----------- ------ ----------- -------------------- ---- 1 Primary 0 146834099 63 146834100 W95 Ext'd (LBA) (0F) Boot 5 Logical 63* 1108484 63 1108422*Linux swap / So (82) None 6 Logical 1108485 32579819 63 31471335 Linux (83) None 7 Logical 32579820 146834099 63 114254280 Linux (83) None Pri/Log 146834100 156360644 0 9526545 Free Space None git bisect: d7a80dad2fe19a2b8c119c8e9cba605474a75a2b is first bad commit commit d7a80dad2fe19a2b8c119c8e9cba605474a75a2b Author: Tejun Heo <htejun@gmail.com> Date: Fri Jun 16 15:00:18 2006 +0900 [PATCH] libata: convert several bmdma-style controllers to new EH, take #3 Convert sata_sis, svw, uli and vsc drivers to new EH. All the drivers used to specify ATA_FLAG_SATA_RESET to tell libata to use SATA hardreset instead of SRST. This patch makes all the converted drivers use the standard bmdma error handler which uses both SRST and SATA hardreset. All the controllers should be able to perform SRST but still needs verification. If some of the controllers can't do SRST, it will be very easy to spot as it will show up during boot probing. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> :040000 040000 2973aec2d849baac212ea19fefc76e7ff551882ac2179659d3f5375f3ed03e687510e38f1058333f M drivers
That sounds like HPA not being unlocked. Does passing "libata.ata_ignore_hpa=1" change anything? Can you post failing kernel boot log?
Created attachment 13703 [details] Log of boot with kernel 2.6.23
Created attachment 13704 [details] boot log with libata ignore_hpa=1
As Debian uses initramfs, I put libata ignore_hpa=1 in /etc/initramfs-tools/modules however I did not see a significant change. Does the kernel print that it got the option? See the two logs of the normal boot and the one with ignore_hpa=1 in the attachments.
The parameter is not being passed to the module. libata will say something like "HPA unlocked: 139968963 -> 156368016, native 156368016". If you can't persuade initrd to specify the parameter, you can modify the default value by editing drivers/ata/libata-core.c such that ata_ignore_hpa is initialized to 1 instead of zero.
By inserting some more ata_dev_printk()s in libata-core.c, I found out that hpa_sectors = ata_set_native_max_address_ext(dev, hpa_sectors); in libata-core.c:999 (of version 2.6.23.1) returns 0. Therefore, nothing changes when setting ignore_hpa=1.
Can you please give a shot at 2.6.24-rc3? There has been major update to HPA unlocking. The previous implementation was faulty because some drives don't return new size after successful SET_MAX. Thanks.
Linux version 2.6.24-rc3 (kschneid@werckmeister) (gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3)) #1 SMP Fri Nov 30 12:58:27 CET 2007 ... ata1.00: failed to set max address (err_mask=0x1) ata1.00: device aborted resize (139968963 -> 156368016), skipping HPA handling
BTW, static int ata_ignore_hpa in libata-core.c is uninitialised in 2.6.24-rc3, is this on purpose? I initialised it to 1 for my test.
Yeap, that's intentional. HPA isn't locked by default on vanilla kernel. You need to specify libata.ignore_hpa=1. Does specifying "libata.ignore_hpa=1 libata.noacpi=1" make any difference?
No, no difference. In 2.6.23.1, ata_ignore_hpa was explicitly initialised to 0, and the explicit initialisation was removed in 2.6.24-rc3. I stumbled on that. Later I realised that it is a static variable and those are automatically initialised to zero, so it makes no difference, but it was less confusing to explicitly initialise it to zero.
Created attachment 13862 [details] sata_sis-use-HRST.patch OIC now. I'm embarrassed it took me this long to realize even though you kindly bisected it for me. The BIOS is freeze locking HPA setting so only HRST can clear it. Please test the attached patch.
No difference. Was I supposed to set ata_ignore_hpa to zero again? I left it set to 1 and set noacpi back to 0.
libata.ignore_hpa=1 should be enough. Weird, I'll prep another debug patch.
Created attachment 13886 [details] bug9397-sata_sis-use-HRST-dbg0.patch Please apply the attached patch on top of -rc4 and report kernel log. Thanks.
Created attachment 13902 [details] Kernel log of 2.6.24-rc4 with applied patch
Created attachment 13909 [details] bug9397-sata_sis-use-HRST-dbg1.patch Weird, I wonder why it's not hardresetting. Can you please revert dbg0 patch (patch -R -p1 < dbg0.patch) and try this one? Thanks.
Created attachment 13971 [details] Kernel log with 2.6.24-rc4 and dbg1.patch
Created attachment 13989 [details] bug9397-sata_sis-use-HRST-dbg2.patch Okay, this should do it. Thanks.
At least kernel version 2.6.26.3 works again, bug is solved. Thanks!
Thanks. Resolving.