Bug 9412

Summary: commit a878539ef994787c447a98c2e3ba0fe3dad984ec breaks boot on SB600 AHCI
Product: Platform Specific/Hardware Reporter: Srihari Vijayaraghavan (sriharivijayaraghavan)
Component: x86-64Assignee: Shane Huang (shane.huang)
Status: CLOSED CODE_FIX    
Severity: normal CC: alan, henry.su, jgarzik, mingo, shane.huang, sriharivijayaraghavan, tj
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: lspci -xxx info of the hardware
ahci: SB600 workaround is suspect... play it safe for now
Only disable 64 bit DMA for old revisions as a temp workaround
lspci dump for ASUS M2A-VM
dmidecode for M2A-VM with BIOS rev.2001
dmesg for M2A-VM with BIOS rev.2001
dmidecode for M2A-VM with BIOS rev.0901
dmesg for M2A-VM with BIOS rev.0901
Restore SB600 SATA controller 64 bit DMA
asus_m2a_vm_quirk.patch
asus_m2a_vm_quirk.patch (updated)
dmidecode for M2A-VM with BIOS rev.1501

Description Srihari Vijayaraghavan 2007-11-19 14:43:46 UTC
Most recent kernel where this bug did not occur: 2.6.23
Distribution: Fedora 8
Hardware Environment: Athlon AM2, AMD690 chipset, 4 GB RAM, SB600 AHCI controller, Segate SATA hard drives
Software Environment: Linux 2.6.24-rc1+
Problem Description: Ever since that commit, Linux fails to boot - constantly reseting the drives during bootup such that power reset is the only option. With mem=3500M, the same kernel image works fine.

Steps to reproduce:
1. Boot Linux-2.6.24-rc1+
2. Observe it constantly resets the hard drives & doesn't successfully boot

For more info, please refer to this thread on Linux-IDE (incl. comments from the contributors of this changeset):  http://marc.info/?l=linux-ide&m=119456211111119&w=2

Thanks

Srihari
Comment 1 H. Peter Anvin 2007-11-19 14:55:34 UTC
Please post your complete kernel bootup messages, as well as the content of /proc/mtrr.
Comment 2 Srihari Vijayaraghavan 2007-11-19 16:41:18 UTC
(copied from the archives)

Of the failure:
Linux version 2.6.24-rc2 (hari@desktop.localdomain) (gcc version 4.1.2 20070925 (Red \
Hat 4.1.2-33)) #3 SMP Tue Nov 13 21:35:09 EST 2007 Command line: ro root=LABEL=/ 1 \
console=ttyS0,115200 console=tty0 BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
 BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable)
 BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS)
 BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data)
 BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000120000000 (usable)
end_pfn_map = 1179648
DMI 2.4 present.
ACPI: RSDP 000F8130, 0024 (r2 ATI   )
ACPI: XSDT DFEE3100, 004C (r1 ATI    ASUSACPI 42302E31 AWRD        0)
ACPI: FACP DFEE8400, 00F4 (r3 ATI    ASUSACPI 42302E31 AWRD        0)
ACPI: DSDT DFEE3280, 5136 (r1 ATI    ASUSACPI     1000 MSFT  3000000)
ACPI: FACS DFEE0000, 0040
ACPI: SSDT DFEE8600, 02CC (r1 PTLTD  POWERNOW        1  LTP        1)
ACPI: HPET DFEE8940, 0038 (r1 ATI    ASUSACPI 42302E31 AWRD       98)
ACPI: MCFG DFEE89C0, 003C (r1 ATI    ASUSACPI 42302E31 AWRD        0)
ACPI: APIC DFEE8540, 0068 (r1 ATI    ASUSACPI 42302E31 AWRD        0)
Scanning NUMA topology in Northbridge 24
CPU has 2 num_cores
No NUMA configuration found
Faking a node at 0000000000000000-0000000120000000
Bootmem setup node 0 0000000000000000-0000000120000000
Zone PFN ranges:
  DMA             0 ->     4096
  DMA32        4096 ->  1048576
  Normal    1048576 ->  1179648
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
    0:        0 ->      159
    0:      256 ->   917216
    0:  1048576 ->  1179648
ATI board detected. Disabling timer routing over 8254.
ACPI: PM-Timer IO Port: 0x4008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
Setting APIC routing to flat
ACPI: HPET id: 0x8200 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at f1000000 (gap: f0000000:ec00000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
PERCPU: Allocating 32848 bytes of per cpu data
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1030058
Policy zone: Normal
Kernel command line: ro root=LABEL=/ 1 console=ttyS0,115200 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
TSC calibrated against PM_TIMER
Marking TSC unstable due to TSCs unsynchronized
time.c: Detected 2799.888 MHz processor.
Console: colour VGA+ 80x25
console [tty0] enabled
console [ttyS0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:    8
... MAX_LOCK_DEPTH:          30
... MAX_LOCKDEP_KEYS:        2048
... CLASSHASH_SIZE:           1024
... MAX_LOCKDEP_ENTRIES:     8192
... MAX_LOCKDEP_CHAINS:      16384
... CHAINHASH_SIZE:          8192
 memory used by lock dependency info: 1648 kB
 per task-struct memory footprint: 1680 bytes
Checking aperture...
CPU 0: aperture @ c000000 size 32 MB
Aperture too small (32 MB)
No AGP bridge found
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ c000000
Memory: 4055984k/4718592k available (2146k kernel code, 136780k reserved, 1273k data, \
                296k init)
SLUB: Genslabs=12, HWalign=64, Order=0-1, MinObjects=4, CPUs=2, Nodes=1
Calibrating delay using timer specific routine.. 5604.88 BogoMIPS (lpj=11209778)
Security Framework initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys cpuacct
Initializing cgroup subsys ns
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0/0 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
lockdep: not fixing up alternatives.
ACPI: Core revision 20070126
Using local APIC timer interrupts.
Detected 12.499 MHz APIC timer.
lockdep: not fixing up alternatives.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 5599.85 BogoMIPS (lpj=11199710)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ stepping 03
Brought up 2 CPUs
net_namespace: 144 bytes
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG at e0000000 - efffffff
PCI: No mmconfig possible on device 00:18
ACPI: Interpreter enabled
ACPI: (supports S0 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Transparent bridge - 0000:00:14.4
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 10 *11)
ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11) *0, disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 12 devices
ACPI: ACPI bus type pnp unregistered
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
DMAR:No DMAR devices found
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ c000000 size 65536 KB
PCI-DMA: using GART IOMMU.
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
ACPI: RTC can wake from S4
system 00:01: ioport range 0x4100-0x411f has been reserved
system 00:01: ioport range 0x228-0x22f has been reserved
system 00:01: ioport range 0x40b-0x40b has been reserved
system 00:01: ioport range 0x4d6-0x4d6 has been reserved
system 00:01: ioport range 0xc00-0xc01 has been reserved
system 00:01: ioport range 0xc14-0xc14 has been reserved
system 00:01: ioport range 0xc50-0xc52 has been reserved
system 00:01: ioport range 0xc6c-0xc6d has been reserved
system 00:0a: iomem range 0xe0000000-0xefffffff could not be reserved
system 00:0b: iomem range 0xf0000-0xf3fff could not be reserved
system 00:0b: iomem range 0xf4000-0xf7fff could not be reserved
system 00:0b: iomem range 0xf8000-0xfbfff could not be reserved
system 00:0b: iomem range 0xfc000-0xfffff could not be reserved
PCI: Bridge: 0000:00:02.0
  IO window: d000-dfff
Time: acpi_pm clocksource has been installed.
  MEM window: fdc00000-fdcfffff
  PREFETCH window: f0000000-f7ffffff
PCI: Bridge: 0000:00:07.0
  IO window: e000-efff
  MEM window: fdb00000-fdbfffff
  PREFETCH window: fdf00000-fdffffff
PCI: Bridge: 0000:00:14.4
  IO window: c000-cfff
  MEM window: fde00000-fdefffff
  PREFETCH window: fdd00000-fddfffff
NET: Registered protocol family 2
IP route cache hash table entries: 131072 (order: 8, 1048576 bytes)
TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
TCP bind hash table entries: 65536 (order: 9, 3670016 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
checking if image is initramfs... it is
Freeing initrd memory: 3213k freed
Initializing RT-Tester: OK
audit: initializing netlink socket (disabled)
audit(1194950790.015:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
io scheduler noop registered
io scheduler cfq registered (default)
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Real Time Clock Driver v1.12ac
hpet_acpi_add: no address or irqs in _CRS
Non-volatile memory driver v1.2
Linux agpgart interface v0.102
Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing enabled
ÿserial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize
usbcore: registered new interface driver libusual
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 1
Freeing unused kernel memory: 296k freed
ACPI: PCI Interrupt 0000:00:13.5[D] -> GSI 19 (level, low) -> IRQ 19
ehci_hcd 0000:00:13.5: EHCI Host Controller
ehci_hcd 0000:00:13.5: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:13.5: debug port 1
ehci_hcd 0000:00:13.5: irq 19, io mem 0xfe029000
ehci_hcd 0000:00:13.5: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 10 ports detected
ACPI: PCI Interrupt 0000:00:13.0[A] -> GSI 16 (level, low) -> IRQ 16
ohci_hcd 0000:00:13.0: OHCI Host Controller
ohci_hcd 0000:00:13.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:13.0: irq 16, io mem 0xfe02e000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:13.1[B] -> GSI 17 (level, low) -> IRQ 17
ohci_hcd 0000:00:13.1: OHCI Host Controller
ohci_hcd 0000:00:13.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:00:13.1: irq 17, io mem 0xfe02d000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:13.2[C] -> GSI 18 (level, low) -> IRQ 18
ohci_hcd 0000:00:13.2: OHCI Host Controller
ohci_hcd 0000:00:13.2: new USB bus registered, assigned bus number 4
ohci_hcd 0000:00:13.2: irq 18, io mem 0xfe02c000
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:13.3[B] -> GSI 17 (level, low) -> IRQ 17
ohci_hcd 0000:00:13.3: OHCI Host Controller
ohci_hcd 0000:00:13.3: new USB bus registered, assigned bus number 5
ohci_hcd 0000:00:13.3: irq 17, io mem 0xfe02b000
usb 3-2: new low speed USB device using ohci_hcd and address 2
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:13.4[C] -> GSI 18 (level, low) -> IRQ 18
ohci_hcd 0000:00:13.4: OHCI Host Controller
ohci_hcd 0000:00:13.4: new USB bus registered, assigned bus number 6
ohci_hcd 0000:00:13.4: irq 18, io mem 0xfe02a000
usb 3-2: configuration #1 chosen from 1 choice
input: Logitech USB Receiver as /class/input/input0
input: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:13.1-2
usb usb6: configuration #1 chosen from 1 choice
hub 6-0:1.0: USB hub found
Fixing up Logitech keyboard report descriptor
hub 6-0:1.0: 2 ports detected
input: Logitech USB Receiver as /class/input/input1
input,hiddev96: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:13.1-2
USB Universal Host Controller Interface driver v3.0
SCSI subsystem initialized
ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level, low) -> IRQ 22
ahci 0000:00:12.0: controller can't do 64bit DMA, forcing 32bit
ahci 0000:00:12.0: controller can't do PMP, turning off CAP_PMP
ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
ahci 0000:00:12.0: flags: ncq sntf ilck pm led clo pio slum part 
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
ata1: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f100 irq 22
ata2: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f180 irq 22
ata3: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f200 irq 22
ata4: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f280 irq 22
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7: ST3250620NS, 3.AEG, max UDMA/133
ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATAPI: ASUS    DRW-1814BLT, 1.04, max UDMA/66, ATAPI AN
ata2.00: configured for UDMA/66
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: ATA-7: ST3250620NS, 3.AEG, max UDMA/133
ata3.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133
ata4: SATA link down (SStatus 0 SControl 300)
scsi 0:0:0:0: Direct-Access     ATA      ST3250620NS      3.AE PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or \
FUA sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or \
FUA  sda: sda1 sda2 sda3 sda4
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 1:0:0:0: CD-ROM            ASUS     DRW-1814BLT      1.04 PQ: 0 ANSI: 5
scsi 2:0:0:0: Direct-Access     ATA      ST3250620NS      3.AE PQ: 0 ANSI: 5
sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \
FUA sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \
FUA  sdb: sdb1 sdb2 sdb3 sdb4
sd 2:0:0:0: [sdb] Attached SCSI disk
ACPI: PCI Interrupt 0000:00:14.1[A] -> GSI 16 (level, low) -> IRQ 16
scsi4 : pata_atiixp
scsi5 : pata_atiixp
ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf900 irq 14
ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf908 irq 15
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata3.00: cmd 60/00:00:25:71:d2/01:00:0e:00:00/40 tag 0 cdb 0x0 data 131072 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3: soft resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: qc timeout (cmd 0xec)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: hard resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: configured for UDMA/133
ata3: EH complete
sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \
                FUA
ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata3.00: cmd 60/00:00:25:71:d2/01:00:0e:00:00/40 tag 0 cdb 0x0 data 131072 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3: soft resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: qc timeout (cmd 0xec)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: hard resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: configured for UDMA/133
ata3: EH complete
sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \
                FUA
ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata3.00: cmd 60/00:00:25:71:d2/01:00:0e:00:00/40 tag 0 cdb 0x0 data 131072 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3: soft resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: qc timeout (cmd 0xec)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: hard resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: configured for UDMA/133
ata3: EH complete
sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \
                FUA
ata3.00: NCQ disabled due to excessive errors
ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata3.00: cmd 60/00:00:25:71:d2/01:00:0e:00:00/40 tag 0 cdb 0x0 data 131072 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3: soft resetting link
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: qc timeout (cmd 0xec)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: hard resetting link
... and errors continue for many minutes at which point did a power reset ...

Of the success:
Linux version 2.6.24-rc2 (hari@desktop.localdomain) (gcc version 4.1.2 20070925 (Red \
Hat 4.1.2-33)) #3 SMP Tue Nov 13 21:35:09 EST 2007 Command line: ro root=LABEL=/ 1 \
mem=3500M console=ttyS0,115200 console=tty0 BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
 BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable)
 BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS)
 BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data)
 BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000120000000 (usable)
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP 000F8130, 0024 (r2 ATI   )
ACPI: XSDT DFEE3100, 004C (r1 ATI    ASUSACPI 42302E31 AWRD        0)
ACPI: FACP DFEE8400, 00F4 (r3 ATI    ASUSACPI 42302E31 AWRD        0)
ACPI: DSDT DFEE3280, 5136 (r1 ATI    ASUSACPI     1000 MSFT  3000000)
ACPI: FACS DFEE0000, 0040
ACPI: SSDT DFEE8600, 02CC (r1 PTLTD  POWERNOW        1  LTP        1)
ACPI: HPET DFEE8940, 0038 (r1 ATI    ASUSACPI 42302E31 AWRD       98)
ACPI: MCFG DFEE89C0, 003C (r1 ATI    ASUSACPI 42302E31 AWRD        0)
ACPI: APIC DFEE8540, 0068 (r1 ATI    ASUSACPI 42302E31 AWRD        0)
Scanning NUMA topology in Northbridge 24
CPU has 2 num_cores
No NUMA configuration found
Faking a node at 0000000000000000-00000000dac00000
Bootmem setup node 0 0000000000000000-00000000dac00000
Zone PFN ranges:
  DMA             0 ->     4096
  DMA32        4096 ->  1048576
  Normal    1048576 ->  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0:        0 ->      159
    0:      256 ->   896000
ATI board detected. Disabling timer routing over 8254.
ACPI: PM-Timer IO Port: 0x4008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
Setting APIC routing to flat
ACPI: HPET id: 0x8200 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at f1000000 (gap: f0000000:ec00000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
PERCPU: Allocating 32848 bytes of per cpu data
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 881649
Policy zone: DMA32
Kernel command line: ro root=LABEL=/ 1 mem=3500M console=ttyS0,115200 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
TSC calibrated against PM_TIMER
Marking TSC unstable due to TSCs unsynchronized
time.c: Detected 2799.897 MHz processor.
Console: colour VGA+ 80x25
console [tty0] enabled
console [ttyS0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:    8
... MAX_LOCK_DEPTH:          30
... MAX_LOCKDEP_KEYS:        2048
... CLASSHASH_SIZE:           1024
... MAX_LOCKDEP_ENTRIES:     8192
... MAX_LOCKDEP_CHAINS:      16384
... CHAINHASH_SIZE:          8192
 memory used by lock dependency info: 1648 kB
 per task-struct memory footprint: 1680 bytes
Checking aperture...
CPU 0: aperture @ c000000 size 32 MB
Aperture too small (32 MB)
No AGP bridge found
Memory: 3520612k/3584000k available (2146k kernel code, 63000k reserved, 1273k data, \
                296k init)
SLUB: Genslabs=12, HWalign=64, Order=0-1, MinObjects=4, CPUs=2, Nodes=1
Calibrating delay using timer specific routine.. 5604.91 BogoMIPS (lpj=11209836)
Security Framework initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys cpuacct
Initializing cgroup subsys ns
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0/0 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
lockdep: not fixing up alternatives.
ACPI: Core revision 20070126
Using local APIC timer interrupts.
Detected 12.499 MHz APIC timer.
lockdep: not fixing up alternatives.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 5599.85 BogoMIPS (lpj=11199710)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ stepping 03
Brought up 2 CPUs
net_namespace: 144 bytes
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG at e0000000 - efffffff
PCI: No mmconfig possible on device 00:18
ACPI: Interpreter enabled
ACPI: (supports S0 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Transparent bridge - 0000:00:14.4
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 10 *11)
ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11) *0, disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 12 devices
ACPI: ACPI bus type pnp unregistered
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
DMAR:No DMAR devices found
ACPI: RTC can wake from S4
system 00:01: ioport range 0x4100-0x411f has been reserved
system 00:01: ioport range 0x228-0x22f has been reserved
system 00:01: ioport range 0x40b-0x40b has been reserved
system 00:01: ioport range 0x4d6-0x4d6 has been reserved
system 00:01: ioport range 0xc00-0xc01 has been reserved
system 00:01: ioport range 0xc14-0xc14 has been reserved
system 00:01: ioport range 0xc50-0xc52 has been reserved
system 00:01: ioport range 0xc6c-0xc6d has been reserved
system 00:0a: iomem range 0xe0000000-0xefffffff could not be reserved
system 00:0b: iomem range 0xf0000-0xf3fff could not be reserved
system 00:0b: iomem range 0xf4000-0xf7fff could not be reserved
system 00:0b: iomem range 0xf8000-0xfbfff could not be reserved
system 00:0b: iomem range 0xfc000-0xfffff could not be reserved
PCI: Bridge: 0000:00:02.0
Time: acpi_pm clocksource has been installed.
  IO window: d000-dfff
  MEM window: fdc00000-fdcfffff
  PREFETCH window: f0000000-f7ffffff
PCI: Bridge: 0000:00:07.0
  IO window: e000-efff
  MEM window: fdb00000-fdbfffff
  PREFETCH window: fdf00000-fdffffff
PCI: Bridge: 0000:00:14.4
  IO window: c000-cfff
  MEM window: fde00000-fdefffff
  PREFETCH window: fdd00000-fddfffff
NET: Registered protocol family 2
IP route cache hash table entries: 131072 (order: 8, 1048576 bytes)
TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
TCP bind hash table entries: 65536 (order: 9, 3670016 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
checking if image is initramfs... it is
Freeing initrd memory: 3213k freed
Initializing RT-Tester: OK
audit: initializing netlink socket (disabled)
audit(1194951795.951:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
io scheduler noop registered
io scheduler cfq registered (default)
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Real Time Clock Driver v1.12ac
hpet_acpi_add: no address or irqs in _CRS
Non-volatile memory driver v1.2
Linux agpgart interface v0.102
Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing enabled
ÿserial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize
usbcore: registered new interface driver libusual
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 1
Freeing unused kernel memory: 296k freed
ACPI: PCI Interrupt 0000:00:13.5[D] -> GSI 19 (level, low) -> IRQ 19
ehci_hcd 0000:00:13.5: EHCI Host Controller
ehci_hcd 0000:00:13.5: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:13.5: debug port 1
ehci_hcd 0000:00:13.5: irq 19, io mem 0xfe029000
ehci_hcd 0000:00:13.5: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 10 ports detected
ACPI: PCI Interrupt 0000:00:13.0[A] -> GSI 16 (level, low) -> IRQ 16
ohci_hcd 0000:00:13.0: OHCI Host Controller
ohci_hcd 0000:00:13.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:13.0: irq 16, io mem 0xfe02e000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:13.1[B] -> GSI 17 (level, low) -> IRQ 17
ohci_hcd 0000:00:13.1: OHCI Host Controller
ohci_hcd 0000:00:13.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:00:13.1: irq 17, io mem 0xfe02d000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:13.2[C] -> GSI 18 (level, low) -> IRQ 18
ohci_hcd 0000:00:13.2: OHCI Host Controller
ohci_hcd 0000:00:13.2: new USB bus registered, assigned bus number 4
ohci_hcd 0000:00:13.2: irq 18, io mem 0xfe02c000
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:13.3[B] -> GSI 17 (level, low) -> IRQ 17
ohci_hcd 0000:00:13.3: OHCI Host Controller
ohci_hcd 0000:00:13.3: new USB bus registered, assigned bus number 5
ohci_hcd 0000:00:13.3: irq 17, io mem 0xfe02b000
usb 3-2: new low speed USB device using ohci_hcd and address 2
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:13.4[C] -> GSI 18 (level, low) -> IRQ 18
ohci_hcd 0000:00:13.4: OHCI Host Controller
ohci_hcd 0000:00:13.4: new USB bus registered, assigned bus number 6
ohci_hcd 0000:00:13.4: irq 18, io mem 0xfe02a000
usb 3-2: configuration #1 chosen from 1 choice
input: Logitech USB Receiver as /class/input/input0
input: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:13.1-2
usb usb6: configuration #1 chosen from 1 choice
hub 6-0:1.0: USB hub found
Fixing up Logitech keyboard report descriptor
hub 6-0:1.0: 2 ports detected
input: Logitech USB Receiver as /class/input/input1
input,hiddev96: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:13.1-2
USB Universal Host Controller Interface driver v3.0
SCSI subsystem initialized
ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level, low) -> IRQ 22
ahci 0000:00:12.0: controller can't do 64bit DMA, forcing 32bit
ahci 0000:00:12.0: controller can't do PMP, turning off CAP_PMP
ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
ahci 0000:00:12.0: flags: ncq sntf ilck pm led clo pio slum part 
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
ata1: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f100 irq 22
ata2: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f180 irq 22
ata3: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f200 irq 22
ata4: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f280 irq 22
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7: ST3250620NS, 3.AEG, max UDMA/133
ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATAPI: ASUS    DRW-1814BLT, 1.04, max UDMA/66, ATAPI AN
ata2.00: configured for UDMA/66
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: ATA-7: ST3250620NS, 3.AEG, max UDMA/133
ata3.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133
ata4: SATA link down (SStatus 0 SControl 300)
scsi 0:0:0:0: Direct-Access     ATA      ST3250620NS      3.AE PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or \
FUA sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or \
FUA  sda: sda1 sda2 sda3 sda4
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 1:0:0:0: CD-ROM            ASUS     DRW-1814BLT      1.04 PQ: 0 ANSI: 5
scsi 2:0:0:0: Direct-Access     ATA      ST3250620NS      3.AE PQ: 0 ANSI: 5
sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \
FUA sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \
FUA  sdb: sdb1 sdb2 sdb3 sdb4
sd 2:0:0:0: [sdb] Attached SCSI disk
ACPI: PCI Interrupt 0000:00:14.1[A] -> GSI 16 (level, low) -> IRQ 16
scsi4 : pata_atiixp
scsi5 : pata_atiixp
ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf900 irq 14
ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf908 irq 15
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
audit(1194951807.473:2): enforcing=1 old_enforcing=0 auid=4294967295
SELinux: policy loaded with handle_unknown=allow
audit(1194951807.722:3): policy loaded auid=4294967295

/proc/mtrr (from 2.6.24-rc3):
reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc0000000 (3072MB), size= 512MB: write-back, count=1
reg03: base=0xdff00000 (3583MB), size=   1MB: uncachable, count=1
reg04: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg05: base=0xf0000000 (3840MB), size= 128MB: write-combining, count=2
Comment 3 H. Peter Anvin 2007-11-19 16:48:03 UTC
MTRRs are consistent with e820, so it's not a reflection of the MTRR problem.
Had to check...
Comment 4 Ingo Molnar 2007-11-30 07:02:08 UTC
We have reverted that commit:

 commit bc84cf17b50ca5b49bec0a5fef63c58c1526d46b
 Author: Ingo Molnar <mingo@elte.hu>
 Date:   Mon Nov 26 20:42:19 2007 +0100

    x86: turn off iommu merge by default
Comment 5 Srihari Vijayaraghavan 2008-03-12 17:15:18 UTC
Ever since the following commit on 2.6.25-rc5, SB600 has same problems as before (ie, no booting without mem=3500M):

changeset:   87398:88dc09676798
user:        Jeff Garzik <jeff@garzik.org>
date:        Wed Mar 05 07:53:06 2008 -0500
files:       drivers/ata/ahci.c
description:
ahci: work around ATI SB600 h/w quirk

This addresses the recent ATI SB600 errata, where the hardware does
not like 256-length PRD entries during FPDMA (aka NCQ).

It hurts performance on SB600, but it is more important to get a
correct patch eliminating the data corruption/lockups, and then later
on tune for performance.

We simply limit each command to a maximum of 255 sectors, on SB600.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

committer: Jeff Garzik <jeff@garzik.org>


diff -r 6e30d34a2bfe -r 88dc09676798 drivers/ata/ahci.c
--- a/drivers/ata/ahci.c        Wed Mar 05 07:46:34 2008 -0500
+++ b/drivers/ata/ahci.c        Wed Mar 05 07:53:06 2008 -0500
@@ -186,6 +186,7 @@ enum {
        AHCI_HFLAG_NO_MSI               = (1 << 5), /* no PCI MSI */
        AHCI_HFLAG_NO_PMP               = (1 << 6), /* no PMP */
        AHCI_HFLAG_NO_HOTPLUG           = (1 << 7), /* ignore PxSERR.DIAG.N */
+       AHCI_HFLAG_SECT255              = (1 << 8), /* max 255 sectors */

        /* ap->flags bits */

@@ -255,6 +256,7 @@ static void ahci_p5wdh_error_handler(str
 static void ahci_p5wdh_error_handler(struct ata_port *ap);
 static void ahci_post_internal_cmd(struct ata_queued_cmd *qc);
 static int ahci_port_resume(struct ata_port *ap);
+static void ahci_dev_config(struct ata_device *dev);
 static unsigned int ahci_fill_sg(struct ata_queued_cmd *qc, void *cmd_tbl);
 static void ahci_fill_cmd_slot(struct ahci_port_priv *pp, unsigned int tag,
                               u32 opts);
@@ -293,6 +295,8 @@ static const struct ata_port_operations
        .check_status           = ahci_check_status,
        .check_altstatus        = ahci_check_status,
        .dev_select             = ata_noop_dev_select,
+
+       .dev_config             = ahci_dev_config,

        .tf_read                = ahci_tf_read,

@@ -425,7 +429,7 @@ static const struct ata_port_info ahci_p
        /* board_ahci_sb600 */
        {
                AHCI_HFLAGS     (AHCI_HFLAG_IGN_SERR_INTERNAL |
-                                AHCI_HFLAG_32BIT_ONLY | AHCI_HFLAG_NO_PMP),
+                                AHCI_HFLAG_SECT255 | AHCI_HFLAG_NO_PMP),
                .flags          = AHCI_FLAG_COMMON,
                .link_flags     = AHCI_LFLAG_COMMON,
                .pio_mask       = 0x1f, /* pio0-4 */
@@ -1174,6 +1178,14 @@ static void ahci_init_controller(struct
        writel(tmp | HOST_IRQ_EN, mmio + HOST_CTL);
        tmp = readl(mmio + HOST_CTL);
        VPRINTK("HOST_CTL 0x%x\n", tmp);
+}
+
+static void ahci_dev_config(struct ata_device *dev)
+{
+       struct ahci_host_priv *hpriv = dev->link->ap->host->private_data;
+
+       if (hpriv->flags & AHCI_HFLAG_SECT255)
+               dev->max_sectors = 255;
 }

 static unsigned int ahci_dev_classify(struct ata_port *ap)


Upon reverting the change, the system boots & works just fine, with no mem= parameter.

Thanks
Comment 6 Jeff Garzik 2008-03-21 19:45:59 UTC
I'm a bit confused...

Please confirm:   you have to revert commit

==============================================
a878539ef994787c447a98c2e3ba0fe3dad984ec
Author: Jeff Garzik <jeff@garzik.org>
Date:   Thu Feb 28 15:43:48 2008 -0500

    ahci: work around ATI SB600 h/w quirk
==============================================

in order to get things working?
Comment 7 Srihari Vijayaraghavan 2008-03-23 10:18:38 UTC
Correct. Despite AMD/ATI claiming it's a >32 bits capable hardware, either it's incapable (contrary to H/W designer's opinion) of or there's some other underlying problem in Linux that prevents it from operating fully.

Yes, by reverting the above patch, Linux indeed boots & works normally.

(the patch in itself is curious, as it not only adds AHCI_HFLAG_SECT255, which seems to be intention of the patch (from the description) but it quietly removes AHCI_HFLAG_32BIT_ONLY. strange. it's not clear whether ACHI_HFLAG_SECT255 & AHCI_HFLAG_32BIT_ONLY are incompatible with each other, if so I apologise for this anology).

Thanks.
Comment 8 Richard Zhao 2008-03-23 20:14:26 UTC
It seems ACHI_HFLAG_SECT255 doesn't work. It still uses 256 sectors(131072).
Comment 9 Richard Zhao 2008-03-23 23:37:35 UTC
Hi, Srihari Vijayaraghavan
If you have the issue after the ACHI_HFLAG_SECT255 commitment, could you let know your log message and hardware configuration?
Thanks
Comment 10 Richard Zhao 2008-03-24 01:23:09 UTC
Hi, Srihari Vijayaraghavan
If your error message is 
"ata3.00: failed to IDENTIFY (I/O error," ,
could you please attach the full output of lspci -xxx ?
Comment 11 Srihari Vijayaraghavan 2008-03-24 14:02:27 UTC
Correct, indeed it goes in a loop throwing the above message:
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1: failed to recover some devices, retrying in 5 secs

Same messages appear for ata2 as well (second identical hard drive in the system).

Here's lspci -xxx is attached as a text file.

Thanks

PS: the same Kernel boots & works if I use mem=3500m.
Comment 12 Srihari Vijayaraghavan 2008-03-24 14:03:59 UTC
Created attachment 15418 [details]
lspci -xxx info of the hardware

lspci -xxx info of the hardware
Comment 13 Jeff Garzik 2008-03-24 19:50:17 UTC
Created attachment 15420 [details]
ahci: SB600 workaround is suspect... play it safe for now


To be on the safe side, I committed this patch as a /temporary/ fix, while we work to find the root cause of this problem.  This does two things:

1) restores the 32-bit limit (temporary, while we investigate this further)

2) causes ahci_dev_config() to emit a message when the SB600 workaround is enabled, permitting us to see clearly from 'dmesg' whether or not the workaround was activated for a particular user.
Comment 14 Richard Zhao 2008-03-25 01:52:34 UTC
Thanks Jeff!
It seems Srihari's SB600 is a earlier one. I will check the difference with hw engineer.
Comment 15 Rafael J. Wysocki 2008-03-26 15:17:24 UTC
Regressions list annotation:

Submitter  : Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au>
Date       : 2008-03-12 17:15
Handled-By : Jeff Garzik <jgarzik@pobox.com>
Handled-By : Richard Zhao <richard.zhao@amd.com>

This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.
Comment 16 Jeff Garzik 2008-03-26 15:27:32 UTC
Should be fixed by 4cde32fc4b32e96a99063af3183acdfd54c563f0
Comment 17 Shane Huang 2009-05-19 02:45:09 UTC
Jeff and Tejun,

I'm NOT able to reproduce this issue with my SB600 boards, neither can I
find the SB600 SATA errata on 64 bit DMA capability, so it seems that the
commit 4cde32fc4b32e96a99063af3183acdfd54c563f0 is NOT the root cause.

Can you REOPEN this BZ first for further debug and root cause? Thanks.
Comment 18 Tejun Heo 2009-05-19 06:31:38 UTC
Okay, reopened and reassigned to you.  :-)

Please note that "lack of evidence of the problem" isn't enough to back out the commit.  We need to actually track down the problem and find out what happened and fix the problem to back out the original commit which either fixed or somehow worked around the problem.

Srihari, do you still own the system?
Comment 19 Shane Huang 2009-05-19 07:36:24 UTC
Created attachment 21424 [details]
Only disable 64 bit DMA for old revisions as a temp workaround
Comment 20 Tejun Heo 2009-05-19 07:42:52 UTC
We really need to either reproduce the problem with the same board or get an actual report to confirm or get a valid explanation from hardware engineering for the behavior change between revisions.

Thanks.
Comment 21 Shane Huang 2009-05-19 08:07:26 UTC
Tejun & Jeff,

I get error messages when I send mails to Srihari, it seems that his mail
address is no longer valid...

As I mentioned before, I'm NOT able to reproduce this issue with my
SB600 boards, whose revisions are newer than Srihari's(0x14 vs. 0x13)

I also have checked with our HW design engineer again with such feedback:
"I didn't see any SB600 64bit DMA design issue"
So, there is no errata on SATA 64bit DMA flaw or revision difference.

Quoting Jeff,
> To be on the safe side, I committed this patch as a /temporary/ fix,
> while we work to find the root cause of this problem.
> This does two things:

Even if community needs a temp fix, we should NOT disable the 64 bit DMA
for all SB600 revisions, right? because Srihari was just using one of them,
the temp fix is affecting the performance of of all SB600 chips.

My suggestion is, we can take two steps to resolve this issue:
1. Enable 64bit DMA first for new revisions with my above patch
   or ALL revisions except for 0x13;
2. Continue debug it when we find one board which can reproduce.

Step1 might make the similar issues exposed if there are, actually, we
should withdraw commit 4cde32fc4b32e96a99063af3183acdfd54c563f0 directly
with the information we have until someone complain again.


Thanks
Shane
Comment 22 Tejun Heo 2009-05-19 09:06:04 UTC
I don't really want to risk breaking unknown number of machines.  Please note that this bug might lead to actual data corruption if the user is unlucky enough.  It does suck to have IOs bouncing unnecessarily but for most users the slow down wouldn't be too bad whereas boot failure and possible data corruption is far more grave, so I really think we should take the safe path here.  Identify the actual problem, reproduce or get an actual reporter to verify the issue and then proceed with proper fix.

Thanks.
Comment 23 Shane Huang 2009-05-19 09:53:29 UTC
Tejun,

But the situation is we are not able to reproduce it yet, and Srihari's
mail address is invalid to me(can you ping it too?).
Without withdrawing the commit, no other guy with continue report the
issue if there is, so as to be debugged...

Thanks.
Comment 24 Tejun Heo 2009-05-19 09:58:57 UTC
Any chance you can acquire the same board (ASUS M2A-VM), flash it to the same BIOS version and try to reproduce the problem?  The board is still on sale here in Korea and I can also find it easily on ebay, so it shouldn't be too difficult to get hold of one.  I'm feeling quite unsure about going forward without knowing what happened w/ the original reporter and we don't have anything to back up the only-old-revisions-are-broken theory.  :-(

Thanks.
Comment 25 Shane Huang 2009-05-20 02:45:20 UTC
Tejun,

As I mentioned before, I borrowed one board ASUS M2A-VM HDMI from another site,
which can NOT reproduce the bug either.
To purchase a new board need approval and boring process, which is not my
first choice yet. Anyway, I will check again whether I can find one M2A-VM
in this site.

BTW, which BIOS version was being used by Srihari? I do not find it.

Thanks.
Comment 26 Tejun Heo 2009-05-20 02:56:05 UTC
Aiee.. right, I should have asked for dmidecode output when the bug reporter was around.  :-(  The following page carries all the BIOS versions for the board.

  http://support.asus.com/download/download.aspx?SLanguage=en-us&model=M2A-VM

Given the dates, it gotta be 1404 or anything before that.  I suppose trying 1404 should be good enough.  Thanks.
Comment 27 Shane Huang 2009-05-21 02:38:30 UTC
Tejun,

I borrowed one M2A-VM board from another building luckily, which the default
BIOS revision 2001, I'm still NOT be able to reproduce this boot failure
issue after 64 bit DMA enablement(4G system memory).
The overnight SATA stress test with bonnie++ also works well, please check
the dmidecode, lspci and dmesg files first.
Next, I will try the BIOS revision 1404...
Comment 28 Shane Huang 2009-05-21 02:40:19 UTC
Created attachment 21459 [details]
lspci dump for ASUS M2A-VM
Comment 29 Shane Huang 2009-05-21 02:42:12 UTC
Created attachment 21460 [details]
dmidecode for M2A-VM with BIOS rev.2001
Comment 30 Shane Huang 2009-05-21 02:43:37 UTC
Created attachment 21461 [details]
dmesg for M2A-VM with BIOS rev.2001
Comment 31 Tejun Heo 2009-05-21 02:59:32 UTC
Great. :-)

Just to be sure, can you please repeatedly run "find / -type f -exec cat \{\} \; > /dev/null" along with fsck on an unmounted filesystem?  That will fill the page cache so that >4G pages are used and fsck will complain if the data it's seeing is inconsistent.

Thanks.
Comment 32 Shane Huang 2009-05-21 07:18:26 UTC
Tejun,

I failed to downgrade the BIOS rev. to an older one(like 1404 or 0901)
after having tried three different methods provided by ASUS M2A-VM Manual:
AWDFlash, EZ Flash 2, ASUS UPDATE of Windows(for M2A-VM-1404.exe).

Do you have any suggestion?

BTW, I will verify date integrity later.
Comment 33 Tejun Heo 2009-05-21 07:26:04 UTC
I'd really like some confirmation of the original issue but it looks like you've tried pretty much everything you can to reproduce the problem.  I'm still a bit worried.  Also, it looks like the SMBus controller revision is different - yours is newer.  Argh...

If the data integrity test works out fine then well I guess we can lift the restriction on newer revisions (btw, how do we tell?) and see how it goes.

Thanks.
Comment 34 Shane Huang 2009-05-21 07:37:48 UTC
Tejun,

Yes, the revision on my board is newer than Srihari's(0x14 vs. 0x13),
I also noticed that, but this is the only M2A-VM I can get.

BTW, the BIOS rev. has been downgraded to 0901 with another AWDFLASH
which was not from ASUS website.

Rev.1404 need Windows env to be upgraded to, which will NOT be tried
at this time.
Comment 35 Shane Huang 2009-05-21 08:33:10 UTC
aha, I know the root cause, it is ASUS BIOS bug in old revisions like 0901.
With rev. 0901, this issue can be reproduced easily, and forcing into 32bit
DMA do fixed it as workaround.
Then I will submit a patch to withdraw AHCI_HFLAG_32BIT_ONLY for SB600.
Comment 36 Shane Huang 2009-05-21 08:34:57 UTC
Created attachment 21463 [details]
dmidecode for M2A-VM with BIOS rev.0901
Comment 37 Shane Huang 2009-05-21 08:36:13 UTC
Created attachment 21464 [details]
dmesg for M2A-VM with BIOS rev.0901
Comment 38 Tejun Heo 2009-05-21 09:49:18 UTC
Great, thanks a lot for tracking it down.  Is there anything other than dmi version string which can match these older BIOSen?  The version string is a bit unwieldy and might be formatted differently for other versions.  Well, at any rate, we can replace general blacklisting of SB600 with more specific one.  :-)
Comment 39 Shane Huang 2009-05-21 09:54:29 UTC
> Is there anything other than dmi version string which can match
> these older BIOSen?

I do not know...

I will submit a patch for this issue later, after some small double check
on SB600 old revision board(like 0x13), if I can find one.
Comment 40 Shane Huang 2009-05-22 03:57:48 UTC
Created attachment 21480 [details]
Restore SB600 SATA controller 64 bit DMA
Comment 41 Shane Huang 2009-05-22 04:02:35 UTC
Tejun,

I have checked that our SB600 reference board(SMBus rev. 0x13) can NOT
reproduce Srihari's issue with 64 bit DMA enabled plus 4G system memory,
so it is NOT related to SB600 revisions.

Please review my above patch, which will be submitted to mailing list soon.

Thanks.
Comment 42 Tejun Heo 2009-05-22 05:18:45 UTC
The patch looks fine in itself but we can't leave the current users who would be benefiting from the wide blacklisting out in the cold.  Can you please implement asus board and bios revision match?  Maybe we can match using release date?

Thanks.
Comment 43 Shane Huang 2009-05-22 06:05:06 UTC
Tejun,

> Can you please implement asus board and bios revision match?

I'm sorry that I'm not able to implement it, because I have to switch to
other tasks, the borrowed ASUS M2A-VM will also have to be returned.

I still believe end users should upgrade the system BIOS, the rev. 2001
and 2302(latest one) have been verified to work well.
Comment 44 Tejun Heo 2009-05-22 06:07:34 UTC
Aiee... Can you then please keep it for a few more hours?  I'll try to brew something.
Comment 45 Shane Huang 2009-05-22 06:17:52 UTC
OK. Can I submit my above patch first, and you add another patch for
ASUS old BIOS revisions?
Comment 46 Tejun Heo 2009-05-22 06:22:46 UTC
I hope it can be done the other way around so that M2A isn't broken inbetween.  :-)
Comment 47 Tejun Heo 2009-05-22 07:52:50 UTC
Created attachment 21483 [details]
asus_m2a_vm_quirk.patch

Can you please verify the attached workaround triggers for older BIOS while 64bit is enabled for newer ones?  Thanks.
Comment 48 Shane Huang 2009-05-25 05:42:03 UTC
Created attachment 21525 [details]
asus_m2a_vm_quirk.patch (updated)
Comment 49 Shane Huang 2009-05-25 05:49:57 UTC
Tejun,

Some commments to your patch:
1. AHCI_HFLAG_32BIT_ONLY should be removed for board_ahci_sb600 as default;
2. I tried more BIOS revisions, finding that the first BIOS rev. which can
   work is 1501 instead of 2001(the last bad one is 1404);
3. M2A-VM HDMI contains the same bug, but the quirk was also covered by our
   patch.

Please check the above updated one which already passed my test,
and mark yours into obsolete, Thanks.
Comment 50 Shane Huang 2009-05-25 05:52:02 UTC
Created attachment 21526 [details]
dmidecode for M2A-VM with BIOS rev.1501
Comment 51 Tejun Heo 2009-05-26 06:07:49 UTC
Yeah, the patch was supposed to come before your patch to lift the wide 64bit restriction, so doesn't contain the change itself.  If you've tested the combined patch, can you please take the patch and send it upstream with me cc'd?  Please feel free to take the credit as you did most of heavy lifting here.

Thanks.
Comment 52 Shane Huang 2009-05-26 06:57:54 UTC
Tejun, please submit the patch, since you wrote the correct quick for
ASUS M2A-VM and we still think it's not necessary... :-)
You can add my name into the CC'd.

Thanks
Comment 53 Tejun Heo 2009-05-26 07:00:24 UTC
Ah... that was my not-so-sneaky attempt to push the work your way.  Pretty please. :-)
Comment 54 Shane Huang 2009-05-27 07:09:36 UTC
Tejun, Patch was sent just now, please check it.
Thanks
Comment 55 Tejun Heo 2009-05-27 07:26:55 UTC
Thanks.  Much appreciated.  :-)