Most recent kernel where this bug did not occur: 2.6.23 Distribution: Fedora 8 Hardware Environment: Athlon AM2, AMD690 chipset, 4 GB RAM, SB600 AHCI controller, Segate SATA hard drives Software Environment: Linux 2.6.24-rc1+ Problem Description: Ever since that commit, Linux fails to boot - constantly reseting the drives during bootup such that power reset is the only option. With mem=3500M, the same kernel image works fine. Steps to reproduce: 1. Boot Linux-2.6.24-rc1+ 2. Observe it constantly resets the hard drives & doesn't successfully boot For more info, please refer to this thread on Linux-IDE (incl. comments from the contributors of this changeset): http://marc.info/?l=linux-ide&m=119456211111119&w=2 Thanks Srihari
Please post your complete kernel bootup messages, as well as the content of /proc/mtrr.
(copied from the archives) Of the failure: Linux version 2.6.24-rc2 (hari@desktop.localdomain) (gcc version 4.1.2 20070925 (Red \ Hat 4.1.2-33)) #3 SMP Tue Nov 13 21:35:09 EST 2007 Command line: ro root=LABEL=/ 1 \ console=ttyS0,115200 console=tty0 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable) BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS) BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data) BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000120000000 (usable) end_pfn_map = 1179648 DMI 2.4 present. ACPI: RSDP 000F8130, 0024 (r2 ATI ) ACPI: XSDT DFEE3100, 004C (r1 ATI ASUSACPI 42302E31 AWRD 0) ACPI: FACP DFEE8400, 00F4 (r3 ATI ASUSACPI 42302E31 AWRD 0) ACPI: DSDT DFEE3280, 5136 (r1 ATI ASUSACPI 1000 MSFT 3000000) ACPI: FACS DFEE0000, 0040 ACPI: SSDT DFEE8600, 02CC (r1 PTLTD POWERNOW 1 LTP 1) ACPI: HPET DFEE8940, 0038 (r1 ATI ASUSACPI 42302E31 AWRD 98) ACPI: MCFG DFEE89C0, 003C (r1 ATI ASUSACPI 42302E31 AWRD 0) ACPI: APIC DFEE8540, 0068 (r1 ATI ASUSACPI 42302E31 AWRD 0) Scanning NUMA topology in Northbridge 24 CPU has 2 num_cores No NUMA configuration found Faking a node at 0000000000000000-0000000120000000 Bootmem setup node 0 0000000000000000-0000000120000000 Zone PFN ranges: DMA 0 -> 4096 DMA32 4096 -> 1048576 Normal 1048576 -> 1179648 Movable zone start PFN for each node early_node_map[3] active PFN ranges 0: 0 -> 159 0: 256 -> 917216 0: 1048576 -> 1179648 ATI board detected. Disabling timer routing over 8254. ACPI: PM-Timer IO Port: 0x4008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) Setting APIC routing to flat ACPI: HPET id: 0x8200 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at f1000000 (gap: f0000000:ec00000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 32848 bytes of per cpu data Built 1 zonelists in Node order, mobility grouping on. Total pages: 1030058 Policy zone: Normal Kernel command line: ro root=LABEL=/ 1 console=ttyS0,115200 console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) TSC calibrated against PM_TIMER Marking TSC unstable due to TSCs unsynchronized time.c: Detected 2799.888 MHz processor. Console: colour VGA+ 80x25 console [tty0] enabled console [ttyS0] enabled Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar ... MAX_LOCKDEP_SUBCLASSES: 8 ... MAX_LOCK_DEPTH: 30 ... MAX_LOCKDEP_KEYS: 2048 ... CLASSHASH_SIZE: 1024 ... MAX_LOCKDEP_ENTRIES: 8192 ... MAX_LOCKDEP_CHAINS: 16384 ... CHAINHASH_SIZE: 8192 memory used by lock dependency info: 1648 kB per task-struct memory footprint: 1680 bytes Checking aperture... CPU 0: aperture @ c000000 size 32 MB Aperture too small (32 MB) No AGP bridge found Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ c000000 Memory: 4055984k/4718592k available (2146k kernel code, 136780k reserved, 1273k data, \ 296k init) SLUB: Genslabs=12, HWalign=64, Order=0-1, MinObjects=4, CPUs=2, Nodes=1 Calibrating delay using timer specific routine.. 5604.88 BogoMIPS (lpj=11209778) Security Framework initialized SELinux: Initializing. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Mount-cache hash table entries: 256 Initializing cgroup subsys cpuacct Initializing cgroup subsys ns CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 0/0 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 lockdep: not fixing up alternatives. ACPI: Core revision 20070126 Using local APIC timer interrupts. Detected 12.499 MHz APIC timer. lockdep: not fixing up alternatives. Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 5599.85 BogoMIPS (lpj=11199710) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1/1 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ stepping 03 Brought up 2 CPUs net_namespace: 144 bytes NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using MMCONFIG at e0000000 - efffffff PCI: No mmconfig possible on device 00:18 ACPI: Interpreter enabled ACPI: (supports S0 S5) ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Transparent bridge - 0000:00:14.4 ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 10 *11) ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11) *0, disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init ACPI: bus type pnp registered pnp: PnP ACPI: found 12 devices ACPI: ACPI bus type pnp unregistered usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default DMAR:No DMAR devices found PCI-DMA: Disabling AGP. PCI-DMA: aperture base @ c000000 size 65536 KB PCI-DMA: using GART IOMMU. PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture ACPI: RTC can wake from S4 system 00:01: ioport range 0x4100-0x411f has been reserved system 00:01: ioport range 0x228-0x22f has been reserved system 00:01: ioport range 0x40b-0x40b has been reserved system 00:01: ioport range 0x4d6-0x4d6 has been reserved system 00:01: ioport range 0xc00-0xc01 has been reserved system 00:01: ioport range 0xc14-0xc14 has been reserved system 00:01: ioport range 0xc50-0xc52 has been reserved system 00:01: ioport range 0xc6c-0xc6d has been reserved system 00:0a: iomem range 0xe0000000-0xefffffff could not be reserved system 00:0b: iomem range 0xf0000-0xf3fff could not be reserved system 00:0b: iomem range 0xf4000-0xf7fff could not be reserved system 00:0b: iomem range 0xf8000-0xfbfff could not be reserved system 00:0b: iomem range 0xfc000-0xfffff could not be reserved PCI: Bridge: 0000:00:02.0 IO window: d000-dfff Time: acpi_pm clocksource has been installed. MEM window: fdc00000-fdcfffff PREFETCH window: f0000000-f7ffffff PCI: Bridge: 0000:00:07.0 IO window: e000-efff MEM window: fdb00000-fdbfffff PREFETCH window: fdf00000-fdffffff PCI: Bridge: 0000:00:14.4 IO window: c000-cfff MEM window: fde00000-fdefffff PREFETCH window: fdd00000-fddfffff NET: Registered protocol family 2 IP route cache hash table entries: 131072 (order: 8, 1048576 bytes) TCP established hash table entries: 524288 (order: 11, 8388608 bytes) TCP bind hash table entries: 65536 (order: 9, 3670016 bytes) TCP: Hash tables configured (established 524288 bind 65536) TCP reno registered checking if image is initramfs... it is Freeing initrd memory: 3213k freed Initializing RT-Tester: OK audit: initializing netlink socket (disabled) audit(1194950790.015:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) io scheduler noop registered io scheduler cfq registered (default) assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Real Time Clock Driver v1.12ac hpet_acpi_add: no address or irqs in _CRS Non-volatile memory driver v1.2 Linux agpgart interface v0.102 Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing enabled ÿserial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize usbcore: registered new interface driver libusual PNP: No PS/2 controller found. Probing ports directly. serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver TCP cubic registered Initializing XFRM netlink socket NET: Registered protocol family 1 Freeing unused kernel memory: 296k freed ACPI: PCI Interrupt 0000:00:13.5[D] -> GSI 19 (level, low) -> IRQ 19 ehci_hcd 0000:00:13.5: EHCI Host Controller ehci_hcd 0000:00:13.5: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:13.5: debug port 1 ehci_hcd 0000:00:13.5: irq 19, io mem 0xfe029000 ehci_hcd 0000:00:13.5: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 10 ports detected ACPI: PCI Interrupt 0000:00:13.0[A] -> GSI 16 (level, low) -> IRQ 16 ohci_hcd 0000:00:13.0: OHCI Host Controller ohci_hcd 0000:00:13.0: new USB bus registered, assigned bus number 2 ohci_hcd 0000:00:13.0: irq 16, io mem 0xfe02e000 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:13.1[B] -> GSI 17 (level, low) -> IRQ 17 ohci_hcd 0000:00:13.1: OHCI Host Controller ohci_hcd 0000:00:13.1: new USB bus registered, assigned bus number 3 ohci_hcd 0000:00:13.1: irq 17, io mem 0xfe02d000 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:13.2[C] -> GSI 18 (level, low) -> IRQ 18 ohci_hcd 0000:00:13.2: OHCI Host Controller ohci_hcd 0000:00:13.2: new USB bus registered, assigned bus number 4 ohci_hcd 0000:00:13.2: irq 18, io mem 0xfe02c000 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:13.3[B] -> GSI 17 (level, low) -> IRQ 17 ohci_hcd 0000:00:13.3: OHCI Host Controller ohci_hcd 0000:00:13.3: new USB bus registered, assigned bus number 5 ohci_hcd 0000:00:13.3: irq 17, io mem 0xfe02b000 usb 3-2: new low speed USB device using ohci_hcd and address 2 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:13.4[C] -> GSI 18 (level, low) -> IRQ 18 ohci_hcd 0000:00:13.4: OHCI Host Controller ohci_hcd 0000:00:13.4: new USB bus registered, assigned bus number 6 ohci_hcd 0000:00:13.4: irq 18, io mem 0xfe02a000 usb 3-2: configuration #1 chosen from 1 choice input: Logitech USB Receiver as /class/input/input0 input: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:13.1-2 usb usb6: configuration #1 chosen from 1 choice hub 6-0:1.0: USB hub found Fixing up Logitech keyboard report descriptor hub 6-0:1.0: 2 ports detected input: Logitech USB Receiver as /class/input/input1 input,hiddev96: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:13.1-2 USB Universal Host Controller Interface driver v3.0 SCSI subsystem initialized ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level, low) -> IRQ 22 ahci 0000:00:12.0: controller can't do 64bit DMA, forcing 32bit ahci 0000:00:12.0: controller can't do PMP, turning off CAP_PMP ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode ahci 0000:00:12.0: flags: ncq sntf ilck pm led clo pio slum part scsi0 : ahci scsi1 : ahci scsi2 : ahci scsi3 : ahci ata1: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f100 irq 22 ata2: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f180 irq 22 ata3: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f200 irq 22 ata4: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f280 irq 22 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: ST3250620NS, 3.AEG, max UDMA/133 ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATAPI: ASUS DRW-1814BLT, 1.04, max UDMA/66, ATAPI AN ata2.00: configured for UDMA/66 ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: ATA-7: ST3250620NS, 3.AEG, max UDMA/133 ata3.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 ata4: SATA link down (SStatus 0 SControl 300) scsi 0:0:0:0: Direct-Access ATA ST3250620NS 3.AE PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA sda: sda1 sda2 sda3 sda4 sd 0:0:0:0: [sda] Attached SCSI disk scsi 1:0:0:0: CD-ROM ASUS DRW-1814BLT 1.04 PQ: 0 ANSI: 5 scsi 2:0:0:0: Direct-Access ATA ST3250620NS 3.AE PQ: 0 ANSI: 5 sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 2:0:0:0: [sdb] Write Protect is off sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 2:0:0:0: [sdb] Write Protect is off sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA sdb: sdb1 sdb2 sdb3 sdb4 sd 2:0:0:0: [sdb] Attached SCSI disk ACPI: PCI Interrupt 0000:00:14.1[A] -> GSI 16 (level, low) -> IRQ 16 scsi4 : pata_atiixp scsi5 : pata_atiixp ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf900 irq 14 ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf908 irq 15 kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata3.00: cmd 60/00:00:25:71:d2/01:00:0e:00:00/40 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata3.00: status: { DRDY } ata3: soft resetting link ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: qc timeout (cmd 0xec) ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata3.00: revalidation failed (errno=-5) ata3: failed to recover some devices, retrying in 5 secs ata3: hard resetting link ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: configured for UDMA/133 ata3: EH complete sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 2:0:0:0: [sdb] Write Protect is off sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata3.00: cmd 60/00:00:25:71:d2/01:00:0e:00:00/40 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata3.00: status: { DRDY } ata3: soft resetting link ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: qc timeout (cmd 0xec) ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata3.00: revalidation failed (errno=-5) ata3: failed to recover some devices, retrying in 5 secs ata3: hard resetting link ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: configured for UDMA/133 ata3: EH complete sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 2:0:0:0: [sdb] Write Protect is off sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata3.00: cmd 60/00:00:25:71:d2/01:00:0e:00:00/40 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata3.00: status: { DRDY } ata3: soft resetting link ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: qc timeout (cmd 0xec) ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata3.00: revalidation failed (errno=-5) ata3: failed to recover some devices, retrying in 5 secs ata3: hard resetting link ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: configured for UDMA/133 ata3: EH complete sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 2:0:0:0: [sdb] Write Protect is off sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA ata3.00: NCQ disabled due to excessive errors ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata3.00: cmd 60/00:00:25:71:d2/01:00:0e:00:00/40 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata3.00: status: { DRDY } ata3: soft resetting link ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: qc timeout (cmd 0xec) ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata3.00: revalidation failed (errno=-5) ata3: failed to recover some devices, retrying in 5 secs ata3: hard resetting link ... and errors continue for many minutes at which point did a power reset ... Of the success: Linux version 2.6.24-rc2 (hari@desktop.localdomain) (gcc version 4.1.2 20070925 (Red \ Hat 4.1.2-33)) #3 SMP Tue Nov 13 21:35:09 EST 2007 Command line: ro root=LABEL=/ 1 \ mem=3500M console=ttyS0,115200 console=tty0 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable) BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS) BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data) BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000120000000 (usable) end_pfn_map = 1048576 DMI 2.4 present. ACPI: RSDP 000F8130, 0024 (r2 ATI ) ACPI: XSDT DFEE3100, 004C (r1 ATI ASUSACPI 42302E31 AWRD 0) ACPI: FACP DFEE8400, 00F4 (r3 ATI ASUSACPI 42302E31 AWRD 0) ACPI: DSDT DFEE3280, 5136 (r1 ATI ASUSACPI 1000 MSFT 3000000) ACPI: FACS DFEE0000, 0040 ACPI: SSDT DFEE8600, 02CC (r1 PTLTD POWERNOW 1 LTP 1) ACPI: HPET DFEE8940, 0038 (r1 ATI ASUSACPI 42302E31 AWRD 98) ACPI: MCFG DFEE89C0, 003C (r1 ATI ASUSACPI 42302E31 AWRD 0) ACPI: APIC DFEE8540, 0068 (r1 ATI ASUSACPI 42302E31 AWRD 0) Scanning NUMA topology in Northbridge 24 CPU has 2 num_cores No NUMA configuration found Faking a node at 0000000000000000-00000000dac00000 Bootmem setup node 0 0000000000000000-00000000dac00000 Zone PFN ranges: DMA 0 -> 4096 DMA32 4096 -> 1048576 Normal 1048576 -> 1048576 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0: 0 -> 159 0: 256 -> 896000 ATI board detected. Disabling timer routing over 8254. ACPI: PM-Timer IO Port: 0x4008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) Setting APIC routing to flat ACPI: HPET id: 0x8200 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at f1000000 (gap: f0000000:ec00000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 32848 bytes of per cpu data Built 1 zonelists in Node order, mobility grouping on. Total pages: 881649 Policy zone: DMA32 Kernel command line: ro root=LABEL=/ 1 mem=3500M console=ttyS0,115200 console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) TSC calibrated against PM_TIMER Marking TSC unstable due to TSCs unsynchronized time.c: Detected 2799.897 MHz processor. Console: colour VGA+ 80x25 console [tty0] enabled console [ttyS0] enabled Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar ... MAX_LOCKDEP_SUBCLASSES: 8 ... MAX_LOCK_DEPTH: 30 ... MAX_LOCKDEP_KEYS: 2048 ... CLASSHASH_SIZE: 1024 ... MAX_LOCKDEP_ENTRIES: 8192 ... MAX_LOCKDEP_CHAINS: 16384 ... CHAINHASH_SIZE: 8192 memory used by lock dependency info: 1648 kB per task-struct memory footprint: 1680 bytes Checking aperture... CPU 0: aperture @ c000000 size 32 MB Aperture too small (32 MB) No AGP bridge found Memory: 3520612k/3584000k available (2146k kernel code, 63000k reserved, 1273k data, \ 296k init) SLUB: Genslabs=12, HWalign=64, Order=0-1, MinObjects=4, CPUs=2, Nodes=1 Calibrating delay using timer specific routine.. 5604.91 BogoMIPS (lpj=11209836) Security Framework initialized SELinux: Initializing. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Mount-cache hash table entries: 256 Initializing cgroup subsys cpuacct Initializing cgroup subsys ns CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 0/0 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 lockdep: not fixing up alternatives. ACPI: Core revision 20070126 Using local APIC timer interrupts. Detected 12.499 MHz APIC timer. lockdep: not fixing up alternatives. Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 5599.85 BogoMIPS (lpj=11199710) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1/1 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ stepping 03 Brought up 2 CPUs net_namespace: 144 bytes NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using MMCONFIG at e0000000 - efffffff PCI: No mmconfig possible on device 00:18 ACPI: Interpreter enabled ACPI: (supports S0 S5) ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Transparent bridge - 0000:00:14.4 ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 10 *11) ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11) *0, disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init ACPI: bus type pnp registered pnp: PnP ACPI: found 12 devices ACPI: ACPI bus type pnp unregistered usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default DMAR:No DMAR devices found ACPI: RTC can wake from S4 system 00:01: ioport range 0x4100-0x411f has been reserved system 00:01: ioport range 0x228-0x22f has been reserved system 00:01: ioport range 0x40b-0x40b has been reserved system 00:01: ioport range 0x4d6-0x4d6 has been reserved system 00:01: ioport range 0xc00-0xc01 has been reserved system 00:01: ioport range 0xc14-0xc14 has been reserved system 00:01: ioport range 0xc50-0xc52 has been reserved system 00:01: ioport range 0xc6c-0xc6d has been reserved system 00:0a: iomem range 0xe0000000-0xefffffff could not be reserved system 00:0b: iomem range 0xf0000-0xf3fff could not be reserved system 00:0b: iomem range 0xf4000-0xf7fff could not be reserved system 00:0b: iomem range 0xf8000-0xfbfff could not be reserved system 00:0b: iomem range 0xfc000-0xfffff could not be reserved PCI: Bridge: 0000:00:02.0 Time: acpi_pm clocksource has been installed. IO window: d000-dfff MEM window: fdc00000-fdcfffff PREFETCH window: f0000000-f7ffffff PCI: Bridge: 0000:00:07.0 IO window: e000-efff MEM window: fdb00000-fdbfffff PREFETCH window: fdf00000-fdffffff PCI: Bridge: 0000:00:14.4 IO window: c000-cfff MEM window: fde00000-fdefffff PREFETCH window: fdd00000-fddfffff NET: Registered protocol family 2 IP route cache hash table entries: 131072 (order: 8, 1048576 bytes) TCP established hash table entries: 524288 (order: 11, 8388608 bytes) TCP bind hash table entries: 65536 (order: 9, 3670016 bytes) TCP: Hash tables configured (established 524288 bind 65536) TCP reno registered checking if image is initramfs... it is Freeing initrd memory: 3213k freed Initializing RT-Tester: OK audit: initializing netlink socket (disabled) audit(1194951795.951:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) io scheduler noop registered io scheduler cfq registered (default) assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Real Time Clock Driver v1.12ac hpet_acpi_add: no address or irqs in _CRS Non-volatile memory driver v1.2 Linux agpgart interface v0.102 Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing enabled ÿserial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize usbcore: registered new interface driver libusual PNP: No PS/2 controller found. Probing ports directly. serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver TCP cubic registered Initializing XFRM netlink socket NET: Registered protocol family 1 Freeing unused kernel memory: 296k freed ACPI: PCI Interrupt 0000:00:13.5[D] -> GSI 19 (level, low) -> IRQ 19 ehci_hcd 0000:00:13.5: EHCI Host Controller ehci_hcd 0000:00:13.5: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:13.5: debug port 1 ehci_hcd 0000:00:13.5: irq 19, io mem 0xfe029000 ehci_hcd 0000:00:13.5: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 10 ports detected ACPI: PCI Interrupt 0000:00:13.0[A] -> GSI 16 (level, low) -> IRQ 16 ohci_hcd 0000:00:13.0: OHCI Host Controller ohci_hcd 0000:00:13.0: new USB bus registered, assigned bus number 2 ohci_hcd 0000:00:13.0: irq 16, io mem 0xfe02e000 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:13.1[B] -> GSI 17 (level, low) -> IRQ 17 ohci_hcd 0000:00:13.1: OHCI Host Controller ohci_hcd 0000:00:13.1: new USB bus registered, assigned bus number 3 ohci_hcd 0000:00:13.1: irq 17, io mem 0xfe02d000 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:13.2[C] -> GSI 18 (level, low) -> IRQ 18 ohci_hcd 0000:00:13.2: OHCI Host Controller ohci_hcd 0000:00:13.2: new USB bus registered, assigned bus number 4 ohci_hcd 0000:00:13.2: irq 18, io mem 0xfe02c000 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:13.3[B] -> GSI 17 (level, low) -> IRQ 17 ohci_hcd 0000:00:13.3: OHCI Host Controller ohci_hcd 0000:00:13.3: new USB bus registered, assigned bus number 5 ohci_hcd 0000:00:13.3: irq 17, io mem 0xfe02b000 usb 3-2: new low speed USB device using ohci_hcd and address 2 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:13.4[C] -> GSI 18 (level, low) -> IRQ 18 ohci_hcd 0000:00:13.4: OHCI Host Controller ohci_hcd 0000:00:13.4: new USB bus registered, assigned bus number 6 ohci_hcd 0000:00:13.4: irq 18, io mem 0xfe02a000 usb 3-2: configuration #1 chosen from 1 choice input: Logitech USB Receiver as /class/input/input0 input: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:13.1-2 usb usb6: configuration #1 chosen from 1 choice hub 6-0:1.0: USB hub found Fixing up Logitech keyboard report descriptor hub 6-0:1.0: 2 ports detected input: Logitech USB Receiver as /class/input/input1 input,hiddev96: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:13.1-2 USB Universal Host Controller Interface driver v3.0 SCSI subsystem initialized ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level, low) -> IRQ 22 ahci 0000:00:12.0: controller can't do 64bit DMA, forcing 32bit ahci 0000:00:12.0: controller can't do PMP, turning off CAP_PMP ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode ahci 0000:00:12.0: flags: ncq sntf ilck pm led clo pio slum part scsi0 : ahci scsi1 : ahci scsi2 : ahci scsi3 : ahci ata1: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f100 irq 22 ata2: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f180 irq 22 ata3: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f200 irq 22 ata4: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f280 irq 22 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: ST3250620NS, 3.AEG, max UDMA/133 ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATAPI: ASUS DRW-1814BLT, 1.04, max UDMA/66, ATAPI AN ata2.00: configured for UDMA/66 ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: ATA-7: ST3250620NS, 3.AEG, max UDMA/133 ata3.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 ata4: SATA link down (SStatus 0 SControl 300) scsi 0:0:0:0: Direct-Access ATA ST3250620NS 3.AE PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA sda: sda1 sda2 sda3 sda4 sd 0:0:0:0: [sda] Attached SCSI disk scsi 1:0:0:0: CD-ROM ASUS DRW-1814BLT 1.04 PQ: 0 ANSI: 5 scsi 2:0:0:0: Direct-Access ATA ST3250620NS 3.AE PQ: 0 ANSI: 5 sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 2:0:0:0: [sdb] Write Protect is off sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 2:0:0:0: [sdb] Write Protect is off sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or \ FUA sdb: sdb1 sdb2 sdb3 sdb4 sd 2:0:0:0: [sdb] Attached SCSI disk ACPI: PCI Interrupt 0000:00:14.1[A] -> GSI 16 (level, low) -> IRQ 16 scsi4 : pata_atiixp scsi5 : pata_atiixp ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf900 irq 14 ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf908 irq 15 kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. audit(1194951807.473:2): enforcing=1 old_enforcing=0 auid=4294967295 SELinux: policy loaded with handle_unknown=allow audit(1194951807.722:3): policy loaded auid=4294967295 /proc/mtrr (from 2.6.24-rc3): reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1 reg02: base=0xc0000000 (3072MB), size= 512MB: write-back, count=1 reg03: base=0xdff00000 (3583MB), size= 1MB: uncachable, count=1 reg04: base=0x100000000 (4096MB), size= 512MB: write-back, count=1 reg05: base=0xf0000000 (3840MB), size= 128MB: write-combining, count=2
MTRRs are consistent with e820, so it's not a reflection of the MTRR problem. Had to check...
We have reverted that commit: commit bc84cf17b50ca5b49bec0a5fef63c58c1526d46b Author: Ingo Molnar <mingo@elte.hu> Date: Mon Nov 26 20:42:19 2007 +0100 x86: turn off iommu merge by default
Ever since the following commit on 2.6.25-rc5, SB600 has same problems as before (ie, no booting without mem=3500M): changeset: 87398:88dc09676798 user: Jeff Garzik <jeff@garzik.org> date: Wed Mar 05 07:53:06 2008 -0500 files: drivers/ata/ahci.c description: ahci: work around ATI SB600 h/w quirk This addresses the recent ATI SB600 errata, where the hardware does not like 256-length PRD entries during FPDMA (aka NCQ). It hurts performance on SB600, but it is more important to get a correct patch eliminating the data corruption/lockups, and then later on tune for performance. We simply limit each command to a maximum of 255 sectors, on SB600. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> committer: Jeff Garzik <jeff@garzik.org> diff -r 6e30d34a2bfe -r 88dc09676798 drivers/ata/ahci.c --- a/drivers/ata/ahci.c Wed Mar 05 07:46:34 2008 -0500 +++ b/drivers/ata/ahci.c Wed Mar 05 07:53:06 2008 -0500 @@ -186,6 +186,7 @@ enum { AHCI_HFLAG_NO_MSI = (1 << 5), /* no PCI MSI */ AHCI_HFLAG_NO_PMP = (1 << 6), /* no PMP */ AHCI_HFLAG_NO_HOTPLUG = (1 << 7), /* ignore PxSERR.DIAG.N */ + AHCI_HFLAG_SECT255 = (1 << 8), /* max 255 sectors */ /* ap->flags bits */ @@ -255,6 +256,7 @@ static void ahci_p5wdh_error_handler(str static void ahci_p5wdh_error_handler(struct ata_port *ap); static void ahci_post_internal_cmd(struct ata_queued_cmd *qc); static int ahci_port_resume(struct ata_port *ap); +static void ahci_dev_config(struct ata_device *dev); static unsigned int ahci_fill_sg(struct ata_queued_cmd *qc, void *cmd_tbl); static void ahci_fill_cmd_slot(struct ahci_port_priv *pp, unsigned int tag, u32 opts); @@ -293,6 +295,8 @@ static const struct ata_port_operations .check_status = ahci_check_status, .check_altstatus = ahci_check_status, .dev_select = ata_noop_dev_select, + + .dev_config = ahci_dev_config, .tf_read = ahci_tf_read, @@ -425,7 +429,7 @@ static const struct ata_port_info ahci_p /* board_ahci_sb600 */ { AHCI_HFLAGS (AHCI_HFLAG_IGN_SERR_INTERNAL | - AHCI_HFLAG_32BIT_ONLY | AHCI_HFLAG_NO_PMP), + AHCI_HFLAG_SECT255 | AHCI_HFLAG_NO_PMP), .flags = AHCI_FLAG_COMMON, .link_flags = AHCI_LFLAG_COMMON, .pio_mask = 0x1f, /* pio0-4 */ @@ -1174,6 +1178,14 @@ static void ahci_init_controller(struct writel(tmp | HOST_IRQ_EN, mmio + HOST_CTL); tmp = readl(mmio + HOST_CTL); VPRINTK("HOST_CTL 0x%x\n", tmp); +} + +static void ahci_dev_config(struct ata_device *dev) +{ + struct ahci_host_priv *hpriv = dev->link->ap->host->private_data; + + if (hpriv->flags & AHCI_HFLAG_SECT255) + dev->max_sectors = 255; } static unsigned int ahci_dev_classify(struct ata_port *ap) Upon reverting the change, the system boots & works just fine, with no mem= parameter. Thanks
I'm a bit confused... Please confirm: you have to revert commit ============================================== a878539ef994787c447a98c2e3ba0fe3dad984ec Author: Jeff Garzik <jeff@garzik.org> Date: Thu Feb 28 15:43:48 2008 -0500 ahci: work around ATI SB600 h/w quirk ============================================== in order to get things working?
Correct. Despite AMD/ATI claiming it's a >32 bits capable hardware, either it's incapable (contrary to H/W designer's opinion) of or there's some other underlying problem in Linux that prevents it from operating fully. Yes, by reverting the above patch, Linux indeed boots & works normally. (the patch in itself is curious, as it not only adds AHCI_HFLAG_SECT255, which seems to be intention of the patch (from the description) but it quietly removes AHCI_HFLAG_32BIT_ONLY. strange. it's not clear whether ACHI_HFLAG_SECT255 & AHCI_HFLAG_32BIT_ONLY are incompatible with each other, if so I apologise for this anology). Thanks.
It seems ACHI_HFLAG_SECT255 doesn't work. It still uses 256 sectors(131072).
Hi, Srihari Vijayaraghavan If you have the issue after the ACHI_HFLAG_SECT255 commitment, could you let know your log message and hardware configuration? Thanks
Hi, Srihari Vijayaraghavan If your error message is "ata3.00: failed to IDENTIFY (I/O error," , could you please attach the full output of lspci -xxx ?
Correct, indeed it goes in a loop throwing the above message: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1: failed to recover some devices, retrying in 5 secs Same messages appear for ata2 as well (second identical hard drive in the system). Here's lspci -xxx is attached as a text file. Thanks PS: the same Kernel boots & works if I use mem=3500m.
Created attachment 15418 [details] lspci -xxx info of the hardware lspci -xxx info of the hardware
Created attachment 15420 [details] ahci: SB600 workaround is suspect... play it safe for now To be on the safe side, I committed this patch as a /temporary/ fix, while we work to find the root cause of this problem. This does two things: 1) restores the 32-bit limit (temporary, while we investigate this further) 2) causes ahci_dev_config() to emit a message when the SB600 workaround is enabled, permitting us to see clearly from 'dmesg' whether or not the workaround was activated for a particular user.
Thanks Jeff! It seems Srihari's SB600 is a earlier one. I will check the difference with hw engineer.
Regressions list annotation: Submitter : Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au> Date : 2008-03-12 17:15 Handled-By : Jeff Garzik <jgarzik@pobox.com> Handled-By : Richard Zhao <richard.zhao@amd.com> This entry is being used for tracking a regression from 2.6.24. Please don't close it until the problem is fixed in the mainline.
Should be fixed by 4cde32fc4b32e96a99063af3183acdfd54c563f0
Jeff and Tejun, I'm NOT able to reproduce this issue with my SB600 boards, neither can I find the SB600 SATA errata on 64 bit DMA capability, so it seems that the commit 4cde32fc4b32e96a99063af3183acdfd54c563f0 is NOT the root cause. Can you REOPEN this BZ first for further debug and root cause? Thanks.
Okay, reopened and reassigned to you. :-) Please note that "lack of evidence of the problem" isn't enough to back out the commit. We need to actually track down the problem and find out what happened and fix the problem to back out the original commit which either fixed or somehow worked around the problem. Srihari, do you still own the system?
Created attachment 21424 [details] Only disable 64 bit DMA for old revisions as a temp workaround
We really need to either reproduce the problem with the same board or get an actual report to confirm or get a valid explanation from hardware engineering for the behavior change between revisions. Thanks.
Tejun & Jeff, I get error messages when I send mails to Srihari, it seems that his mail address is no longer valid... As I mentioned before, I'm NOT able to reproduce this issue with my SB600 boards, whose revisions are newer than Srihari's(0x14 vs. 0x13) I also have checked with our HW design engineer again with such feedback: "I didn't see any SB600 64bit DMA design issue" So, there is no errata on SATA 64bit DMA flaw or revision difference. Quoting Jeff, > To be on the safe side, I committed this patch as a /temporary/ fix, > while we work to find the root cause of this problem. > This does two things: Even if community needs a temp fix, we should NOT disable the 64 bit DMA for all SB600 revisions, right? because Srihari was just using one of them, the temp fix is affecting the performance of of all SB600 chips. My suggestion is, we can take two steps to resolve this issue: 1. Enable 64bit DMA first for new revisions with my above patch or ALL revisions except for 0x13; 2. Continue debug it when we find one board which can reproduce. Step1 might make the similar issues exposed if there are, actually, we should withdraw commit 4cde32fc4b32e96a99063af3183acdfd54c563f0 directly with the information we have until someone complain again. Thanks Shane
I don't really want to risk breaking unknown number of machines. Please note that this bug might lead to actual data corruption if the user is unlucky enough. It does suck to have IOs bouncing unnecessarily but for most users the slow down wouldn't be too bad whereas boot failure and possible data corruption is far more grave, so I really think we should take the safe path here. Identify the actual problem, reproduce or get an actual reporter to verify the issue and then proceed with proper fix. Thanks.
Tejun, But the situation is we are not able to reproduce it yet, and Srihari's mail address is invalid to me(can you ping it too?). Without withdrawing the commit, no other guy with continue report the issue if there is, so as to be debugged... Thanks.
Any chance you can acquire the same board (ASUS M2A-VM), flash it to the same BIOS version and try to reproduce the problem? The board is still on sale here in Korea and I can also find it easily on ebay, so it shouldn't be too difficult to get hold of one. I'm feeling quite unsure about going forward without knowing what happened w/ the original reporter and we don't have anything to back up the only-old-revisions-are-broken theory. :-( Thanks.
Tejun, As I mentioned before, I borrowed one board ASUS M2A-VM HDMI from another site, which can NOT reproduce the bug either. To purchase a new board need approval and boring process, which is not my first choice yet. Anyway, I will check again whether I can find one M2A-VM in this site. BTW, which BIOS version was being used by Srihari? I do not find it. Thanks.
Aiee.. right, I should have asked for dmidecode output when the bug reporter was around. :-( The following page carries all the BIOS versions for the board. http://support.asus.com/download/download.aspx?SLanguage=en-us&model=M2A-VM Given the dates, it gotta be 1404 or anything before that. I suppose trying 1404 should be good enough. Thanks.
Tejun, I borrowed one M2A-VM board from another building luckily, which the default BIOS revision 2001, I'm still NOT be able to reproduce this boot failure issue after 64 bit DMA enablement(4G system memory). The overnight SATA stress test with bonnie++ also works well, please check the dmidecode, lspci and dmesg files first. Next, I will try the BIOS revision 1404...
Created attachment 21459 [details] lspci dump for ASUS M2A-VM
Created attachment 21460 [details] dmidecode for M2A-VM with BIOS rev.2001
Created attachment 21461 [details] dmesg for M2A-VM with BIOS rev.2001
Great. :-) Just to be sure, can you please repeatedly run "find / -type f -exec cat \{\} \; > /dev/null" along with fsck on an unmounted filesystem? That will fill the page cache so that >4G pages are used and fsck will complain if the data it's seeing is inconsistent. Thanks.
Tejun, I failed to downgrade the BIOS rev. to an older one(like 1404 or 0901) after having tried three different methods provided by ASUS M2A-VM Manual: AWDFlash, EZ Flash 2, ASUS UPDATE of Windows(for M2A-VM-1404.exe). Do you have any suggestion? BTW, I will verify date integrity later.
I'd really like some confirmation of the original issue but it looks like you've tried pretty much everything you can to reproduce the problem. I'm still a bit worried. Also, it looks like the SMBus controller revision is different - yours is newer. Argh... If the data integrity test works out fine then well I guess we can lift the restriction on newer revisions (btw, how do we tell?) and see how it goes. Thanks.
Tejun, Yes, the revision on my board is newer than Srihari's(0x14 vs. 0x13), I also noticed that, but this is the only M2A-VM I can get. BTW, the BIOS rev. has been downgraded to 0901 with another AWDFLASH which was not from ASUS website. Rev.1404 need Windows env to be upgraded to, which will NOT be tried at this time.
aha, I know the root cause, it is ASUS BIOS bug in old revisions like 0901. With rev. 0901, this issue can be reproduced easily, and forcing into 32bit DMA do fixed it as workaround. Then I will submit a patch to withdraw AHCI_HFLAG_32BIT_ONLY for SB600.
Created attachment 21463 [details] dmidecode for M2A-VM with BIOS rev.0901
Created attachment 21464 [details] dmesg for M2A-VM with BIOS rev.0901
Great, thanks a lot for tracking it down. Is there anything other than dmi version string which can match these older BIOSen? The version string is a bit unwieldy and might be formatted differently for other versions. Well, at any rate, we can replace general blacklisting of SB600 with more specific one. :-)
> Is there anything other than dmi version string which can match > these older BIOSen? I do not know... I will submit a patch for this issue later, after some small double check on SB600 old revision board(like 0x13), if I can find one.
Created attachment 21480 [details] Restore SB600 SATA controller 64 bit DMA
Tejun, I have checked that our SB600 reference board(SMBus rev. 0x13) can NOT reproduce Srihari's issue with 64 bit DMA enabled plus 4G system memory, so it is NOT related to SB600 revisions. Please review my above patch, which will be submitted to mailing list soon. Thanks.
The patch looks fine in itself but we can't leave the current users who would be benefiting from the wide blacklisting out in the cold. Can you please implement asus board and bios revision match? Maybe we can match using release date? Thanks.
Tejun, > Can you please implement asus board and bios revision match? I'm sorry that I'm not able to implement it, because I have to switch to other tasks, the borrowed ASUS M2A-VM will also have to be returned. I still believe end users should upgrade the system BIOS, the rev. 2001 and 2302(latest one) have been verified to work well.
Aiee... Can you then please keep it for a few more hours? I'll try to brew something.
OK. Can I submit my above patch first, and you add another patch for ASUS old BIOS revisions?
I hope it can be done the other way around so that M2A isn't broken inbetween. :-)
Created attachment 21483 [details] asus_m2a_vm_quirk.patch Can you please verify the attached workaround triggers for older BIOS while 64bit is enabled for newer ones? Thanks.
Created attachment 21525 [details] asus_m2a_vm_quirk.patch (updated)
Tejun, Some commments to your patch: 1. AHCI_HFLAG_32BIT_ONLY should be removed for board_ahci_sb600 as default; 2. I tried more BIOS revisions, finding that the first BIOS rev. which can work is 1501 instead of 2001(the last bad one is 1404); 3. M2A-VM HDMI contains the same bug, but the quirk was also covered by our patch. Please check the above updated one which already passed my test, and mark yours into obsolete, Thanks.
Created attachment 21526 [details] dmidecode for M2A-VM with BIOS rev.1501
Yeah, the patch was supposed to come before your patch to lift the wide 64bit restriction, so doesn't contain the change itself. If you've tested the combined patch, can you please take the patch and send it upstream with me cc'd? Please feel free to take the credit as you did most of heavy lifting here. Thanks.
Tejun, please submit the patch, since you wrote the correct quick for ASUS M2A-VM and we still think it's not necessary... :-) You can add my name into the CC'd. Thanks
Ah... that was my not-so-sneaky attempt to push the work your way. Pretty please. :-)
Tejun, Patch was sent just now, please check it. Thanks
Thanks. Much appreciated. :-)