Bug 10738
Summary: | sata_via hang and data corruption | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Matteo Croce (rootkit85) |
Component: | Serial ATA | Assignee: | Tejun Heo (htejun) |
Status: | REJECTED INVALID | ||
Severity: | high | CC: | bunk, devzero |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
the full syslog
the lspci -nnvvv output bootlog with libata.force=1:udma33 |
Description
Matteo Croce
2008-05-17 19:11:47 UTC
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sat, 17 May 2008 19:11:48 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10738 > > Summary: sata_via hang and data corruption > Product: IO/Storage > Version: 2.5 > KernelVersion: 2.6.25 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Serial ATA > AssignedTo: jgarzik@pobox.com > ReportedBy: rootkit85@yahoo.it > > > Latest working kernel version: 2.6.22 (?) > Earliest failing kernel version: 2.6.23 (?) So it's a regression? > Distribution: Ubuntu > Hardware Environment: VIA motherboard (ASROCK P4V88), Maxtor SATA HD > Software Environment: Linux 2.6.25 i386 > Problem Description: After a few minutes of usage the kernel hangs while > sayng: > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata1.00: cmd 25/00:08:85:75:25/00:00:25:00:00/e0 tag 0 dma 4096 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata1.00: status: { DRDY } > ata1: soft resetting link > ata1.00: configured for UDMA/133 > ata1: EH complete > sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support > DPO or FUA > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata1.00: cmd 25/00:20:6d:ae:1d/00:00:24:00:00/e0 tag 0 dma 16384 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata1.00: status: { DRDY } > ata1: soft resetting link > ata1.00: configured for UDMA/133 > ata1: EH complete > sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support > DPO or FUA > > Steps to reproduce: poweron and use the disk > > Any hint on what can it be? I can't remember the last working kernel, will > try > some old one soon > I often have trouble working out which device driver is being used when a bug report comes past. This is one such case. * Does 'irqpoll' work around the problem? * What do you mean by data corruption? * Please post full boot log. Sorry I haven't the full bootlog now, the machine is remote and we powered it down as it were unusable. it was using the sata_via driver on a 2.6.25.4 vanilla kernel, not tainted. I will provide a full bootlog as soon as I can. ANother thing, I _have_ to boot with "noapic" otherwise the HDD will lock very early akpm: sorry for not providing much details but I haven't the machine locally Tejun: by corruption I mean that e2fsck can't repare the FS at boottime and I have to boot from cd and do e2fsck manually, which it fixes a LOT of errors, in fact I have to use the -y option to avoid stayng 1 hour pressing the y key hint: you can test if you get online data-corruption very easily. dd if=/dev/urandom of=100m.dat bs=1M count=100;while true;do cp 100M.dat 100M.tmp;rm 100M.dat;mv 100M.tmp 100M.dat;sync;md5sum 100M.dat;done let that run for a while. if the numbers being printed change from time to time, you have data-corruption problem. Created attachment 16196 [details]
the full syslog
Created attachment 16197 [details]
the lspci -nnvvv output
Andrew: here's the info, I hope that they will help. Tejun: i'll try in a few and let you know here is the bootlog with irqpoll: Linux version 2.6.25.1 (root@tracy-desktop) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #3 Fri May 16 00:22:49 CEST 2008 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003ff30000 (usable) BIOS-e820: 000000003ff30000 - 000000003ff40000 (ACPI data) BIOS-e820: 000000003ff40000 - 000000003fff0000 (ACPI NVS) BIOS-e820: 000000003fff0000 - 0000000040000000 (reserved) BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved) Warning only 896MB will be used. Use a HIGHMEM enabled kernel. 896MB LOWMEM available. Entering add_active_range(0, 0, 229376) 0 entries of 256 used Zone PFN ranges: DMA 0 -> 4096 Normal 4096 -> 229376 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0 -> 229376 On node 0 totalpages: 229376 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 4064 pages, LIFO batch:0 Normal zone: 1760 pages used for memmap Normal zone: 223520 pages, LIFO batch:31 Movable zone: 0 pages used for memmap DMI 2.3 present. ACPI: RSDP 000F9BC0, 0014 (r0 ACPIAM) ACPI: RSDT 3FF30000, 0030 (r1 A M I OEMRSDT 6000524 MSFT 97) ACPI: FACP 3FF30200, 0081 (r2 A M I OEMFACP 6000524 MSFT 97) ACPI: DSDT 3FF30360, 3417 (r1 P4V88 P4V88001 1 INTL 2002026) ACPI: FACS 3FF40000, 0040 ACPI: APIC 3FF30300, 0052 (r1 A M I OEMAPIC 6000524 MSFT 97) ACPI: OEMB 3FF40040, 003F (r1 A M I OEMBIOS 6000524 MSFT 97) ACPI: PM-Timer IO Port: 0x808 Allocating PCI resources starting at 50000000 (gap: 40000000:bffc0000) PM: Registered nosave memory: 000000000009f000 - 00000000000a0000 PM: Registered nosave memory: 00000000000a0000 - 00000000000e4000 PM: Registered nosave memory: 00000000000e4000 - 0000000000100000 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 227584 Kernel command line: ro noapic irqpoll vga=773 quiet Misrouted IRQ fixup and polling support enabled This may significantly impact system performance Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 CPU 0 irqstacks, hard=c0349000 soft=c0348000 PID hash table entries: 4096 (order: 12, 16384 bytes) Detected 3191.757 MHz processor. Console: colour dummy device 80x25 console [tty0] enabled Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 906512k/917504k available (1545k kernel code, 10508k reserved, 602k data, 172k init, 0k highmem) virtual kernel memory layout: fixmap : 0xffff6000 - 0xfffff000 ( 36 kB) vmalloc : 0xf8800000 - 0xffff4000 ( 119 MB) lowmem : 0xc0000000 - 0xf8000000 ( 896 MB) .init : 0xc031a000 - 0xc0345000 ( 172 kB) .data : 0xc02824cd - 0xc0318f08 ( 602 kB) .text : 0xc0100000 - 0xc02824cd (1545 kB) Checking if this processor honours the WP bit even in supervisor mode...Ok. CPA: page pool initialized 1 of 1 pages preallocated SLUB: Genslabs=12, HWalign=128, Order=0-1, MinObjects=4, CPUs=1, Nodes=1 Calibrating delay using timer specific routine.. 6390.97 BogoMIPS (lpj=12781953) Mount-cache hash table entries: 512 CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available Compat vDSO mapped to ffffe000. CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 01 Checking 'hlt' instruction... OK. Freeing SMP alternatives: 0k freed ACPI: Core revision 20070126 ACPI: setting ELCR to 0200 (from 0c20) net_namespace: 152 bytes NET: Registered protocol family 16 No dock devices found. ACPI: bus type pci registered PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=1 PCI: Using configuration type 1 Setting up standard PCI resources ACPI: EC: Look up EC in DSDT ACPI: Interpreter enabled ACPI: (supports S0 S1 S3 S4 S5) ACPI: Using PIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 10 *11 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 *10 11 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 7 10 11 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. ACPI Warning (tbutils-0217): Incorrect checksum in table [OEMB] - F0, should be E8 [20070126] Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init ACPI: bus type pnp registered pnp: PnP ACPI: found 10 devices ACPI: ACPI bus type pnp unregistered SCSI subsystem initialized libata version 3.00 loaded. PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report system 00:06: ioport range 0x295-0x296 has been reserved system 00:06: ioport range 0x3e0-0x3e7 has been reserved system 00:06: ioport range 0x4d0-0x4d1 has been reserved system 00:06: ioport range 0x800-0x87f has been reserved system 00:06: ioport range 0x400-0x41f has been reserved system 00:07: iomem range 0xfec80000-0xfec800ff has been reserved system 00:07: iomem range 0xfec00000-0xfec00fff has been reserved system 00:07: iomem range 0xfee00000-0xfee00fff has been reserved system 00:07: iomem range 0xfff80000-0xffffffff could not be reserved system 00:09: iomem range 0x0-0x9ffff could not be reserved system 00:09: iomem range 0xc0000-0xdffff could not be reserved system 00:09: iomem range 0xe0000-0xfffff could not be reserved system 00:09: iomem range 0x100000-0x3fffffff could not be reserved system 00:09: iomem range 0x0-0x0 could not be reserved PCI: Bridge: 0000:00:01.0 IO window: b000-bfff MEM window: 0xffd00000-0xffdfffff PREFETCH window: 0x00000000b7f00000-0x00000000f7efffff PCI: Setting latency timer of device 0000:00:01.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash table entries: 65536 (order: 6, 262144 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered Machine check exception polling timer started. Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254) io scheduler noop registered io scheduler cfq registered (default) PCI: VIA PCI bridge detected. Disabling DAC. pci 0000:01:00.0: Boot video device vesafb: framebuffer at 0xe0000000, mapped to 0xf8880000, using 1536k, total 16384k vesafb: mode is 1024x768x8, linelength=1024, pages=20 vesafb: protected mode interface info at c000:57cb vesafb: pmi: set display start = c00c585f, set palette = c00c58ab vesafb: pmi: ports = b810 b816 b854 b838 b83c b85c b800 b804 b8b0 b8b2 b8b4 vesafb: scrolling: redraw vesafb: Pseudocolor: size=8:8:8:8, shift=0:0:0:0 Console: switching to colour frame buffer device 128x48 fb0: VESA VGA frame buffer device Real Time Clock Driver v1.12ac Driver 'sd' needs updating - please use bus_type methods sata_via 0000:00:0f.0: version 2.3 ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 PCI: setting IRQ 10 as level-triggered ACPI: PCI Interrupt 0000:00:0f.0[B] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10 sata_via 0000:00:0f.0: routed to hard irq line 10 scsi0 : sata_via scsi1 : sata_via ata1: SATA max UDMA/133 cmd 0xec00 ctl 0xe800 bmdma 0xdc00 irq 10 ata2: SATA max UDMA/133 cmd 0xe400 ctl 0xe000 bmdma 0xdc08 irq 10 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133 ata1.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 0/32) ata1.00: configured for UDMA/133 Switched to high resolution mode on CPU 0 ata2: SATA link down 1.5 Gbps (SStatus 0 SControl 300) scsi 0:0:0:0: Direct-Access ATA MAXTOR STM332082 3.AA PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sd 0:0:0:0: [sda] Attached SCSI disk PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1 PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp serio: i8042 KBD port at 0x60,0x64 irq 1 mice: PS/2 mouse device common for all mice cpuidle: using governor ladder cpuidle: using governor menu TCP cubic registered NET: Registered protocol family 1 Using IPI Shortcut mode input: AT Translated Set 2 keyboard as /class/input/input0 kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 172k freed ACPI: ACPI0007:00 is registered as cooling_device0 ACPI: Processor [CPU1] (supports 16 throttling states) input: Power Button (FF) as /class/input/input1 ACPI: Power Button (FF) [PWRF] input: Power Button (CM) as /class/input/input2 ACPI: Power Button (CM) [PWRB] Linux agpgart interface v0.103 pata_via 0000:00:0f.1: version 0.3.3 ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11 PCI: setting IRQ 11 as level-triggered ACPI: PCI Interrupt 0000:00:0f.1[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 pata_via 0000:00:0f.1: VIA VLink IRQ fixup, from 255 to 11 scsi2 : pata_via scsi3 : pata_via ata3: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xfc00 irq 14 ata4: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xfc08 irq 15 usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb input: PC Speaker as /class/input/input3 Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 sd 0:0:0:0: Attached scsi generic sg0 type 0 USB Universal Host Controller Interface driver v3.0 via-rhine.c:v1.10-LK1.4.3 2007-03-06 Written by Donald Becker ata3.01: NODEV after polling detection ata3.00: ATAPI: ATAPI DVD+RW 4X4X12, B1HY, max UDMA/33 ata3.00: configured for UDMA/33 ata4.00: ATAPI: CRD-8481B, 1.01, max MWDMA2 ata4.00: configured for MWDMA2 scsi 2:0:0:0: CD-ROM ATAPI DVD+RW 4X4X12 B1HY PQ: 0 ANSI: 5 scsi 2:0:0:0: Attached scsi generic sg1 type 5 scsi 3:0:0:0: CD-ROM LG CD-ROM CRD-8481B 1.01 PQ: 0 ANSI: 5 scsi 3:0:0:0: Attached scsi generic sg2 type 5 ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 uhci_hcd 0000:00:10.0: UHCI Host Controller uhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 1 uhci_hcd 0000:00:10.0: irq 11, io base 0x0000c000 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 2 ports detected Driver 'sr' needs updating - please use bus_type methods sr0: scsi3-mmc drive: 12x/40x writer cd/rw xa/form2 cdda tray Uniform CD-ROM driver Revision: 3.20 sr 2:0:0:0: Attached scsi CD-ROM sr0 sr1: scsi3-mmc drive: 48x/48x cd/rw xa/form2 cdda tray sr 3:0:0:0: Attached scsi CD-ROM sr1 ACPI: PCI Interrupt 0000:00:10.1[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 uhci_hcd 0000:00:10.1: UHCI Host Controller uhci_hcd 0000:00:10.1: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:10.1: irq 11, io base 0x0000c400 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:10.2[B] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10 uhci_hcd 0000:00:10.2: UHCI Host Controller uhci_hcd 0000:00:10.2: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:10.2: irq 10, io base 0x0000c800 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:10.3[B] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10 uhci_hcd 0000:00:10.3: UHCI Host Controller uhci_hcd 0000:00:10.3: new USB bus registered, assigned bus number 4 uhci_hcd 0000:00:10.3: irq 10, io base 0x0000cc00 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5 PCI: setting IRQ 5 as level-triggered ACPI: PCI Interrupt 0000:00:10.4[C] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5 ehci_hcd 0000:00:10.4: EHCI Host Controller ehci_hcd 0000:00:10.4: new USB bus registered, assigned bus number 5 ehci_hcd 0000:00:10.4: irq 5, io mem 0xffeff800 ehci_hcd 0000:00:10.4: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 8 ports detected ACPI: PCI Interrupt 0000:00:12.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 eth0: VIA Rhine II at 0xffeffc00, 00:0b:6a:ad:8f:48, IRQ 11. eth0: MII PHY found at address 1, status 0x7849 advertising 05e1 Link 0000. ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5 PCI: Setting latency timer of device 0000:00:11.5 to 64 eth0: link down agpgart: Detected VIA PT880 chipset agpgart: AGP aperture is 64M @ 0xf8000000 fuse init (API version 7.9) Adding 996020k swap on /dev/sda3. Priority:-1 extents:1 across:996020k hub 5-0:1.0: unable to enumerate USB device on port 5 usb 3-1: new low speed USB device using uhci_hcd and address 2 EXT3 FS on sda2, internal journal usb 3-1: configuration #1 chosen from 1 choice input: Acrox USB & PS/2 Mouse as /class/input/input4 input: USB HID v1.10 Mouse [Acrox USB & PS/2 Mouse] on usb-0000:00:10.2-1 usbcore: registered new interface driver usbhid drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver IA-32 Microcode Update Driver: v1.14a <tigran@aivazian.fsnet.co.uk> warning: `dnsmasq' uses 32-bit capabilities (legacy support in use) Bluetooth: Core ver 2.11 NET: Registered protocol family 31 Bluetooth: HCI device and connection manager initialized Bluetooth: HCI socket layer initialized Bluetooth: L2CAP ver 2.9 Bluetooth: L2CAP socket layer initialized Bluetooth: RFCOMM socket layer initialized Bluetooth: RFCOMM TTY layer initialized Bluetooth: RFCOMM ver 1.8 [drm] Initialized drm 1.1.0 20060810 ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 [drm] Initialized radeon 1.28.0 20060524 on minor 0 agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0. agpgart: Putting AGP V3 device at 0000:00:00.0 into 8x mode agpgart: Putting AGP V3 device at 0000:01:00.0 into 8x mode [drm] Setting GART location based on new memory map [drm] Loading R300 Microcode [drm] writeback test succeeded in 1 usecs ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd 25/00:08:fd:6d:86/00:00:24:00:00/e0 tag 0 dma 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: soft resetting link ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd 25/00:f0:0d:4f:55/00:00:24:00:00/e0 tag 0 dma 122880 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: soft resetting link ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd 25/00:08:55:bf:4d/00:00:1e:00:00/e0 tag 0 dma 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: soft resetting link ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA usb 5-7: new high speed USB device using ehci_hcd and address 3 usb 5-7: configuration #1 chosen from 1 choice Initializing USB Mass Storage driver... scsi4 : SCSI emulation for USB Mass Storage devices usbcore: registered new interface driver usb-storage USB Mass Storage support registered. usb-storage: device found at 3 usb-storage: waiting for device to settle before scanning usb-storage: device scan complete scsi 4:0:0:0: Direct-Access Kingston DataTraveler 2.0 PMAP PQ: 0 ANSI: 0 CCS sd 4:0:0:0: [sdb] 2015232 512-byte hardware sectors (1032 MB) sd 4:0:0:0: [sdb] Write Protect is off sd 4:0:0:0: [sdb] Mode Sense: 23 00 00 00 sd 4:0:0:0: [sdb] Assuming drive cache: write through sd 4:0:0:0: [sdb] 2015232 512-byte hardware sectors (1032 MB) sd 4:0:0:0: [sdb] Write Protect is off sd 4:0:0:0: [sdb] Mode Sense: 23 00 00 00 sd 4:0:0:0: [sdb] Assuming drive cache: write through sdb: unknown partition table sd 4:0:0:0: [sdb] Attached SCSI removable disk sd 4:0:0:0: Attached scsi generic sg3 type 0 Hmmm... Strange. Data should never be corrupt as all failed commands are retried. Does specifying libata.force=1:udma33 make any difference? Also, you said that e2fsck complains a lot during boot. Do you see a lot of IO errors while e2fsck is running? I'll have phisical access to the machine on saturday, i'll check thanks. no e2fsck doesn't give any error or warning. I run it from the ubuntu live cd which uses a 2.6.24 kernel IIRC it boots then hangs sayng: BUG: soft lockup - CPU0 stuck for 61s! should I remove noapic and irqpoll with libata.force=1:udma33? I attach the last boot log, in which I have the FS corruption Created attachment 16266 [details]
bootlog with libata.force=1:udma33
Here I have a FS corruption
tracy@tracy-desktop:~$ uptime 19:44:04 up 27 min, 1 user, load average: 0.20, 0.13, 0.17 i guess it's stable now, with the UDMA33 force and NOT irqpoll If the machine crashes w/o properly unmounting the filesystem, journal recovery is expected. e2fsk shouldn't be necessary tho. W/o UDMA33, how reproducible is the problem? it happens a few minutes after the system booted, sometimes when Xorg starts, sometimes before. sorry, it happens with UDMA33 too, even if the system doesn't hang. It hangs for a while, then the kernel gives the error, then continue to work e2fsck IS necessary, the ext3 driver detects an IO error and remount the drive read-only Hmmm... Can you please post the kernel log here? You can stick in a USB stick before boot and keep it mounted. After the harddisk fails, do dmesg > MOUNTPOINT/dmesg.out. You probably want to run dmesg regularly to keep it in memory while the harddisk is running normally. i'm not at pc now, but the error was the same: "action 0x2 frozen" followed by: ata1.00: status: { DRDY } can it be a damaged HD? but why I can do fsck for much time without errors? Can you please post the log when you have access to the pc? I really need to look at the full log to learn more about the failure. Thanks. ethernet and IDE stopped working too, probably the motherboard (asrock) is just crap. Closing the bug (and trashing the motherboard) |