Bug 10738 - sata_via hang and data corruption
Summary: sata_via hang and data corruption
Status: REJECTED INVALID
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-17 19:11 UTC by Matteo Croce
Modified: 2008-09-09 12:20 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.25
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
the full syslog (14.65 KB, text/plain)
2008-05-19 07:09 UTC, Matteo Croce
Details
the lspci -nnvvv output (11.87 KB, text/plain)
2008-05-19 07:11 UTC, Matteo Croce
Details
bootlog with libata.force=1:udma33 (17.47 KB, text/plain)
2008-05-24 08:49 UTC, Matteo Croce
Details

Description Matteo Croce 2008-05-17 19:11:47 UTC
Latest working kernel version: 2.6.22 (?)
Earliest failing kernel version: 2.6.23 (?)
Distribution: Ubuntu
Hardware Environment: VIA motherboard (ASROCK P4V88), Maxtor SATA HD
Software Environment: Linux 2.6.25 i386
Problem Description: After a few minutes of usage the kernel hangs while sayng:
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd 25/00:08:85:75:25/00:00:25:00:00/e0 tag 0 dma 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd 25/00:20:6d:ae:1d/00:00:24:00:00/e0 tag 0 dma 16384 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Steps to reproduce: poweron and use the disk

Any hint on what can it be? I can't remember the last working kernel, will try some old one soon
Comment 1 Anonymous Emailer 2008-05-17 22:00:08 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 17 May 2008 19:11:48 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10738
> 
>            Summary: sata_via hang and data corruption
>            Product: IO/Storage
>            Version: 2.5
>      KernelVersion: 2.6.25
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Serial ATA
>         AssignedTo: jgarzik@pobox.com
>         ReportedBy: rootkit85@yahoo.it
> 
> 
> Latest working kernel version: 2.6.22 (?)
> Earliest failing kernel version: 2.6.23 (?)

So it's a regression?

> Distribution: Ubuntu
> Hardware Environment: VIA motherboard (ASROCK P4V88), Maxtor SATA HD
> Software Environment: Linux 2.6.25 i386
> Problem Description: After a few minutes of usage the kernel hangs while
> sayng:
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd 25/00:08:85:75:25/00:00:25:00:00/e0 tag 0 dma 4096 in
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata1.00: status: { DRDY }
> ata1: soft resetting link
> ata1.00: configured for UDMA/133
> ata1: EH complete
> sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support
> DPO or FUA
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd 25/00:20:6d:ae:1d/00:00:24:00:00/e0 tag 0 dma 16384 in
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata1.00: status: { DRDY }
> ata1: soft resetting link
> ata1.00: configured for UDMA/133
> ata1: EH complete
> sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support
> DPO or FUA
> 
> Steps to reproduce: poweron and use the disk
> 
> Any hint on what can it be? I can't remember the last working kernel, will
> try
> some old one soon
> 

I often have trouble working out which device driver is being used when
a bug report comes past.  This is one such case.
Comment 2 Tejun Heo 2008-05-17 22:24:27 UTC
* Does 'irqpoll' work around the problem?

* What do you mean by data corruption?

* Please post full boot log.
Comment 3 Matteo Croce 2008-05-18 07:07:58 UTC
Sorry I haven't the full bootlog now, the machine is remote and we powered it down as it were unusable.
it was using the sata_via driver on a 2.6.25.4 vanilla kernel, not tainted.
I will provide a full bootlog as soon as I can.

ANother thing, I _have_ to boot with "noapic" otherwise the HDD will lock very early

akpm: sorry for not providing much details but I haven't the machine locally
Comment 4 Matteo Croce 2008-05-18 07:09:44 UTC
Tejun:
by corruption I mean that e2fsck can't repare the FS at boottime and I have to boot from cd and do e2fsck manually, which it fixes a LOT of errors, in fact I have to use the -y option to avoid stayng 1 hour pressing the y key
Comment 5 Roland Kletzing 2008-05-18 07:38:44 UTC
hint: you can test if you get online data-corruption very easily.

dd if=/dev/urandom of=100m.dat bs=1M count=100;while true;do cp 100M.dat 100M.tmp;rm 100M.dat;mv 100M.tmp 100M.dat;sync;md5sum 100M.dat;done

let that run for a while.

if the numbers being printed change from time to time, you have data-corruption problem.
Comment 6 Matteo Croce 2008-05-19 07:09:56 UTC
Created attachment 16196 [details]
the full syslog
Comment 7 Matteo Croce 2008-05-19 07:11:22 UTC
Created attachment 16197 [details]
the lspci -nnvvv output
Comment 8 Matteo Croce 2008-05-19 07:32:50 UTC
Andrew: here's the info, I hope that they will help.
Tejun: i'll try in a few and let you know
Comment 9 Matteo Croce 2008-05-19 10:56:41 UTC
here is the bootlog with irqpoll:

Linux version 2.6.25.1 (root@tracy-desktop) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #3 Fri May 16 00:22:49 CEST 2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003ff30000 (usable)
 BIOS-e820: 000000003ff30000 - 000000003ff40000 (ACPI data)
 BIOS-e820: 000000003ff40000 - 000000003fff0000 (ACPI NVS)
 BIOS-e820: 000000003fff0000 - 0000000040000000 (reserved)
 BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
Warning only 896MB will be used.
Use a HIGHMEM enabled kernel.
896MB LOWMEM available.
Entering add_active_range(0, 0, 229376) 0 entries of 256 used
Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->   229376
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    0:        0 ->   229376
On node 0 totalpages: 229376
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 1760 pages used for memmap
  Normal zone: 223520 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
DMI 2.3 present.
ACPI: RSDP 000F9BC0, 0014 (r0 ACPIAM)
ACPI: RSDT 3FF30000, 0030 (r1 A M I  OEMRSDT   6000524 MSFT       97)
ACPI: FACP 3FF30200, 0081 (r2 A M I  OEMFACP   6000524 MSFT       97)
ACPI: DSDT 3FF30360, 3417 (r1  P4V88 P4V88001        1 INTL  2002026)
ACPI: FACS 3FF40000, 0040
ACPI: APIC 3FF30300, 0052 (r1 A M I  OEMAPIC   6000524 MSFT       97)
ACPI: OEMB 3FF40040, 003F (r1 A M I  OEMBIOS   6000524 MSFT       97)
ACPI: PM-Timer IO Port: 0x808
Allocating PCI resources starting at 50000000 (gap: 40000000:bffc0000)
PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000e4000
PM: Registered nosave memory: 00000000000e4000 - 0000000000100000
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 227584
Kernel command line: ro noapic irqpoll vga=773 quiet
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c0349000 soft=c0348000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 3191.757 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 906512k/917504k available (1545k kernel code, 10508k reserved, 602k data, 172k init, 0k highmem)
virtual kernel memory layout:
    fixmap  : 0xffff6000 - 0xfffff000   (  36 kB)
    vmalloc : 0xf8800000 - 0xffff4000   ( 119 MB)
    lowmem  : 0xc0000000 - 0xf8000000   ( 896 MB)
      .init : 0xc031a000 - 0xc0345000   ( 172 kB)
      .data : 0xc02824cd - 0xc0318f08   ( 602 kB)
      .text : 0xc0100000 - 0xc02824cd   (1545 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
CPA: page pool initialized 1 of 1 pages preallocated
SLUB: Genslabs=12, HWalign=128, Order=0-1, MinObjects=4, CPUs=1, Nodes=1
Calibrating delay using timer specific routine.. 6390.97 BogoMIPS (lpj=12781953)
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
Compat vDSO mapped to ffffe000.
CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 01
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 0k freed
ACPI: Core revision 20070126
ACPI: setting ELCR to 0200 (from 0c20)
net_namespace: 152 bytes
NET: Registered protocol family 16
No dock devices found.
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=1
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S3 S4 S5)
ACPI: Using PIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 10 *11 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 *10 11 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 7 10 11 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 7 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 7 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 7 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 7 10 11 14 15) *0, disabled.
ACPI Warning (tbutils-0217): Incorrect checksum in table [OEMB] -  F0, should be E8 [20070126]
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 10 devices
ACPI: ACPI bus type pnp unregistered
SCSI subsystem initialized
libata version 3.00 loaded.
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
system 00:06: ioport range 0x295-0x296 has been reserved
system 00:06: ioport range 0x3e0-0x3e7 has been reserved
system 00:06: ioport range 0x4d0-0x4d1 has been reserved
system 00:06: ioport range 0x800-0x87f has been reserved
system 00:06: ioport range 0x400-0x41f has been reserved
system 00:07: iomem range 0xfec80000-0xfec800ff has been reserved
system 00:07: iomem range 0xfec00000-0xfec00fff has been reserved
system 00:07: iomem range 0xfee00000-0xfee00fff has been reserved
system 00:07: iomem range 0xfff80000-0xffffffff could not be reserved
system 00:09: iomem range 0x0-0x9ffff could not be reserved
system 00:09: iomem range 0xc0000-0xdffff could not be reserved
system 00:09: iomem range 0xe0000-0xfffff could not be reserved
system 00:09: iomem range 0x100000-0x3fffffff could not be reserved
system 00:09: iomem range 0x0-0x0 could not be reserved
PCI: Bridge: 0000:00:01.0
  IO window: b000-bfff
  MEM window: 0xffd00000-0xffdfffff
  PREFETCH window: 0x00000000b7f00000-0x00000000f7efffff
PCI: Setting latency timer of device 0000:00:01.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
Machine check exception polling timer started.
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
io scheduler noop registered
io scheduler cfq registered (default)
PCI: VIA PCI bridge detected. Disabling DAC.
pci 0000:01:00.0: Boot video device
vesafb: framebuffer at 0xe0000000, mapped to 0xf8880000, using 1536k, total 16384k
vesafb: mode is 1024x768x8, linelength=1024, pages=20
vesafb: protected mode interface info at c000:57cb
vesafb: pmi: set display start = c00c585f, set palette = c00c58ab
vesafb: pmi: ports = b810 b816 b854 b838 b83c b85c b800 b804 b8b0 b8b2 b8b4 
vesafb: scrolling: redraw
vesafb: Pseudocolor: size=8:8:8:8, shift=0:0:0:0
Console: switching to colour frame buffer device 128x48
fb0: VESA VGA frame buffer device
Real Time Clock Driver v1.12ac
Driver 'sd' needs updating - please use bus_type methods
sata_via 0000:00:0f.0: version 2.3
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:0f.0[B] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10
sata_via 0000:00:0f.0: routed to hard irq line 10
scsi0 : sata_via
scsi1 : sata_via
ata1: SATA max UDMA/133 cmd 0xec00 ctl 0xe800 bmdma 0xdc00 irq 10
ata2: SATA max UDMA/133 cmd 0xe400 ctl 0xe000 bmdma 0xdc08 irq 10
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133
ata1.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
Switched to high resolution mode on CPU 0
ata2: SATA link down 1.5 Gbps (SStatus 0 SControl 300)
scsi 0:0:0:0: Direct-Access     ATA      MAXTOR STM332082 3.AA PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2 sda3
sd 0:0:0:0: [sda] Attached SCSI disk
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
cpuidle: using governor ladder
cpuidle: using governor menu
TCP cubic registered
NET: Registered protocol family 1
Using IPI Shortcut mode
input: AT Translated Set 2 keyboard as /class/input/input0
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 172k freed
ACPI: ACPI0007:00 is registered as cooling_device0
ACPI: Processor [CPU1] (supports 16 throttling states)
input: Power Button (FF) as /class/input/input1
ACPI: Power Button (FF) [PWRF]
input: Power Button (CM) as /class/input/input2
ACPI: Power Button (CM) [PWRB]
Linux agpgart interface v0.103
pata_via 0000:00:0f.1: version 0.3.3
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt 0000:00:0f.1[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
pata_via 0000:00:0f.1: VIA VLink IRQ fixup, from 255 to 11
scsi2 : pata_via
scsi3 : pata_via
ata3: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xfc00 irq 14
ata4: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xfc08 irq 15
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
input: PC Speaker as /class/input/input3
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
sd 0:0:0:0: Attached scsi generic sg0 type 0
USB Universal Host Controller Interface driver v3.0
via-rhine.c:v1.10-LK1.4.3 2007-03-06 Written by Donald Becker
ata3.01: NODEV after polling detection
ata3.00: ATAPI: ATAPI   DVD+RW 4X4X12, B1HY, max UDMA/33
ata3.00: configured for UDMA/33
ata4.00: ATAPI: CRD-8481B, 1.01, max MWDMA2
ata4.00: configured for MWDMA2
scsi 2:0:0:0: CD-ROM            ATAPI    DVD+RW 4X4X12    B1HY PQ: 0 ANSI: 5
scsi 2:0:0:0: Attached scsi generic sg1 type 5
scsi 3:0:0:0: CD-ROM            LG       CD-ROM CRD-8481B 1.01 PQ: 0 ANSI: 5
scsi 3:0:0:0: Attached scsi generic sg2 type 5
ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
uhci_hcd 0000:00:10.0: UHCI Host Controller
uhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 1
uhci_hcd 0000:00:10.0: irq 11, io base 0x0000c000
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
Driver 'sr' needs updating - please use bus_type methods
sr0: scsi3-mmc drive: 12x/40x writer cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 2:0:0:0: Attached scsi CD-ROM sr0
sr1: scsi3-mmc drive: 48x/48x cd/rw xa/form2 cdda tray
sr 3:0:0:0: Attached scsi CD-ROM sr1
ACPI: PCI Interrupt 0000:00:10.1[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
uhci_hcd 0000:00:10.1: UHCI Host Controller
uhci_hcd 0000:00:10.1: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:10.1: irq 11, io base 0x0000c400
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:10.2[B] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10
uhci_hcd 0000:00:10.2: UHCI Host Controller
uhci_hcd 0000:00:10.2: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:10.2: irq 10, io base 0x0000c800
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:10.3[B] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10
uhci_hcd 0000:00:10.3: UHCI Host Controller
uhci_hcd 0000:00:10.3: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:10.3: irq 10, io base 0x0000cc00
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
PCI: setting IRQ 5 as level-triggered
ACPI: PCI Interrupt 0000:00:10.4[C] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5
ehci_hcd 0000:00:10.4: EHCI Host Controller
ehci_hcd 0000:00:10.4: new USB bus registered, assigned bus number 5
ehci_hcd 0000:00:10.4: irq 5, io mem 0xffeff800
ehci_hcd 0000:00:10.4: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 8 ports detected
ACPI: PCI Interrupt 0000:00:12.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
eth0: VIA Rhine II at 0xffeffc00, 00:0b:6a:ad:8f:48, IRQ 11.
eth0: MII PHY found at address 1, status 0x7849 advertising 05e1 Link 0000.
ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5
PCI: Setting latency timer of device 0000:00:11.5 to 64
eth0: link down
agpgart: Detected VIA PT880 chipset
agpgart: AGP aperture is 64M @ 0xf8000000
fuse init (API version 7.9)
Adding 996020k swap on /dev/sda3.  Priority:-1 extents:1 across:996020k
hub 5-0:1.0: unable to enumerate USB device on port 5
usb 3-1: new low speed USB device using uhci_hcd and address 2
EXT3 FS on sda2, internal journal
usb 3-1: configuration #1 chosen from 1 choice
input: Acrox USB & PS/2 Mouse as /class/input/input4
input: USB HID v1.10 Mouse [Acrox USB & PS/2 Mouse] on usb-0000:00:10.2-1
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
IA-32 Microcode Update Driver: v1.14a <tigran@aivazian.fsnet.co.uk>
warning: `dnsmasq' uses 32-bit capabilities (legacy support in use)
Bluetooth: Core ver 2.11
NET: Registered protocol family 31
Bluetooth: HCI device and connection manager initialized
Bluetooth: HCI socket layer initialized
Bluetooth: L2CAP ver 2.9
Bluetooth: L2CAP socket layer initialized
Bluetooth: RFCOMM socket layer initialized
Bluetooth: RFCOMM TTY layer initialized
Bluetooth: RFCOMM ver 1.8
[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
[drm] Initialized radeon 1.28.0 20060524 on minor 0
agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0.
agpgart: Putting AGP V3 device at 0000:00:00.0 into 8x mode
agpgart: Putting AGP V3 device at 0000:01:00.0 into 8x mode
[drm] Setting GART location based on new memory map
[drm] Loading R300 Microcode
[drm] writeback test succeeded in 1 usecs
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd 25/00:08:fd:6d:86/00:00:24:00:00/e0 tag 0 dma 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd 25/00:f0:0d:4f:55/00:00:24:00:00/e0 tag 0 dma 122880 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd 25/00:08:55:bf:4d/00:00:1e:00:00/e0 tag 0 dma 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
usb 5-7: new high speed USB device using ehci_hcd and address 3
usb 5-7: configuration #1 chosen from 1 choice
Initializing USB Mass Storage driver...
scsi4 : SCSI emulation for USB Mass Storage devices
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usb-storage: device found at 3
usb-storage: waiting for device to settle before scanning
usb-storage: device scan complete
scsi 4:0:0:0: Direct-Access     Kingston DataTraveler 2.0 PMAP PQ: 0 ANSI: 0 CCS
sd 4:0:0:0: [sdb] 2015232 512-byte hardware sectors (1032 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 23 00 00 00
sd 4:0:0:0: [sdb] Assuming drive cache: write through
sd 4:0:0:0: [sdb] 2015232 512-byte hardware sectors (1032 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 23 00 00 00
sd 4:0:0:0: [sdb] Assuming drive cache: write through
 sdb: unknown partition table
sd 4:0:0:0: [sdb] Attached SCSI removable disk
sd 4:0:0:0: Attached scsi generic sg3 type 0
Comment 10 Tejun Heo 2008-05-22 00:45:53 UTC
Hmmm... Strange.  Data should never be corrupt as all failed commands are retried.

Does specifying libata.force=1:udma33 make any difference?
Comment 11 Tejun Heo 2008-05-22 00:47:24 UTC
Also, you said that e2fsck complains a lot during boot.  Do you see a lot of IO errors while e2fsck is running?
Comment 12 Matteo Croce 2008-05-22 17:17:02 UTC
I'll have phisical access to the machine on saturday, i'll check thanks.

no e2fsck doesn't give any error or warning.
I run it from the ubuntu live cd which uses a 2.6.24 kernel IIRC
Comment 13 Matteo Croce 2008-05-24 08:42:57 UTC
it boots then hangs sayng:
BUG: soft lockup - CPU0 stuck for 61s!

should I remove noapic and irqpoll with libata.force=1:udma33?
Comment 14 Matteo Croce 2008-05-24 08:48:38 UTC
I attach the last boot log, in which I have the FS corruption
Comment 15 Matteo Croce 2008-05-24 08:49:52 UTC
Created attachment 16266 [details]
bootlog with libata.force=1:udma33

Here I have a FS corruption
Comment 16 Matteo Croce 2008-05-24 10:45:14 UTC
tracy@tracy-desktop:~$ uptime
 19:44:04 up 27 min,  1 user,  load average: 0.20, 0.13, 0.17

i guess it's stable now, with the UDMA33 force and NOT irqpoll
Comment 17 Tejun Heo 2008-05-25 20:38:16 UTC
If the machine crashes w/o properly unmounting the filesystem, journal recovery is expected.  e2fsk shouldn't be necessary tho.

W/o UDMA33, how reproducible is the problem?
Comment 18 Matteo Croce 2008-05-26 04:23:50 UTC
it happens a few minutes after the system booted, sometimes when Xorg starts, sometimes before.
Comment 19 Matteo Croce 2008-05-26 11:12:23 UTC
sorry, it happens with UDMA33 too, even if the system doesn't hang.
It hangs for a while, then the kernel gives the error, then continue to work
e2fsck IS necessary, the ext3 driver detects an IO error and remount the drive read-only
Comment 20 Tejun Heo 2008-05-26 16:20:12 UTC
Hmmm... Can you please post the kernel log here?  You can stick in a USB stick before boot and keep it mounted.  After the harddisk fails, do dmesg > MOUNTPOINT/dmesg.out.  You probably want to run dmesg regularly to keep it in memory while the harddisk is running normally.
Comment 21 Matteo Croce 2008-05-26 16:34:50 UTC
i'm not at pc now, but the error was the same: "action 0x2 frozen" followed by:
ata1.00: status: { DRDY }

can it be a damaged HD?
but why I can do fsck for much time without errors?
Comment 22 Tejun Heo 2008-05-26 16:48:17 UTC
Can you please post the log when you have access to the pc?  I really need to look at the full log to learn more about the failure.  Thanks.
Comment 23 Matteo Croce 2008-09-09 12:20:59 UTC
ethernet and IDE stopped working too, probably the motherboard (asrock) is just crap.
Closing the bug (and trashing the motherboard)

Note You need to log in before you can comment on or make changes to this bug.