Most recent kernel where this bug did not occur: N/A Distribution: Slackware 10.2 Hardware Environment: 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) Subsystem: ASUSTeK Computer Inc. A7V600/K8V Deluxe/K8V-X motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 Interrupt: pin B routed to IRQ 11 Region 0: I/O ports at e400 [size=8] Region 1: I/O ports at e000 [size=4] Region 2: I/O ports at d800 [size=8] Region 3: I/O ports at d400 [size=4] Region 4: I/O ports at d000 [size=16] Region 5: I/O ports at c800 [size=256] Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Software Environment: raid1, if it matters Problem Description: After a few days of uptime, I got following in my log: May 6 18:40:20 k kernel: ata2: command 0xea timeout, stat 0xd0 host_stat 0x0 May 6 18:40:20 k kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 May 6 18:40:20 k kernel: ata2: status=0xd0 { Busy } May 6 18:40:20 k kernel: ATA: abnormal status 0xD0 on port 0xD807 May 6 18:40:20 k last message repeated 2 times May 6 18:40:50 k kernel: ata2: command 0x35 timeout, stat 0xd0 host_stat 0x1 May 6 18:40:50 k kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 May 6 18:40:50 k kernel: ata2: status=0xd0 { Busy } May 6 18:40:50 k kernel: end_request: I/O error, dev sdb, sector 58604991 May 6 18:40:50 k kernel: ATA: abnormal status 0xD0 on port 0xD807 May 6 18:40:50 k last message repeated 2 times May 6 18:41:20 k kernel: ata2: command 0xea timeout, stat 0xd0 host_stat 0x0 May 6 18:41:20 k kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 May 6 18:41:20 k kernel: ata2: status=0xd0 { Busy } May 6 18:41:20 k kernel: raid1: Disk failure on sdb1, disabling device. May 6 18:41:20 k kernel: ^IOperation continuing on 1 devices Today, I noticed this, so I tried smartctl -d ata -a /dev/sdb and fdisk /dev/sdb, and got following oops: May 9 06:47:07 k kernel: ATA: abnormal status 0xD0 on port 0xD807 May 9 06:47:07 k last message repeated 3 times May 9 06:47:17 k kernel: ata2: command 0xec timeout, stat 0xd0 host_stat 0x0 May 9 06:47:17 k kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 May 9 06:47:17 k kernel: ata2: status=0xd0 { Busy } May 9 06:47:17 k kernel: ATA: abnormal status 0xD0 on port 0xD807 May 9 06:47:17 k last message repeated 2 times May 9 06:47:27 k kernel: ata2: command 0xa1 timeout, stat 0xd0 host_stat 0x0 May 9 06:47:27 k kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 May 9 06:47:27 k kernel: ata2: status=0xd0 { Busy } May 9 06:47:27 k kernel: Assertion failed! qc != NULL,/root/linux-2.6.16.9/drivers/scsi/libata-core.c,ata_pio_poll,line=2897 May 9 06:47:47 k last message repeated 1253 times May 9 06:47:47 k kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000008c May 9 06:47:47 k kernel: printing eip: May 9 06:47:47 k kernel: c025e96c May 9 06:47:47 k kernel: *pde = 00000000 May 9 06:47:47 k kernel: Oops: 0002 [#1] May 9 06:47:47 k kernel: Modules linked in: bsd_comp ppp_synctty ppp_async crc_ccitt ppp_generic slhc ipv6 ohci_hcd amd64_agp shpchp 8139too uhci_hcd ehci_hcd via_rhine mii ext3 jbd w83627ehf hwmon eeprom i2c_is a i2c_viapro i2c_core ide_scsi agpgart quota_v2 May 9 06:47:47 k kernel: CPU: 0 May 9 06:47:47 k kernel: EIP: 0060:[<c025e96c>] Not tainted VLI May 9 06:47:47 k kernel: EFLAGS: 00010297 (2.6.16.9 #4) May 9 06:47:47 k kernel: EIP is at ata_pio_poll+0x8c/0x100 May 9 06:47:47 k kernel: eax: 0de5fe1d ebx: f7487280 ecx: c03f4bcc edx: 0000d807 May 9 06:47:47 k kernel: esi: 00000000 edi: 00000004 ebp: 00000002 esp: f7563eec May 9 06:47:47 k kernel: ds: 007b es: 007b ss: 0068 May 9 06:47:47 k kernel: Process ata/0 (pid: 801, threadinfo=f7562000 task=f7f6f550) May 9 06:47:47 k kernel: Stack: <0>f7487280 c03b081d c03c4140 c037fb37 00000b51 c048b5b0 f7487280 00000000 May 9 06:47:47 k kernel: f7f96440 f7487280 c025f30c f7487280 f7487830 00000296 c0128342 f7487280 May 9 06:47:47 k kernel: 00000000 a1e3e100 f7f96458 f7f96448 c025f2c0 f7f96450 f7563f88 f7f96448 May 9 06:47:47 k kernel: Call Trace: May 9 06:47:47 k kernel: [<c025f30c>] ata_pio_task+0x4c/0x80 May 9 06:47:47 k kernel: [<c0128342>] run_workqueue+0x62/0xd0 May 9 06:47:47 k kernel: [<c025f2c0>] ata_pio_task+0x0/0x80 May 9 06:47:47 k kernel: [<c01284f0>] worker_thread+0x140/0x170 May 9 06:47:47 k kernel: [<c0115f20>] default_wake_function+0x0/0x20 May 9 06:47:47 k kernel: [<c0115f20>] default_wake_function+0x0/0x20 May 9 06:47:47 k kernel: [<c01283b0>] worker_thread+0x0/0x170 May 9 06:47:47 k kernel: [<c012b831>] kthread+0xb1/0xc0 May 9 06:47:47 k kernel: [<c012b780>] kthread+0x0/0xc0 May 9 06:47:47 k kernel: [<c0101391>] kernel_thread_helper+0x5/0x14 May 9 06:47:47 k kernel: Code: 78 1c 89 bb dc 05 00 00 31 c0 8b 5c 24 18 8b 74 24 1c 8b 7c 24 20 8b 6c 24 24 83 c4 28 c3 a1 f8 5c 3f c0 39 83 e0 05 00 00 79 13 <83> 8e 8c 00 00 00 04 c7 83 dc 05 00 00 03 00 00 0 0 eb ca b8 04 Steps to reproduce:
This bug is fixed by the following commit, which is in libata-dev #upstream and Linus's tree. I don't know whether backporting the fix to -stable is necessary though. Author: Tejun Heo <htejun@gmail.com> Date: Sun Feb 12 23:32:59 2006 +0900 [tj@htj:~/os/linux-2.6]$ git-cat-file commit 86e45b6 tree 5b86ebd0b0b17d05bdfdd07b7683f7348577b52a parent d7fc3ca1cd0ecce82263299c6b1631fc83b0ec79 author Tejun Heo <htejun@gmail.com> 1141540149 +0900 committer Jeff Garzik <jeff@garzik.org> 1142117840 -0500 [PATCH] libata: implement port_task Implement port_task. LLDD's can schedule a function to be executed with context after specified delay. libata core takes care of synchronization against EH. This is generalized form of pio_task and packet_task which are tied to PIO hsm implementation. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Oh, only looked into latest -stable. Sorry for the noise. Thanks!