Bug 15537

Summary: 2.6.34-rc1 hangs for 30 seconds when trying to access the disk
Product: IO/Storage Reporter: Andrew Benton (b3nton)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, maciej.rutecki, petr.uzel, rjw, tj
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 15310    
Attachments: Kernel config
Proposed patch from Tejun Heo

Description Andrew Benton 2010-03-14 23:26:05 UTC
The system works fine with 2.6.33 
Compiling 2.6.34-rc1 I accepted the defaults for all the new options (except the max no of GPUs). The kernel boot Ok but when I try to use the system and access the disk it freezes for about 30 seconds with no disk activity and then I get a message like this in the system log

Mar 11 14:57:57 eccles kernel: ata2: clearing spurious IRQ
Mar 11 14:58:27 eccles kernel: ata2: lost interrupt (Status 0x50)
Mar 11 14:58:27 eccles kernel: ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 frozen
Mar 11 14:58:27 eccles kernel: ata2.01: failed command: WRITE DMA
Mar 11 14:58:27 eccles kernel: ata2.01: cmd ca/00:08:4c:37:6f/00:00:00:00:00/fd tag 0 dma 4096 out
Mar 11 14:58:27 eccles kernel:          res 40/00:00:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)
Mar 11 14:58:27 eccles kernel: ata2.01: status: { DRDY }
Mar 11 14:58:27 eccles kernel: ata2.00: hard resetting link
Mar 11 14:58:28 eccles kernel: ata2.01: hard resetting link
Mar 11 14:58:28 eccles kernel: ata2.00: SATA link down (SStatus 0 SControl 300)
Mar 11 14:58:28 eccles kernel: ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 11 14:58:28 eccles kernel: ata2.01: configured for UDMA/133
Mar 11 14:58:28 eccles kernel: ata2.01: device reported invalid CHS sector 0
Mar 11 14:58:28 eccles kernel: ata2: EH complete

Or this:

Mar 14 22:59:33 eccles kernel: ata2: clearing spurious IRQ
Mar 14 23:00:03 eccles kernel: ata2: lost interrupt (Status 0x50)
Mar 14 23:00:03 eccles kernel: ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 frozen
Mar 14 23:00:03 eccles kernel: ata2.01: failed command: READ DMA
Mar 14 23:00:03 eccles kernel: ata2.01: cmd c8/00:08:cc:46:37/00:00:00:00:00/fc tag 0 dma 4096 in
Mar 14 23:00:03 eccles kernel:          res 40/00:00:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)
Mar 14 23:00:03 eccles kernel: ata2.01: status: { DRDY }
Mar 14 23:00:03 eccles kernel: ata2.00: hard resetting link
Mar 14 23:00:04 eccles kernel: ata2.01: hard resetting link
Mar 14 23:00:04 eccles kernel: ata2.00: SATA link down (SStatus 0 SControl 300)
Mar 14 23:00:04 eccles kernel: ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 14 23:00:04 eccles kernel: ata2.01: configured for UDMA/133
Mar 14 23:00:04 eccles kernel: ata2.01: device reported invalid CHS sector 0
Mar 14 23:00:04 eccles kernel: ata2: EH complete

I've tried compiling the kernel with different options. Enabling 
CONFIG_PATA_MPIIX or CONFIG_PATA_SCH (even though they weren't needed before). They made it more difficult to trigger the bug which makes me think maybe it's some sort of race condition.

Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz

lspci:

00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03)
00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller #1
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller #2
01:00.0 VGA compatible controller: ATI Technologies Inc RV770 [Radeon HD 4870]
01:00.1 Audio device: ATI Technologies Inc HD48x0 audio
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
04:00.0 Multimedia video controller: Conexant Systems, Inc. CX23880/1/2/3 PCI Video and Audio Decoder (rev 05)
04:00.2 Multimedia controller: Conexant Systems, Inc. CX23880/1/2/3 PCI Video and Audio Decoder [MPEG Port] (rev 05)
04:00.4 Multimedia controller: Conexant Systems, Inc. CX23880/1/2/3 PCI Video and Audio Decoder [IR Port] (rev 05)
04:05.0 Network controller: RaLink RT2561/RT61 802.11g PCI
Comment 1 Andrew Benton 2010-03-14 23:27:51 UTC
Created attachment 25515 [details]
Kernel config
Comment 2 Jeff Garzik 2010-03-14 23:55:20 UTC
Created attachment 25516 [details]
Proposed patch from Tejun Heo


Does this patch help?
Comment 3 Andrew Benton 2010-03-15 00:55:37 UTC
Yes it does. It seems to fix the bug. 
Thankyou very much for your prompt help

Andy
Comment 4 Andrew Benton 2010-03-15 01:05:11 UTC
Looking at the sys.log (or dmesg) the only trace of the problem that is left is the message

ata2: clearing spurious IRQ

I've just googled on that and found this http://lkml.org/lkml/2010/3/9/352 which looks like the same bug to me
Comment 5 Rafael J. Wysocki 2010-03-15 21:29:29 UTC
Handled-By : Jeff Garzik <jgarzik@pobox.com>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=25516
Comment 6 Jeff Garzik 2010-03-17 17:21:31 UTC
*** Bug 15556 has been marked as a duplicate of this bug. ***
Comment 7 Petr Uzel 2010-03-21 10:44:49 UTC
I had the same issue with-2.6.34-rc2 and after applying the patch, it seems to be fixed. Thanks.
Comment 8 Andrew Benton 2010-03-21 22:43:18 UTC
Why's this been changed to resolved? The bug is still in the kernel, the patch hasn't been checked in. The bug is not resolved
Comment 9 Rafael J. Wysocki 2010-03-21 23:07:59 UTC
It was marked as 'resolved' (which quite obviously didn't mean 'closed'), because the patch had been provided.  I'll mark it as 'closed' when the patch is in.
Comment 10 Tejun Heo 2010-03-22 08:08:50 UTC
Patch posted.

  http://article.gmane.org/gmane.linux.ide/45543
Comment 11 Rafael J. Wysocki 2010-03-22 11:09:12 UTC
Patch : http://article.gmane.org/gmane.linux.ide/45543
Comment 12 Rafael J. Wysocki 2010-03-22 21:19:59 UTC
*** Bug 15549 has been marked as a duplicate of this bug. ***
Comment 13 Rafael J. Wysocki 2010-04-07 20:18:39 UTC
Fixed by commit 332ac7ff77cdc6a183d78ab129545d7b14a1d57c .
Comment 14 Rafael J. Wysocki 2010-04-08 20:08:32 UTC
*** Bug 15716 has been marked as a duplicate of this bug. ***