Bug 7412 - libata-eh.c when doing "hdparm -W0 ..."
Summary: libata-eh.c when doing "hdparm -W0 ..."
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Tejun Heo
URL:
Keywords:
: 7486 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-10-25 02:35 UTC by Jochen Barth
Modified: 2007-01-11 02:05 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.18.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Jochen Barth 2006-10-25 02:35:05 UTC
Most recent kernel where this bug did not occur:
2.6.18.1 (two almost identical machines without Promise-controller)
2.6.15.4 (one identical machine)
Distribution:
Debian Sarge, self compiled kernel (source from kernel.org)
Hardware Environment:
Intel Server Board (details on request),
Promise Ultra100(133?)TX2
Software Environment:
Problem Description:
I've started
find /dev -type b \( -name "/dev/sd?" -o -name "/dev/hd?" \) -exec hdparm -W0
\{\} \;
- because disk write cache isn't a good idea to ensure filesystem consistency.
then:
Oct 25 10:23:29 rack04 kernel: SCSI device sdb: drive cache: write through
Oct 25 10:23:29 rack04 kernel: BUG: warning at
drivers/scsi/libata-eh.c:520/ata_port_schedule_eh()
Oct 25 10:23:29 rack04 kernel:  [ata_port_schedule_eh+78/80]
ata_port_schedule_eh+0x4e/0x50
Oct 25 10:23:29 rack04 kernel:  [ata_scsi_qc_complete+194/199]
ata_scsi_qc_complete+0xc2/0xc7
Oct 25 10:23:29 rack04 kernel:  [__ata_qc_complete+77/156]
__ata_qc_complete+0x4d/0x9c
Oct 25 10:23:29 rack04 kernel:  [pdc_interrupt+391/446] pdc_interrupt+0x187/0x1be
Oct 25 10:23:29 rack04 kernel:  [handle_IRQ_event+38/86] handle_IRQ_event+0x26/0x56
Oct 25 10:23:29 rack04 kernel:  [__do_IRQ+135/240] __do_IRQ+0x87/0xf0
Oct 25 10:23:29 rack04 kernel:  [do_IRQ+49/105] do_IRQ+0x31/0x69
Oct 25 10:23:29 rack04 kernel:  [common_interrupt+26/32] common_interrupt+0x1a/0x20
Oct 25 10:23:29 rack04 kernel:  [default_idle+49/79] default_idle+0x31/0x4f
Oct 25 10:23:29 rack04 kernel:  [cpu_idle+120/129] cpu_idle+0x78/0x81
Oct 25 10:23:29 rack04 kernel:  [start_kernel+398/475] start_kernel+0x18e/0x1db
Oct 25 10:23:29 rack04 kernel:  [unknown_bootoption+0/416]
unknown_bootoption+0x0/0x1a0
Oct 25 10:23:29 rack04 kernel: BUG: warning at
drivers/scsi/libata-eh.c:321/ata_scsi_error()
Oct 25 10:23:29 rack04 kernel:  [ata_scsi_error+653/927] ata_scsi_error+0x28d/0x39f
Oct 25 10:23:29 rack04 kernel:  [scsi_error_handler+0/153]
scsi_error_handler+0x0/0x99
Oct 25 10:23:29 rack04 kernel:  [scsi_error_handler+109/153]
scsi_error_handler+0x6d/0x99
Oct 25 10:23:29 rack04 kernel:  [kthread+163/167] kthread+0xa3/0xa7
Oct 25 10:23:29 rack04 kernel:  [kthread+0/167] kthread+0x0/0xa7
Oct 25 10:23:29 rack04 kernel:  [kernel_thread_helper+5/11]
kernel_thread_helper+0x5/0xb
Oct 25 10:23:29 rack04 kernel: BUG: unable to handle kernel NULL pointer
dereference at virtual address 00000014
Oct 25 10:23:29 rack04 kernel:  printing eip:
Oct 25 10:23:29 rack04 kernel: c028d104
Oct 25 10:23:29 rack04 kernel: *pde = 00000000
Oct 25 10:23:29 rack04 kernel: Oops: 0000 [#1]
Oct 25 10:23:29 rack04 kernel: SMP 
Oct 25 10:23:29 rack04 kernel: Modules linked in: e1000
Oct 25 10:23:29 rack04 kernel: CPU:    0
Oct 25 10:23:29 rack04 kernel: EIP:    0060:[pdc_eng_timeout+74/372]    Not
tainted VLI
Oct 25 10:23:29 rack04 kernel: EFLAGS: 00010046   (2.6.18.1 #1) 
Oct 25 10:23:29 rack04 kernel: EIP is at pdc_eng_timeout+0x4a/0x174
Oct 25 10:23:29 rack04 kernel: eax: fafbfcfd   ebx: c19e02d8   ecx: c03866f0  
edx: 00000000
Oct 25 10:23:29 rack04 kernel: esi: c19e02d8   edi: 00000000   ebp: 00000000  
esp: f7b09f5c
Oct 25 10:23:29 rack04 kernel: ds: 007b   es: 007b   ss: 0068
Oct 25 10:23:29 rack04 kernel: Process scsi_eh_5 (pid: 802, ti=f7b08000
task=c19c3a70 task.ti=f7b08000)
Oct 25 10:23:29 rack04 kernel: Stack: c01c45a4 00000001 c19e02d8 00000000
00000296 c1a08d80 c19e02d8 00000000 
Oct 25 10:23:29 rack04 kernel:        c023ab96 00000000 c028a212 c032f5ae
c0349540 00000141 c0323550 c19e2330 
Oct 25 10:23:29 rack04 kernel:        c180f3e0 00000000 0000004d b262ce22
00000005 c19e0000 c19e0000 c19e0000 
Oct 25 10:23:29 rack04 kernel: Call Trace:
Oct 25 10:23:29 rack04 kernel:  [__next_cpu+18/31] __next_cpu+0x12/0x1f
Oct 25 10:23:29 rack04 kernel:  [scsi_error_handler+0/153]
scsi_error_handler+0x0/0x99
Oct 25 10:23:29 rack04 kernel:  [ata_scsi_error+607/927] ata_scsi_error+0x25f/0x39f
Oct 25 10:23:29 rack04 kernel:  [scsi_error_handler+0/153]
scsi_error_handler+0x0/0x99
Oct 25 10:23:29 rack04 kernel:  [scsi_error_handler+109/153]
scsi_error_handler+0x6d/0x99
Oct 25 10:23:29 rack04 kernel:  [kthread+163/167] kthread+0xa3/0xa7
Oct 25 10:23:29 rack04 kernel:  [kthread+0/167] kthread+0x0/0xa7
Oct 25 10:23:29 rack04 kernel:  [kernel_thread_helper+5/11]
kernel_thread_helper+0x5/0xb
Oct 25 10:23:29 rack04 kernel: Code: 86 a4 1f 00 00 83 f8 1f 77 0d 69 c0 ac 00
00 00 8d 94 30 1c 0a 00 00 85 d2 74 0e 8b 46 04
 8b 40 5c 85 c0 0f 85 17 01 00 00 89 d7 <0f> b6 47 14 83 f8 01 0f 84 84 00 00 00
83 f8 03 bb e8 03 00 00 
Oct 25 10:23:29 rack04 kernel: EIP: [pdc_eng_timeout+74/372]
pdc_eng_timeout+0x4a/0x174 SS:ESP 0068:f7b09f5c

Steps to reproduce:
not tried again today - production environment. perhaps tomorrow...
Comment 1 Jochen Barth 2006-10-25 02:43:17 UTC
The CORRECT controller name is:
0000:02:04.0 Unknown mass storage controller: Promise Technology, Inc.: Unknown
device 3d18 (rev 02)
= Promise SataII 150 TX4

I'm not sure, if the error happened at the onboard-(intel)-Controller
0000:00:1f.2 IDE interface: Intel Corp. 6300ESB SATA Storage Controller (rev 02)
or at the Promise-Controller.
Comment 2 Jochen Barth 2006-10-25 22:56:48 UTC
Tried again this morning.
Did manually:
sync && hdparm -W0 /dev/sda (Intel onBoard ->ok)
sync && hdparm -W0 /dev/sdb (Intel onBoard ->ok)
sync && hdparm -W0 /dev/sdc (Promise controller ->crash)

The drive model, where the crash occurs is
scsi5 : sata_promise
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.00: ATA-6, max UDMA/133, 234441648 sectors: LBA48 
ata6.00: ata6: dev 0 multi count 0
ata6.00: configured for UDMA/133
  Vendor: ATA       Model: WDC WD1200JD-00H  Rev: 08.0
  Type:   Direct-Access                      ANSI SCSI revision: 05

I think the problem is the disk - or the promise controller.
Comment 3 Tejun Heo 2006-11-21 19:40:38 UTC
Fixed by the following patch.

http://article.gmane.org/gmane.linux.ide/14188
Comment 4 martin f. krafft 2007-01-11 02:05:26 UTC
*** Bug 7486 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.