Most recent kernel where this bug did not occur: 2.6.18.1 (two almost identical machines without Promise-controller) 2.6.15.4 (one identical machine) Distribution: Debian Sarge, self compiled kernel (source from kernel.org) Hardware Environment: Intel Server Board (details on request), Promise Ultra100(133?)TX2 Software Environment: Problem Description: I've started find /dev -type b \( -name "/dev/sd?" -o -name "/dev/hd?" \) -exec hdparm -W0 \{\} \; - because disk write cache isn't a good idea to ensure filesystem consistency. then: Oct 25 10:23:29 rack04 kernel: SCSI device sdb: drive cache: write through Oct 25 10:23:29 rack04 kernel: BUG: warning at drivers/scsi/libata-eh.c:520/ata_port_schedule_eh() Oct 25 10:23:29 rack04 kernel: [ata_port_schedule_eh+78/80] ata_port_schedule_eh+0x4e/0x50 Oct 25 10:23:29 rack04 kernel: [ata_scsi_qc_complete+194/199] ata_scsi_qc_complete+0xc2/0xc7 Oct 25 10:23:29 rack04 kernel: [__ata_qc_complete+77/156] __ata_qc_complete+0x4d/0x9c Oct 25 10:23:29 rack04 kernel: [pdc_interrupt+391/446] pdc_interrupt+0x187/0x1be Oct 25 10:23:29 rack04 kernel: [handle_IRQ_event+38/86] handle_IRQ_event+0x26/0x56 Oct 25 10:23:29 rack04 kernel: [__do_IRQ+135/240] __do_IRQ+0x87/0xf0 Oct 25 10:23:29 rack04 kernel: [do_IRQ+49/105] do_IRQ+0x31/0x69 Oct 25 10:23:29 rack04 kernel: [common_interrupt+26/32] common_interrupt+0x1a/0x20 Oct 25 10:23:29 rack04 kernel: [default_idle+49/79] default_idle+0x31/0x4f Oct 25 10:23:29 rack04 kernel: [cpu_idle+120/129] cpu_idle+0x78/0x81 Oct 25 10:23:29 rack04 kernel: [start_kernel+398/475] start_kernel+0x18e/0x1db Oct 25 10:23:29 rack04 kernel: [unknown_bootoption+0/416] unknown_bootoption+0x0/0x1a0 Oct 25 10:23:29 rack04 kernel: BUG: warning at drivers/scsi/libata-eh.c:321/ata_scsi_error() Oct 25 10:23:29 rack04 kernel: [ata_scsi_error+653/927] ata_scsi_error+0x28d/0x39f Oct 25 10:23:29 rack04 kernel: [scsi_error_handler+0/153] scsi_error_handler+0x0/0x99 Oct 25 10:23:29 rack04 kernel: [scsi_error_handler+109/153] scsi_error_handler+0x6d/0x99 Oct 25 10:23:29 rack04 kernel: [kthread+163/167] kthread+0xa3/0xa7 Oct 25 10:23:29 rack04 kernel: [kthread+0/167] kthread+0x0/0xa7 Oct 25 10:23:29 rack04 kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb Oct 25 10:23:29 rack04 kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000014 Oct 25 10:23:29 rack04 kernel: printing eip: Oct 25 10:23:29 rack04 kernel: c028d104 Oct 25 10:23:29 rack04 kernel: *pde = 00000000 Oct 25 10:23:29 rack04 kernel: Oops: 0000 [#1] Oct 25 10:23:29 rack04 kernel: SMP Oct 25 10:23:29 rack04 kernel: Modules linked in: e1000 Oct 25 10:23:29 rack04 kernel: CPU: 0 Oct 25 10:23:29 rack04 kernel: EIP: 0060:[pdc_eng_timeout+74/372] Not tainted VLI Oct 25 10:23:29 rack04 kernel: EFLAGS: 00010046 (2.6.18.1 #1) Oct 25 10:23:29 rack04 kernel: EIP is at pdc_eng_timeout+0x4a/0x174 Oct 25 10:23:29 rack04 kernel: eax: fafbfcfd ebx: c19e02d8 ecx: c03866f0 edx: 00000000 Oct 25 10:23:29 rack04 kernel: esi: c19e02d8 edi: 00000000 ebp: 00000000 esp: f7b09f5c Oct 25 10:23:29 rack04 kernel: ds: 007b es: 007b ss: 0068 Oct 25 10:23:29 rack04 kernel: Process scsi_eh_5 (pid: 802, ti=f7b08000 task=c19c3a70 task.ti=f7b08000) Oct 25 10:23:29 rack04 kernel: Stack: c01c45a4 00000001 c19e02d8 00000000 00000296 c1a08d80 c19e02d8 00000000 Oct 25 10:23:29 rack04 kernel: c023ab96 00000000 c028a212 c032f5ae c0349540 00000141 c0323550 c19e2330 Oct 25 10:23:29 rack04 kernel: c180f3e0 00000000 0000004d b262ce22 00000005 c19e0000 c19e0000 c19e0000 Oct 25 10:23:29 rack04 kernel: Call Trace: Oct 25 10:23:29 rack04 kernel: [__next_cpu+18/31] __next_cpu+0x12/0x1f Oct 25 10:23:29 rack04 kernel: [scsi_error_handler+0/153] scsi_error_handler+0x0/0x99 Oct 25 10:23:29 rack04 kernel: [ata_scsi_error+607/927] ata_scsi_error+0x25f/0x39f Oct 25 10:23:29 rack04 kernel: [scsi_error_handler+0/153] scsi_error_handler+0x0/0x99 Oct 25 10:23:29 rack04 kernel: [scsi_error_handler+109/153] scsi_error_handler+0x6d/0x99 Oct 25 10:23:29 rack04 kernel: [kthread+163/167] kthread+0xa3/0xa7 Oct 25 10:23:29 rack04 kernel: [kthread+0/167] kthread+0x0/0xa7 Oct 25 10:23:29 rack04 kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb Oct 25 10:23:29 rack04 kernel: Code: 86 a4 1f 00 00 83 f8 1f 77 0d 69 c0 ac 00 00 00 8d 94 30 1c 0a 00 00 85 d2 74 0e 8b 46 04 8b 40 5c 85 c0 0f 85 17 01 00 00 89 d7 <0f> b6 47 14 83 f8 01 0f 84 84 00 00 00 83 f8 03 bb e8 03 00 00 Oct 25 10:23:29 rack04 kernel: EIP: [pdc_eng_timeout+74/372] pdc_eng_timeout+0x4a/0x174 SS:ESP 0068:f7b09f5c Steps to reproduce: not tried again today - production environment. perhaps tomorrow...
The CORRECT controller name is: 0000:02:04.0 Unknown mass storage controller: Promise Technology, Inc.: Unknown device 3d18 (rev 02) = Promise SataII 150 TX4 I'm not sure, if the error happened at the onboard-(intel)-Controller 0000:00:1f.2 IDE interface: Intel Corp. 6300ESB SATA Storage Controller (rev 02) or at the Promise-Controller.
Tried again this morning. Did manually: sync && hdparm -W0 /dev/sda (Intel onBoard ->ok) sync && hdparm -W0 /dev/sdb (Intel onBoard ->ok) sync && hdparm -W0 /dev/sdc (Promise controller ->crash) The drive model, where the crash occurs is scsi5 : sata_promise ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.00: ATA-6, max UDMA/133, 234441648 sectors: LBA48 ata6.00: ata6: dev 0 multi count 0 ata6.00: configured for UDMA/133 Vendor: ATA Model: WDC WD1200JD-00H Rev: 08.0 Type: Direct-Access ANSI SCSI revision: 05 I think the problem is the disk - or the promise controller.
Fixed by the following patch. http://article.gmane.org/gmane.linux.ide/14188
*** Bug 7486 has been marked as a duplicate of this bug. ***