Bug 13345

Summary: panic when reading data from IDE CDROM with >= 2.6.29 kernels
Product: IO/Storage Reporter: Modestas Vainius (modestas)
Component: IDEAssignee: io_ide (io_ide)
Status: CLOSED CODE_FIX    
Severity: normal CC: alan, bp, eekee57
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Backtrace of the OOPS with 2.6.30-rc5
Some information about my system/kernel configuration
dmesg of 2.6.30-rc6-git5 + patch with CDROM as hdc and hdd

Description Modestas Vainius 2009-05-19 21:29:00 UTC
Created attachment 21439 [details]
Backtrace of the OOPS with 2.6.30-rc5

Hello,

I get kernel panic when data is read from IDE CDROM with 2.6.29 or later kernels (testing 2.6.30-rc5 at the moment). I have never had the issue with 2.6.28 and earlier kernels.

I suspect the cause though. My CD/DVD drive is pretty old and might be a bit faulty (or this might kernel be misinterpreting my drive sometimes). <= 2.6.28 kernels used to turn off DMA due to seek errors (if I recall correctly) sometimes. However, I used to be able to turn DMA back on with hdparm and I have never really had read() failures. Now with >= 2.6.29 I have never seen DMA being turn off on my CD/DVD drive but I guess kernel just panics instead. So I suspect there is a regression in the latest kernels which triggers unrecoverable panic instead of drive reset.

I got the attached backtrace (sorry, I couldn't do better than this picture) with 2.6.30-rc5 on Debian unstable amd64 box while doing a simple:

$ dd if=/dev/cdrom of=/dev/null bs=1M
Comment 1 Modestas Vainius 2009-05-19 21:43:34 UTC
Created attachment 21440 [details]
Some information about my system/kernel configuration

uname -a
lsmod
lspci -vv

and kernel config
Comment 2 Bartlomiej Zolnierkiewicz 2009-05-19 21:53:26 UTC
On Tuesday 19 May 2009 23:44:32 bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13345

Please test the patch from:

	http://patchwork.kernel.org/patch/24790/
Comment 3 Modestas Vainius 2009-05-20 07:30:29 UTC
Unfortunately, the patch applied on top of 2.6.30-rc5 does not fix the problem.
Comment 4 Borislav Petkov 2009-05-20 09:18:16 UTC
Hi,

can you also send the bootlogs of a working kernel (<= 2.6.28) and of the borked one (30-rc5). For the last one, it would be very useful to see the whole of the initial OOPS which is cut off on the photo. Can you catch the output with a serial console (netconsole might do too if its not too early in the boot process and the machine doesn't die before some data can be transferred over the network)?

Also, on the 30-rc5 do 

objdump -d drivers/ide/ide-io.o > ide-io.dsm

and

make drivers/ide/ide-io.s

and send me the .dsm and .s files.

Thanks,
Boris.
Comment 5 Bartlomiej Zolnierkiewicz 2009-05-20 14:35:05 UTC
On Wednesday 20 May 2009 09:30:29 bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13345
> 
> 
> 
> 
> 
> --- Comment #3 from Modestas Vainius <modestas@vainius.eu>  2009-05-20
> 07:30:29 ---
> Unfortunately, the patch applied on top of 2.6.30-rc5 does not fix the
> problem.

Just to be sure:

Is the OOPS identical as previously or is it something new?
Comment 6 Modestas Vainius 2009-05-20 18:19:31 UTC
I'm really sorry, but I realized that I had not rebuilt initrd when I'd installed a new kernel with the patch applied. So I did that and I can confirm that I no longer got OOPS but those DMA turn offs were back instead as they used to happen with <= 2.6.28 kernels:

[  663.442788] hdd: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
[  663.445040] hdd: cdrom_decode_status: error=0x40 <3>{ LastFailedSense=0x04 }
[  663.446768] ide: failed opcode was: unknown
[  663.454549] hdd: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
[  663.456681] hdd: cdrom_decode_status: error=0x40 <3>{ LastFailedSense=0x04 }
[  663.458541] ide: failed opcode was: unknown
[  663.464783] hdd: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
[  663.466847] hdd: cdrom_decode_status: error=0x40 <3>{ LastFailedSense=0x04 }
[  663.468764] ide: failed opcode was: unknown
[  663.473414] hdd: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
[  663.475365] hdd: cdrom_decode_status: error=0x40 <3>{ LastFailedSense=0x04 }
[  663.477259] ide: failed opcode was: unknown
[  663.477406] hdd: DMA disabled
[  663.520043] hdd: ATAPI reset complete

So yes, I can confirm that the patch fixes the OOPS with 2.6.30-rc5 and 2.6.30-rc6-git5. I'm looking forward to seeing the patch included in 2.6.30-rc7 and 2.6.29.x, it looks pretty important to me.

What is more, I "played" a bit with IDE cabling and switched my CD/DVD drive to IDE master and I could not even reproduce the errors above (and hence annoying DMA turn off is gone too) any more. So it is good news on all fronts for me.

I leave it for you to close the bug but it is fully RESOLVED as far as I'm concerned.
Comment 7 Borislav Petkov 2009-05-21 07:27:02 UTC
Can you send us the dmesg of the working kernel, I'd like to see what is the drive model exactly when it gets identified. Also, are you using 40 or 80 wires IDE cable? You can recognize the 80 wires cable by the blue connector on the host side (the end that goes into the motherboard).

@Bart: presumably, this sounds like another one b0rked drive-side cable detection, from what I've seen so far and looking at Martin's error messages.

Thanks,
Boris.
Comment 8 Modestas Vainius 2009-05-21 20:25:02 UTC
Created attachment 21473 [details]
dmesg of 2.6.30-rc6-git5 + patch with CDROM as hdc and hdd

No errors with CDROM as hdc plugged to middle connector of the cable.
DMA problems with CDROM as hdd plugged to end connector of the cable.
I have not tested other combinations.

My cable is 40-wire (all connectors are black, I counted the wires too).
Comment 9 Bartlomiej Zolnierkiewicz 2009-05-21 20:48:33 UTC
On Thursday 21 May 2009 22:25:02 bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13345
> 
> 
> 
> 
> 
> --- Comment #8 from Modestas Vainius <modestas@vainius.eu>  2009-05-21
> 20:25:02 ---
> Created an attachment (id=21473)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=21473)
> dmesg of 2.6.30-rc6-git5 + patch with CDROM as hdc and hdd
> 
> No errors with CDROM as hdc plugged to middle connector of the cable.
> DMA problems with CDROM as hdd plugged to end connector of the cable.
> I have not tested other combinations.
> 
> My cable is 40-wire (all connectors are black, I counted the wires too).

Sigh...  it is detected as 80-wire...

This is nVidia PATA controller with broken cable detection.

In ide we just use BIOS data, libata looks at both BIOS and ACPI data
(though it may not help at all actually)...

Anyway the following change needs to be ported to amd74xx.c one day:

"pata_amd: update mode selection for NV PATAs"
(commit ce54d1616302117fa98513ae916bb3333e1c02ea)
Comment 10 Alan 2009-05-21 20:53:44 UTC
The discussion with Nvidia some time back established that for on board Nvidia devices the only method that would be reliable as the ACPI one.
Comment 11 Modestas Vainius 2009-05-21 21:04:19 UTC
In both cases? Or I'm just lucky that I have not seen problems with CD/DVD drive as hdc yet?
Comment 12 Bartlomiej Zolnierkiewicz 2009-05-23 11:48:01 UTC
On Thursday 21 May 2009 23:04:19 bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13345
> 
> 
> 
> 
> 
> --- Comment #11 from Modestas Vainius <modestas@vainius.eu>  2009-05-21
> 21:04:19 ---
> In both cases? Or I'm just lucky that I have not seen problems with CD/DVD
> drive as hdc yet?

In both cases -- the drive is always tuned to UDMA/66.
Comment 13 Modestas Vainius 2009-05-24 06:29:35 UTC
BIOS setup tells me "UltraDMA Mode 2" (UDMA/33) for CD drive so Linux is obviously misdetecting here. I'll probably have to get a 80-conductor cable as hdparm -X udma2 does not seem to work :/
Comment 14 Ethan Grammatikidis 2009-06-09 21:04:05 UTC
Hullo. I ran into this the other day, & can report the bug is still present in 2.6.30-rc8.

I tried to capture the console output on panic but:
FATAL: Error inserting netconsole (/lib/modules/2.6.30-rc8/kernel/drivers/net/netconsole.ko): Operation not permitted

(Yes, I was root.)

I tried 2 drives, only one triggered the bug. This is a Lite-On SOHD-16P9S28C made in January 2005.
Comment 15 Ethan Grammatikidis 2009-06-09 21:24:49 UTC
Ah, I can give chipset info:

 $ lspci|grep 'IDE\|SATA'
00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA IDE Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller (rev 02)
03:00.0 SATA controller: JMicron Technologies, Inc. 20360/20363 Serial ATA Controller (rev 03)
03:00.1 IDE interface: JMicron Technologies, Inc. 20360/20363 Serial ATA Controller (rev 03)

The cable is 40-wire.

I haven't tried the patch. If I feel like opening up my computer again in a few days I may try it.
Comment 16 Bartlomiej Zolnierkiewicz 2009-06-10 11:15:27 UTC
On Tuesday 09 June 2009 23:04:07 bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13345
> 
> 
> Ethan Grammatikidis <eekee57@fastmail.fm> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |eekee57@fastmail.fm
> 
> 
> 
> 
> --- Comment #14 from Ethan Grammatikidis <eekee57@fastmail.fm>  2009-06-09
> 21:04:05 ---
> Hullo. I ran into this the other day, & can report the bug is still present
> in
> 2.6.30-rc8.

Could it be bug #13399 instead of this one?

[ This one should really be already fixed. ]
Comment 17 Ethan Grammatikidis 2009-06-10 14:34:06 UTC
> Could it be bug #13399 instead of this one?
> 
> [ This one should really be already fixed. ]

It could well be, by the symptoms. I can't tell the difference. It's not fixed for that old drive of mine though. :)

A different problem with the dvd-rom drives was fixed between rc7 and rc8. Anything accessing my newer drive would freeze in an uninteruptable wait for IO under rc7, but works fine with rc8.
Comment 18 Ethan Grammatikidis 2009-06-12 00:49:17 UTC
Correction: newer drive does not work fine. With 2 days uptime processes accessing the drive freeze in an uninteruptable wait for IO. It's a buggy drive to be sure, but in 2.6.24/26 the worst it did was spam dmesg, & back in 2.6.15 it would stop working but not cause processes to freeze in an unkillable state.
Comment 19 Ethan Grammatikidis 2009-06-20 19:58:00 UTC
2.6.30 final fixes my issue with my newer drive.