Bug 11498 - Teac DV28EV stops working with ide-cd
Summary: Teac DV28EV stops working with ide-cd
Status: CLOSED OBSOLETE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: IDE (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Borislav Petkov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-04 10:06 UTC by Ben Hutchings
Modified: 2012-05-22 13:49 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.27-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
config for 2.6.27-rc5 (72.92 KB, text/plain)
2008-09-05 07:42 UTC, Ben Hutchings
Details
dump failing packet command (655 bytes, patch)
2008-09-06 06:08 UTC, Borislav Petkov
Details | Diff
The tryme script, runs the test program and monitors the results. (289 bytes, text/plain)
2008-10-08 01:09 UTC, Darren Lavender
Details
The test program (1.34 KB, text/x-csrc)
2008-10-08 01:10 UTC, Darren Lavender
Details
.config (62.93 KB, application/octet-stream)
2009-03-23 02:50 UTC, Balint Reczey
Details

Description Ben Hutchings 2008-09-04 10:06:04 UTC
Latest working kernel version: 2.6.18
Earliest failing kernel version: 2.6.26-rc2
Distribution: RHEL 5
Hardware Environment: Dell PowerEdge 6950, TEAC DVD-ROM DV28EV
Software Environment:
Problem Description:
The DVD-ROM drive starts returning status 0xd0 after the machine has been running for some time. The drive is not being used, but HAL is installed and appears to open the device every 5 seconds, so at least its status is being read. This could be a hardware/firmware problem but has only been observed when running recent kernels (2.6.26-rc2 and 2.6.27-rc5). It has also been observed on a Dell PowerEdge 2950 with the same model of DVD-ROM drive.


Steps to reproduce:
Comment 1 Ben Hutchings 2008-09-05 07:32:55 UTC
The kernel config is based on the RHEL 5 PAE kernel config; I will attach the config we used for 2.6.27-rc5. I'm not sure it's feasible to bisect this as the error has only been seen after the machine has been running for several hours.

We saw the following kernel error messages:

hda: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command

The last three lines were repeated at 5-second intervals, apparently whenever the device was opened.
Comment 2 Ben Hutchings 2008-09-05 07:42:15 UTC
Created attachment 17635 [details]
config for 2.6.27-rc5
Comment 3 Borislav Petkov 2008-09-06 06:06:25 UTC
Ok, attached is an inquisitive patch. Ben, do you care to apply it ontop of 27-rc5 and then send me the output so that we know for which packet command the irq times out and thus see what hald is saying to the drive? 

Thanks.
Comment 4 Borislav Petkov 2008-09-06 06:08:02 UTC
Created attachment 17648 [details]
dump failing packet command
Comment 5 Darren Lavender 2008-10-08 01:03:39 UTC
I started to investigate this problem when it was also reported to me. The issue seems to pertain to the TEAC DV*28* family of devices only.  The HAL 'hald-addon-storage' polling daemon seemed to be the process that perturbed the problem, as a result I extracted the relevant syscalls from an strace and was able to develop a simple stand-alone reproducer script and C-program.

Here are the results of a test run, showing the problem:

[root@gtr01 tests]# ./tryme
Problem seen 14 times in 500000 iterations
Comment 6 Darren Lavender 2008-10-08 01:09:18 UTC
Created attachment 18210 [details]
The tryme script, runs the test program and monitors the results.
Comment 7 Darren Lavender 2008-10-08 01:10:07 UTC
Created attachment 18211 [details]
The test program
Comment 8 Borislav Petkov 2008-10-08 03:01:11 UTC
Hi Darren,

care to apply the patch from comment #4, recompile the kernel and send me the dmesg output after you run your reproducer program?

Thanks,
Boris.
Comment 9 Bartlomiej Zolnierkiewicz 2008-10-15 13:38:36 UTC
Darren, any chance of bisecting it now that you have a reliable way to reproduce the problem?
Comment 10 Darren Lavender 2008-10-16 05:27:42 UTC
Hi Boris

Sorry for the delay in responding.  The system I originally had access to is no longer available to me, so I have been trying to reproduce this on several other platforms with other TEAC DV*28* drives.  Unfortunately, thus far, I have been unable to reproduce the problem.

I will continue to try and work on this as time allows, if I can reproduce the problem then I will add the patch you mentioned.

Regards

Darren
Comment 11 Balint Reczey 2009-03-23 02:42:11 UTC
Hi,

I was able to reproduce the problem after applying the patch:

Mar 20 18:12:38 mgwt041 kernel: Failed packet cmd: 0x0
Mar 20 18:12:38 mgwt041 kernel: Pid: 0, comm: swapper Not tainted 2.6.26-custom-wake-patch-1.11-4g #1
Mar 20 18:12:38 mgwt041 kernel: [<f896fa10>] cdrom_newpc_intr+0x0/0x750 [ide_cd_mod]
Mar 20 18:12:38 mgwt041 kernel: [<f889af2b>] ide_timer_expiry+0x26b/0x300 [ide_core]
Mar 20 18:12:38 mgwt041 kernel: [<c012ec6b>] run_timer_softirq+0x10b/0x1d0
Mar 20 18:12:38 mgwt041 kernel: [<f889acc0>] ide_timer_expiry+0x0/0x300 [ide_core]
Mar 20 18:12:38 mgwt041 kernel: [<c012ad42>] __do_softirq+0x82/0xf0
Mar 20 18:12:38 mgwt041 kernel: [<c01014b0>] default_idle+0x0/0x60
Mar 20 18:12:38 mgwt041 kernel: [<c012aded>] do_softirq+0x3d/0x50
Mar 20 18:12:38 mgwt041 kernel: [<c012afe5>] irq_exit+0x45/0x50
Mar 20 18:12:38 mgwt041 kernel: [<c0112c16>] smp_apic_timer_interrupt+0x56/0x90
Mar 20 18:12:38 mgwt041 kernel: [<c01014b0>] default_idle+0x0/0x60
Mar 20 18:12:38 mgwt041 kernel: [<c0103ccc>] apic_timer_interrupt+0x28/0x30
Mar 20 18:12:38 mgwt041 kernel: [<c01014b0>] default_idle+0x0/0x60
Mar 20 18:12:38 mgwt041 kernel: [<c01014b0>] default_idle+0x0/0x60
Mar 20 18:12:38 mgwt041 kernel: [<c01014fa>] default_idle+0x4a/0x60
Mar 20 18:12:38 mgwt041 kernel: [<c010216e>] cpu_idle+0x8e/0xd0
Mar 20 18:12:38 mgwt041 kernel: =======================
Mar 20 18:12:38 mgwt041 kernel: hda: irq timeout: status=0xd0 { Busy }
Mar 20 18:12:38 mgwt041 kernel: ide: failed opcode was: unknown
Mar 20 18:12:38 mgwt041 kernel: hda: status timeout: status=0xd0 { Busy }
Mar 20 18:12:38 mgwt041 kernel: ide: failed opcode was: unknown

Regards,
Balint
Comment 12 Balint Reczey 2009-03-23 02:50:56 UTC
Created attachment 20635 [details]
.config

The kernel is compiled from the Debian package linux-source-2.6.26 2.6.26-13
and with reverted 33b0c4217dcd67b788318c3192a2912b530e4eef.
Comment 13 Borislav Petkov 2009-03-24 02:12:26 UTC
Hi,

can you try a newer kernel, for example the freshly released 2.6.29?

Thanks.
Comment 14 Balint Reczey 2009-03-24 07:43:20 UTC
Hi,

The patch does not compile on 2.9.29:

drivers/ide/ide-io.c: In function \u2018ide_timer_expiry\u2019:
drivers/ide/ide-io.c:967: error: \u2018struct hwif_s\u2019 has no member named \u2018hwgroup\u2019
make[4]: *** [drivers/ide/ide-io.o] Error 1

So i omitted the cmd code from the printout, probably it's 0 again.:-)

Failed packet cmd: 0x?
Pid: 0, comm: swapper Not tainted 2.6.29 #1
Call Trace:
 [<c031b67c>] ? printk+0x18/0x1c
 [<f8192a00>] ? cdrom_newpc_intr+0x0/0xab0 [ide_cd_mod]
 [<f807d201>] ide_timer_expiry+0x271/0x2b0 [ide_core]
 [<c0133ea9>] run_timer_softirq+0x109/0x1d0
 [<c014a9d3>] ? tick_dev_program_event+0x33/0xb0
 [<f807cf90>] ? ide_timer_expiry+0x0/0x2b0 [ide_core]
 [<c012fa17>] __do_softirq+0x97/0x160
 [<c012fb25>] do_softirq+0x45/0x50
 [<c012fd45>] irq_exit+0x45/0x50
 [<c011525d>] smp_apic_timer_interrupt+0x5d/0x90
 [<c0103ac0>] apic_timer_interrupt+0x28/0x30
 [<c010a00a>] ? default_idle+0x4a/0x50
 [<c010a162>] c1e_idle+0x32/0xf0
 [<c0102350>] cpu_idle+0x80/0xa0
 [<c0319284>] start_secondary+0x16f/0x1eb
hda: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
ide-cd: hda: weird block size 0
ide-cd: hda: default to 2kb block size
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status timeout: status=0xd0 { Busy }
Comment 15 Borislav Petkov 2009-03-25 08:46:42 UTC
Hi,

I meant try newer kernel _without_ the patch. Sorry for not expressing this more clearly.

Thanks,
Boris.
Comment 16 Balint Reczey 2009-04-03 14:16:08 UTC
Hi,

Is there anything i can do to help?
The log for 2.6.29 is usable, I think.

Regards,
Balint

Note You need to log in before you can comment on or make changes to this bug.