Bug 11581 - CDRW not detected on boot. 2.6.18 worked ok.
Summary: CDRW not detected on boot. 2.6.18 worked ok.
Status: CLOSED OBSOLETE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: IDE (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Borislav Petkov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-17 04:46 UTC by sanvila
Modified: 2012-05-22 14:08 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.26.5
Tree: Mainline
Regression: Yes


Attachments
output of lspci -v (4.37 KB, text/plain)
2008-09-17 04:49 UTC, sanvila
Details
output of dmesg (15.89 KB, text/plain)
2008-09-17 04:50 UTC, sanvila
Details
spit failing command patch (573 bytes, patch)
2008-09-18 02:00 UTC, Borislav Petkov
Details | Diff
dmesg after patching ide-io.c (15.94 KB, text/plain)
2008-09-18 03:25 UTC, sanvila
Details
More dmesg output (5.54 KB, text/plain)
2008-09-18 04:00 UTC, sanvila
Details
enable full debugging info (21.88 KB, patch)
2008-09-18 04:39 UTC, Borislav Petkov
Details | Diff
dmesg with full debug patch (84.87 KB, text/plain)
2008-09-18 06:40 UTC, sanvila
Details
oops after cat /dev/hdc (5.55 KB, text/plain)
2008-09-19 05:19 UTC, sanvila
Details
ide-cd.s (106.69 KB, text/plain)
2008-09-19 09:22 UTC, sanvila
Details
ide-cd.dsm (162.10 KB, text/plain)
2008-09-19 09:23 UTC, sanvila
Details
unrelated NULL ptr fix (1.27 KB, patch)
2008-09-19 10:34 UTC, Borislav Petkov
Details | Diff
new oops after cat /dev/hdc (6.37 KB, text/plain)
2008-09-22 03:11 UTC, sanvila
Details

Description sanvila 2008-09-17 04:46:44 UTC
Latest working kernel version: 2.6.18
Earliest failing kernel version: 2.6.19
Distribution: 2.6.26.5 from kernel.org
Hardware Environment: AMD Duron 800. VIA KT133A motherboard
Software Environment: Debian testing
Problem Description: CD-RW unit at /dev/hdc is not properly detected, which makes udev to wait for 180 seconds at boot

Steps to reproduce:

Boot using 2.6.26.5. Kernel boots, shows a few messages, and then it seems to stop. If I wait for 180 seconds a message like this is shown and the boot process ends:

After the udevadm settle timeout, the events queue contains:

1180: /block/hdc

At this moment, the eject button of CD-RW unit at /dev/hdc does not work.
Surprisingly, trying to access the unit via "less -f /dev/hdc" seems to fix it.

The problem was introduced somewhere between 2.6.18 and 2.6.19.
This is the result of git-bisect:

4aff5e2333c9a1609662f2091f55c3f6fffdad36 is first bad commit
commit 4aff5e2333c9a1609662f2091f55c3f6fffdad36
Author: Jens Axboe <axboe@suse.de>
Date:   Thu Aug 10 08:44:47 2006 +0200

    [PATCH] Split struct request ->flags into two parts
    
    Right now ->flags is a bit of a mess: some are request types, and
    others are just modifiers. Clean this up by splitting it into
    ->cmd_type and ->cmd_flags. This allows introduction of generic
    Linux block message types, useful for sending generic Linux commands
    to block devices.
    
    Signed-off-by: Jens Axboe <axboe@suse.de>

:040000 040000 ff931af25471578be78885d8e27e9e0df829b49d f5edcbd2a9424828cfb4f1579d672c95bba7a4a0 M      block
:040000 040000 5e4d7235fa7d0a48cb3b23399905dc3d472d738e 05a44d13e66ce6e6bdc6e9d697675c32799c70e3 M      drivers
:040000 040000 887303b2f4077cc43bd23e42d0b104cab05655b1 50c82dbe8394b6b8e5bd169c182e0b4cc3d71963 M      include

Will include dmesg and lspci output as soon as I find the attach button.
Comment 1 sanvila 2008-09-17 04:49:39 UTC
Created attachment 17834 [details]
output of lspci -v
Comment 2 sanvila 2008-09-17 04:50:07 UTC
Created attachment 17835 [details]
output of dmesg
Comment 3 Andrew Morton 2008-09-17 09:37:38 UTC
Marked as a regression, reassigned to IDE, cc'ed Jens.

Jens, please note that this was bisected down to a block layer change.
Comment 4 Jens Axboe 2008-09-17 11:28:00 UTC
Hmm interesting. So the drive is actually detected, but later issued commands by udev are timing out. Could you double check if 2.6.27-rc6 is broken or not?

CC'ing Bart and Borislav.
Comment 5 sanvila 2008-09-18 00:51:53 UTC
2.6.27-rc6 is also broken.

Note: This time I've had to use the .config from the Debian package for 2.6.26, as the one provided by "make defconfig" didn't detect the hard disk (!).

The behaviour is the same: Waiting time while udev is trying to detect hdc,
timeout of "udevadm settle", eject button does not work, and a simple
"less -f /dev/hdc" makes eject to actually happen.
Comment 6 Borislav Petkov 2008-09-18 01:58:15 UTC
Hi,

can you please try the attached patch and send me the dmesg output?

Thanks.
Comment 7 Borislav Petkov 2008-09-18 02:00:24 UTC
Created attachment 17848 [details]
spit failing command patch
Comment 8 Sergei Shtylyov 2008-09-18 02:17:44 UTC
(In reply to comment #7)
> Created an attachment (id=17848) [details]
> spit failing command patch

Could you tick the "patch" checkbox on this attachment?
Comment 9 sanvila 2008-09-18 03:25:10 UTC
Created attachment 17854 [details]
dmesg after patching ide-io.c
Comment 10 sanvila 2008-09-18 03:59:22 UTC
More info, which I don't know if it's relevant. If I leave the system alone and don't try to wake up the cdrom by doing "less -f /dev/hdc", then the following messages are appended to dmesg (see next attach).
Comment 11 sanvila 2008-09-18 04:00:29 UTC
Created attachment 17855 [details]
More dmesg output
Comment 12 Borislav Petkov 2008-09-18 04:38:25 UTC
Ok, those are follow-up traces from the soft lockup detector code showing that we're stuck trying to revalidate the disk after reading the toc. There are also some ioctls which come from somewhere else so we'll have to enable full debugging output in order to see exactly what happens. Here's a debugging patch, it is pretty big, please recompile with it and send me the whole boot log - the dmesg might not be complete since the debug output is going to be a lot more verbose and overflow the ring buffer so try to copy it from /var/log/syslog or similar, thanks.
Comment 13 Borislav Petkov 2008-09-18 04:39:06 UTC
Created attachment 17856 [details]
enable full debugging info
Comment 14 sanvila 2008-09-18 06:38:27 UTC
Here it is. Notes:

* This is 2.6.27-rc6, as the patch didn't apply cleanly to 2.6.26.5.

* I've modified the file

/usr/share/initramfs-tools/scripts/init-premount/udev

which is used in my system to build the initramfs so that "udevadm settle" takes
only 30 seconds instead of the default 180.
Comment 15 sanvila 2008-09-18 06:40:09 UTC
Created attachment 17858 [details]
dmesg with full debug patch
Comment 16 sanvila 2008-09-18 10:51:40 UTC
Using 2.6.27-rc6, "cat /dev/hdc" produces a kernel panic and kdb is started.
[ Would love to cut and paste but there is no bash anymore ].
Comment 17 Borislav Petkov 2008-09-18 16:53:30 UTC
you can catch the output with a serial console or a netconsole.
Comment 18 sanvila 2008-09-19 05:18:17 UTC
Ok. This is what netconsole was able to catch. It's an oops.
Comment 19 sanvila 2008-09-19 05:19:41 UTC
Created attachment 17879 [details]
oops after cat /dev/hdc
Comment 20 Borislav Petkov 2008-09-19 08:23:16 UTC
Can you now do 

objdump -d drivers/ide/ide-cd.o > drivers/ide/ide-cd.dsm

and

make drivers/ide/ide-cd.s

and send me the .s and .dsm files?

Thanks.
Comment 21 sanvila 2008-09-19 09:22:36 UTC
Created attachment 17885 [details]
ide-cd.s
Comment 22 sanvila 2008-09-19 09:23:40 UTC
Created attachment 17886 [details]
ide-cd.dsm
Comment 23 Borislav Petkov 2008-09-19 10:33:47 UTC
This is a NULL ptr access in the debugging printk, here's a fix.
Comment 24 Borislav Petkov 2008-09-19 10:34:20 UTC
Created attachment 17887 [details]
unrelated NULL ptr fix
Comment 25 Borislav Petkov 2008-09-21 22:22:18 UTC
Hi,

does the above patch fix the oops you get? I'm still working on the main problem but it looks pretty hairy...

Thanks.
Comment 26 sanvila 2008-09-22 03:10:13 UTC
[ Sorry for the delay, the computer is the one I use at work ].

The patch seems to fix the previous oops, but now there is a new one.
Follows netconsole output for this one.
Comment 27 sanvila 2008-09-22 03:11:13 UTC
Created attachment 17937 [details]
new oops after cat /dev/hdc
Comment 28 lars.winterfeld 2008-11-23 15:32:36 UTC
i opened a similar (possibly same) bug on http://bugzilla.kernel.org/show_bug.cgi?id=10216
Comment 29 Alan 2012-05-22 14:08:07 UTC
drivers/ide is now obsolete

Note You need to log in before you can comment on or make changes to this bug.