Bug 11899
Summary: | sometime boot failed on T61 laptop | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Alex Shi (alex.shi) |
Component: | Serial ATA | Assignee: | Tejun Heo (tj) |
Status: | CLOSED CODE_FIX | ||
Severity: | blocking | CC: | albcamus, rjw, yanmin_zhang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.28-rc1/rc2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 11808 | ||
Attachments: |
kernel config file
new kernel config file Patch to fix register a blkext block device for MAJOR 259 patch against nash to add blkext into table proc_dev_info New patch against FC9's nash New patch against FC9's nash: against nash to add blkext into table proc_dev_info New patch for kernel to register MAJOR 259 as a block device |
Description
Alex Shi
2008-10-30 02:04:59 UTC
Created attachment 18547 [details]
kernel config file
kernel configure file for this bug.
still exists in rc3 How does it fail? and is this a regression? Ah.. okay, missed the first few lines. 2.6.27 worked. Strange, 2.6.28-rc1 doesn't change much although slave link change might be causing the problem. Can you please set up a net or serial console and capture failing log? Thanks. the problem machine is a product laptop. So, it is so hard to adding a serial line to capture the booting log. Even, the failure message have nothing useful info, anyting is fine in kernel booting and then move to initrd;In initrd.img, inserting relative modules are all fine. and then the system try to do switchroot failed and report "boot failed" . the console can show disk info till "sda1 sda2 sda3 sda4 <sda5>". I also compile in all relative modules but bug still exists. The booting message in console of 2.6.27, is quite similar with problem kernel. I mean the previous comments following by "###" also do not appear in console for 2.6.27 kernel in fact. but pop up in dmesg. In one time 2.6.28-rc2 succeed booting, the sonsole and dmesg is quite similar with 2.6.27. oh, in booting somewhere it will print "resume from /dev/sda5 ..." in console of the 2.6.27 kernel, resume report find sda5. but in problem kernel, it report find sda5 failed. The only thing I can think about is HPA but there's no reset or HPA related changes between 2.6.27 and 2.6.28-rc for ahci. Does libata.ignore_hpa=1 do any good? I modified the "init" file in initrd.img to add this parameter: insmod /lib/libata.ko ignore_hpa=1 but seems no help for this issue. we found several machine was affected, like think pad R52 etc. Can you post .config? it was posted as "kernel config file" as below. the Creator is "alexs", Do not you find it? Heh, yeah, I missed that. I was suspecting whether block extended devt was turned on. I don't have much experience with fc initrd. Can you drop into emergency shell from initrd and see whether you can mount filesystems manually? I'm not sure where the problem is. I have another bug report which is caused by recent block change (along with the extended block devt changes) where md returns garbage values and what you're seeing could be related. If you can't mount manually and you see error message from filesystem code, it's likely to be the same problem. I'll investigate the problem and let you know when I know more. Thanks. I have added CONFIG_DEBUG_BLOCK_EXT_DEVT in kernel. but no more message print out. CONFIG_DEBUG_BLOCK_EXT_DEVT doesn't add any messages, it only changes device number allocation. Ah.. strange. Can you please build the kernel w/ all the necessary things built-in and skip initrd altogether? Does the kernel still fail to mount root? Created attachment 18699 [details]
new kernel config file
I rebuilt kernel 2.6.28-rc3 and configure disk driver (AHCI) and some other drivers as built-in. Then, kernel could boot without hang. I booted for 4 times.
I also had built in these driver. and sometime it still hang. Another time, I write "reboot" in /etc/rc.local, the system keep reboot for 1.5 hours but still hang finally. Thanks for testing. Hmm.. Where does the kernel hang? What are the final words w/o initrd? Then, I built ahci (root disk driver) as module and reboot for 5 times. No hang. I will switch to the first kernel config file. sorry for laptop is not in my hand. In my memory, in initrd, the first failure is that can not find /dev/root. and then setup root, switchroot all failed. I add some debug in script init in initrd. mkrootdev doesn't create /dev/root while the root partition /dev/sda5 does exists. I will download nash source codes to add some debugging codes to track command mkrootdev tomorrow. Thanks, Yanmin. It probably is a good idea to cross post to rh bugzilla? The grub.conf configures the root partition as LABEL=/1. nash debug shows it fails to get the root device from the LABEL(name convertion failed). If I change the root partition to /dev/sda1 in the kernel boot parameter line, system could always boot (I tried for 5 times). (In reply to comment #19) > Thanks, Yanmin. It probably is a good idea to cross post to rh bugzilla? > Because of comment #20, Perhaps we need stick to kernel and nash before posting to rh bugzilla. Well, it's more about who knows rh initrd better rather than who's fault it is. We could be spending hours here trying to find out why nash has trouble reading label off the disk when rh's initrd might already know it. Yes, you are right. The key is system could boot with old kernel. I suspect the new kernel doesn't prepare data for /etc/blkid/blkid.tab well before initrd reads it, or kernel and initrd doen't cooperate well sometimes. I'm checking who initiate /etc/blkid/blkid.tab and how to initiate it. Well, more debugging shows block layer adds /sys/block/sda, but forgets add it to /proc/devices. nash use /proc/devices to find the __type__(here should be disk for sda) of the device. When nash can't find the type, it just omits it. So kernel block layer might doesn't add the device to /proc/devices in time. /proc/devices? That file only lists major -> driver relationship. If you have DEBUG_BLOCK_EXT_DEVT off, it shouldn't behave any differently. Does nash still fail at the same point with DEBUG_BLOCK_EXT_DEVT off? (In reply to comment #25) > /proc/devices? That file only lists major -> driver relationship. I instrumented nash and it gets device type from /proc/devices. If you have > DEBUG_BLOCK_EXT_DEVT off, it shouldn't behave any differently. Does nash > still > fail at the same point with DEBUG_BLOCK_EXT_DEVT off? My initial failed kernel set DEBUG_BLOCK_EXT_DEVT off. The current debugging kernel enables it. Both kernels fails randomly. I found now that sometimes kernel also fails when root=/dev/sda1 in boot prarameter line. I think it's related to BLOCK_EXT feature. when it fails, kernel allocates 259 as the major number for the disk. This is just BLOCK_EXT_MAJOR. If I set root=/dev/sda1, sometimes it can boot and 259 is added to /proc/devices. With my another kernel, the root device major number is 8. I don't know how nash handles block device discovery but the behavior being indeterministic is confusing. If DEBUG_BLOCK_EXT_DEVT is set, all ide and scsi devices will populate major 259, if it's not turned on, nothing should change as long as the number of partitions don't go over the current limits. How does the kernel fail when root=/dev/sda1 is specified? Does it fail less frequently than the initrd case? Do the failures have correlation with DEBUG_BLOCK_EXT_DEVT enabledness? Thanks. (In reply to comment #27) > I don't know how nash handles block device discovery I just go through the codes of nash today. nash colects device under /sys/block and find the device type from /proc/devices. If not, its subcommand mkblkdevs won't create /dev/root, so later switchroot fails. I got a headache and need take a rest. Then, I will double-check it. but the behavior being > indeterministic is confusing. If DEBUG_BLOCK_EXT_DEVT is set, all ide and > scsi > devices will populate major 259, if it's not turned on, nothing should change > as long as the number of partitions don't go over the current limits. > > How does the kernel fail when root=/dev/sda1 is specified? It fails rarely. Mostly, it could boot. But I did hit a boot failure. Does it fail less > frequently than the initrd case? Do the failures have correlation with > DEBUG_BLOCK_EXT_DEVT enabledness? I don't think so. Notify-Also : Yanmin Zhang <yanmin_zhang@linux.intel.com> Handled-By : Tejun Heo <tj@kernel.org> Just some update in case you guys might lose patience. It looks the issue consists of at least 2 problems: 1) nash doesn't create device node for later devices sometimes; 2) If devices are created before nash searches /sys/block/XXX, and the major device number is 259, nash reports failure to check the device type from /proc/devices. nash has a table definition to probe disk devices for filesystem label (if we define root=LABEL=XXX). I am busy in a couple of tasks. Pls. let me arrange them. I might communicate with Peter Jones <pjones@redhat.com>, the nash developer. Hello, (In reply to comment #30) > 1) nash doesn't create device node for later devices sometimes; Hmm... > 2) If devices are created before nash searches /sys/block/XXX, and the major > device number is 259, nash reports failure to check the device type from > /proc/devices. nash has a table definition to probe disk devices for > filesystem > label (if we define root=LABEL=XXX). If CONFIG_DEBUG_BLOCK_EXT_DEVT is not set, maj 259 is used iff the device has more than 15 partitions. I don't think nash makes any difference which major number a device gets. > I am busy in a couple of tasks. Pls. let me arrange them. I might communicate > with Peter Jones <pjones@redhat.com>, the nash developer. Thanks. (In reply to comment #31) > Hello, > > (In reply to comment #30) > > 1) nash doesn't create device node for later devices sometimes; > > Hmm... > > > 2) If devices are created before nash searches /sys/block/XXX, and the > major > > device number is 259, nash reports failure to check the device type from > > /proc/devices. nash has a table definition to probe disk devices for > filesystem > > label (if we define root=LABEL=XXX). > > If CONFIG_DEBUG_BLOCK_EXT_DEVT is not set, maj 259 is used iff the device has > more than 15 partitions. I don't think nash makes any difference which major > number a device gets. Here is an executation branch: 1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop) 2) nash query /proc/devices with the major number; It found line "8 sd"; 3) nash use 'sd' to search its own probe table to find DISK type for the device and add it to its own list; 4) Later on, it probes all devices in its list to get filesystem labels; scsi register "8 sd" always. When major is 259, nash fails to find the DISK type. Let me do more instruments. (In reply to comment #20) > The grub.conf configures the root partition as LABEL=/1. nash debug shows it > fails to get the root device from the LABEL(name convertion failed). If I > change the root partition to /dev/sda1 in the kernel boot parameter line, > system could always boot (I tried for 5 times). > Hi Yanmin, I have the very problem with vanilla 2.6.28-rc4, Fedora 8 x86-64, HPC NX6325 laptop. I replaced `LABEL=/1' with `/dev/sda8' in grub.conf, and then Linux boots. But sometimes resuming from disk still fails, even after replacing `LABEL=SWAP-sda6' with `/dev/sda6' as well as replacing `LABEL=/1' with `/dev/sda8' in /etc/fstab. Any clues? (In reply to comment #33) > (In reply to comment #20) > > The grub.conf configures the root partition as LABEL=/1. nash debug shows > it > > fails to get the root device from the LABEL(name convertion failed). If I > > change the root partition to /dev/sda1 in the kernel boot parameter line, > > system could always boot (I tried for 5 times). > > > > Hi Yanmin, > > I have the very problem with vanilla 2.6.28-rc4, Fedora 8 x86-64, HPC NX6325 > laptop. I replaced `LABEL=/1' with `/dev/sda8' in grub.conf, and then Linux > boots. Can you try to boot for many times with root=/dev/sda8? With root=LABEL=/1, my T61 can't boot. After replacing with root=/dev/sda1, sometimes it can boot, sometimes it can't. Mostly try 5 times and hit once. I instrumented kernel and nash. The sda1 uevent was sent to the socket queue, but laster on, nash just gets some other uevent and can't get the sda1 ADD uevent. I don't know who stole the sda1 uevent in kernel. But sometimes resuming from disk still fails, even after replacing > `LABEL=SWAP-sda6' with `/dev/sda6' as well as replacing `LABEL=/1' with > `/dev/sda8' in /etc/fstab. > > Any clues? > On Wed, Nov 12, 2008 at 2:26 PM, <bugme-daemon@bugzilla.kernel.org> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11899 > > > > > > ------- Comment #34 from yanmin_zhang@linux.intel.com 2008-11-11 22:26 > ------- > (In reply to comment #33) >> (In reply to comment #20) >> > The grub.conf configures the root partition as LABEL=/1. nash debug shows >> it >> > fails to get the root device from the LABEL(name convertion failed). If I >> > change the root partition to /dev/sda1 in the kernel boot parameter line, >> > system could always boot (I tried for 5 times). >> > >> >> Hi Yanmin, >> >> I have the very problem with vanilla 2.6.28-rc4, Fedora 8 x86-64, HPC >> NX6325 >> laptop. I replaced `LABEL=/1' with `/dev/sda8' in grub.conf, and then Linux >> boots. > Can you try to boot for many times with root=/dev/sda8? > > With root=LABEL=/1, my T61 can't boot. After replacing with root=/dev/sda1, > sometimes it can boot, sometimes it can't. Mostly try 5 times and hit once. > Yes, I did boot with root=/dev/sda8, still gets random failures. The same as you;-) > I instrumented kernel and nash. The sda1 uevent was sent to the socket queue, > but laster on, nash just gets some other uevent and can't get the sda1 ADD > uevent. I don't know who stole the sda1 uevent in kernel. > > > But sometimes resuming from disk still fails, even after replacing >> `LABEL=SWAP-sda6' with `/dev/sda6' as well as replacing `LABEL=/1' with >> `/dev/sda8' in /etc/fstab. >> >> Any clues? >> > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. > Well, I think I find the root cause why system randomly fails to boot when root/dev/sda1. It's a bug of nash and latest kernel triggers it. Below statements are about nash source. In function nashBdevIterNext, case POLLING: timeout = iter->timeout; if (block_process_one_uevent(iter->nc, &timeout, &node) > 0 && node) { *dev = node->bdev; return 1; } if (speczero(&timeout)) iter->state = DONE; continue; block_process_one_uevent might process an event which is not block device, but timeout is also reset to 0, so the loop (bot inner and the caller) will stop and all other uevent are left in kernel queues. As for why timeout becomes 0 when uevent isn't block device, I think current scheduler in 2.6.28-rc did a process schedule of nash process. When nash is scheduled back, timeout is used up (15usec when mkblkdevs is executed). Created attachment 18816 [details]
Patch to fix
Here is the patch against mkinitrd-6.0.19 to fix it.
I will fix another issue that when root=LABEL=/1. (In reply to comment #38) > Created an attachment (id=18816) [details] > Patch to fix > > Here is the patch against mkinitrd-6.0.19 to fix it. > Yanmin, I'm afraid that this fix doesn't work for me. I downloaded mkinitrd-6.0.19-4.fc8.src.rpm, install and patch it with your fix(as well as 3 patches along with the .src rpm), and call mkinitrd again: # mkinitrd -v --preload libata /boot/initrd-2.6.28-rc4.img 2.6.28 After that, booting always fails whether with `root=/dev/sda8' or `root=LABEL=/1'. (In reply to comment #40) > (In reply to comment #38) > > Created an attachment (id=18816) [details] [details] > > Patch to fix > > > > Here is the patch against mkinitrd-6.0.19 to fix it. > > > > Yanmin, > > I'm afraid that this fix doesn't work for me. I downloaded > mkinitrd-6.0.19-4.fc8.src.rpm, install and patch it with your fix(as well as > 3 > patches along with the .src rpm), and call mkinitrd again: > > # mkinitrd -v --preload libata /boot/initrd-2.6.28-rc4.img 2.6.28 It seems the commandline isn't correct. You should use 2.6.28-rc4 as kernel version number. #mkinitrd -v --preload libata /boot/initrd-2.6.28-rc4.img 2.6.28-rc4 > > After that, booting always fails whether with `root=/dev/sda8' or > `root=LABEL=/1'. Pls. don't use root=LABEL=/1 now. I have a kernel patch and nash patch to fix it. Sure, it's a typo of my post. In fact '2.6.28' is 'nonexistent' now ;-) I'm sure I call mkinitrd correctly. On Thu, Nov 13, 2008 at 11:21 AM, <bugme-daemon@bugzilla.kernel.org> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11899 > > > > > > ------- Comment #41 from yanmin_zhang@linux.intel.com 2008-11-12 19:21 > ------- > (In reply to comment #40) >> (In reply to comment #38) >> > Created an attachment (id=18816) > --> (http://bugzilla.kernel.org/attachment.cgi?id=18816&action=view) > [details] [details] >> > Patch to fix >> > >> > Here is the patch against mkinitrd-6.0.19 to fix it. >> > >> >> Yanmin, >> >> I'm afraid that this fix doesn't work for me. I downloaded >> mkinitrd-6.0.19-4.fc8.src.rpm, install and patch it with your fix(as well as >> 3 >> patches along with the .src rpm), and call mkinitrd again: >> >> # mkinitrd -v --preload libata /boot/initrd-2.6.28-rc4.img 2.6.28 > It seems the commandline isn't correct. You should use 2.6.28-rc4 as kernel > version number. > > #mkinitrd -v --preload libata /boot/initrd-2.6.28-rc4.img 2.6.28-rc4 > >> >> After that, booting always fails whether with `root=/dev/sda8' or >> `root=LABEL=/1'. > Pls. don't use root=LABEL=/1 now. I have a kernel patch and nash patch to fix > it. > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. > Created attachment 18837 [details]
register a blkext block device for MAJOR 259
As for boot failure with root=LABEL=/1, I work out 2 patch.
Here is the first patch for kernel to just register blkext block device for
major 259.
next patch is against nash to add blkext into table proc_dev_info.
Created attachment 18838 [details]
patch against nash to add blkext into table proc_dev_info
The patch shold be applied to nash.
(In reply to comment #42) > Sure, it's a typo of my post. In fact '2.6.28' is 'nonexistent' now ;-) > > I'm sure I call mkinitrd correctly. > > On Thu, Nov 13, 2008 at 11:21 AM, <bugme-daemon@bugzilla.kernel.org> wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=11899 > > ------- Comment #41 from yanmin_zhang@linux.intel.com 2008-11-12 19:21 > ------- > > (In reply to comment #40) > >> (In reply to comment #38) > >> > Created an attachment (id=18816) [details] > > --> (http://bugzilla.kernel.org/attachment.cgi?id=18816&action=view) > [details] [details] > >> > Patch to fix > >> > > >> > Here is the patch against mkinitrd-6.0.19 to fix it. > >> > > >> > >> Yanmin, > >> > >> I'm afraid that this fix doesn't work for me. I downloaded > >> mkinitrd-6.0.19-4.fc8.src.rpm, install and patch it with your fix(as well > as 3 I tried both maunal cpio and command mkinitrd. They all work well to boot my T61. What's your os version? FedoraCore 8? I have a coulple of patches of both kernel and nash to debug it. I might send you if you want to debug it. > >> patches along with the .src rpm), and call mkinitrd again: > >> > >> # mkinitrd -v --preload libata /boot/initrd-2.6.28-rc4.img 2.6.28 > > It seems the commandline isn't correct. You should use 2.6.28-rc4 as kernel > > version number. > > > > #mkinitrd -v --preload libata /boot/initrd-2.6.28-rc4.img 2.6.28-rc4 > > > >> > >> After that, booting always fails whether with `root=/dev/sda8' or > >> `root=LABEL=/1'. > > Pls. don't use root=LABEL=/1 now. I have a kernel patch and nash patch to > fix > > it. > > > (In reply to comment #45) > > What's your os version? FedoraCore 8? I have a coulple of patches of both > kernel and nash to debug it. I might send you if you want to debug it. > Aha, with Yanmin's patch applied, I can always boot 2.6.28-rc4 with root=/dev/sda8. Will check the patch for kernel shortly. (In reply to comment #46) > (In reply to comment #45) > > > > What's your os version? FedoraCore 8? I have a coulple of patches of both > > kernel and nash to debug it. I might send you if you want to debug it. > > > > Aha, with Yanmin's patch applied, I can always boot 2.6.28-rc4 with > root=/dev/sda8. Will check the patch for kernel shortly. Thanks Jike. I created a bug report at https://bugzilla.redhat.com/show_bug.cgi?id=471517. I ported 2 nash patches to FC9's nash version and posted to redhat bugzilla. I reproduced the issue on another nehalem machine with FC9. My patches do fix it. Tejun Heo, What's your opinion on these patches? Yanmin The kernel part looks fine to me. Please send it to Jens Axboe <jens.axboe@oracle.com> and cc lkml and me. For the nash part, I don't have the slightest idea. Thanks. Created attachment 18858 [details]
New patch against FC9's nash
Created attachment 18859 [details]
New patch against FC9's nash: against nash to add blkext into table proc_dev_info
Created attachment 18860 [details]
New patch for kernel to register MAJOR 259 as a block device
Thank Tejun Heo and Alexey Dobriyan for their good comments. The new patch moves
the registration to genhd_device_init.
Handled-By : Yanmin Zhang <yanmin_zhang@linux.intel.com> Patch : http://bugzilla.kernel.org/attachment.cgi?id=18860&action=view Notify-Also : Tejun Heo <tj@kernel.org> Notify-Also : Jens Axboe <jens.axboe@oracle.com> Fixed by commit 561ec68e4de7947167937c49c451728e6b19e63b . |