Most recent kernel where this bug did not occur: (pre-NO_HZ) Distribution: Debian Hardware Environment: Dell Latitude D620 Intel Core2 Duo T7200 2GHz Nvidia Quadro NVS 110M / GeForce Go 7300 (rev a1) Software Environment: Boot from sata, no initramfs, no binary modules Kernel built with gcc-4.2. Normally I patch in tuxonice, although that doesn't seem to affect the sequence of boot messages. Problem Description: During boot, messages . ... nvidiafb: CRTC1 analog not found i2c-adapter i2c-0: unable to read EDID block i2c-adapter i2c-0: unable to read EDID block i2c-adapter i2c-0: unable to read EDID block i2c-adapter i2c-1: unable to read EDID block i2c-adapter i2c-1: unable to read EDID block i2c-adapter i2c-1: unable to read EDID block Switched to high resolution mode on CPU 1 Switched to high resolution mode on CPU 0 . and then it hangs, with no response to keyboard actions. Adding nohz=off to the boot line allows boot to continue, leading to . IP route cache hash table entries 32768 (order: 5, 131072 bytes) . after the point where it had frozen. Steps to reproduce: Just boot. (Behavior was the same in 2.6.22.9 at that time, although not often previously.) (Is there any point in using acpi_irq_balance any more?) However, I have been unable to reproduce this failure today despite unchanged hardware and unchanged kernel images. (I.e., the failure is intermittent, on a fairly long time scale.) When I boot from power-off today, the order of messages produced is very different. In particular, the "i2c-adapter" EDID messages come long after the switch to high resolution mode on each CPU. I don't know why switching to high resolution mode is delayed so long in the failed boot.
Can you please provide dmesg output of a successful boot along with the output of "lspci -vvv" and your .config file. Thanks, tglx
> Can you please provide dmesg output of a successful boot along with > the output of "lspci -vvv" and your .config file. a one-stop-shop would be to run this script: http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh and to attach the resulting file. [ Note, if you dont have this set in your .config: CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y then /proc/config.gz wont be included in the output file and you'll need to attach it extra to this bugzilla. ]
> [ Note, if you dont have this set in your .config: > > CONFIG_IKCONFIG=y > CONFIG_IKCONFIG_PROC=y > > then /proc/config.gz wont be included in the output file and you'll > need to attach it extra to this bugzilla. ] ok, i've updated cfs-debug-info.sh so that it retrieves the currently running kernel package's config automatically. So in theory it should be plug & play.
Nathan, any news on this one ? Is the problem still present with current mainline ? Thanks, tglx
I'm sorry for going hors de combat on this; [cue usual excuses]. I didn't send output immediately because it seemed important to verify that it occurred on a vanilla (no TUXONICE) kernel. It did, finally, but [cue usual excuses]. Anyway, the hang has not recurred since I switched to 2.4.24.3. If you are still interested in details, I can try to produce them. I imagine there are still users of 2.4.23 who might benefit from a fix. I should mention, too, that the transcription from my notes in the original report had an error: there were no lines "i2c-adapter i2c-1: unable to read EDID block" (i.e., only "i2c-0"). Also, there was a big difference in the boot sequence in failed cases than in successful ones: the "switch to high resolution mode" attempt really did occur much,much later in the failed boots, as suggested by the transcript above with "nvidiafb: CRTC1 analog not found". I'll attach various files next.
Created attachment 15403 [details] lspci -vvv output This lspci output is current, as reported from a 2.4.24.3 kernel, but with no hardware changes since the failures.
Created attachment 15404 [details] successful boot using a kernel that sometimes hangs as described
Created attachment 15405 [details] The .config for kernel 2.6.23.11 that sometimes failed
i`m really no expert with this and i`m not sure if this will give a benefit to put a comment here, but i would have checked if highres=off vs. highres=on or booting with nosmp or maxcpus=1 makes any difference. i also would have tried a kernel without those i2c features builtin (or without loading appropriate modules on boot)
I'm experiencing the same problem on my Dell inspiron 530n. passing highres=off or maxcpus=1 to the kernel solves the problem. Using maxcpus=1 is a pity on a dual core, but I can live with highres=off. Please also note that (without kernel options), the boot goes on after ca. 10 minutes opf freeze (no HDD nor screen activity, keyboard dead with num-led on). This problem has been submited to redhat (https://bugzilla.redhat.com/show_bug.cgi?id=241249) where you can see a photo of a freezed boot. I've also submited this problem to debian (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=481125)
Created attachment 16870 [details] cfs-debug-info.sh results
Update : I had lately many boot freezes with option "highres=off", so this is not a valid workaround.
Update : I can't reproduce this bug anymore (I'm using 2.6.26-1-amd64), so it's fixed for me.