Bug 17261

Summary: Freezes on bootup
Product: Other Reporter: Dan Dart (dandart)
Component: OtherAssignee: other_other
Severity: normal CC: akpm, alan, bjorn.helgaas, florian, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35.x Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    
Attachments: Output of sudo lshw
.config for
dmesg output (plugging trick)
dmesg output (pci=nocrs)
dmesg output (boots faster for some reason)
dmesg output (23/07, shows problem still occuring)
dmesg output (no usb devices)

Description Dan Dart 2010-08-29 09:00:06 UTC
Created attachment 28281 [details]
Output of sudo lshw

Kernel freezes on bootup with message:

init: bridge-network-interface (lo) pre-start process (335) terminated with status 2

at which it hangs. I used to be able to get past the screen by alerting the computer (usually Alt+SysRq alone would do the trick) but I'm having more difficulty doing so now that that PS2 keyboard is screwed. After those messages there are similar ones for eth0 and wlan1 but they don't freeze the bootup and they've been harmless.

When the USB & (broken) PS2 keyboard were connected this morning, I could not get past, nor could I get any sort of USB keyboard noticed by Grub2. Using only the USB jeyboard didn't work so well either. Having just the PS2 keyboard connected, it managed to get past the freezing, but I had to connect the USB one afterward anyway to get Linux to recognise I had a keyboard attached.

Note at that early stage the Magic SysRq keys did not work.

I first noticed this problem in 2.6.35.
Comment 1 Dan Dart 2010-08-29 09:02:05 UTC
Created attachment 28291 [details]
.config for
Comment 2 Dan Dart 2010-08-29 09:57:29 UTC
This could be due to the number of USB devices attached and the kernel not correctly detecting the number of ports I have on my mainboard (it is faurly new). After about 5 minutes with both USB and PS2 pkeyboards plugged in, (which seems to be a toss up if they work), lots of messages appeared about udevd worker threads failing, and I removed my phone from the USB to take a photo and the system booted before I could.
Comment 3 Dan Dart 2010-09-03 10:28:29 UTC
Yes, this is more likely to be related to USB devices, because I can make the system boot by plugging and unplugging a device a couple of times. Also, when the system is in use, and too many devices are connected, one device (such as the wi-fi card) would be kicked off. Is this the correct behaviour?
Comment 4 Bjorn Helgaas 2010-09-14 15:59:10 UTC
If you can get the system to boot, please attach a dmesg log.  If you
can't get it to boot, please boot with  "ignore_loglevel vga=0x0f07"
and attach a picture of the console when it's hung.  You might also
try "pci=nocrs".  If that helps, please attach the dmesg log.
Comment 5 Dan Dart 2010-09-14 16:18:19 UTC
Yes, it boots... if I try the plugging trick, or if I wait a long time, some other odd messages pop up after a few minutes sometimes, I'll post a dmesg of those when that happens again, but here's my "plugging boot dmesg:
Comment 6 Dan Dart 2010-09-14 16:19:15 UTC
Created attachment 30022 [details]
dmesg output (plugging trick)
Comment 7 Dan Dart 2010-09-14 16:30:23 UTC
Adding pci=nocrs seems to let me use the keyboard when it wouldn't usually be able to, after the "ureadahead" message appears. Attaching new dmesg.
Comment 8 Dan Dart 2010-09-14 16:31:15 UTC
Created attachment 30042 [details]
dmesg output (pci=nocrs)
Comment 9 Bjorn Helgaas 2010-09-14 18:12:02 UTC
Dan, can you double-check that "pci=nocrs" really makes a difference?
I don't see any real difference in the dmesg logs you posted, other than
the resources available on bus 04, which doesn't have any devices on it.
Comment 10 Dan Dart 2010-09-20 12:25:26 UTC
Seems to do. But recently it's been starting up much faster - perhaps a particular USB device is bogging it down? I didn't have to mess around with USBs this time (uploading log)
Comment 11 Dan Dart 2010-09-20 12:26:13 UTC
Created attachment 30692 [details]
dmesg output (boots faster for some reason)
Comment 12 Andrew Morton 2010-09-23 22:42:36 UTC
(In reply to comment #0)

> I first noticed this problem in 2.6.35.

So would you say this is a regression?  If so, since which kernel version?
Comment 13 Dan Dart 2010-09-23 22:45:47 UTC
Yes, this hasn't happened before 2.6.35 I think. I don't recall seeing it in 2.6.34 - and I don't think in Ubuntu's 2.6.32 kernel.
Comment 14 Bjorn Helgaas 2010-09-23 23:20:27 UTC
I'm confused.  Is this still a problem?  In comments 10 and 11 (with
no special boot options) it sounds like things are working normally --
you didn't have to mess around with USB devices.

Maybe the problem would be clearer if you could take a video of the
boot with "ignore_loglevel".
Comment 15 Dan Dart 2010-09-23 23:32:59 UTC
It still happens - I had to do it this evening. (uploading log)
Comment 16 Dan Dart 2010-09-23 23:34:02 UTC
Created attachment 31142 [details]
dmesg output (23/07, shows problem still occuring)
Comment 17 Bjorn Helgaas 2010-09-23 23:45:25 UTC
So the problem seems intermittent.  From attachment 31142 [details]:

[    5.693079] hub 6-0:1.0: Cannot enable port 3.  Maybe the USB cable is bad?
[    6.484122] udev: starting version 151
[  329.607825] cfg80211: Calling CRDA to update world regulatory domain
[  329.668289] ACPI: resource piix4_smbus [io  0x0b00-0x0b07] conflicts with ACPI region SOR1 [io  0x0b00-0x0b0f pref disabled]

I notice the USB cable question ... is there any possibility there is a
problem with an unreliable cable or hub?  Does it make any difference if
you boot with no USB devices attached at all?
Comment 18 Dan Dart 2010-09-24 00:03:30 UTC
Created attachment 31152 [details]
dmesg output (no usb devices)

Uploaded file dmesg.nousb.log - dmesg for no USB devices, I let it run and it still froze, and I had to put the keyboard in to get it to respond. (I suppose plugging a device means "wake up" somewhere).

By the way, I'm now on - and .3 and .4 haven't helped, if that's useful to anyone.

Will now try the video suggestion.
Comment 19 Dan Dart 2010-09-24 00:24:34 UTC
Where would be the best place to put it? I could youtube->tinyogg and link?
Comment 20 Dan Dart 2010-09-24 01:31:44 UTC
Hope this helps. http://www.tinyogg.com/watch/k88KB/
Comment 21 Bjorn Helgaas 2010-09-24 16:16:07 UTC
Ok, let me back up a bit.  Please correct any misconceptions below:

1) The problem is a hang after we've started running user-mode init
   scripts (udev, etc).
2) It never happens with 2.6.34.
3) It sometimes happens with 2.6.35, but not always.
4) When the system is hung, plugging in a USB device gets it going again.

Let's see if we can figure out what is hanging.  In /etc/udev/udev.conf,
set udev_log to "debug".  The user-mode output doesn't go to dmesg, so
you'll have to dig around in /var/log or capture it with video.

Both 2.6.34 and 2.6.35 have "pci=use_crs" turned on by default. If the
problem never happens with 2.6.34, I'm even more confused about how
"pci=nocrs" can make a difference in 2.6.35.  2.6.34 and 2.6.35 should
be basically the same in that area.  Maybe it'd be worth collecting a
dmesg log from 2.6.34 and comparing it with the 2.6.35 one.

If the problem is reproducible enough, I suppose bisection between
2.6.34 and 2.6.35 would be one option.
Comment 22 Dan Dart 2010-10-09 10:18:04 UTC
Not sure about 2) - I'll have to test it - but I don't recall it. (2nd to last paragraph)..
3) - I think it depends on the power the system is using.
4) - Well, after a few goes, anyway.

Going to boot now with pci=nocrs ignore_loglevel and film it.
Comment 23 Dan Dart 2010-10-09 10:29:03 UTC
Strange - without pci-nocrs, making the change to the udev.conf file just made it boot sensibly! No idea what's up with that.
Comment 24 Florian Mickler 2011-01-12 08:21:51 UTC
Dan, do you still experience problems with current kernels? 
Can you still reproduce the hang or did you find out what made udev hang at bootup?
Comment 25 Dan Dart 2011-01-26 18:47:33 UTC
Still no idea - but it's fixed in 2.6.37 at least. Possibly 2.6.36 too. I'm still puzzled how making udev more verbose actually helped.