Bug 6011
Summary: | Booting hangs up after "io scheduler cfq registered" | ||
---|---|---|---|
Product: | Drivers | Reporter: | Michal Ludvig (michal) |
Component: | USB | Assignee: | Greg Kroah-Hartman (greg) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | akpm, bugs, bugzilla, greg, kernel, stern, totya |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.16-rc2 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 5089 | ||
Attachments: |
config of my 2.6.16-rc2 build
dmesg disable the SMI-setting; better diagnostics |
Description
Michal Ludvig
2006-02-04 18:10:07 UTC
Created attachment 7244 [details]
config of my 2.6.16-rc2 build
Please add `initcall_debug' to the kernel boot commandline and let's see how far it got. Make sure that CONFIG_KALLYSMS is set. With debugging enabled I get this on the serial console: [...] io scheduler cfq registered Calling initcall 0xc024fd30: pci_init+0x0/0x30() BUG: soft lockup detected on CPU#1! Pid: 1, comm: swapper EIP: 0060:[<c0110fa0>] CPU: 1 EIP is at smp_call_function+0xb0/0x180 EFLAGS: 00000297 Not tainted (2.6.16-rc2) EAX: 00000000 EBX: 00000001 ECX: 000c0000 EDX: 000000fb ESI: c0110d40 EDI: 00000000 EBP: 00000001 DS: 007b ES: 007b CR0: 8005003b CR2: 00000000 CR3: 004a2000 CR4: 00000690 [<c0110d40>] do_flush_tlb_all+0x0/0x70 [<c030a4e6>] pci_conf1_write+0x96/0xd0 [<c01110bb>] flush_tlb_all+0x1b/0x30 [<c0152125>] __remove_vm_area+0x25/0x60 [<c0152174>] remove_vm_area+0x14/0x30 [<c01174ed>] iounmap+0x7d/0x140 [<c02e7a5a>] quirk_usb_early_handoff+0x2fa/0x420 [<c024731f>] kobject_get+0xf/0x20 [<c02a4c90>] get_device+0x10/0x20 [<c0250a37>] pci_fixup_device+0x47/0xa0 [<c024fd41>] pci_init+0x11/0x30 [<c010044a>] init+0x12a/0x360 [<c024fd30>] pci_init+0x0/0x30 [<c0100320>] init+0x0/0x360 [<c0100ed5>] kernel_thread_helper+0x5/0x10 What other tests or information should I provide? Alan, Greg: this looks like another hang in the USB PCI quirk handling. Where did we end up with that? What kind of hardware does this machine have? OHCI, UHCI, and/or EHCI USB host controllers? (you can see this by running 'lspci'). lspci says UHCI but in the BIOS I could enable EHCI mode. Attaching dmesg as well. ~# lspci 00:00.0 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge 00:00.1 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge 00:00.2 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge 00:00.3 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge 00:00.4 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge 00:00.7 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge 00:08.0 Network controller: Intersil Corporation Prism 2.5 Wavelan chipset (rev 01) 00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 10) 00:0a.0 Ethernet controller: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter (rev 11) 00:0f.0 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60) 01:00.0 VGA compatible controller: VIA Technologies, Inc. S3 Unichrome Pro VGA Adapter (rev 02) Created attachment 7267 [details]
dmesg
Try turning on CONFIG_USB_DEBUG and booting 2.6.16-rc2 with "debug" as well as "initcall_debug" on the kernel command line. With USB_DEBUG I got: Calling initcall 0xc024e210: pci_init+0x0/0x30() 0000:00:10.0: uhci_check_and_reset_hc: legsup = 0x0010 0000:00:10.0: Performing full reset 0000:00:10.1: uhci_check_and_reset_hc: legsup = 0x0010 0000:00:10.1: Performing full reset 0000:00:10.2: uhci_check_and_reset_hc: legsup = 0x0010 0000:00:10.2: Performing full reset 0000:00:10.3: uhci_check_and_reset_hc: legsup = 0x0010 0000:00:10.3: Performing full reset 0000:00:10.4 EHCI: BIOS handoff ... and then the BUG as in Comment #3 Try following the advice in http://marc.theaimsgroup.com/?l=linux-usb-devel&m=113938405418889&w=2 and the preceding messages in that thread. Your problem might be exactly the same as the one Carlo Prelz has. After commenting out the block in drivers/usb/host/pci-quirks.c:quirk_usb_disable_ehci(): #if 0 if ((cap & EHCI_USBLEGSUP_BIOS)) { [...] pci_read_config_dword(pdev, offset + EHCI_USBLEGCTLSTS, &val); pci_write_config_dword(pdev, offset + EHCI_USBLEGCTLSTS, val | EHCI_USBLEGCTLSTS_SOOE); } #endif it boots fine and USB works with no apparent problems (having keyboard, mouse and usbstorage plugged in). BTW I observe this message after pci_init() is called: 0000:00:10.4 EHCI: BIOS handoff failed (BIOS bug ?) Exactly as in the refered thread in linux-kernel. Created attachment 7284 [details]
disable the SMI-setting; better diagnostics
Looks like this patch should be merged then;
I was never happy with that "force SMI on" bit.
Looks good to me, want to forward it to me in email? *** Bug 5935 has been marked as a duplicate of this bug. *** *** Bug 5932 has been marked as a duplicate of this bug. *** Bug 5932 should not be rejected as a duplicate of this one. Although they are both related to the USB handoff in the pci-quirks file, they have different causes. This bug is related to the EHCI handoff, whereas bug 5932 is related to the OHCI handoff. A very similar bug has been reported at the Gentoo bugzilla: http://bugs.gentoo.org/show_bug.cgi?id=122277 2.6.14 works; 2.6.15 and 2.6.16-rc2 hang after "io scheduler cfq registered". This only happens on *warm* boots (i.e. does not happen when you cold-boot the machine, only on reboots). Additionally, it doesn't happen on *every* warm boot, but does happen most of the time. We have been diagnosing this on IRC and have found that the patch posted in comment #12 does not solve the issue. We followed the debug process here, here is a screenshot of the crashed system with "initcall_debug debug" and CONFIG_USB_DEBUG: http://www.evolutions.za.net/debug/crash.JPG The problem seems to occur during UHCI initialization. As this bug seems to be about EHCI, should we file a new bug report? Oops, that image URL wont be permanent. Here's the relevent info from the screenshot: Calling initcall cfq_init io scheduler cfq registered Calling initcall pci_init 0000:00:1d.0: uhci_check_and_reset_hc: legsup = 0x0030 0000:00:1d.0: Performing full reset 0000:00:1d.1: uhci_check_and_reset_hc: legsup = 0x0030 0000:00:1d.1: Performing full reset 0000:00:1d.2: uhci_check_and_reset_hc: legsup = 0x0030 0000:00:1d.2: Performing full reset 0000:00:1d.3: uhci_check_and_reset_hc: legsup = 0x0030 0000:00:1d.3: Performing full reset Dan: Better to start a new bug report in any case. Are you sure your problem is associated with UHCI? It looks like the code succeeded at least twice before the crash. Maybe the crash was caused by something that happened after the third reset completed. It would be a good idea to sprinkle some printk statements at the beginning an end of each of these quirk routines (UHCI and EHCI both) so you can tell for certain when they start up and when they complete. FWIW the patch in Comment #12 helps in my case. However now there is a noticable delay after "io scheduler cfq registered" (i.e. in the place where I expect the USB inititalization). It's at least 5 seconds, looks like it's waiting for something to timeout. But finally it boots and USB keyboard/mouse work fine. Based on the problems with this bios, i think we should be happy it works properly now. Am closing this then, as the patch has hit Linus's tree already. Opened bug 6098 with the issue described in comment 17. It is ehci after all. *** Bug 6118 has been marked as a duplicate of this bug. *** |