Bug 6011 - Booting hangs up after "io scheduler cfq registered"
Booting hangs up after "io scheduler cfq registered"
Status: RESOLVED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: USB
i386 Linux
: P2 high
Assigned To: Greg Kroah-Hartman
:
: 5932 5935 6118 (view as bug list)
Depends on:
Blocks: USB
  Show dependency treegraph
 
Reported: 2006-02-04 18:10 UTC by Michal Ludvig
Modified: 2006-02-22 09:51 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.16-rc2
Tree: Mainline
Regression: ---


Attachments
config of my 2.6.16-rc2 build (36.53 KB, text/plain)
2006-02-04 18:11 UTC, Michal Ludvig
Details
dmesg (13.29 KB, text/plain)
2006-02-07 12:31 UTC, Michal Ludvig
Details
disable the SMI-setting; better diagnostics (2.01 KB, patch)
2006-02-09 09:59 UTC, David Brownell
Details | Diff

Description Michal Ludvig 2006-02-04 18:10:07 UTC
Most recent kernel where this bug did not occur: 2.6.14.5, haven't tested
anything newer yet.
Hardware Environment: VIA VT-310DP SMP board with 2x VIA C3, 512MB RAM
Software Environment: OpenSUSE 10.0, gcc version 4.0.2 20050901 (prerelease)
(SUSE Linux)
Problem Description:
Booting hangs after printing "io scheduler cfq registered". In 2.6.14.5 it
continues with:
io scheduler cfq registered
loop: loaded (max 8 devices)
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:0f.0
ACPI: PCI Interrupt Link [ALKA] disabled and referenced, BIOS bug.
ACPI: PCI Interrupt Link [ALKA] enabled at IRQ 20
ACPI: PCI Interrupt 0000:00:0f.0[A] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 17
PCI: Via IRQ fixup for 0000:00:0f.0, from 255 to 1
VP_IDE: chipset revision 6
[...]
Comment 1 Michal Ludvig 2006-02-04 18:11:30 UTC
Created attachment 7244 [details]
config of my 2.6.16-rc2 build
Comment 2 Andrew Morton 2006-02-04 18:31:02 UTC
Please add `initcall_debug' to the kernel boot commandline and let's see
how far it got.

Make sure that CONFIG_KALLYSMS is set.

Comment 3 Michal Ludvig 2006-02-04 20:25:57 UTC
With debugging enabled I get this on the serial console:
[...]
io scheduler cfq registered
Calling initcall 0xc024fd30: pci_init+0x0/0x30()
BUG: soft lockup detected on CPU#1!

Pid: 1, comm:              swapper
EIP: 0060:[<c0110fa0>] CPU: 1
EIP is at smp_call_function+0xb0/0x180
 EFLAGS: 00000297    Not tainted  (2.6.16-rc2)
EAX: 00000000 EBX: 00000001 ECX: 000c0000 EDX: 000000fb
ESI: c0110d40 EDI: 00000000 EBP: 00000001 DS: 007b ES: 007b
CR0: 8005003b CR2: 00000000 CR3: 004a2000 CR4: 00000690
 [<c0110d40>] do_flush_tlb_all+0x0/0x70
 [<c030a4e6>] pci_conf1_write+0x96/0xd0
 [<c01110bb>] flush_tlb_all+0x1b/0x30
 [<c0152125>] __remove_vm_area+0x25/0x60
 [<c0152174>] remove_vm_area+0x14/0x30
 [<c01174ed>] iounmap+0x7d/0x140
 [<c02e7a5a>] quirk_usb_early_handoff+0x2fa/0x420
 [<c024731f>] kobject_get+0xf/0x20
 [<c02a4c90>] get_device+0x10/0x20
 [<c0250a37>] pci_fixup_device+0x47/0xa0
 [<c024fd41>] pci_init+0x11/0x30
 [<c010044a>] init+0x12a/0x360
 [<c024fd30>] pci_init+0x0/0x30
 [<c0100320>] init+0x0/0x360
 [<c0100ed5>] kernel_thread_helper+0x5/0x10

What other tests or information should I provide?
Comment 4 Andrew Morton 2006-02-04 21:23:21 UTC
Alan, Greg: this looks like another hang in the USB PCI quirk
handling.   Where did we end up with that?
Comment 5 Greg Kroah-Hartman 2006-02-07 08:41:23 UTC
What kind of hardware does this machine have?  OHCI, UHCI, and/or EHCI USB host
controllers?  (you can see this by running 'lspci').
Comment 6 Michal Ludvig 2006-02-07 12:30:19 UTC
lspci says UHCI but in the BIOS I could enable EHCI mode. Attaching dmesg as well.

~# lspci
00:00.0 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. CN400/PM880 Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:08.0 Network controller: Intersil Corporation Prism 2.5 Wavelan chipset (rev 01)
00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 10)
00:0a.0 Ethernet controller: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit
Ethernet Adapter (rev 11)
00:0f.0 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge
[KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237
AC97 Audio Controller (rev 60)
01:00.0 VGA compatible controller: VIA Technologies, Inc. S3 Unichrome Pro VGA
Adapter (rev 02)

Comment 7 Michal Ludvig 2006-02-07 12:31:27 UTC
Created attachment 7267 [details]
dmesg
Comment 8 Alan Stern 2006-02-07 13:03:55 UTC
Try turning on CONFIG_USB_DEBUG and booting 2.6.16-rc2 with "debug" as well as
"initcall_debug" on the kernel command line.
Comment 9 Michal Ludvig 2006-02-07 19:43:30 UTC
With USB_DEBUG I got:

Calling initcall 0xc024e210: pci_init+0x0/0x30()
 0000:00:10.0: uhci_check_and_reset_hc: legsup = 0x0010
 0000:00:10.0: Performing full reset
 0000:00:10.1: uhci_check_and_reset_hc: legsup = 0x0010
 0000:00:10.1: Performing full reset
 0000:00:10.2: uhci_check_and_reset_hc: legsup = 0x0010
 0000:00:10.2: Performing full reset
 0000:00:10.3: uhci_check_and_reset_hc: legsup = 0x0010
 0000:00:10.3: Performing full reset
0000:00:10.4 EHCI: BIOS handoff

... and then the BUG as in Comment #3
Comment 10 Alan Stern 2006-02-08 09:24:39 UTC
Try following the advice in 

    http://marc.theaimsgroup.com/?l=linux-usb-devel&m=113938405418889&w=2

and the preceding messages in that thread.  Your problem might be exactly the
same as the one Carlo Prelz has.
Comment 11 Michal Ludvig 2006-02-08 21:49:28 UTC
After commenting out the block in
drivers/usb/host/pci-quirks.c:quirk_usb_disable_ehci():

#if 0
if ((cap & EHCI_USBLEGSUP_BIOS)) {
        [...]
        pci_read_config_dword(pdev,
                        offset + EHCI_USBLEGCTLSTS,
                        &val);
        pci_write_config_dword(pdev,
                        offset + EHCI_USBLEGCTLSTS,
                        val | EHCI_USBLEGCTLSTS_SOOE);
}
#endif

it boots fine and USB works with no apparent problems (having keyboard, mouse
and usbstorage plugged in).

BTW I observe this message after pci_init() is called:
0000:00:10.4 EHCI: BIOS handoff failed (BIOS bug ?)

Exactly as in the refered thread in linux-kernel.
Comment 12 David Brownell 2006-02-09 09:59:20 UTC
Created attachment 7284 [details]
disable the SMI-setting; better diagnostics

Looks like this patch should be merged then;
I was never happy with that "force SMI on" bit.
Comment 13 Greg Kroah-Hartman 2006-02-09 13:21:36 UTC
Looks good to me, want to forward it to me in email?
Comment 14 Greg Kroah-Hartman 2006-02-09 13:48:01 UTC
*** Bug 5935 has been marked as a duplicate of this bug. ***
Comment 15 Greg Kroah-Hartman 2006-02-09 13:48:15 UTC
*** Bug 5932 has been marked as a duplicate of this bug. ***
Comment 16 Alan Stern 2006-02-09 13:59:50 UTC
Bug 5932 should not be rejected as a duplicate of this one.  Although they are
both related to the USB handoff in the pci-quirks file, they have different
causes.  This bug is related to the EHCI handoff, whereas bug 5932 is related to
the OHCI handoff.
Comment 17 Daniel Drake 2006-02-11 08:18:30 UTC
A very similar bug has been reported at the Gentoo bugzilla:

http://bugs.gentoo.org/show_bug.cgi?id=122277

2.6.14 works; 2.6.15 and 2.6.16-rc2 hang after "io scheduler cfq registered".
This only happens on *warm* boots (i.e. does not happen when you cold-boot the
machine, only on reboots). Additionally, it doesn't happen on *every* warm boot,
but does happen most of the time.

We have been diagnosing this on IRC and have found that the patch posted in
comment #12 does not solve the issue.

We followed the debug process here, here is a screenshot of the crashed system
with "initcall_debug debug" and CONFIG_USB_DEBUG:

http://www.evolutions.za.net/debug/crash.JPG

The problem seems to occur during UHCI initialization. As this bug seems to be
about EHCI, should we file a new bug report?
Comment 18 Daniel Drake 2006-02-11 08:35:03 UTC
Oops, that image URL wont be permanent. Here's the relevent info from the
screenshot:

Calling initcall cfq_init
io scheduler cfq registered
Calling initcall pci_init
 0000:00:1d.0: uhci_check_and_reset_hc: legsup = 0x0030
 0000:00:1d.0: Performing full reset
 0000:00:1d.1: uhci_check_and_reset_hc: legsup = 0x0030
 0000:00:1d.1: Performing full reset
 0000:00:1d.2: uhci_check_and_reset_hc: legsup = 0x0030
 0000:00:1d.2: Performing full reset
 0000:00:1d.3: uhci_check_and_reset_hc: legsup = 0x0030
 0000:00:1d.3: Performing full reset
Comment 19 Alan Stern 2006-02-11 09:16:08 UTC
Dan: Better to start a new bug report in any case.

Are you sure your problem is associated with UHCI?  It looks like the code
succeeded at least twice before the crash.  Maybe the crash was caused by
something that happened after the third reset completed.  It would be a good
idea to sprinkle some printk statements at the beginning an end of each of these
quirk routines (UHCI and EHCI both) so you can tell for certain when they start
up and when they complete.
Comment 20 Michal Ludvig 2006-02-14 16:42:08 UTC
FWIW the patch in Comment #12 helps in my case. However now there is a noticable
delay after "io scheduler cfq registered" (i.e. in the place where I expect the
USB inititalization). It's at least 5 seconds, looks like it's waiting for
something to timeout. But finally it boots and USB keyboard/mouse work fine.
Comment 21 Greg Kroah-Hartman 2006-02-14 16:58:40 UTC
Based on the problems with this bios, i think we should be happy it works
properly now.

Am closing this then, as the patch has hit Linus's tree already.
Comment 22 Daniel Drake 2006-02-18 08:04:35 UTC
Opened bug 6098 with the issue described in comment 17. It is ehci after all.
Comment 23 Greg Kroah-Hartman 2006-02-22 09:51:48 UTC
*** Bug 6118 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.