Bug 8249
Summary: | NMI watchdog detected LOCKUP when plugging USB devices with acpi=off | ||
---|---|---|---|
Product: | Drivers | Reporter: | Lucas Nussbaum (lucas) |
Component: | USB | Assignee: | Greg Kroah-Hartman (greg) |
Status: | REJECTED UNREPRODUCIBLE | ||
Severity: | high | CC: | protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.21-rc4 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 5089 |
Description
Lucas Nussbaum
2007-03-21 07:18:37 UTC
Reply-To: akpm@linux-foundation.org On Wed, 21 Mar 2007 07:18:45 -0700 bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=8249 > > Summary: NMI watchdog detected LOCKUP when plugging USB devices > with acpi=off > Kernel Version: 2.6.21-rc4 > Status: NEW > Severity: high > Owner: greg@kroah.com > Submitter: lucas@lucas-nussbaum.net > > > Most recent kernel where this bug did *NOT* occur: none, as far as I know > Problem Description: > > My kernel is running with acpi=off to workaround another bug. When I turn on my > USB printer (a Canon PIXMA MP750), I get this on the serial console: > > usb 1-1: new full speed USB device using ohci_hcd and address 2 > usb 1-1: configuration #1 chosen from 1 choice > BUG: NMI Watchdog detected LOCKUP on CPU0, eip 00007db7, registers: > Modules linked in: nfs nfsd exportfs lockd nfs_acl sunrpc ppdev lp ipv6 > dm_snapshot dm_mirror dm_mod apm eeprom i2c_i801 snd_es1938 snd_opl3_lib > snd_hwdep snd_mpu401_uart snd_pcm_oss snd_pcm snd_page_alloc snd_mixer_oss > snd_seq_dummy snd_seq_oss bt878 snd_seq_midi snd_rawmidi snd_seq_midi_event > tuner bttv snd_seq video_buf firmware_class snd_timer snd_seq_device ir_common > compat_ioctl32 i2c_algo_bit snd btcx_risc tveeprom videodev v4l2_common > v4l1_compat soundcore i2c_amd756 i2c_core nvidia_agp agpgart shpchp pci_hotplug > parport_pc parport evdev analog rtc gameport pcspkr ext3 jbd mbcache ide_cd > cdrom ide_disk generic amd74xx e100 mii ide_core ohci_hcd usbcore > CPU: 0 > EIP: 00b8:[<00007db7>] Not tainted VLI > EFLAGS: 00000006 (2.6.21-rc4 #3) > EIP is at 0x7db7 > eax: 00000ac2 ebx: 00000001 ecx: 00000000 edx: 0000e42e > esi: dfe2bf2e edi: 0000530a ebp: 00000000 esp: dfe2be66 > ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0068 > Process hald (pid: 2198, ti=dfe2a000 task=dff94a70 task.ti=dfe2a000) > Stack: c2fb0001 0060ef9e 00010000 530a0000 007b0000 007b0000 f7200000 5de0deb8 > 0000eefd 00d80000 c0400000 0033c15e 02820000 00010000 00000000 00000000 > bf2a0000 bf2edfe2 bf2cdfe2 0400dfe2 c5cb0000 bee0ef9e bedcdfe2 bed8dfe2 > Call Trace: > ======================= > Code: Bad EIP value. > > I could also reproduce the same crash using two another USB peripherals (a > remote controller and a mouse). > > Without acpi=off, it works fine. > (Please respond via reply-to-all, not via bugzilla web forms) Are you able to get a cleaner kenrel trace than that? This one seems a bit mangled. Please raise a buzilla entry against acpi for the "another bug", if you haven't done so. I assume this indicates a bug in USB (as well) but with this trace it is hard to tell. Thanks. On 21/03/07 at 08:41 -0800, Andrew Morton wrote: > On Wed, 21 Mar 2007 07:18:45 -0700 bugme-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=8249 > > > > Summary: NMI watchdog detected LOCKUP when plugging USB devices > > with acpi=off > > Kernel Version: 2.6.21-rc4 > > Status: NEW > > Severity: high > > Owner: greg@kroah.com > > Submitter: lucas@lucas-nussbaum.net > > > > > > Most recent kernel where this bug did *NOT* occur: none, as far as I know > > Problem Description: > > > > My kernel is running with acpi=off to workaround another bug. When I turn on my > > USB printer (a Canon PIXMA MP750), I get this on the serial console: > > > > usb 1-1: new full speed USB device using ohci_hcd and address 2 > > usb 1-1: configuration #1 chosen from 1 choice > > BUG: NMI Watchdog detected LOCKUP on CPU0, eip 00007db7, registers: > > Modules linked in: nfs nfsd exportfs lockd nfs_acl sunrpc ppdev lp ipv6 > > dm_snapshot dm_mirror dm_mod apm eeprom i2c_i801 snd_es1938 snd_opl3_lib > > snd_hwdep snd_mpu401_uart snd_pcm_oss snd_pcm snd_page_alloc snd_mixer_oss > > snd_seq_dummy snd_seq_oss bt878 snd_seq_midi snd_rawmidi snd_seq_midi_event > > tuner bttv snd_seq video_buf firmware_class snd_timer snd_seq_device ir_common > > compat_ioctl32 i2c_algo_bit snd btcx_risc tveeprom videodev v4l2_common > > v4l1_compat soundcore i2c_amd756 i2c_core nvidia_agp agpgart shpchp pci_hotplug > > parport_pc parport evdev analog rtc gameport pcspkr ext3 jbd mbcache ide_cd > > cdrom ide_disk generic amd74xx e100 mii ide_core ohci_hcd usbcore > > CPU: 0 > > EIP: 00b8:[<00007db7>] Not tainted VLI > > EFLAGS: 00000006 (2.6.21-rc4 #3) > > EIP is at 0x7db7 > > eax: 00000ac2 ebx: 00000001 ecx: 00000000 edx: 0000e42e > > esi: dfe2bf2e edi: 0000530a ebp: 00000000 esp: dfe2be66 > > ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0068 > > Process hald (pid: 2198, ti=dfe2a000 task=dff94a70 task.ti=dfe2a000) > > Stack: c2fb0001 0060ef9e 00010000 530a0000 007b0000 007b0000 f7200000 5de0deb8 > > 0000eefd 00d80000 c0400000 0033c15e 02820000 00010000 00000000 00000000 > > bf2a0000 bf2edfe2 bf2cdfe2 0400dfe2 c5cb0000 bee0ef9e bedcdfe2 bed8dfe2 > > Call Trace: > > ======================= > > Code: Bad EIP value. > > > > I could also reproduce the same crash using two another USB peripherals (a > > remote controller and a mouse). > > > > Without acpi=off, it works fine. > > > > (Please respond via reply-to-all, not via bugzilla web forms) > > Are you able to get a cleaner kenrel trace than that? This one seems a bit > mangled. No. Initially, I didn't have debugging symbols compiled in, so I rebuilt the kernel in the hope that I would get something useful. But I got the same trace. > Please raise a buzilla entry against acpi for the "another bug", if you > haven't done so. It's http://bugzilla.kernel.org/show_bug.cgi?id=7937 (wake on lan doesn't work when acpi is enabled (works if disabled). > I assume this indicates a bug in USB (as well) but with this trace it is > hard to tell. How could I get a more useful output ? Reply-To: akpm@linux-foundation.org On Wed, 21 Mar 2007 18:56:24 +0100 Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote: > On 21/03/07 at 08:41 -0800, Andrew Morton wrote: > > On Wed, 21 Mar 2007 07:18:45 -0700 bugme-daemon@bugzilla.kernel.org wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=8249 > > > > > > Summary: NMI watchdog detected LOCKUP when plugging USB devices > > > with acpi=off > > > Kernel Version: 2.6.21-rc4 > > > Status: NEW > > > Severity: high > > > Owner: greg@kroah.com > > > Submitter: lucas@lucas-nussbaum.net > > > > > > > > > Most recent kernel where this bug did *NOT* occur: none, as far as I know > > > Problem Description: > > > > > > My kernel is running with acpi=off to workaround another bug. When I turn on my > > > USB printer (a Canon PIXMA MP750), I get this on the serial console: > > > > > > usb 1-1: new full speed USB device using ohci_hcd and address 2 > > > usb 1-1: configuration #1 chosen from 1 choice > > > BUG: NMI Watchdog detected LOCKUP on CPU0, eip 00007db7, registers: > > > Modules linked in: nfs nfsd exportfs lockd nfs_acl sunrpc ppdev lp ipv6 > > > dm_snapshot dm_mirror dm_mod apm eeprom i2c_i801 snd_es1938 snd_opl3_lib > > > snd_hwdep snd_mpu401_uart snd_pcm_oss snd_pcm snd_page_alloc snd_mixer_oss > > > snd_seq_dummy snd_seq_oss bt878 snd_seq_midi snd_rawmidi snd_seq_midi_event > > > tuner bttv snd_seq video_buf firmware_class snd_timer snd_seq_device ir_common > > > compat_ioctl32 i2c_algo_bit snd btcx_risc tveeprom videodev v4l2_common > > > v4l1_compat soundcore i2c_amd756 i2c_core nvidia_agp agpgart shpchp pci_hotplug > > > parport_pc parport evdev analog rtc gameport pcspkr ext3 jbd mbcache ide_cd > > > cdrom ide_disk generic amd74xx e100 mii ide_core ohci_hcd usbcore > > > CPU: 0 > > > EIP: 00b8:[<00007db7>] Not tainted VLI > > > EFLAGS: 00000006 (2.6.21-rc4 #3) > > > EIP is at 0x7db7 > > > eax: 00000ac2 ebx: 00000001 ecx: 00000000 edx: 0000e42e > > > esi: dfe2bf2e edi: 0000530a ebp: 00000000 esp: dfe2be66 > > > ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0068 > > > Process hald (pid: 2198, ti=dfe2a000 task=dff94a70 task.ti=dfe2a000) > > > Stack: c2fb0001 0060ef9e 00010000 530a0000 007b0000 007b0000 f7200000 5de0deb8 > > > 0000eefd 00d80000 c0400000 0033c15e 02820000 00010000 00000000 00000000 > > > bf2a0000 bf2edfe2 bf2cdfe2 0400dfe2 c5cb0000 bee0ef9e bedcdfe2 bed8dfe2 > > > Call Trace: > > > ======================= > > > Code: Bad EIP value. > > > > > > I could also reproduce the same crash using two another USB peripherals (a > > > remote controller and a mouse). > > > > > > Without acpi=off, it works fine. > > > > > > > (Please respond via reply-to-all, not via bugzilla web forms) > > > > Are you able to get a cleaner kenrel trace than that? This one seems a bit > > mangled. > > No. Initially, I didn't have debugging symbols compiled in, so I rebuilt > the kernel in the hope that I would get something useful. But I got the > same trace. > > > Please raise a buzilla entry against acpi for the "another bug", if you > > haven't done so. > > It's http://bugzilla.kernel.org/show_bug.cgi?id=7937 (wake on lan > doesn't work when acpi is enabled (works if disabled). > > > I assume this indicates a bug in USB (as well) but with this trace it is > > hard to tell. > > How could I get a more useful output ? ugly. Have you tried it with the NMI watchdog disabled? The machine might just lock up dead. Or it might do something useful&interesting, dunno. On 21/03/07 at 11:07 -0700, Andrew Morton wrote:
> On Wed, 21 Mar 2007 18:56:24 +0100
> Lucas Nussbaum <lucas@lucas-nussbaum.net> wrote:
>
> > On 21/03/07 at 08:41 -0800, Andrew Morton wrote:
> > > On Wed, 21 Mar 2007 07:18:45 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> > >
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=8249
> > > >
> > > > Summary: NMI watchdog detected LOCKUP when plugging USB devices
> > > > with acpi=off
> > > > Kernel Version: 2.6.21-rc4
> > > > Status: NEW
> > > > Severity: high
> > > > Owner: greg@kroah.com
> > > > Submitter: lucas@lucas-nussbaum.net
> > > >
> > > >
> > > > Most recent kernel where this bug did *NOT* occur: none, as far as I know
> > > > Problem Description:
> > > >
> > > > My kernel is running with acpi=off to workaround another bug. When I turn on my
> > > > USB printer (a Canon PIXMA MP750), I get this on the serial console:
> > > >
> > > > usb 1-1: new full speed USB device using ohci_hcd and address 2
> > > > usb 1-1: configuration #1 chosen from 1 choice
> > > > BUG: NMI Watchdog detected LOCKUP on CPU0, eip 00007db7, registers:
> > > > Modules linked in: nfs nfsd exportfs lockd nfs_acl sunrpc ppdev lp ipv6
> > > > dm_snapshot dm_mirror dm_mod apm eeprom i2c_i801 snd_es1938 snd_opl3_lib
> > > > snd_hwdep snd_mpu401_uart snd_pcm_oss snd_pcm snd_page_alloc snd_mixer_oss
> > > > snd_seq_dummy snd_seq_oss bt878 snd_seq_midi snd_rawmidi snd_seq_midi_event
> > > > tuner bttv snd_seq video_buf firmware_class snd_timer snd_seq_device ir_common
> > > > compat_ioctl32 i2c_algo_bit snd btcx_risc tveeprom videodev v4l2_common
> > > > v4l1_compat soundcore i2c_amd756 i2c_core nvidia_agp agpgart shpchp pci_hotplug
> > > > parport_pc parport evdev analog rtc gameport pcspkr ext3 jbd mbcache ide_cd
> > > > cdrom ide_disk generic amd74xx e100 mii ide_core ohci_hcd usbcore
> > > > CPU: 0
> > > > EIP: 00b8:[<00007db7>] Not tainted VLI
> > > > EFLAGS: 00000006 (2.6.21-rc4 #3)
> > > > EIP is at 0x7db7
> > > > eax: 00000ac2 ebx: 00000001 ecx: 00000000 edx: 0000e42e
> > > > esi: dfe2bf2e edi: 0000530a ebp: 00000000 esp: dfe2be66
> > > > ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0068
> > > > Process hald (pid: 2198, ti=dfe2a000 task=dff94a70 task.ti=dfe2a000)
> > > > Stack: c2fb0001 0060ef9e 00010000 530a0000 007b0000 007b0000 f7200000 5de0deb8
> > > > 0000eefd 00d80000 c0400000 0033c15e 02820000 00010000 00000000 00000000
> > > > bf2a0000 bf2edfe2 bf2cdfe2 0400dfe2 c5cb0000 bee0ef9e bedcdfe2 bed8dfe2
> > > > Call Trace:
> > > > =======================
> > > > Code: Bad EIP value.
> > > >
> > > > I could also reproduce the same crash using two another USB peripherals (a
> > > > remote controller and a mouse).
> > > >
> > > > Without acpi=off, it works fine.
> > > >
> > >
> > > (Please respond via reply-to-all, not via bugzilla web forms)
> > >
> > > Are you able to get a cleaner kenrel trace than that? This one seems a bit
> > > mangled.
> >
> > No. Initially, I didn't have debugging symbols compiled in, so I rebuilt
> > the kernel in the hope that I would get something useful. But I got the
> > same trace.
> >
> > > Please raise a buzilla entry against acpi for the "another bug", if you
> > > haven't done so.
> >
> > It's http://bugzilla.kernel.org/show_bug.cgi?id=7937 (wake on lan
> > doesn't work when acpi is enabled (works if disabled).
> >
> > > I assume this indicates a bug in USB (as well) but with this trace it is
> > > hard to tell.
> >
> > How could I get a more useful output ?
>
> ugly.
>
> Have you tried it with the NMI watchdog disabled? The machine might just
> lock up dead. Or it might do something useful&interesting, dunno.
When booting with nmi_watchdog=0, it just locks up...
On Wed, 21 Mar 2007, Lucas Nussbaum wrote:
> On 21/03/07 at 08:41 -0800, Andrew Morton wrote:
> > On Wed, 21 Mar 2007 07:18:45 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> >
> > > http://bugzilla.kernel.org/show_bug.cgi?id=8249
> > >
> > > Summary: NMI watchdog detected LOCKUP when plugging USB devices
> > > with acpi=off
> > > Kernel Version: 2.6.21-rc4
> > > Status: NEW
> > > Severity: high
> > > Owner: greg@kroah.com
> > > Submitter: lucas@lucas-nussbaum.net
> > >
> > >
> > > Most recent kernel where this bug did *NOT* occur: none, as far as I know
> > > Problem Description:
> > >
> > > My kernel is running with acpi=off to workaround another bug. When I turn on my
> > > USB printer (a Canon PIXMA MP750), I get this on the serial console:
It looks like your problem occurs either as a result of the USB printer
driver's actions or as a result of HAL. You could try turning off the HAL
daemon and renaming the usblp.ko file so that it won't get loaded
automatically. Then after plugging in the printer you could insmod the
renamed driver file by hand, and see whether that causes the oops. Then
start HAL by hand and see whether that causes the oops.
Alan Stern
On 22/03/07 at 10:49 -0400, Alan Stern wrote: > On Wed, 21 Mar 2007, Lucas Nussbaum wrote: > > > On 21/03/07 at 08:41 -0800, Andrew Morton wrote: > > > On Wed, 21 Mar 2007 07:18:45 -0700 bugme-daemon@bugzilla.kernel.org wrote: > > > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=8249 > > > > > > > > Summary: NMI watchdog detected LOCKUP when plugging USB devices > > > > with acpi=off > > > > Kernel Version: 2.6.21-rc4 > > > > Status: NEW > > > > Severity: high > > > > Owner: greg@kroah.com > > > > Submitter: lucas@lucas-nussbaum.net > > > > > > > > > > > > Most recent kernel where this bug did *NOT* occur: none, as far as I know > > > > Problem Description: > > > > > > > > My kernel is running with acpi=off to workaround another bug. When I turn on my > > > > USB printer (a Canon PIXMA MP750), I get this on the serial console: > > It looks like your problem occurs either as a result of the USB printer > driver's actions or as a result of HAL. You could try turning off the HAL > daemon and renaming the usblp.ko file so that it won't get loaded > automatically. Then after plugging in the printer you could insmod the > renamed driver file by hand, and see whether that causes the oops. Then > start HAL by hand and see whether that causes the oops. Ah, good idea. A problem specific to usblp is unlikely, since it also fails with other USB devices (mouse, remote controler). If I kill hald before turning on the printer, the kernel doesn't deadlock. If then, I modprobe usblp, the module gets inserted ok. If then, I start hald (or if I start if with the module unloaded), it deadlocks. I straced hald. The output is on http://blop.info/strace-hald.log, but it's probably not very interesting, except for the end: [pid 2619] close(14) = 0 [pid 2619] munmap(0xb7f19000, 4096) = 0 [pid 2619] readlink("/sys/block/ram0/device", 0x8150f30, 256) = -1 ENOENT (No such file or directory) [pid 2619] open("/sys/block/ram0/slaves", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 14 [pid 2619] fstat64(14, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 2619] fcntl64(14, F_SETFD, FD_CLOEXEC) = 0 [pid 2619] getdents64(14, /* 2 entries */, 4096) = 48 [pid 2619] getdents64(14, /* 0 entries */, 4096) = 0 [pid 2619] close(14) = 0 [pid 2619] open("//proc/apm", O_RDONLY|O_LARGEFILE) = 14 I also confirmed that "cat /proc/apm" works fine when the USB printer is off. But it deadlocks when the USB printer is on. With the printer off: # cat /proc/apm 1.16ac 1.2 0x03 0x01 0xff 0x80 -1% -1 ? The kernel is in the APM BIOS when the lockup is detected. That's why there is no useful information in the OOPS message. Hald is known for misbehaving - but it also means that kernel interfaces that it uses have bugs, or it runs into other hidden bugs in the kernel. I'd say information about how system gets configured with and without ACPI would be important. Can you please provide boot trace with and without acpi? Having debug boot parameter would probably make it more informative. Thanks. Hi, I am sorry, but the machine that caused this broke a few months ago. So I am unable to reproduce this, or to provide more information. Can you provide for the record what system/motherboard was that? I'm not 100% sure as I don't have the hardware anymore, but I'm pretty sure it was an Asus A7N266-E. |