Bug 15376

Summary: regression (oops) with usb in 2.6.33-rc8
Product: Drivers Reporter: Christophe Fergeau (cfergeau)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, florian, maciej.rutecki, marti, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33-rc8 bee415ce427d1eab6cfb30221461c7d20cbf1903 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    
Attachments: lsusb -v before running the program that triggers the oops
output of lspci |grep USB

Description Christophe Fergeau 2010-02-23 10:58:01 UTC
Created attachment 25172 [details]
lsusb -v before running the program that triggers the oops

While "playing" (ie sending it random stuff with libusb) with an iPod nano with a recent kernel (tested with 2.6.33-rc8 and git master from a few hours ago), I'm getting a kernel oops :


BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa00e7079>] usb_altnum_to_altsetting+0x9/0x60 [usbcore]
PGD 6b802067 PUD 6b9ad067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1a.7/usb1/1-2/devnum
CPU 1 
Pid: 4349, comm: python Not tainted 2.6.33-desktop-0.rc8.1mnb #1 Mac-F22788A9/MacBook4,1
RIP: 0010:[<ffffffffa00e7079>]  [<ffffffffa00e7079>] usb_altnum_to_altsetting+0x9/0x60 [usbcore]
RSP: 0018:ffff880068011d18  EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff88006f8905f0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880068011d18 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007c573800
R13: ffff880068042c00 R14: 0000000000000000 R15: 00000000ffffffb5
FS:  00007f224c9f0700(0000) GS:ffff880001b00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 000000006b815000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process python (pid: 4349, threadinfo ffff880068010000, task ffff88006b8c4350)
Stack:
 ffff880068011d88 ffffffffa00f4213 0000000000000000 ffff880000000000
<0> ffff880000001388 ffffffff810df769 ffff88006da3e400 ffff88007c377d50
<0> ffff880068011d88 0000000000000000 ffff88007c9a3240 00007fff620c8c8c
Call Trace:
 [<ffffffffa00f4213>] usb_reset_configuration+0x123/0x250 [usbcore]
 [<ffffffff810df769>] ? filemap_fault+0xb9/0x450
 [<ffffffffa00ff34d>] usbdev_do_ioctl+0xcdd/0x12c0 [usbcore]
 [<ffffffff810f9a59>] ? __do_fault+0x3b9/0x4b0
 [<ffffffff813b8e07>] ? _lock_kernel+0x47/0xad
 [<ffffffffa00ff9f8>] usbdev_ioctl+0x48/0x80 [usbcore]
 [<ffffffff81137ffd>] vfs_ioctl+0x3d/0xd0
 [<ffffffff8113858a>] do_vfs_ioctl+0x8a/0x5a0
 [<ffffffff811f4166>] ? __up_read+0xa6/0xd0
 [<ffffffff8107bcde>] ? up_read+0xe/0x10
 [<ffffffff81138b21>] sys_ioctl+0x81/0xa0
 [<ffffffff8100a002>] system_call_fastpath+0x16/0x1b
Code: 0f b6 7f 02 39 f7 74 c1 83 c2 01 44 39 c2 7c e2 31 c0 eb b5 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 <44> 8b 47 10 45 85 c0 74 35 48 8b 07 31 d2 0f b6 48 03 39 f1 75 
RIP  [<ffffffffa00e7079>] usb_altnum_to_altsetting+0x9/0x60 [usbcore]
 RSP <ffff880068011d18>
CR2: 0000000000000010
---[ end trace ded3cae37b595f91 ]---

I bisected it to http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f0479e0
With this commit, the oops happens, right before this commit, my test program just fails with 

Traceback (most recent call last):
  File "./ipoddfu.py", line 32, in <module>
    dev = libipoddfu.ipoddfu()
  File "/home/teuf/hack/ipoddfu/snapshot-201002150047/tools/libipoddfu.py", line 76, in __init__
    self.handle.setConfiguration(1)
usb.USBError: Numerical result out of range

(which is an acceptable result to me since the iPod is in a pretty bad state at that point).
Comment 1 Christophe Fergeau 2010-02-23 10:58:43 UTC
Created attachment 25173 [details]
output of lspci |grep USB
Comment 2 Christophe Fergeau 2010-02-23 10:59:34 UTC
The oops doesn't happen either with 2.6.31.6, I get the same error message as mentioned earlier
Comment 3 Andrew Morton 2010-02-24 22:12:20 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

Regent regression, bisected to

commit 3f0479e00a3fca9590ae8d9edc4e9c47b7fa0610
Author:     Sarah Sharp <sarah.a.sharp@linux.intel.com>
AuthorDate: Thu Dec 3 09:44:36 2009 -0800
Commit:     Greg Kroah-Hartman <gregkh@suse.de>
CommitDate: Fri Dec 11 11:55:27 2009 -0800

    USB: Check bandwidth when switching alt settings.


On Tue, 23 Feb 2010 10:58:09 GMT bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=15376
> 
>            Summary: regression (oops) with usb in 2.6.33-rc8
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.33-rc8 bee415ce427d1eab6cfb30221461c7d20cbf1903
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: USB
>         AssignedTo: greg@kroah.com
>         ReportedBy: cfergeau@mandriva.com
>         Regression: No
> 
> 
> Created an attachment (id=25172)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=25172)
> lsusb -v before running the program that triggers the oops
> 
> While "playing" (ie sending it random stuff with libusb) with an iPod nano
> with
> a recent kernel (tested with 2.6.33-rc8 and git master from a few hours ago),
> I'm getting a kernel oops :
> 
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> IP: [<ffffffffa00e7079>] usb_altnum_to_altsetting+0x9/0x60 [usbcore]
> PGD 6b802067 PUD 6b9ad067 PMD 0 
> Oops: 0000 [#1] SMP 
> last sysfs file: /sys/devices/pci0000:00/0000:00:1a.7/usb1/1-2/devnum
> CPU 1 
> Pid: 4349, comm: python Not tainted 2.6.33-desktop-0.rc8.1mnb #1
> Mac-F22788A9/MacBook4,1
> RIP: 0010:[<ffffffffa00e7079>]  [<ffffffffa00e7079>]
> usb_altnum_to_altsetting+0x9/0x60 [usbcore]
> RSP: 0018:ffff880068011d18  EFLAGS: 00010246
> RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff88006f8905f0
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff880068011d18 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007c573800
> R13: ffff880068042c00 R14: 0000000000000000 R15: 00000000ffffffb5
> FS:  00007f224c9f0700(0000) GS:ffff880001b00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000010 CR3: 000000006b815000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process python (pid: 4349, threadinfo ffff880068010000, task
> ffff88006b8c4350)
> Stack:
>  ffff880068011d88 ffffffffa00f4213 0000000000000000 ffff880000000000
> <0> ffff880000001388 ffffffff810df769 ffff88006da3e400 ffff88007c377d50
> <0> ffff880068011d88 0000000000000000 ffff88007c9a3240 00007fff620c8c8c
> Call Trace:
>  [<ffffffffa00f4213>] usb_reset_configuration+0x123/0x250 [usbcore]
>  [<ffffffff810df769>] ? filemap_fault+0xb9/0x450
>  [<ffffffffa00ff34d>] usbdev_do_ioctl+0xcdd/0x12c0 [usbcore]
>  [<ffffffff810f9a59>] ? __do_fault+0x3b9/0x4b0
>  [<ffffffff813b8e07>] ? _lock_kernel+0x47/0xad
>  [<ffffffffa00ff9f8>] usbdev_ioctl+0x48/0x80 [usbcore]
>  [<ffffffff81137ffd>] vfs_ioctl+0x3d/0xd0
>  [<ffffffff8113858a>] do_vfs_ioctl+0x8a/0x5a0
>  [<ffffffff811f4166>] ? __up_read+0xa6/0xd0
>  [<ffffffff8107bcde>] ? up_read+0xe/0x10
>  [<ffffffff81138b21>] sys_ioctl+0x81/0xa0
>  [<ffffffff8100a002>] system_call_fastpath+0x16/0x1b
> Code: 0f b6 7f 02 39 f7 74 c1 83 c2 01 44 39 c2 7c e2 31 c0 eb b5 66 66 66 66
> 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 <44> 8b 47 10 45 85
> c0
> 74 35 48 8b 07 31 d2 0f b6 48 03 39 f1 75 
> RIP  [<ffffffffa00e7079>] usb_altnum_to_altsetting+0x9/0x60 [usbcore]
>  RSP <ffff880068011d18>
> CR2: 0000000000000010
> ---[ end trace ded3cae37b595f91 ]---
> 
> I bisected it to
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f0479e0
> With this commit, the oops happens, right before this commit, my test program
> just fails with 
> 
> Traceback (most recent call last):
>   File "./ipoddfu.py", line 32, in <module>
>     dev = libipoddfu.ipoddfu()
>   File "/home/teuf/hack/ipoddfu/snapshot-201002150047/tools/libipoddfu.py",
> line 76, in __init__
>     self.handle.setConfiguration(1)
> usb.USBError: Numerical result out of range
> 
> (which is an acceptable result to me since the iPod is in a pretty bad state
> at
> that point).
Comment 4 Anonymous Emailer 2010-02-25 01:05:52 UTC
Reply-To: sarah.a.sharp@linux.intel.com

On Wed, Feb 24, 2010 at 02:05:53PM -0800, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> Regent regression, bisected to
> 
> commit 3f0479e00a3fca9590ae8d9edc4e9c47b7fa0610
> Author:     Sarah Sharp <sarah.a.sharp@linux.intel.com>
> AuthorDate: Thu Dec 3 09:44:36 2009 -0800
> Commit:     Greg Kroah-Hartman <gregkh@suse.de>
> CommitDate: Fri Dec 11 11:55:27 2009 -0800
> 
>     USB: Check bandwidth when switching alt settings.
> 
> 
> On Tue, 23 Feb 2010 10:58:09 GMT bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=15376
> > 
> >            Summary: regression (oops) with usb in 2.6.33-rc8
> >            Product: Drivers
> >            Version: 2.5
> >     Kernel Version: 2.6.33-rc8 bee415ce427d1eab6cfb30221461c7d20cbf1903
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: USB
> >         AssignedTo: greg@kroah.com
> >         ReportedBy: cfergeau@mandriva.com
> >         Regression: No
> > 
> > 
> > Created an attachment (id=25172)
> >  --> (http://bugzilla.kernel.org/attachment.cgi?id=25172)
> > lsusb -v before running the program that triggers the oops
> > 
> > While "playing" (ie sending it random stuff with libusb) with an iPod nano
> with
> > a recent kernel (tested with 2.6.33-rc8 and git master from a few hours
> ago),
> > I'm getting a kernel oops :
> > 
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> > IP: [<ffffffffa00e7079>] usb_altnum_to_altsetting+0x9/0x60 [usbcore]
> > PGD 6b802067 PUD 6b9ad067 PMD 0 
> > Oops: 0000 [#1] SMP 
> > last sysfs file: /sys/devices/pci0000:00/0000:00:1a.7/usb1/1-2/devnum
> > CPU 1 
> > Pid: 4349, comm: python Not tainted 2.6.33-desktop-0.rc8.1mnb #1
> > Mac-F22788A9/MacBook4,1
> > RIP: 0010:[<ffffffffa00e7079>]  [<ffffffffa00e7079>]
> > usb_altnum_to_altsetting+0x9/0x60 [usbcore]
> > RSP: 0018:ffff880068011d18  EFLAGS: 00010246
> > RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff88006f8905f0
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffff880068011d18 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007c573800
> > R13: ffff880068042c00 R14: 0000000000000000 R15: 00000000ffffffb5
> > FS:  00007f224c9f0700(0000) GS:ffff880001b00000(0000)
> knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000000010 CR3: 000000006b815000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process python (pid: 4349, threadinfo ffff880068010000, task
> ffff88006b8c4350)
> > Stack:
> >  ffff880068011d88 ffffffffa00f4213 0000000000000000 ffff880000000000
> > <0> ffff880000001388 ffffffff810df769 ffff88006da3e400 ffff88007c377d50
> > <0> ffff880068011d88 0000000000000000 ffff88007c9a3240 00007fff620c8c8c
> > Call Trace:
> >  [<ffffffffa00f4213>] usb_reset_configuration+0x123/0x250 [usbcore]
> >  [<ffffffff810df769>] ? filemap_fault+0xb9/0x450
> >  [<ffffffffa00ff34d>] usbdev_do_ioctl+0xcdd/0x12c0 [usbcore]
> >  [<ffffffff810f9a59>] ? __do_fault+0x3b9/0x4b0
> >  [<ffffffff813b8e07>] ? _lock_kernel+0x47/0xad
> >  [<ffffffffa00ff9f8>] usbdev_ioctl+0x48/0x80 [usbcore]
> >  [<ffffffff81137ffd>] vfs_ioctl+0x3d/0xd0
> >  [<ffffffff8113858a>] do_vfs_ioctl+0x8a/0x5a0
> >  [<ffffffff811f4166>] ? __up_read+0xa6/0xd0
> >  [<ffffffff8107bcde>] ? up_read+0xe/0x10
> >  [<ffffffff81138b21>] sys_ioctl+0x81/0xa0
> >  [<ffffffff8100a002>] system_call_fastpath+0x16/0x1b
> > Code: 0f b6 7f 02 39 f7 74 c1 83 c2 01 44 39 c2 7c e2 31 c0 eb b5 66 66 66
> 66
> > 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 <44> 8b 47 10 45
> 85 c0
> > 74 35 48 8b 07 31 d2 0f b6 48 03 39 f1 75 
> > RIP  [<ffffffffa00e7079>] usb_altnum_to_altsetting+0x9/0x60 [usbcore]
> >  RSP <ffff880068011d18>
> > CR2: 0000000000000010
> > ---[ end trace ded3cae37b595f91 ]---
> > 
> > I bisected it to
> >
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f0479e0
> > With this commit, the oops happens, right before this commit, my test
> program
> > just fails with 
> > 
> > Traceback (most recent call last):
> >   File "./ipoddfu.py", line 32, in <module>
> >     dev = libipoddfu.ipoddfu()
> >   File "/home/teuf/hack/ipoddfu/snapshot-201002150047/tools/libipoddfu.py",
> > line 76, in __init__
> >     self.handle.setConfiguration(1)
> > usb.USBError: Numerical result out of range
> > 
> > (which is an acceptable result to me since the iPod is in a pretty bad
> state at
> > that point).

Random stuff, hmmm?  Maybe you asked it to change alternate settings to
one that the device doesn't provide?  Can you post your python script?

The only way you would get a null pointer deference when
usb_altnum_to_altsetting() is called in usb_reset_configuration() is if
usb_host_config->interface[i] is null.  I think that's only true if the
configuration is not active, meaning the device was in the addressed
state.

Alan, does this sound correct?

I think the fix is to dig the interfaces out of the interface cache
instead.  I'll post a patch in a bit.

Sarah
Comment 5 Christophe Fergeau 2010-02-25 10:49:02 UTC
Hi,

Le 25/02/2010 00:50, Sarah Sharp a écrit :
>
> Random stuff, hmmm?  Maybe you asked it to change alternate settings to
> one that the device doesn't provide?  Can you post your python script?
>
>    

Yep, really random stuff ) The attached script puts the ipod in dfu mode 
and then uploads data from a file to it. The data I'm sending comes from 
/dev/urandom or a randomly chosen file in /bin. If I send the right 
amount of data, the ipod ends up in a state when I can run my script a 
second time, and this is when running it this second time that I get 
this oops. So it's truly happening when sending random stuff to an ipod 
in a weird state ;)
If I send valid data with this script, or a small amount of data, I can 
do it again and again with no crash, so I don't think it's directly 
related to the script.

Hope that helps,

Christophe
Comment 6 Alan Stern 2010-02-25 15:40:01 UTC
On Wed, 24 Feb 2010, Sarah Sharp wrote:

> The only way you would get a null pointer deference when
> usb_altnum_to_altsetting() is called in usb_reset_configuration() is if
> usb_host_config->interface[i] is null.  I think that's only true if the
> configuration is not active, meaning the device was in the addressed
> state.
> 
> Alan, does this sound correct?

It does.  However usb_reset_configuration() doesn't get called if 
actconfig is NULL.

> I think the fix is to dig the interfaces out of the interface cache
> instead.  I'll post a patch in a bit.

No, that's not right.  The interface cache isn't needed; 
usb_reset_configuration() should never be called if there isn't an 
active configuration -- it would make no sense.

Instead find out exactly what's going wrong.  Write a debugging patch
to print actconfig and actconfig->desc.bNumInterfaces in
proc_setconfig(), and print intf in usb_reset_configuration().

Alan Stern
Comment 7 Christophe Fergeau 2010-02-25 16:52:15 UTC
Le 25/02/2010 16:37, Sarah Sharp a écrit :
>
> Would you recompile your kernel with CONFIG_USB_DEBUG turned on, and
> send me the dmesg for the crash, and a log with the patch reverted?
>    

Yeah, sure, I probably won't be able to do that before the week-end though.

For the record, I haven't reverted bee415ce427 from master since it 
couldn't be reverted without getting a conflict, and I was bound to fail 
if I tried to fix the conflict. So I tested that the bug occurs with 
master, with bee415ce427 and not with bee415ce427. And the logs I can 
send you are one from bee415ce427 and one from bee415ce427^ (unless you 
can provide me with a patch reverting cleanly this commit from master), 
I hope that's ok ?

Christophe
Comment 8 Anonymous Emailer 2010-02-25 16:53:35 UTC
Reply-To: sarah.a.sharp@linux.intel.com

On Thu, Feb 25, 2010 at 11:14:05AM +0100, Christophe Fergeau wrote:
> Hi,
> 
> Le 25/02/2010 00:50, Sarah Sharp a écrit :
> >
> >Random stuff, hmmm?  Maybe you asked it to change alternate settings to
> >one that the device doesn't provide?  Can you post your python script?
> >
> 
> Yep, really random stuff ) The attached script puts the ipod in dfu
> mode and then uploads data from a file to it. The data I'm sending
> comes from /dev/urandom or a randomly chosen file in /bin. If I send
> the right amount of data, the ipod ends up in a state when I can run
> my script a second time, and this is when running it this second
> time that I get this oops. So it's truly happening when sending
> random stuff to an ipod in a weird state ;)
> If I send valid data with this script, or a small amount of data, I
> can do it again and again with no crash, so I don't think it's
> directly related to the script.

Would you recompile your kernel with CONFIG_USB_DEBUG turned on, and
send me the dmesg for the crash, and a log with the patch reverted?

Thanks,
Sarah Sharp
Comment 9 Anonymous Emailer 2010-02-26 02:14:20 UTC
Reply-To: sarah.a.sharp@linux.intel.com

Christophe,

Along with sending me dmesg with CONFIG_USB_DEBUG enabled against
2.6.33-r8, can you send me dmesg with this patch applied?  I think this
will avert the oops and let me see what's really going on with the
device.  I can wait until the weekend. :)

Sarah Sharp


diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
index 9bc95fe..13ba115 100644
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -1437,6 +1437,7 @@ int usb_reset_configuration(struct usb_device *dev)
 	int			i, retval;
 	struct usb_host_config	*config;
 	struct usb_hcd *hcd = bus_to_hcd(dev->bus);
+	int			bad_config = 0;
 
 	if (dev->state == USB_STATE_SUSPENDED)
 		return -EHOSTUNREACH;
@@ -1452,6 +1453,30 @@ int usb_reset_configuration(struct usb_device *dev)
 	}
 
 	config = dev->actconfig;
+	printk(KERN_DEBUG "Dev in state %u.\n", dev->state);
+	printk(KERN_DEBUG "Config %u has %u interfaces.\n",
+			config->desc.bConfigurationValue,
+			config->desc.bNumInterfaces);
+	for (i = 0; i < config->desc.bNumInterfaces; i++) {
+		if (config->interface[i] == NULL) {
+			printk(KERN_DEBUG "config->interface[%i] is NULL.\n",
+					i);
+			bad_config = 1;
+			continue;
+		}
+		printk(KERN_DEBUG "Config %u intf %u has %u altsettings.\n",
+				config->desc.bConfigurationValue,
+				config->desc.bNumInterfaces,
+				config->interface[i]->num_altsetting);
+		if (config->interface[i]->altsetting == NULL) {
+			printk(KERN_DEBUG "interface %i "
+					"altsetting array is NULL.\n", i);
+			bad_config = 1;
+		}
+	}
+	if (bad_config)
+		return -EPIPE;
+
 	retval = 0;
 	mutex_lock(&hcd->bandwidth_mutex);
 	/* Make sure we have enough bandwidth for each alternate setting 0 */
Comment 10 Christophe Fergeau 2010-03-01 09:26:45 UTC
Hi Sarah,

Le 26/02/2010 02:20, Sarah Sharp a écrit :
> Along with sending me dmesg with CONFIG_USB_DEBUG enabled against
> 2.6.33-r8, can you send me dmesg with this patch applied?  I think this
> will avert the oops and let me see what's really going on with the
> device.  I can wait until the weekend. :)
>    

Here are the files you asked for. I appended the corresponding git hash 
to the name of each dmesg.

* dmesg-3f0479e is the commit when it stopped working, so you've got an 
oops in the dmesg
* dmesg-91017f9 is the commit right before 3f0479e
* dmesg-06a79b8 is linus's git master from Saturday with your patch 
applied (and the oops is still triggered)

I wasn't sure if you wanted to see the output of your patch on a recent 
kernel or on one of the older kernels I'm using for the tests, let me 
know if you want me to do some more tests.

Christophe
Comment 11 Marti Raudsepp 2010-03-10 18:55:46 UTC
I am also experiencing a somewhat similar bug in 2.6.33 -- this time through libusb.

I'm developing firmware for an ARM microcontroller with an embedded USB bootloader for flashing. In Linux I'm using sam7utils (again libusb) to program the device. As you can imagine, this requires lots of USB plugs/unplugs; so much that I have a button on my board for "disconnecting" USB. :)

The bug seems to occur whenever I reset the board and run sam7utils too quickly. When I give it some time to settle after plugging in, everything works as expected. The interesting thing is, after pushing reset, the "USB disconnect" message appears AFTER the oops.

After this oops occurs, I can not access the device again until I reboot (the process simply hangs in an unkillable state).

Here's the oops report:

16:57:48 usb 1-3.2.2: USB disconnect, address 16
16:57:50 usb 1-3.2.2: new full speed USB device using ehci_hcd and address 17
17:36:53 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
17:36:53 IP: [<ffffffffa0214071>] usb_altnum_to_altsetting+0x1/0x50 [usbcore]
17:36:53 PGD 6dbe8067 PUD 6d816067 PMD 0 
17:36:53 Oops: 0000 [#1] PREEMPT SMP 
17:36:53 last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:00/power_supply/BAT0/charge_full
17:36:53 CPU 0 
17:36:53 Pid: 4752, comm: sam7 Not tainted 2.6.33-ARCH #1 0KU184/Latitude D630                   
17:36:53 RIP: 0010:[<ffffffffa0214071>]  [<ffffffffa0214071>] usb_altnum_to_altsetting+0x1/0x50 [usbcore]
17:36:53 RSP: 0018:ffff88007c5ddd18  EFLAGS: 00010246
17:36:53 RAX: 0000000000000002 RBX: 0000000000000002 RCX: ffff88006db4c078
17:36:53 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
17:36:53 RBP: ffff88007c5ddd88 R08: ffff88007c5dc000 R09: dead000000200200
17:36:53 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007bae2800
17:36:53 R13: ffff88007c0d6c00 R14: 0000000000000000 R15: 00000000ffffffe0
17:36:53 FS:  00007f01df4db700(0000) GS:ffff880001800000(0000) knlGS:0000000000000000
17:36:53 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
17:36:53 CR2: 0000000000000010 CR3: 000000006dbe9000 CR4: 00000000000006f0
17:36:53 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
17:36:53 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
17:36:53 Process sam7 (pid: 4752, threadinfo ffff88007c5dc000, task ffff88006c458e40)
17:36:53 Stack:
17:36:53 ffff88007c5ddd88 ffffffffa021e8e3 0000000000000000 ffff880000000000
17:36:53 <0> ffff880000001388 ffffffff8111fb07 ffff88007b6d8c00 ffff88007ad84d50
17:36:53 <0> ffff88007c5ddd88 0000000000000000 ffff88007c72a0c0 00007fffe337f58c
17:36:53 Call Trace:
17:36:53 [<ffffffffa021e8e3>] ? usb_reset_configuration+0x123/0x250 [usbcore]
17:36:53 [<ffffffff8111fb07>] ? __dentry_open+0x287/0x3f0
17:36:53 [<ffffffffa0229097>] usbdev_do_ioctl+0xcb7/0x1270 [usbcore]
17:36:53 [<ffffffff8111fd74>] ? nameidata_to_filp+0x54/0x70
17:36:53 [<ffffffff811302e0>] ? do_filp_open+0x980/0xd80
17:36:53 [<ffffffffa0220bd7>] ? usb_suspend_both+0x277/0x300 [usbcore]
17:36:53 [<ffffffffa0229703>] usbdev_ioctl+0x43/0x70 [usbcore]
17:36:53 [<ffffffff81131978>] vfs_ioctl+0x38/0xd0
17:36:53 [<ffffffff81131b20>] do_vfs_ioctl+0x80/0x560
17:36:53 [<ffffffff81132081>] sys_ioctl+0x81/0xa0
17:36:53 [<ffffffff8100a002>] system_call_fastpath+0x16/0x1b
17:36:53 Code: 00 48 83 c1 08 48 8b 38 0f b6 7f 02 39 f7 74 bc 83 c2 01 44 39 c2 7c e2 31 c0 eb b0 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 <44> 8b 47 10 48 89 e5 45 85 c0 74 32 48 8b 07 31 d2 0f b6 48 03 
17:36:53 RIP  [<ffffffffa0214071>] usb_altnum_to_altsetting+0x1/0x50 [usbcore]
17:36:53 RSP <ffff88007c5ddd18>
17:36:53 CR2: 0000000000000010
17:36:53 ---[ end trace 03286e9a42890582 ]---
17:36:56 usb 1-3.2.2: USB disconnect, address 17
Comment 12 Christophe Fergeau 2010-03-19 15:18:21 UTC
Hello,

Le 01/03/2010 10:25, Christophe Fergeau a écrit :
> Le 26/02/2010 02:20, Sarah Sharp a écrit :
>> Along with sending me dmesg with CONFIG_USB_DEBUG enabled against
>> 2.6.33-r8, can you send me dmesg with this patch applied?  I think this
>> will avert the oops and let me see what's really going on with the
>> device.  I can wait until the weekend. :)
>
> Here are the files you asked for. I appended the corresponding git 
> hash to the name of each dmesg.
>
> * dmesg-3f0479e is the commit when it stopped working, so you've got 
> an oops in the dmesg
> * dmesg-91017f9 is the commit right before 3f0479e
> * dmesg-06a79b8 is linus's git master from Saturday with your patch 
> applied (and the oops is still triggered)
>
> I wasn't sure if you wanted to see the output of your patch on a 
> recent kernel or on one of the older kernels I'm using for the tests, 
> let me know if you want me to do some more tests.

Do you need additional information or are you just lacking time to get 
back to this issue?

Thanks,

Christophe
Comment 13 Anonymous Emailer 2010-03-20 14:15:18 UTC
Reply-To: sarah.a.sharp@linux.intel.com

On Fri, Mar 19, 2010 at 04:17:09PM +0100, Christophe Fergeau wrote:
> Hello,
> 
> Le 01/03/2010 10:25, Christophe Fergeau a écrit :
> >Le 26/02/2010 02:20, Sarah Sharp a écrit :
> >>Along with sending me dmesg with CONFIG_USB_DEBUG enabled against
> >>2.6.33-r8, can you send me dmesg with this patch applied?  I think this
> >>will avert the oops and let me see what's really going on with the
> >>device.  I can wait until the weekend. :)
> >
> >Here are the files you asked for. I appended the corresponding git
> >hash to the name of each dmesg.
> >
> >* dmesg-3f0479e is the commit when it stopped working, so you've
> >got an oops in the dmesg
> >* dmesg-91017f9 is the commit right before 3f0479e
> >* dmesg-06a79b8 is linus's git master from Saturday with your
> >patch applied (and the oops is still triggered)
> >
> >I wasn't sure if you wanted to see the output of your patch on a
> >recent kernel or on one of the older kernels I'm using for the
> >tests, let me know if you want me to do some more tests.
> 
> Do you need additional information or are you just lacking time to
> get back to this issue?

Lacking time, sorry. :(  Thanks for reminding me.

Sarah Sharp
Comment 14 Rafael J. Wysocki 2010-03-21 19:24:43 UTC
Handled-By : Sarah Sharp <sarah.a.sharp@linux.intel.com>
Comment 15 Rafael J. Wysocki 2010-03-22 21:22:52 UTC
On Monday 22 March 2010, Sarah Sharp wrote:
> On Sun, Mar 21, 2010 at 09:30:49PM +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.32 and 2.6.33.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.32 and 2.6.33.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=15376
> > Subject             : regression (oops) with usb in 2.6.33-rc8
> > Submitter   : Christophe Fergeau <cfergeau@mandriva.com>
> > Date                : 2010-02-23 10:58 (27 days old)
> > Handled-By  : Sarah Sharp <sarah.a.sharp@linux.intel.com>
> 
> Yes, this should still be included.
Comment 16 Anonymous Emailer 2010-05-04 21:43:42 UTC
Reply-To: sarah.a.sharp@linux.intel.com

On Fri, Mar 19, 2010 at 04:17:09PM +0100, Christophe Fergeau wrote:
> Hello,
> 
> Le 01/03/2010 10:25, Christophe Fergeau a écrit :
> >Le 26/02/2010 02:20, Sarah Sharp a écrit :
> >>Along with sending me dmesg with CONFIG_USB_DEBUG enabled against
> >>2.6.33-r8, can you send me dmesg with this patch applied?  I think this
> >>will avert the oops and let me see what's really going on with the
> >>device.  I can wait until the weekend. :)
> >
> >Here are the files you asked for. I appended the corresponding git
> >hash to the name of each dmesg.
> >
> >* dmesg-3f0479e is the commit when it stopped working, so you've
> >got an oops in the dmesg
> >* dmesg-91017f9 is the commit right before 3f0479e
> >* dmesg-06a79b8 is linus's git master from Saturday with your
> >patch applied (and the oops is still triggered)
> >
> >I wasn't sure if you wanted to see the output of your patch on a
> >recent kernel or on one of the older kernels I'm using for the
> >tests, let me know if you want me to do some more tests.
> 
> Do you need additional information or are you just lacking time to
> get back to this issue?

Christophe, I remembered that you had a problem similar to Michael
Buesch with resetting the configuration on your ipod.  I this commit
might fix your problem:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e4a3d94658b5760fc947d7f7185c57db47ca362a

It's applied against 2.6.34, but the patch should be back ported to
2.6.33.  Can you try the patch out?

Sarah Sharp


On Sun, May 02, 2010 at 05:51:49PM -0400, Alan Stern wrote:
> On Sun, 2 May 2010, Michael Buesch wrote:
> 
> > On Sunday 02 May 2010 21:24:49 Alan Stern wrote:
> > > On Sun, 2 May 2010, Michael Buesch wrote:
> > > 
> > > > This fixes a NULL pointer dereference triggered by an off-by-one
> > > > error, if the USB_REQ_SET_CONFIGURATION request to the device in
> > > > usb_reset_configuration() fails.
> > > > 
> > > > Signed-off-by: Michael Buesch <mb@bu3sch.de>
> > > > Cc: stable@kernel.org
> > > > 
> > > > ---
> > > > 
> > > > Alan, this fixes the crash.
> > > 
> > > Can you explain why?  I don't see any off-by-one errors.
> > > 
> > > > Index: linux-2.6.33/drivers/usb/core/message.c
> > > > ===================================================================
> > > > --- linux-2.6.33.orig/drivers/usb/core/message.c        2010-05-02
> 19:41:58.000000000 +0200
> > > > +++ linux-2.6.33/drivers/usb/core/message.c     2010-05-02
> 19:42:46.000000000 +0200
> > > > @@ -1489,8 +1489,10 @@ reset_old_alts:
> > > >                         USB_REQ_SET_CONFIGURATION, 0,
> > > >                         config->desc.bConfigurationValue, 0,
> > > >                         NULL, 0, USB_CTRL_SET_TIMEOUT);
> > > > -       if (retval < 0)
> > > > +       if (retval < 0) {
> > > > +               i--;
> > > >                 goto reset_old_alts;
> > > > +       }
> > > 
> > > This extra decrement should not be needed, because the "for" loop at 
> > > reset_old_alts starts out by decrementing i.
> > 
> > No it does not. The initialization area of the for loop is empty.
> > Which is correct for the case where the loop is entered through the
> > first if (retval < 0) without a goto. Because in that case the previous
> > for-loop was terminated with a break.
> 
> No, it's wrong in that case as well.  It will attempt to deallocate 
> bandwidth that didn't get allocated previously.
> 
> >  However for the second case, the first
> > for loop will have looped through all of its elements. Leaving i at
> bNumInterfaces.
> > So just before goto reset_old_alts i is bNumInterfaces. Which is an invalid
> index
> > as it is one beyond the array.
> 
> Ah, I see the problem.  The bug has already been fixed in 2.6.34-rc by:
> 
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e4a3d94658b5760fc947d7f7185c57db47ca362a
> 
> Evidently this patch needs to be applied to the stable trees.  Sarah,
> this would explain why other people using 2.6.33 observe the same bug.
> 
> Greg, can you take care of this?
> 
> Alan Stern
> 
>
Comment 17 Florian Mickler 2010-12-08 06:53:09 UTC
Fixed by: e4a3d94658b5760fc947d7f7185c57db47ca362a

If this is not correct, please reopen/shout.