Bug 216543
Summary: | kernel NULL pointer dereference usb_hcd_alloc_bandwidth | ||
---|---|---|---|
Product: | Drivers | Reporter: | Nazar Mokrynskyi (nazar) |
Component: | USB | Assignee: | Default virtual assignee for Drivers/USB (drivers_usb) |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.19.10 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel log from latest crash
kernel log from first crash Diagnostic patch for uvcvideo driver Kernel log with uvc-trace patch applied |
Description
Nazar Mokrynskyi
2022-09-29 18:53:46 UTC
On Thu, Sep 29, 2022 at 06:53:46PM +0000, bugzilla-daemon@kernel.org wrote: > With a flaky USB 3.0 cable (3m extension + 2m cable + 90 degree adapter) and > Logitech BRIO webcam I got exactly the same null pointer dereference twice > already. That's really an unstable and unsupported system, sorry. If you fix your cable it should work properly, right? > Here are two instances (from different boots): > [64977.148098] BUG: kernel NULL pointer dereference, address: > 0000000000000000 > [64977.148101] #PF: supervisor read access in kernel mode > [64977.148102] #PF: error_code(0x0000) - not-present page > [64977.148103] PGD 101370067 P4D 101370067 PUD 0 > [64977.148105] Oops: 0000 [#1] SMP NOPTI > [64977.148107] CPU: 14 PID: 27951 Comm: VideoCapture Not tainted > 5.19.10-xanmod1-x64v2 #0~20220920.git017c598 What about any kernel log messages from right before this crashed? There should be some disconnect or other USB messages, right? Specifics here would be good to see. > [64977.148109] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION > D/B550 > VISION D, BIOS F15d 07/20/2022 > [64977.148109] RIP: 0010:usb_ifnum_to_if+0x34/0x60 > [64977.148113] Code: 74 33 0f b6 4a 04 84 c9 74 33 83 e9 01 48 8d 82 98 00 00 > 00 48 8d bc ca a0 00 00 00 eb 09 48 83 c0 08 48 39 f8 74 16 48 8b 10 <48> 8b > 0a > 0f b6 49 02 39 f1 75 e9 48 89 d0 c3 cc cc cc cc 31 d2 48 > [64977.148114] RSP: 0018:ffffb20951407bb0 EFLAGS: 00010202 > [64977.148115] RAX: ffff8cfbbc618098 RBX: ffff8ceb844cc800 RCX: > 0000000000000004 > [64977.148116] RDX: 0000000000000000 RSI: 0000000000000001 RDI: > ffff8cfbbc6180c0 > [64977.148117] RBP: 0000000000000000 R08: 0000000080000000 R09: > ffffffff8f590de8 > [64977.148117] R10: 0000000000000001 R11: 0000000000000001 R12: > ffff8cf67c70f398 > [64977.148118] R13: 0000000000000000 R14: ffff8cf67c70f208 R15: > ffff8ceb8123c000 > [64977.148119] FS: 00007f5f51379640(0000) GS:ffff8d0a3ed80000(0000) > knlGS:0000000000000000 > [64977.148120] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [64977.148120] CR2: 0000000000000000 CR3: 000000023b842000 CR4: > 0000000000750ee0 > [64977.148121] PKRU: 55555554 > [64977.148122] Call Trace: > [64977.148123] <TASK> > [64977.148124] usb_hcd_alloc_bandwidth+0x241/0x360 > [64977.148127] usb_set_interface+0x11d/0x340 > [64977.148130] uvc_video_start_transfer+0x17b/0x4b0 [uvcvideo] This isn't good, we shouldn't crash when a device is removed, but this might be an issue with some reference counting. More log messages might help out here. thanks, greg k-h Created attachment 301905 [details]
kernel log from latest crash
> That's really an unstable and unsupported system, sorry. If you fix your > cable it should work properly, right? Yes. And I totally understand that is not supported, the only reason I posted this is because it seemed to uncover some race condition in the code that might be beneficial to fix. > What about any kernel log messages from right before this crashed? > There should be some disconnect or other USB messages, right? Specifics here would be good to see. Attached full kernel log. > This isn't good, we shouldn't crash when a device is removed, but this might be an issue with some reference counting. More log messages might help out here. Yes, that is the reason I decided to create a bug report, just hoping it is useful. Created attachment 301906 [details]
kernel log from first crash
Previously uploaded file is from second log snippet, this is the first one for completeness since stack traces are slightly different there.
On Fri, Sep 30, 2022 at 11:38:46AM +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216543 > > --- Comment #4 from Nazar Mokrynskyi (nazar@mokrynskyi.com) --- > Created attachment 301906 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=301906&action=edit > kernel log from first crash > > Previously uploaded file is from second log snippet, this is the first one > for > completeness since stack traces are slightly different there. The log file is full of warnings and other messages way before USB is ever involved. You might want to resolve those first. Anyway, yes, the device disconnects itself from the USB bus which is an electrical event and the video driver fails trying to send data to it, and then things blow up again. As there is a real solution for this (fix the cable), I recommend doing that first :) thanks, greg k-h Created attachment 301908 [details]
Diagnostic patch for uvcvideo driver
This looks like a race in the uvcvideo driver, possibly between disconnect and video start.
You might be able to trace this down more by running with the attached patch.
Created attachment 303022 [details]
Kernel log with uvc-trace patch applied
I'm on 6.0.2 and seemingly get this even more frequently with good cable and no extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it within a few minutes.
USB seems to reset, often camera stops or freezes in the browser, but the light on the camera itself remains on. Sometimes I can enable/disable/enable camera for it to reboot, but the last time I did that in the log I got null pointer de-reference again.
Please let me know if there is any other information I can provide and what could be the root cause of this annoying behavior.
|