Bug 207045

Summary: uvcvideo: properly handle UVC payloads that occur in the middle of bulk URBs
Product: Drivers Reporter: Julian Meyer (jucmeyer)
Component: USBAssignee: Default virtual assignee for Drivers/USB (drivers_usb)
Status: NEW ---    
Severity: normal CC: jucmeyer, louise, luciomf, rpgm2k
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: v5.5 Subsystem:
Regression: No Bisected commit-id:

Description Julian Meyer 2020-03-31 22:11:09 UTC
The integrated webcam on the Razer Blade (2017) does not work.

I found this thread from 2014 having issues with the same webcam, but it's unresolved: https://www.spinics.net/lists/linux-media/msg73476.html

At first, I turned on the trace parameter for the uvcvideo module and found the following logs as mentioned in the above thread first:

[   80.490256] uvcvideo: Frame complete (FID bit toggled).
[   80.490262] uvcvideo: frame 2080 stats: 0/0/1 packets, 0/0/0 pts (!early !initial), 0/1 scr, last pts/stc/sof 0/2069725020/32605
[   80.490264] uvcvideo: Marking buffer as bad (error bit set).
[   80.490606] uvcvideo: Frame complete (EOF found).
[   80.490942] uvcvideo: Marking buffer as bad (error bit set).
[   80.490947] uvcvideo: Dropping payload (out of sync).
[   80.492920] uvcvideo: Marking buffer as bad (error bit set).

After more investigation, I found this line at the end of uvc_video_decode_bulk that was causing the problem:

if (buf->bytesused == stream->queue.buf_used ||
  stream->bulk.payload_size == stream->bulk.max_payload_size) {
   // assume end of frame
}

I also ran a USBMon on the webcam and decoded it using the python script in the thread linked above. This yielded the following setup URB:

bmHint                         0x01
bFormatIndex                      1
bFrameIndex                       1
dwFrameInterval              333333
wKeyFrameRate                     0
wPFrameRate                       0
wCompQuality                      0
wCompWindowSize                   0
wDelay                           32
dwMaxVideoFrameSize          614400
dwMaxPayloadTransferSize     119296

I also calculated the size of a video frame by adding up the lengths of each URB sent. So, after this, it sends the UVC header, which looks correct:

0c8d6018 3bbcfa9b    .`.;¼ú›
3cbc3304 727a7082    <¼3.rzp‚
717c717d 6f7e7089    q|q}o~p‰
6e816f89 6e826e88    no‰n‚nˆ

In total sends 37 16384 byte URBs and one 8264 byte URB or 614472 bytes total (greater than the 614400 specified in the URB by 72 bytes). I tried removing the second condition from the if statement above and making it consider the end of frame based on finding a URB with size less than dwMaxVideoFrameSize and that seemed to change the problem at least.

Now, it was overflowing the V4L plane size which was set to 614400. It seems this is specified somewhere in uvc_queue_setup:

struct uvc_video_queue *queue = vb2_get_drv_priv(vq);
struct uvc_streaming *stream;
unsigned int size;
switch (vq->type) {
case V4L2_BUF_TYPE_META_CAPTURE:
	size = UVC_METADATA_BUF_SIZE;
	break;

default:
	stream = uvc_queue_to_stream(queue);
	size = stream->ctrl.dwMaxVideoFrameSize; // + 72
	break;
}

I'm not really sure where to go from here. It's not always 72 bytes over either. I've found that if I switch the resolution/format, it can be a different number.
Comment 1 Julian Meyer 2020-04-01 15:31:07 UTC
Narrowing this issue further, it seems that if the max video frame size is less than the max video transfer size, it works, but if it's over the max video transfer size, it does not work:

WORKING (low res)
-------

bmHint                         0x01
bFormatIndex                      1
bFrameIndex                       3
dwFrameInterval              333333
wKeyFrameRate                     0
wPFrameRate                       0
wCompQuality                      0
wCompWindowSize                   0
wDelay                           32
dwMaxVideoFrameSize           50688
dwMaxPayloadTransferSize     129024


NOT WORKING (high res)
-----------
bmHint                         0x01
bFormatIndex                      1
bFrameIndex                       1
dwFrameInterval              333333
wKeyFrameRate                     0
wPFrameRate                       0
wCompQuality                      0
wCompWindowSize                   0
wDelay                           32
dwMaxVideoFrameSize          614400
dwMaxPayloadTransferSize     119296

In the non-working case, first, I see this in the log when it gets set up:

[41790.164920] uvcvideo: Allocated 5 URB buffers of 32x512 bytes each.

I'm not sure why it's 5, but we know that each URB buffer is 16384 bytes large and then I see this message in the log a bunch:

[41790.355807] uvcvideo: payload size is: 131072, max is: 119296

My guess here is that it's sending dwMaxVideoFrameSize and assuming that is the dwMaxPayloadTransferSize. So, it tries to send 614400 in a single payload (not sure about the terminology here), but the buffer is only 119296 bytes large.

This means that it would not work if the max video frame size is greater than the max payload transfer size (buffer "overflows" and UVC assumes it's the end of the frame), but it would work if the maxVideoFrameSize is smaller than dwMaxPayloadTransferSize.

Also, it makes sense because if I run it at a high resolution, the bottom 1/3 of the picture is cut off/corrupted whereas the top part is just fine. In this case, dwMaxVideoFrameSize is probably about 30% larger than dwMaxPayloadTransferSize.
Comment 2 Julian Meyer 2020-04-01 22:34:57 UTC
I did more research and I found that the problem is that the camera is sending the next UVC payload immediately after finishing the previous one instead of waiting for the next URB.

I ran my camera at 640x480@30fps and found the following information:

First, the camera reports the max payload size and max frame size as follows:

Max Payload Size: 119296
Max Frame Size: 614400

Actual Received Payload Size (sum of all payloads): 614472

This number makes sense because it includes (frame size/payload size =) 6 payload headers of 12 bytes each, so 614472 bytes total.

The last URB is truncated because it fits in 37 full 16K URBs + one 8264 byte URB.

uvc_video_decode_bulk calls the following functions:
1. uvc_video_decode_start - find header length and decode it
2. uvc_video_decode_meta - decode the rest of the header metadata
3. uvc_video_decode_data - decodes data up to the max payload size and if it overflows, assume the frame is ended
4. uvc_video_decode_end - if this payload was end-of-frame, start the next frame

uvc_video_decode_bulk is called from uvc_video_complete which is the completion handler for each URB. This makes it so that each URB must contain one or less payloads because we only process at most one payload per call to the completion handler.

In the case of my camera, the driver hits the max payload size and assumes it's finished (the next payload is in the next URB). Then, it tries to parse the header for the next URB assuming it's a new payload even though it's actually in the middle of a payload.

This causes FID mismatches and all sorts of other issues.

Proposed Fix: When we see a URB that's larger than the maximum payload size, assume it's another payload and start processing the new payload.
Comment 3 Julian Meyer 2020-04-02 02:09:36 UTC
I submitted a patch here: https://lore.kernel.org/patchwork/patch/1219476/
Comment 4 Lucio MF 2020-10-27 14:49:38 UTC
Hi Julian,
I've been testing this for a few days now, mostly with google meets at 720p. Working like a charm. I'll post if something goes wrong.
Comment 5 Lucio MF 2020-11-20 14:50:32 UTC
(In reply to Lucio MF from comment #4)
> Hi Julian,
> I've been testing this for a few days now, mostly with google meets at 720p.
> Working like a charm. I'll post if something goes wrong.

So I've been testing a little more this patch and although it seems to been working properly (it fixes the issues with broken video image on higher resolution) there are some situations in witch the system freezes completely. I couldn't find a pattern just using it and don't really know how to debug this, I don't have to much experience in kernel development or debugging.

Could it be posible that in some situation len = urb->actual_length; will keep getting 0 or other value minor to 0, that will break the while condition?
Comment 6 Julian Meyer 2020-11-20 16:45:47 UTC
Hmm... are you using this patch?

https://lore.kernel.org/patchwork/patch/1227775/

I think I fixed that bug in v2/3.
Comment 7 Lucio MF 2020-11-25 23:00:50 UTC
(In reply to Julian Meyer from comment #6)
> Hmm... are you using this patch?
> 
> https://lore.kernel.org/patchwork/patch/1227775/
> 
> I think I fixed that bug in v2/3.

Thanks! I was definitely running an older version of your patch. Now freezes are gone :-D
Comment 8 Gabriel Gravel 2021-01-24 22:30:02 UTC
I've been using this patch on Ubuntu 20.04 with kernel 5.4 and now 5.8 without issue. Many thanks to you! Any chance this could be merged eventually?
Comment 9 Lucio MF 2021-03-02 14:58:42 UTC
Hi Julian,

I keep using this patch almost everyday and it mostly works fine. Although I find out that there are still some "stress" situations where it freezes the whole system. One can almost "force" the freeze in three situations:
- Using camera filters in Google Meets for longs periods.
- Closing a call and starting one right away
- Opening two programs that requires the camera

Let me know if you're interested in me running some specific test to help solve this situation.