Bug 60699

Summary: Stalled state of endpoint will not be cleared
Product: Drivers Reporter: Florian Wolter (wolly84)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: RESOLVED CODE_FIX    
Severity: normal CC: sarah, stern, zonque
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2.0-4 (debian wheezy kernel); 3.9.11 (from kernel.org); 3.11-rc5 Subsystem:
Regression: No Bisected commit-id:
Attachments: Logfiles and Patch
Rebased Patch to Linux-3.11-rc5

Description Florian Wolter 2013-08-06 09:37:48 UTC
Created attachment 107117 [details]
Logfiles and Patch

After a stall from device of an endpoint this halt state can not be cleared from a user-Process.

Latest working kernel version: 3.2.0-4 (debin wheezy kernel) (problem must exist sinze 3.0)
Earliest working kernel version: 3.9.11 (mainline)
Distribution: Fedora 18 x86_64


Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux

"lspci"
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:16.3 Serial controller: Intel Corporation 6 Series/C200 Series Chipset Family KT Controller (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b4)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a4)
00:1f.0 ISA bridge: Intel Corporation Q67 Express Chipset Family LPC Controller (rev 04)
00:1f.2 RAID bus controller: Intel Corporation 82801 SATA Controller [RAID mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cedar PRO [Radeon HD 5450/6350]
04:00.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
05:01.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
05:05.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
05:07.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
05:09.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
06:00.0 USB controller: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller (rev 02)
07:00.0 USB controller: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller (rev 02)
08:00.0 USB controller: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller (rev 02)
09:00.0 USB controller: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller (rev 02)
0a:02.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)



The communication-sequence is the following (communication runs over a bulk endpoint):

At first the communication with device runs perfect, than after a while the device set a stall of the endpoint and the XHCI-HCD detect this correctly (line 2051 in logfile1.txt).
Now the user-Process send a CLEAR_HALT to the specific endpoint, which calls the "xhci_endpoint_reset()"-Function of the xhci-implementation. In this Function only the message "Endpoint ... not halted, refusing to reset." is printed so I think the halt-state is always cleared. 
After starting a new communication with the device the xhci-hcd get the message "WARN halted endpoint, queueing URB anyway" which would say in my opinion that the HALT state is not cleared.
Now the communication with the device, over this endpoint, fails always. The only method to start a new communication with the device is to reconnect the device hard over plugging of the device. 

I analyses the xhci-code and found that if a stall is detected the ep_state set to "EP_HALTED" and in function finish_td() the variable ep->stopped_td was filled with the td which causes the stall. 
Now the function handle_stopped_endpoint() is called which clears the td's and checks if a td is equal to the td which causes the stall. But in this function the variable ep->stopped_td and ep->stopped_trb is always overwritten with "NULL" (see line 2325 in logfile1.txt) so if the xhci_endpoint_reset() is called after that the ep->stopped_td is always "NULL" and the reset is refused.
So the endpoint is always stalled and the communication with the device can not run over this endpoint. 

I think the function handle_stopped_endpoint() should only override the ep->stopped_td and ep->stopped_trb if the ep is not in a halt state to allow the xhci_endpoint_reset() function to clear the halted td. This change can see in the applied patch (based on 3.2.0-4 [debian wheezy kernel]).


The attached logfile1.txt presents the fault and logfile2.txt the situation after applying the patch xhci-ring.patch.
Comment 1 Daniel Mack 2013-08-10 12:45:22 UTC
Please check again with the most recent kernel.

If the problem persist, please rebase your patch and see if it still fixes the issue for you. Then attach the new patch to this bug report.
Comment 2 Florian Wolter 2013-08-12 08:30:53 UTC
Created attachment 107184 [details]
Rebased Patch to Linux-3.11-rc5
Comment 3 Florian Wolter 2013-08-12 08:35:11 UTC
The Problem exists as well in actual Kernel-Tree (3.11-rc5). The attached Patch fixes this issue.
Comment 4 Sarah Sharp 2013-08-12 19:40:16 UTC
Hi Florian,

You said you rebased this patch to 3.11-rc5.  Where did you get this patch?  Is it your patch or someone else's patch?

If it's your patch, please submit this patch as a standard kernel patch (following Documentation/SubmittingPatches), and send it to me <sarah.a.sharp@linux.intel.com> and Cc the Linux USB mailing list <linux-usb@vger.kernel.org>.

Sarah Sharp
Comment 5 Sarah Sharp 2013-09-24 18:24:38 UTC
Patch has been sent to Greg to be queued for 3.12:

https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?id=526867c3ca0caa2e3e846cb993b0f961c33c2abb