Bug 43333

Summary: option-based modem unhappy on USB 3.0 port: "Transfer event TRB DMA ptr not part of current TD", "rejecting I/O to offline device"
Product: Drivers Reporter: James Ettle (james)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, florian, sarah
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: Output of lspci -vvv
Output of lspci -vvv -n
Possible bug fix.
dmesg output with debugging enabled, patch applied

Description James Ettle 2012-06-02 21:12:27 UTC
I have a Huawei e1550 mobile broadband dongle (which also doubles up as a virtual CD-ROM drive with drivers on it). My notebook has a NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04), running kernel 3.4. Shortly after attaching the modem to a USB 3.0 port, the following shows up in dmesg:


[28980.250828] usb 3-2: new high-speed USB device number 7 using xhci_hcd
[28980.275290] usb 3-2: New USB device found, idVendor=12d1, idProduct=1001
[28980.275293] usb 3-2: New USB device strings: Mfr=2, Product=1, SerialNumber=0
[28980.275295] usb 3-2: Product: HUAWEI Mobile
[28980.275297] usb 3-2: Manufacturer: HUAWEI Technology
[28980.275496] usb 3-2: ep 0x85 - rounding interval to 32768 microframes, ep desc says 0 microframes
[28980.275501] usb 3-2: ep 0x4 - rounding interval to 32768 microframes, ep desc says 0 microframes
[28980.275505] usb 3-2: ep 0x5 - rounding interval to 32768 microframes, ep desc says 0 microframes
[28980.275509] usb 3-2: ep 0x86 - rounding interval to 32768 microframes, ep desc says 0 microframes
[28980.281264] option 3-2:1.0: GSM modem (1-port) converter detected
[28980.281497] usb 3-2: GSM modem (1-port) converter now attached to ttyUSB0
[28980.281653] option 3-2:1.1: GSM modem (1-port) converter detected
[28980.281722] usb 3-2: GSM modem (1-port) converter now attached to ttyUSB1
[28980.281840] option 3-2:1.2: GSM modem (1-port) converter detected
[28980.281942] usb 3-2: GSM modem (1-port) converter now attached to ttyUSB2
[28980.283049] scsi32 : usb-storage 3-2:1.3
[28980.284026] scsi33 : usb-storage 3-2:1.4
[28981.289053] scsi 32:0:0:0: CD-ROM            HUAWEI   Mass Storage     2.31 PQ: 0 ANSI: 2
[28981.291440] scsi 33:0:0:0: Direct-Access     HUAWEI   MMC Storage      2.31 PQ: 0 ANSI: 2
[28981.299955] sr1: scsi-1 drive
[28981.300181] sr 32:0:0:0: Attached scsi CD-ROM sr1
[28981.300498] sr 32:0:0:0: Attached scsi generic sg2 type 5
[28981.301134] sd 33:0:0:0: Attached scsi generic sg3 type 0
[28981.309991] sd 33:0:0:0: [sdb] Attached SCSI removable disk


A short time afterwards the following appears:


[29065.953248] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
[29101.105619] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
[29121.074940] sr 32:0:0:0: Device offlined - not ready after error recovery
[29123.207488] sr 32:0:0:0: rejecting I/O to offline device


The final line repeats indefinitely until the modem is unplugged. The modem otherwise appears to work. This doesn't happen when connected to a USB 2.0 port.
Comment 1 Sarah Sharp 2012-07-02 23:30:36 UTC
James, can you try compiling my for-usb-linus branch and see if this takes care of your issue?

git clone -b for-usb-linus git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci.g

There was a bug that got fixed that could cause the ERROR message to appear.
Comment 2 James Ettle 2012-07-05 20:08:14 UTC
(In reply to comment #1)

> git clone -b for-usb-linus
> git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci.g
> 
> There was a bug that got fixed that could cause the ERROR message to appear.

Hmm... every time I try that, I get "fatal: The remote end hung up unexpectedly". Are there some plain patches for 3.4.4 I could try?
Comment 3 Sarah Sharp 2012-07-24 21:26:52 UTC
James, can you retest with 3.4.5?  There was a patch that fixed the "ERROR Transfer event TRB DMA ptr not part of current TD" message for some other devices.  I suspect commit b62d32b9166b085a487916eca514b59b5ffdf2b7 may help your device as well.
Comment 4 James Ettle 2012-07-25 17:48:32 UTC
(In reply to comment #3)
> James, can you retest with 3.4.5?  There was a patch that fixed the "ERROR
> Transfer event TRB DMA ptr not part of current TD" message for some other
> devices.  I suspect commit b62d32b9166b085a487916eca514b59b5ffdf2b7 may help
> your device as well.

I'm on 3.4.6 and I still get:


[107173.234727] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
[107210.252645] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
[107230.221164] sr 74:0:0:0: Device offlined - not ready after error recovery
[107232.607962] sr 74:0:0:0: rejecting I/O to offline device
[107234.651880] sr 74:0:0:0: rejecting I/O to offline device
...
Comment 5 Sarah Sharp 2012-07-25 19:53:16 UTC
Ok, James, thanks for trying that.  Can you recompile your kernel with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on, and post the resulting dmesg with the failure.  I'll need to see some of the log before the failure as well.
Comment 6 Sarah Sharp 2012-07-25 19:54:11 UTC
Also, will you please post the output of `sudo lspci -vvv` and `sudo lspci -vvv -n`?  I need to see which host controller you're having issues with.
Comment 7 James Ettle 2012-07-25 20:39:09 UTC
Created attachment 76091 [details]
Output of lspci -vvv
Comment 8 James Ettle 2012-07-25 20:40:00 UTC
Created attachment 76101 [details]
Output of lspci -vvv -n

lspci outputs attached. Debug kernel will take a little longer, will post output in due course.
Comment 9 Sarah Sharp 2012-07-26 20:00:46 UTC
I think I have fixed the bug.  Please apply the attached patch and let me know if it fixes the ERROR messages.
Comment 10 Sarah Sharp 2012-07-26 20:01:32 UTC
Created attachment 76151 [details]
Possible bug fix.
Comment 11 James Ettle 2012-07-26 20:28:19 UTC
(In reply to comment #9)
> I think I have fixed the bug.  Please apply the attached patch and let me
> know
> if it fixes the ERROR messages.

Do you want the CONFIG_[..]_DEBUG options of Comment #5 enabled and the corresponding trace when built against this patch?
Comment 12 Sarah Sharp 2012-07-26 23:37:16 UTC
(In reply to comment #11)
> (In reply to comment #9)
> > I think I have fixed the bug.  Please apply the attached patch and let me
> know
> > if it fixes the ERROR messages.
> 
> Do you want the CONFIG_[..]_DEBUG options of Comment #5 enabled and the
> corresponding trace when built against this patch?

Yes.
Comment 13 James Ettle 2012-07-27 06:57:30 UTC
OK, I'll build a trace-enabled version soon. That said, things are looking good so far with the patch of Comment #10 applied (to a non-debug kernel).
Comment 14 James Ettle 2012-07-28 08:12:17 UTC
Created attachment 76391 [details]
dmesg output with debugging enabled, patch applied
Comment 15 Florian Mickler 2012-08-26 10:48:23 UTC
A patch referencing this bug report has been merged in Linux v3.6-rc3:

commit 50d0206fcaea3e736f912fd5b00ec6233fb4ce44
Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Date:   Thu Jul 26 12:03:59 2012 -0700

    xhci: Fix bug after deq ptr set to link TRB.
Comment 16 James Ettle 2012-09-06 20:55:36 UTC
For some reason, I'm now seeing this with the modem attached to a USB 2.0 port. I know it's appeared on at least kernel 3.5.2 with the patch from Comment #15, and on 3.5.3 (from my distro).
Comment 17 Sarah Sharp 2012-09-07 17:09:45 UTC
James: can you please open a separate bugzilla request?  It could be a different issue.