Bug 56811 - xhci_hcd spams log with "Not enough room on ring" messages, bringing system to its knees
Summary: xhci_hcd spams log with "Not enough room on ring" messages, bringing system t...
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-18 15:12 UTC by Mark
Modified: 2014-04-09 15:42 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.0.61
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Mark 2013-04-18 15:12:00 UTC
$ uname -a
Linux lubuntu-usb 3.0.0-31-generic #48-Ubuntu SMP Mon Feb 4 13:22:36 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I noticed some problems when writing to a USB 3.0 hard drive. Specifically a Seagate GoFlex 3TB drive with USB 3.0 adapter. The adapter reports that the drive sector size is 4KB, so as far as the computer is concerned it really does have 4KB sectors. (Not sure whether that's relevant but thought I would mention it.)

The problem occurs when I copy files to the drive, after increasing the maximum transfer size to 252KB (63 4KB sectors) or more by doing e.g.
  echo 504 >/sys/class/scsi_disk/7:0:0:0/device/max_sectors
(It seems that max_sectors is considered a multiple of 512 bytes, even if the actual sector size is larger.)

On copying files to the drive using cp, errors like this appear over and over in the log:
  xhci_hcd 0000:05:00.0: ERROR no room on ep ring
  xhci_hcd 0000:05:00.0: Not enough room on ring; need 63 TRBs, 62 TRBs left

Those lines repeat hundreds of thousands of times, bringing the system to its knees. Log files become huge and looking at CPU usage in top, usb-storage and rsyslogd use 99% each.

Eventually there is output like this in the log:
  usb 9-2: reset SuperSpeed USB device number 2 using xhci_hcd
  xhci_hcd 0000:05:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff88013524df80
  xhci_hcd 0000:05:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff88013524dfc0
  sd 7:0:0:0: [sdd] Unhandled error code
  sd 7:0:0:0: [sdd]  Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
  sd 7:0:0:0: [sdd] CDB: Write(10): 2a 00 1d 3e fe 15 00 00 3f 00
  end_request: I/O error, dev sdd, sector 3925340328

I don't know whether there is an underlying "actual" bug (should the block layer know not to issue too-large I/O requests if there is a limitation in the xhci driver?), but the xhci_hcd log messages should definitely be rate-limited or suppressed after a small number.


I'm using a USB 3.0 ExpressCard which has a Renesas chip:

# lspci -nn -vvv -s 05:00.0
05:00.0 USB Controller [0c03]: Renesas Technology Corp. Device [1912:0015] (rev 02) (prog-if 30 [XHCI])
	Physical Slot: 1
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 18
	Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=0 offset=00001000
		PBA: BAR=0 offset=00001080
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend+
		LnkCap:	Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 unlimited
			ClockPM+ Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [150 v1] #18
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci-hcd
Comment 1 Alan 2014-04-09 15:42:50 UTC
Please try 3.14 - it has a lot of xhci fixes

Note You need to log in before you can comment on or make changes to this bug.