Most recent kernel where this bug did not occur: 2.6.16 Distribution: Debian Hardware Environment: Speedtocuh USB ADSL modem Software Environment: ATM routed IP Problem Description: From 2.6.17 the speedtouch driver blocks the conection. Steps to reproduce: Download at the maximun transfer rate with a big file apt-get -y install tex --> 106K At some point in the download process, it stop. wave:/usr/src/bughunting# atmdiag Itf TX_okay TX_err RX_okay RX_err RX_drop 0 AAL0 0 0 0 0 0 AAL5 2429 0 6936 0 0 wave:/usr/src/bughunting# atmdiag Itf TX_okay TX_err RX_okay RX_err RX_drop 0 AAL0 0 0 0 0 0 AAL5 2433 0 6936 0 0 Only send ATM cells Bisection: wave:/usr/src/bughunting# git bisect bad dccf4a48d47120a42382ba526f1a0848c13ba2a4 is first bad commit commit dccf4a48d47120a42382ba526f1a0848c13ba2a4 Author: Alan Stern <stern@rowland.harvard.edu> Date: Sat Dec 17 17:58:46 2005 -0500 [PATCH] UHCI: use one QH per endpoint, not per URB This patch (as623) changes the uhci-hcd driver to make it use one QH per device endpoint, instead of a QH per URB as it does now. Numerous areas of the code are affected by this. For example, the distinction between "queued" URBs and non-"queued" URBs no longer exists; all URBs belong to a queue and some just happen to be at the queue's head. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
wave:/usr/src/linux-2.6.17# cat .config|grep USB_DEBUG CONFIG_USB_DEBUG=y I can't find any debug warnings/errors from USB yet. Dmesg 2.6.17 and procedure to reproduce: klogd 1.4.1#18, log source = /proc/kmsg started. ing PCI hardware (bus 00) ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 Boot video device is 0000:00:02.0 PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1 PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI2._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI3._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI4._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 15) ACPI: PCI Interrupt Link [LNKC] (IRQs *3 4 5 6 7 9 10 11 12 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 12 15) *0, disabled. ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 *5 6 7 9 10 11 12 15) ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 *9 10 11 12 15) ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 7 9 10 11 12 15) ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 *10 11 12 15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 11 devices PnPBIOS: Disabled by ACPI PNP SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report TC classifier action (bugs to netdev@vger.kernel.org cc hadi@cyberus.ca) pnp: 00:01: ioport range 0x800-0x85f could not be reserved pnp: 00:01: ioport range 0xc00-0xc7f has been reserved pnp: 00:01: ioport range 0x860-0x8ff has been reserved pnp: 00:09: ioport range 0x100-0x1fe has been reserved pnp: 00:09: ioport range 0x200-0x277 has been reserved pnp: 00:09: ioport range 0x280-0x2e7 has been reserved pnp: 00:09: ioport range 0x2f0-0x2f7 has been reserved pnp: 00:09: ioport range 0x300-0x377 has been reserved pnp: 00:09: ioport range 0x380-0x3bb has been reserved pnp: 00:09: ioport range 0x3c0-0x3e7 could not be reserved pnp: 00:09: ioport range 0x3f6-0x3f7 has been reserved PCI: Ignore bogus resource 6 [0:0] of 0000:00:02.0 PCI: Bridge: 0000:00:01.0 IO window: disabled. MEM window: dfd00000-dfdfffff PREFETCH window: disabled. PCI: Bridge: 0000:00:1c.0 IO window: disabled. MEM window: dfc00000-dfcfffff PREFETCH window: disabled. PCI: Bridge: 0000:00:1c.1 IO window: disabled. MEM window: dfb00000-dfbfffff PREFETCH window: disabled. PCI: Bridge: 0000:00:1e.0 IO window: d000-dfff MEM window: disabled. PREFETCH window: disabled. ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:00:01.0 to 64 ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:00:1c.0 to 64 ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:00:1c.1 to 64 PCI: Setting latency timer of device 0000:00:1e.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 4096 (order: 2, 16384 bytes) TCP established hash table entries: 16384 (order: 5, 196608 bytes) TCP bind hash table entries: 8192 (order: 4, 98304 bytes) TCP: Hash tables configured (established 16384 bind 8192) TCP reno registered Simple Boot Flag value 0x87 read from CMOS RAM was invalid Simple Boot Flag at 0x7a set to 0x1 Initializing Cryptographic API io scheduler noop registered io scheduler anticipatory registered (default) io scheduler deadline registered io scheduler cfq registered 0000:00:1d.0: uhci_check_and_reset_hc: legsup = 0x001b 0000:00:1d.0: Performing full reset 0000:00:1d.1: uhci_check_and_reset_hc: legsup = 0x0010 0000:00:1d.1: Performing full reset 0000:00:1d.2: uhci_check_and_reset_hc: legsup = 0x0010 0000:00:1d.2: Performing full reset 0000:00:1d.3: uhci_check_and_reset_hc: legsup = 0x0010 0000:00:1d.3: Performing full reset pci_hotplug: PCI Hot Plug PCI Core version: 0.5 ACPI: Power Button (FF) [PWRF] ACPI: Power Button (CM) [VBTN] Using specific hotkey driver isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found lp: driver loaded but no devices found Real Time Clock Driver v1.12ac ppdev: user-space parallel port driver Linux agpgart interface v0.101 (c) Dave Jones agpgart: Detected an Intel 915G Chipset. agpgart: Detected 7932K stolen memory. agpgart: AGP aperture is 256M @ 0xc0000000 [drm] Initialized drm 1.0.1 20051102 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A parport: PnPBIOS parport detected. parport0: PC-style at 0x378 (0x778), irq 7, using FIFO [PCSPP,TRISTATE,COMPAT,ECP] lp0: using parport0 (interrupt-driven). lp0: console ready Floppy drive(s): fd0 is 1.44M floppy0: no floppy controllers found RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: loaded (max 8 devices) nbd: registered device at major 43 tg3.c:v3.59 (June 8, 2006) ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:02:00.0 to 64 eth0: Tigon3 [partno(BCM95751) rev 4001 PHY(5750)] (PCI Express) 10/100/1000BaseT Ethernet 00:0f:1f:e2:38:46 eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[1] eth0: dma_rwctrl[76180000] dma_mask[64-bit] Linux video capture interface: v1.00 bttv: driver version 0.9.16 loaded bttv: using 8 buffers with 2080k (520 pages) each for capture Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ICH6: IDE controller at PCI slot 0000:00:1f.1 ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 16 (level, low) -> IRQ 16 ICH6: chipset revision 3 ICH6: not 100%% native mode: will probe irqs later ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:DMA Probing IDE interface ide0... hda: ST3160021A, ATA DISK drive hdb: HL-DT-ST CD-RW GCE-8524B, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hda: max request size: 512KiB hda: 312581808 sectors (160041 MB) w/2048KiB Cache, CHS=19457/255/63, UDMA(100) hda: cache flushes supported hda: hda1 hdb: ATAPI 52X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 ide-floppy driver 0.99.newide libata version 1.20 loaded. ata_piix 0000:00:1f.2: version 1.05 ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] ACPI: PCI Interrupt 0000:00:1f.2[C] -> GSI 20 (level, low) -> IRQ 18 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ata1: SATA max UDMA/133 cmd 0xFE00 ctl 0xFE12 bmdma 0xFEA0 irq 18 ata2: SATA max UDMA/133 cmd 0xFE20 ctl 0xFE32 bmdma 0xFEA8 irq 18 ata1: dev 0 cfg 49:2f00 82:3469 83:5b01 84:4003 85:3469 86:1801 87:4003 88:207f ata1: dev 0 ATA-6, max UDMA/133, 78125000 sectors: LBA ata1: dev 0 configured for UDMA/133 scsi0 : ata_piix ata2: SATA port has no device. scsi1 : ata_piix Vendor: ATA Model: WDC WD400JD-75HK Rev: 14.0 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 78125000 512-byte hdwr sectors (40000 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back SCSI device sda: 78125000 512-byte hdwr sectors (40000 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back sda: sda1 sda2 sd 0:0:0:0: Attached scsi disk sda usbmon: debugfs is not available ehci_hcd: block sizes: qh 128 qtd 96 itd 192 sitd 96 ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 21 (level, low) -> IRQ 19 PCI: Setting latency timer of device 0000:00:1d.7 to 64 ehci_hcd 0000:00:1d.7: EHCI Host Controller drivers/usb/core/inode.c: creating file 'devices' drivers/usb/core/inode.c: creating file '001' ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:1d.7: reset hcs_params 0x104208 dbg=1 cc=4 pcc=2 ordered !ppc ports=8 ehci_hcd 0000:00:1d.7: reset hcc_params 6871 thresh 7 uframes 1024 64 bit addr ehci_hcd 0000:00:1d.7: debug port 1 PCI: cache line size of 128 is not supported by device 0000:00:1d.7 ehci_hcd 0000:00:1d.7: supports USB remote wakeup ehci_hcd 0000:00:1d.7: irq 19, io mem 0xffa80800 ehci_hcd 0000:00:1d.7: reset command 000002 (park)=0 ithresh=0 period=1024 Reset HALT ehci_hcd 0000:00:1d.7: init command 010001 (park)=0 ithresh=1 period=1024 RUN ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: default language 0x0409 usb usb1: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb1: Product: EHCI Host Controller usb usb1: Manufacturer: Linux 2.6.17 ehci_hcd usb usb1: SerialNumber: 0000:00:1d.7 usb usb1: uevent usb usb1: configuration #1 chosen from 1 choice usb usb1: adding 1-0:1.0 (config #1, interface 0) usb 1-0:1.0: uevent hub 1-0:1.0: usb_probe_interface hub 1-0:1.0: usb_probe_interface - got id hub 1-0:1.0: USB hub found hub 1-0:1.0: 8 ports detected hub 1-0:1.0: standalone hub hub 1-0:1.0: no power switching (usb 1.0) hub 1-0:1.0: individual port over-current protection hub 1-0:1.0: Single TT hub 1-0:1.0: TT requires at most 8 FS bit times (666 ns) hub 1-0:1.0: power on to power good time: 20ms hub 1-0:1.0: local power source is good hub 1-0:1.0: state 7 ports 8 chg 0000 evt 0000 drivers/usb/core/inode.c: creating file '001' ehci_hcd 0000:00:1d.7: GetStatus port 1 status 001803 POWER sig=j CSC CONNECT hub 1-0:1.0: port 1, status 0501, change 0001, 480 Mb/s ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ohci_hcd: block sizes: ed 64 td 64 USB Universal Host Controller Interface driver v3.0 ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 21 (level, low) -> IRQ 19 PCI: Setting latency timer of device 0000:00:1d.0 to 64 uhci_hcd 0000:00:1d.0: UHCI Host Controller drivers/usb/core/inode.c: creating file '002' uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:1d.0: detected 2 ports uhci_hcd 0000:00:1d.0: uhci_check_and_reset_hc: cmd = 0x0000 uhci_hcd 0000:00:1d.0: Performing full reset uhci_hcd 0000:00:1d.0: irq 19, io base 0x0000ff80 usb usb2: default language 0x0409 usb usb2: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb2: Product: UHCI Host Controller usb usb2: Manufacturer: Linux 2.6.17 uhci_hcd usb usb2: SerialNumber: 0000:00:1d.0 usb usb2: uevent usb usb2: configuration #1 chosen from 1 choice usb usb2: adding 2-0:1.0 (config #1, interface 0) usb 2-0:1.0: uevent hub 2-0:1.0: usb_probe_interface hub 2-0:1.0: usb_probe_interface - got id hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected hub 2-0:1.0: standalone hub hub 2-0:1.0: no power switching (usb 1.0) hub 2-0:1.0: individual port over-current protection hub 2-0:1.0: power on to power good time: 2ms hub 2-0:1.0: local power source is good drivers/usb/core/inode.c: creating file '001' ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 22 (level, low) -> IRQ 20 PCI: Setting latency timer of device 0000:00:1d.1 to 64 uhci_hcd 0000:00:1d.1: UHCI Host Controller drivers/usb/core/inode.c: creating file '003' uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1d.1: detected 2 ports uhci_hcd 0000:00:1d.1: uhci_check_and_reset_hc: cmd = 0x0000 uhci_hcd 0000:00:1d.1: Performing full reset uhci_hcd 0000:00:1d.1: irq 20, io base 0x0000ff60 usb usb3: default language 0x0409 usb usb3: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb3: Product: UHCI Host Controller usb usb3: Manufacturer: Linux 2.6.17 uhci_hcd usb usb3: SerialNumber: 0000:00:1d.1 usb usb3: uevent usb usb3: configuration #1 chosen from 1 choice usb usb3: adding 3-0:1.0 (config #1, interface 0) usb 3-0:1.0: uevent hub 3-0:1.0: usb_probe_interface hub 3-0:1.0: usb_probe_interface - got id hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected hub 3-0:1.0: standalone hub hub 3-0:1.0: no power switching (usb 1.0) hub 3-0:1.0: individual port over-current protection hub 3-0:1.0: power on to power good time: 2ms hub 3-0:1.0: local power source is good hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x501 ehci_hcd 0000:00:1d.7: port 1 full speed --> companion ehci_hcd 0000:00:1d.7: GetStatus port 1 status 003801 POWER OWNER sig=j CONNECT ehci_hcd 0000:00:1d.7: GetStatus port 2 status 001803 POWER sig=j CSC CONNECT hub 1-0:1.0: port 2, status 0501, change 0001, 480 Mb/s drivers/usb/core/inode.c: creating file '001' ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:1d.2 to 64 uhci_hcd 0000:00:1d.2: UHCI Host Controller drivers/usb/core/inode.c: creating file '004' uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4 uhci_hcd 0000:00:1d.2: detected 2 ports uhci_hcd 0000:00:1d.2: uhci_check_and_reset_hc: cmd = 0x0000 uhci_hcd 0000:00:1d.2: Performing full reset uhci_hcd 0000:00:1d.2: irq 21, io base 0x0000ff40 usb usb4: default language 0x0409 usb usb4: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb4: Product: UHCI Host Controller usb usb4: Manufacturer: Linux 2.6.17 uhci_hcd usb usb4: SerialNumber: 0000:00:1d.2 usb usb4: uevent usb usb4: configuration #1 chosen from 1 choice usb usb4: adding 4-0:1.0 (config #1, interface 0) usb 4-0:1.0: uevent hub 4-0:1.0: usb_probe_interface hub 4-0:1.0: usb_probe_interface - got id hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected hub 4-0:1.0: standalone hub hub 4-0:1.0: no power switching (usb 1.0) hub 4-0:1.0: individual port over-current protection hub 4-0:1.0: power on to power good time: 2ms hub 4-0:1.0: local power source is good hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x501 drivers/usb/core/inode.c: creating file '001' ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 23 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:1d.3 to 64 uhci_hcd 0000:00:1d.3: UHCI Host Controller drivers/usb/core/inode.c: creating file '005' uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5 uhci_hcd 0000:00:1d.3: detected 2 ports uhci_hcd 0000:00:1d.3: uhci_check_and_reset_hc: cmd = 0x0000 uhci_hcd 0000:00:1d.3: Performing full reset uhci_hcd 0000:00:1d.3: irq 22, io base 0x0000ff20 usb usb5: default language 0x0409 usb usb5: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb5: Product: UHCI Host Controller usb usb5: Manufacturer: Linux 2.6.17 uhci_hcd usb usb5: SerialNumber: 0000:00:1d.3 usb usb5: uevent usb usb5: configuration #1 chosen from 1 choice usb usb5: adding 5-0:1.0 (config #1, interface 0) usb 5-0:1.0: uevent hub 5-0:1.0: usb_probe_interface hub 5-0:1.0: usb_probe_interface - got id hub 5-0:1.0: USB hub found hub 5-0:1.0: 2 ports detected hub 5-0:1.0: standalone hub hub 5-0:1.0: no power switching (usb 1.0) hub 5-0:1.0: individual port over-current protection hub 5-0:1.0: power on to power good time: 2ms hub 5-0:1.0: local power source is good ehci_hcd 0000:00:1d.7: port 2 full speed --> companion ehci_hcd 0000:00:1d.7: GetStatus port 2 status 003801 POWER OWNER sig=j CONNECT ehci_hcd 0000:00:1d.7: GetStatus port 5 status 001403 POWER sig=k CSC CONNECT hub 1-0:1.0: port 5, status 0501, change 0001, 480 Mb/s drivers/usb/core/inode.c: creating file '001' hub 1-0:1.0: debounce: port 5: total 100ms stable 100ms status 0x501 ehci_hcd 0000:00:1d.7: port 5 low speed --> companion ehci_hcd 0000:00:1d.7: GetStatus port 5 status 003002 POWER OWNER sig=se0 CSC ehci_hcd 0000:00:1d.7: GetStatus port 7 status 001403 POWER sig=k CSC CONNECT hub 1-0:1.0: port 7, status 0501, change 0001, 480 Mb/s hub 1-0:1.0: debounce: port 7: total 100ms stable 100ms status 0x501 ehci_hcd 0000:00:1d.7: port 7 low speed --> companion ehci_hcd 0000:00:1d.7: GetStatus port 7 status 003002 POWER OWNER sig=se0 CSC hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0006 uhci_hcd 0000:00:1d.0: port 1 portsc 009b,00 hub 2-0:1.0: port 1, status 0101, change 0003, 12 Mb/s hub 2-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x101 usb 2-1: new full speed USB device using uhci_hcd and address 2 usb 2-1: ep0 maxpacket = 8 usb 2-1: default language 0x0409 usb 2-1: new device strings: Mfr=1, Product=2, SerialNumber=3 usb 2-1: Product: Speed Touch 330 usb 2-1: Manufacturer: THOMSON usb 2-1: SerialNumber: 0090D0CFF11B usb 2-1: uevent usb 2-1: configuration #1 chosen from 1 choice usb 2-1: adding 2-1:1.0 (config #1, interface 0) usb 2-1:1.0: uevent usb 2-1: adding 2-1:1.1 (config #1, interface 1) usb 2-1:1.1: uevent usb 2-1: adding 2-1:1.2 (config #1, interface 2) usb 2-1:1.2: uevent drivers/usb/core/inode.c: creating file '002' uhci_hcd 0000:00:1d.0: port 2 portsc 009b,00 hub 2-0:1.0: port 2, status 0101, change 0003, 12 Mb/s uhci_hcd 0000:00:1d.1: suspend_rh (auto-stop) hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x101 usb 2-2: new full speed USB device using uhci_hcd and address 3 usb 2-2: ep0 maxpacket = 8 usb 2-2: default language 0x0409 usb 2-2: new device strings: Mfr=1, Product=2, SerialNumber=3 usb 2-2: Product: PSC 1400 series usb 2-2: Manufacturer: HP usb 2-2: SerialNumber: CN5862B0B504BM usb 2-2: uevent usb 2-2: configuration #1 chosen from 1 choice usb 2-2: adding 2-2:1.0 (config #1, interface 0) usb 2-2:1.0: uevent usb 2-2: adding 2-2:1.1 (config #1, interface 1) usb 2-2:1.1: uevent usb 2-2: adding 2-2:1.2 (config #1, interface 2) usb 2-2:1.2: uevent drivers/usb/core/inode.c: creating file '003' hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000 hub 1-0:1.0: state 7 ports 8 chg 0000 evt fe00 hub 4-0:1.0: state 7 ports 2 chg 0000 evt 0002 uhci_hcd 0000:00:1d.2: port 1 portsc 01ab,00 hub 4-0:1.0: port 1, status 0301, change 0003, 1.5 Mb/s hub 4-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x301 usb 4-1: new low speed USB device using uhci_hcd and address 2 usb 4-1: skipped 1 descriptor after interface usb 4-1: new device strings: Mfr=0, Product=0, SerialNumber=0 usb 4-1: uevent usb 4-1: configuration #1 chosen from 1 choice usb 4-1: adding 4-1:1.0 (config #1, interface 0) usb 4-1:1.0: uevent drivers/usb/core/inode.c: creating file '002' hub 5-0:1.0: state 7 ports 2 chg 0000 evt 0002 uhci_hcd 0000:00:1d.3: port 1 portsc 01ab,00 hub 5-0:1.0: port 1, status 0301, change 0003, 1.5 Mb/s hub 5-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x301 usb 5-1: new low speed USB device using uhci_hcd and address 2 usb 5-1: skipped 1 descriptor after interface usb 5-1: skipped 1 descriptor after interface usb 5-1: default language 0x0409 usb 5-1: new device strings: Mfr=1, Product=2, SerialNumber=0 usb 5-1: Product: USB Receiver usb 5-1: Manufacturer: Logitech usb 5-1: uevent usb 5-1: configuration #1 chosen from 1 choice usb 5-1: adding 5-1:1.0 (config #1, interface 0) usb 5-1:1.0: uevent usb 5-1: adding 5-1:1.1 (config #1, interface 1) usb 5-1:1.1: uevent drivers/usb/core/inode.c: creating file '002' hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000 usbcore: registered new driver libusual usbcore: registered new driver hiddev usbhid 4-1:1.0: usb_probe_interface usbhid 4-1:1.0: usb_probe_interface - got id input: HID 413c:3010 as /class/input/input0 input: USB HID v1.00 Mouse [HID 413c:3010] on usb-0000:00:1d.2-1 usbhid 5-1:1.0: usb_probe_interface usbhid 5-1:1.0: usb_probe_interface - got id input: Logitech USB Receiver as /class/input/input1 input: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:1d.3-1 usbhid 5-1:1.1: usb_probe_interface usbhid 5-1:1.1: usb_probe_interface - got id input: Logitech USB Receiver as /class/input/input2 input: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:1d.3-1 usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver PNP: No PS/2 controller found. Probing ports directly. serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 mice: PS/2 mouse device common for all mice i2c /dev entries driver md: linear personality registered for level -1 md: raid0 personality registered for level 0 md: raid1 personality registered for level 1 md: raid10 personality registered for level 10 md: raid5 personality registered for level 5 md: raid4 personality registered for level 4 raid5: automatically using best checksumming function: pIII_sse pIII_sse : 4159.000 MB/sec raid5: using function: pIII_sse (4159.000 MB/sec) raid6: int32x1 687 MB/s raid6: int32x2 701 MB/s raid6: int32x4 550 MB/s raid6: int32x8 506 MB/s raid6: mmxx1 1615 MB/s raid6: mmxx2 1625 MB/s raid6: sse1x1 807 MB/s raid6: sse1x2 1072 MB/s raid6: sse2x1 1871 MB/s raid6: sse2x2 1777 MB/s raid6: using algorithm sse2x1 (1871 MB/s) md: raid6 personality registered for level 6 md: multipath personality registered for level -4 md: faulty personality registered for level -5 md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 device-mapper: 4.6.0-ioctl (2006-02-17) initialised: dm-devel@redhat.com device-mapper: dm-multipath version 1.0.4 loaded device-mapper: dm-round-robin version 1.0.0 loaded EISA: Probing bus 0 at eisa.0 dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-2) Advanced Linux Sound Architecture Driver Version 1.0.11rc4 (Wed Mar 22 10:27:24 2006 UTC). no UART detected at 0x1 ACPI: PCI Interrupt 0000:04:01.0[A] -> GSI 17 (level, low) -> IRQ 17 gameport: ES137x is pci0000:04:01.0/gameport0, io 0x200, speed 648kHz ACPI: PCI Interrupt 0000:00:1e.2[A] -> GSI 23 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:1e.2 to 64 input: Analog 3-axis 4-button joystick as /class/input/input3 intel8x0_measure_ac97_clock: measured 58265 usecs intel8x0: clocking to 48000 ALSA device list: #0: Intel ICH6 with AD1981B at 0xdfebfe00, irq 22 #1: Ensoniq AudioPCI ENS1370 at 0xdcc0, irq 17 #2: Virtual MIDI Card 1 TCP bic registered NET: Registered protocol family 1 NET: Registered protocol family 10 lo: Disabled Privacy Extensions IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 NET: Registered protocol family 15 Using IPI Shortcut mode md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. ReiserFS: sda2: found reiserfs format "3.6" with standard journal ReiserFS: sda2: using ordered data mode ReiserFS: sda2: journal params: device sda2, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 ReiserFS: sda2: checking transaction log (sda2) ReiserFS: sda2: Using r5 hash to sort names VFS: Mounted root (reiserfs filesystem) readonly. Freeing unused kernel memory: 308k freed usb 5-1:1.1: uevent usb 5-1:1.0: uevent usb 5-1: uevent usb 4-1:1.0: uevent usb 4-1: uevent usb 2-2:1.2: uevent usb 2-2:1.1: uevent usb 2-2:1.0: uevent usb 2-2: uevent usb 2-1:1.2: uevent usb 2-1:1.1: uevent usb 2-1:1.0: uevent usb 2-1: uevent usb 5-0:1.0: uevent usb usb5: uevent usb 4-0:1.0: uevent usb usb4: uevent usb 3-0:1.0: uevent usb usb3: uevent usb 2-0:1.0: uevent usb usb2: uevent usb 1-0:1.0: uevent usb usb1: uevent usblp 2-2:1.1: usb_probe_interface usblp 2-2:1.1: usb_probe_interface - got id drivers/usb/core/file.c: looking for a minor, starting at 0 drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 3 if 1 alt 0 proto 2 vid 0x03F0 pid 0x4D11 usbcore: registered new driver usblp drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver NET: Registered protocol family 8 NET: Registered protocol family 20 drivers/usb/atm/usbatm.c: usbatm_usb_init: driver version 1.10 drivers/usb/atm/speedtch.c: speedtch_usb_init: driver version 1.10 speedtch 2-1:1.0: usb_probe_interface speedtch 2-1:1.0: usb_probe_interface - got id speedtch 2-1:1.0: usbatm_usb_probe: trying driver speedtch with vendor=06b9, product=4061, ifnum 0 speedtch 2-1:1.0: speedtch_bind entered usb 2-1: modprobe timed out on ep0in len=0/1 speedtch 2-1:1.0: speedtch_bind: firmware not loaded usb 2-1: reset full speed USB device using uhci_hcd and address 2 usb 2-1: ep0 maxpacket = 8 speedtch 2-1:1.0: usbatm_usb_probe: using 3392 byte buffer for rx channel 0xde86c2b8 speedtch 2-1:1.0: usbatm_usb_probe: using 3392 byte buffer for tx channel 0xde86c308 usbcore: registered new driver speedtch speedtch 2-1:1.0: speedtch_find_firmware: looking for speedtch-1.bin.4.00 hda: CHECK for good STATUS hdb: CHECK for good STATUS speedtch 2-1:1.0: speedtch_find_firmware: looking for speedtch-1.bin.4 speedtch 2-1:1.0: speedtch_find_firmware: looking for speedtch-1.bin speedtch 2-1:1.0: found stage 1 firmware speedtch-1.bin speedtch 2-1:1.0: speedtch_find_firmware: looking for speedtch-2.bin.4.00 speedtch 2-1:1.0: speedtch_find_firmware: looking for speedtch-2.bin.4 speedtch 2-1:1.0: speedtch_find_firmware: looking for speedtch-2.bin speedtch 2-1:1.0: found stage 2 firmware speedtch-2.bin speedtch 2-1:1.0: speedtch_upload_firmware entered speedtch 2-1:1.0: speedtch_upload_firmware: BLOCK1 uploaded (935 bytes) speedtch 2-1:1.0: speedtch_upload_firmware: BLOCK2 downloaded (511 bytes) hdb: CHECK for good STATUS Adding 995988k swap on /dev/sda1. Priority:-1 extents:1 across:995988k speedtch 2-1:1.0: speedtch_upload_firmware: BLOCK3 uploaded (775545 bytes) speedtch 2-1:1.0: speedtch_upload_firmware: BLOCK4 downloaded (511 bytes) ATM dev 0: speedtch_atm_start entered ATM dev 0: speedtch_start_synchro entered ATM dev 0: speedtch_start_synchro: modem prodded. 2 bytes returned: 00 00 drivers/usb/atm/usbatm.c: usbatm_get_instance nbd0: Attempted send on closed socket end_request: I/O error, dev nbd0, sector 0 Buffer I/O error on device nbd0, logical block 0 Buffer I/O error on device nbd0, logical block 1 Buffer I/O error on device nbd0, logical block 2 Buffer I/O error on device nbd0, logical block 3 nbd0: Attempted send on closed socket end_request: I/O error, dev nbd0, sector 0 Buffer I/O error on device nbd0, logical block 0 nbd0: Attempted send on closed socket end_request: I/O error, dev nbd0, sector 4294965120 nbd0: Attempted send on closed socket end_request: I/O error, dev nbd0, sector 0 nbd1: Attempted send on closed socket end_request: I/O error, dev nbd1, sector 4294965120 nbd1: Attempted send on closed socket end_request: I/O error, dev nbd1, sector 0 nbd2: Attempted send on closed socket end_request: I/O error, dev nbd2, sector 4294965120 nbd2: Attempted send on closed socket end_request: I/O error, dev nbd2, sector 0 nbd3: Attempted send on closed socket end_request: I/O error, dev nbd3, sector 4294965120 nbd3: Attempted send on closed socket end_request: I/O error, dev nbd3, sector 0 nbd4: Attempted send on closed socket end_request: I/O error, dev nbd4, sector 4294965120 nbd4: Attempted send on closed socket end_request: I/O error, dev nbd4, sector 0 nbd5: Attempted send on closed socket end_request: I/O error, dev nbd5, sector 4294965120 nbd5: Attempted send on closed socket end_request: I/O error, dev nbd5, sector 0 nbd6: Attempted send on closed socket end_request: I/O error, dev nbd6, sector 4294965120 nbd6: Attempted send on closed socket end_request: I/O error, dev nbd6, sector 0 nbd7: Attempted send on closed socket end_request: I/O error, dev nbd7, sector 4294965120 nbd7: Attempted send on closed socket end_request: I/O error, dev nbd7, sector 0 nbd8: Attempted send on closed socket end_request: I/O error, dev nbd8, sector 4294965120 nbd8: Attempted send on closed socket end_request: I/O error, dev nbd8, sector 0 nbd9: Attempted send on closed socket end_request: I/O error, dev nbd9, sector 4294965120 nbd9: Attempted send on closed socket end_request: I/O error, dev nbd9, sector 0 nbd10: Attempted send on closed socket end_request: I/O error, dev nbd10, sector 4294965120 nbd10: Attempted send on closed socket end_request: I/O error, dev nbd10, sector 0 nbd11: Attempted send on closed socket end_request: I/O error, dev nbd11, sector 4294965120 nbd11: Attempted send on closed socket end_request: I/O error, dev nbd11, sector 0 nbd12: Attempted send on closed socket end_request: I/O error, dev nbd12, sector 4294965120 nbd12: Attempted send on closed socket end_request: I/O error, dev nbd12, sector 0 nbd13: Attempted send on closed socket end_request: I/O error, dev nbd13, sector 4294965120 nbd13: Attempted send on closed socket end_request: I/O error, dev nbd13, sector 0 nbd14: Attempted send on closed socket end_request: I/O error, dev nbd14, sector 4294965120 nbd14: Attempted send on closed socket end_request: I/O error, dev nbd14, sector 0 nbd15: Attempted send on closed socket end_request: I/O error, dev nbd15, sector 4294965120 nbd15: Attempted send on closed socket end_request: I/O error, dev nbd15, sector 0 nbd0: Attempted send on closed socket end_request: I/O error, dev nbd0, sector 0 nbd1: Attempted send on closed socket end_request: I/O error, dev nbd1, sector 0 nbd2: Attempted send on closed socket end_request: I/O error, dev nbd2, sector 0 nbd3: Attempted send on closed socket end_request: I/O error, dev nbd3, sector 0 nbd4: Attempted send on closed socket end_request: I/O error, dev nbd4, sector 0 nbd5: Attempted send on closed socket end_request: I/O error, dev nbd5, sector 0 nbd6: Attempted send on closed socket end_request: I/O error, dev nbd6, sector 0 nbd7: Attempted send on closed socket end_request: I/O error, dev nbd7, sector 0 nbd8: Attempted send on closed socket end_request: I/O error, dev nbd8, sector 0 nbd9: Attempted send on closed socket end_request: I/O error, dev nbd9, sector 0 nbd10: Attempted send on closed socket end_request: I/O error, dev nbd10, sector 0 nbd11: Attempted send on closed socket end_request: I/O error, dev nbd11, sector 0 nbd12: Attempted send on closed socket end_request: I/O error, dev nbd12, sector 0 nbd13: Attempted send on closed socket end_request: I/O error, dev nbd13, sector 0 nbd14: Attempted send on closed socket end_request: I/O error, dev nbd14, sector 0 nbd15: Attempted send on closed socket end_request: I/O error, dev nbd15, sector 0 ATM dev 0: speedtch_check_status: line state 0x10 ATM dev 0: ADSL line is synchronising ReiserFS: hda1: found reiserfs format "3.6" with standard journal ReiserFS: hda1: using ordered data mode ReiserFS: hda1: journal params: device hda1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 ReiserFS: hda1: checking transaction log (hda1) ReiserFS: hda1: Using r5 hash to sort names ATM dev 0: usbatm_atm_open: vpi 8, vci 32 ATM dev 0: usbatm_atm_open: allocated vcc data 0xc1691340 ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. ip_conntrack version 2.4 (4020 buckets, 32160 max) - 236 bytes per conntrack ip6_tables: (C) 2000-2006 Netfilter Core Team ADDRCONF(NETDEV_UP): eth0: link is not ready r6to4: Disabled Privacy Extensions tg3: eth0: Link is up at 100 Mbps, full duplex. tg3: eth0: Flow control is on for TX and on for RX. ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready eth0: no IPv6 routers present ATM dev 0: speedtch_check_status: line state 0x20 ATM dev 0: ADSL line is up (1024 kb/s down | 320 kb/s up) tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> Ssh and low speed traffic works well Yes, until i try to get a file at higher ADSL speed example with 2.6.17: devel:~# LC_ALL=C apt-get -y install tetex-base tetex-extra Reading Package Lists... Done Building Dependency Tree... Done The following packages will be upgraded: tetex-base tetex-extra 2 upgraded, 0 newly installed, 0 to remove and 1078 not upgraded. Need to get 33.3MB of archives. After unpacking 8192B disk space will be freed. Get:1 http://ftp2.de.debian.org unstable/main tetex-base 3.0-18 [22.4MB] 64% [1 tetex-base 21609000/22.4MB 96%] 106kB/s 59s An then, is when it is frozen, and the atmdiag don't receive nothing. wave:/usr/src/bughunting# atmdiag Itf TX_okay TX_err RX_okay RX_err RX_drop 0 AAL0 0 0 0 0 0 AAL5 1435 0 3436 0 0 wave:/usr/src/bughunting# atmdiag Itf TX_okay TX_err RX_okay RX_err RX_drop 0 AAL0 0 0 0 0 0 AAL5 1443 0 3436 0 0 No messages from USB debug still. Later 1 minute or two the moden blinks the red and green light No messages too.
Maybe Juampe should build with CONFIG_DEBUG_FS. That way he should be able to take a snapshot of the uhci schedule next time the connection goes down. I haven't used this feature since it was moved out of /proc, so I'm not sure where the debug file is now. Alan?
Juan, the next time you have a large amount of data to post (like that system log), _attach_ it instead of dumping the whole thing into the bug report. Yes, definitely you should set CONFIG_DEBUG_FS. You can mount the filesystem anywhere you like; a conventional spot is /sys/kernel/debug. So you would do mount -t debufs none /sys/kernel/debug cd /sys/kernel/debug/uhci Then you want to look at the 0000:00:1d.0 file.
Created attachment 8431 [details] demsg Dmesg 2.6.17
Created attachment 8432 [details] debug/uhci/usbmon/2t log Information during high rate transfer and cut due to uhci problem debug/uhci/usbmon/2t
Created attachment 8442 [details] 0000:00:1d.0 before net block 0000:00:1d.0 before net block
Created attachment 8443 [details] 0000:00:1d.0 after net block 0000:00:1d.0 after net block
Duncan, the information in the first two attachments might mean something to you. I don't know what it's supposed to look like, so it's hard to tell what's wrong. Juan, I forgot to mention one thing earlier, sorry. Before copying the /sys/kernel/debug/uhci/0000:00:1d.0 file, you should do this: echo 2 >/sys/module/uhci_hcd/parameters/debug Then the debugging file will display the contents of the USB schedule as well as the controller information (which is all you see now).
Created attachment 8444 [details] 0000:00:1d.0 before net block 0000:00:1d.0 before net block
Created attachment 8445 [details] 0000:00:1d.0 after net block 0000:00:1d.0 after net block
Created attachment 8446 [details] dmesg dmesg
I have a question about dccf4a48d47120a42382ba526f1a0848c13ba2a4 and uhci_hcd_endpoint_disable. "/* Wait until a particular device/endpoint's QH is idle, and free it */" This are speculations. I'm not faliliarized with this code. The detection of idle estate change, can this new apporach cause problems under heavy load, and erroneusly free the endpoint? I'll try to put some printk to check my theory.
I don't know what all those errors in the dmesg log mean. The uhci debugging files indicate that there is a period-32 interrupt-IN URB with length=16 waiting for the device to send data on endpoint 1, and there are 4 bulk-IN URBs with length approximately 3392 waiting for the device to send data on endpoint 7. The second log ("after net block") shows that the current bulk-IN URB is active; the device has sent the first 64-byte data packet and the computer is waiting for more. I don't think there's anything wrong with the idle-detection code. If there is, it's a bug!
Created attachment 8457 [details] 0000:00:1d.0 during heavy load 0000:00:1d.0 during heavy load, i have 9 files dumped before block if is necessary I'll upload it
The "during heavy load" dump doesn't show anything different from the "before net block" dump. Let's see what Duncan has to say.
> I don't know what all those errors in the dmesg log mean. They don't seem to have anything to do with the speedtouch driver which hasn't even started network activity at that point. > The uhci debugging files indicate that there is a period-32 interrupt-IN URB > with length=16 waiting for the device to send data on endpoint 1, and there are > 4 bulk-IN URBs with length approximately 3392 waiting for the device to send > data on endpoint 7. The second log ("after net block") shows that the current > bulk-IN URB is active; the device has sent the first 64-byte data packet and the > computer is waiting for more. I don't see anything wrong. I note that there are no OUT urbs - but you don't expect there to be unless a network packet happens to be on the way at this time. Juampe, does the number of transmitted packets keep going up, for example if you ping someone? Short transfers are ok for this driver - could it be that the modem is sending us exactly 64 bytes, but the uhci hcd hasn't understood that the transfer is complete? Weren't there some bugs of this kind in the past? Duncan.
Juampe, which LED starts flashing red? Is it the "USB" one (the one that flashes red when the modem is first connected)? Thanks, Duncan.
> Short transfers are ok for this driver - could it be that the modem is > sending > us exactly 64 bytes, but the uhci hcd hasn't understood that the transfer is > complete? Weren't there some bugs of this kind in the past? If the modem wants to send a short message containing only 64 bytes, then it has to send a second packet with 0 bytes to indicate that the message is over. Otherwise uhci-hcd has no way to know that the message isn't going to continue. The dump indicated that no such zero-length packet had been received.
Duncan the LED that flash red is the "USB" one, A note abut how is flashing, is that it appears to flash when i try to ping and each icmp does a red flash in USB LED. Regards, Juampe
> Juampe, does the number of transmitted packets keep going up, for example if > you > ping someone? Alan, As i exposed at the end of comment #1 (really sorry for the full dump) TX counter increments but RX not, if i send ICMP only grow the TX counter. (always spoken in in ATM stack of course) One symthon of the problem is that the RX counter is frozen. Regards, Juampe.
Juampe, can you please do several (eg: three) dumps after things get stuck. I'd like to see if anything is changing in the schedule. Thanks, Duncan.
Created attachment 8477 [details] uhci after 1 uhci after 1
Created attachment 8478 [details] uhci after 2 uhci after 2
Created attachment 8479 [details] uhci after 3 uhci after 3
Created attachment 8480 [details] uhci after 4 uhci after 4
OK, the dumps show that absolutely nothing is changing in the schedule. Alan, a question: where does the link for the last TD in an urb point nowadays? Juampe, can you please load the usbatm module with option rcv_buf_bytes=256 and do another dump after the connection gets stuck. Also, can you please send the output of lspci.
The last TD in an URB points to the first TD in the next URB for that endpoint. If there are no more URBs for the endpoint, it points to a "Dummy" TD that has the "active" flag cleared. You can see two examples of this in the attachments, at the ends of the queues for the period-32 interrupt-in endpoint and the bulk-in endpoint. It certainly looks like the modem has sent a 64-byte packet and the HCD is waiting for more. The usbatm driver _could_ protect against this sort of thing by using timeouts and unlinking URBs when necessary, but I have no idea whether this would be the right thing to do.
> The last TD in an URB points to the first TD in the next URB for that > endpoint. > If there are no more URBs for the endpoint, it points to a "Dummy" TD that > has > the "active" flag cleared. You can see two examples of this in the > attachments, > at the ends of the queues for the period-32 interrupt-in endpoint and the > bulk-in endpoint. I suppose an "out" urb on the same endpoint will be queued the same way as the "in" urbs? > It certainly looks like the modem has sent a 64-byte packet and the HCD is > waiting for more. The usbatm driver _could_ protect against this sort of > thing > by using timeouts and unlinking URBs when necessary, but I have no idea > whether > this would be the right thing to do. Well, that would at least be premature: this problem was never seen for these modems before your patch, and I still consider the most likely explanation to be that there is a bug in your patch. I find it interesting that the uhci schedule is perfectly sensible at each snapshot. That suggests that whatever is wrong is transient, for example a race between your queue updating and the host controller. The fact that the problem only seems to occur when load is heavy suggests this kind of thing too. I've been trying to reproduce the problem of course, but with no luck so far. Duncan.
I've finally managed to provoke an error, by up- and down-loading huge files via the speedtouch modem, while simultaneously streaming images from a USB webcam: [ 6344.101269] uhci_hcd 0000:00:10.2: host controller process error, something bad happened! [ 6344.125817] uhci_hcd 0000:00:10.2: host controller halted, very bad! [ 6344.144902] uhci_hcd 0000:00:10.2: HC died; cleaning up Of course this could be a different problem to the one juampe is seeing. However it is what you would expect if the host controller noticed something wrong with the schedule. I have no time to look further into this now - maybe tomorrow.
> I suppose an "out" urb on the same endpoint will be queued the same way as > the > "in" urbs? Apart from the fact that you can't queue an OUT URB to an IN endpoint, yes. Have we gotten accurate confirmation that reverting the patch (and the following ones which also affect uhci-hcd) fixes the problem? I don't trust git-bisect to do this correctly. The "host controller process error" message does indicate a real problem, possibly connected with handling of ISO transactions. If you can duplicate the fault, it would help to load uhci-hcd beforehand with "debug=2". Then when the fault occurs the driver will automatically dump the schedule to the system log.
Duncan, Alan sorry for waiting, the CPU of my server ("the speedtouch machine") was overheated and burn, while i try to fix this and I change to an older computer (P4 from 2.66 to 1.5 Ghz).The initial enviroment of testing is not available. I'll test the problem with a new enviroment.
Created attachment 8559 [details] undo as623 Does this fix the problem?
Yes, the fix works for me. I had the same problem with 2.6.17 - the connection would barely last 5 minutes every time and then would just stop working. Fortunate that when I was searching for a solution the fix had just been posted! How long will it take to get the fix in a released kernel? Many thanks for your work.
Andrew, the question was whether the patch affects Juan's problem, not whether it fixes yours. At the moment we have no way of knowing whether the two problems are really the same. Your question about how long it will take for the fix to appear in a released kernel is ironic. That so-called "fix" is already present in 2.6.16! The patch you tested actually _removes_ a change which was added in 2.6.17. That change is never going to be taken out of the kernel, although it may need some adjustments in order to work perfectly. At this point, the real job is to find out what's wrong with the code in 2.6.17. Bear in mind, it's entirely possible that if you used a later development kernel, the problem would be gone. I suggest you try out 2.6.18-rc1-mm2 kernel (don't forget to apply the hotfix patches).
Ok, well I can now use the latest stable kernel without my ADSL stopping working! If that means winding some code back to 2.6.16 then so be it. I will be more than happy to run some tests if it helps fix the problem properly - as said in my first post my connection always stops working within a few minutes of being established. I will try the mm kernel suggested when I have a moment.
We'll have to rely on Duncan to suggest a good method for determining what's going wrong with the 2.6.17 kernel; that is, what requests or results aren't getting through. I don't know enough about how the speedtouch driver works to give any insights. Perhaps comparing a UHCI schedule dump with a usbmon trace would be a start. Can you follow the instructions in the earlier parts of this bug report to capture a usbmon trace starting from when you plug in/turn on the speedtouch modem together with a UHCI schedule dump after the stoppage occurs? It's important that the usbmon trace should include the time when you copy the schedule dump.
> We'll have to rely on Duncan to suggest a good method for determining what's > going wrong with the 2.6.17 kernel; that is, what requests or results aren't > getting through. I don't know enough about how the speedtouch driver works to > give any insights. Here's how the speedtouch works: being a modem, it transmits packets and receives packets. By default, 4 pre-allocated urbs are used for transmission, and 4 for reception. Reception: all 4 urbs are queued on the endpoint used for reception (a bulk in endpoint; you can see these urbs in the schedule dumps). When one completes, the packets it contains are extracted and the urb is immediately requeued on the endpoint. Transmission: when a packet arrives for transmission, it is placed in an urb and queued on the transmission endpoint (a bulk out endpoint). If lots of packets are coming in fast, this can result in all 4 urbs being queued on the endpoint, and additional packets being buffered while waiting for an urb to become available (complete). Most of the time you don't expect to see any urbs on this endpoint when dumping the schedule (indeed, you don't see any in Juampe's schedule dumps), you would only expect to see some if packets are being transmitted at a high rate. Nonetheless, such urbs are doubtless going out all the time. State monitoring: in addition, the state of the modem is monitored. Mostly what is monitored is whether the line is up or not. There are two methods, each of which works with different modem versions: (a) an interrupt urb sits on an endpoint, and completes when there is a state change [you can see this urb in Juampe's schedule dumps]; or (b) the modem is polled every few seconds by sending a control urb. What's going wrong is that the urbs queued for reception are no longer completing. As you can see from the schedule dumps, the urb is sitting there waiting for more, having received 64 bytes. Why is data no longer coming in? From what Juampe says, packets still get transmitted ok.
> At this point, the real job is to find out what's wrong with the code in > 2.6.17. > Bear in mind, it's entirely possible that if you used a later development > kernel, the problem would be gone. I suggest you try out 2.6.18-rc1-mm2 > kernel > (don't forget to apply the hotfix patches). I've been able to reproduce this problem, or something very like it, with 2.6.17 but not with 2.6.18-rc1. So please do as Alan asks: try a recent development kernel like 2.6.18-rc1 and let us know if the problem goes away.
Created attachment 8588 [details] as679 plus dependencies Can you please check if this patch fixes the problem (it certainly helps on my machine, but it is not clear if I'm seeing exactly the same problem). Alan, I haven't yet gone through the patch, but from the description it sounds like it wasn't intended to introduce any functional changes. Yet, on my machine, without it I see modem hangs with 2.6.17 (requires heavy disk + network load), but with it I can't make the network connection collapse!
I'm glad to hear that the later version of the driver is unbreakable! You're right that the patch was not intended to change the behavior. But that was under the assumption that the unpatched version was working correctly, so it's entirely possible that whatever bug is in 2.6.17 was inadvertently fixed by as675, as676, or as679. (I assume those are the patches you combined. Or did you include even more?) This gives me a clue where to look. If it's a simple thing to fix, I may be able to submit an update for the 2.6.17.x stable series.
I have tried 2.6.18-rc2, and there are no problems so far - connection has been on for 7 hours without failing. I will try to do the other things suggested this weekend.
> ...it's entirely possible that whatever bug is in 2.6.17 was inadvertently > fixed by > as675, as676, or as679. (I assume those are the patches you combined. Or > did > you include even more?) The patch contains as675 and as679 only. > This gives me a clue where to look. If it's a simple thing to fix, I may be > able to submit an update for the 2.6.17.x stable series. Great - thanks for looking into it. This doesn't just effect speedtouch modems, I'm seeing it with a different modem and an intel chipset (most of the bug reporters have VIA chipsets), so a 2.6.17 stable patch would be good.
Created attachment 8591 [details] Fix for when last TD is short I think this patch for 2.6.17 should fix the problem. The driver needs to treat short TDs specially when they are the last TD in the URB. The as679 patch did add that special treatment.
> I think this patch for 2.6.17 should fix the problem. The driver needs to > treat short TDs specially when they are the last TD in the URB. The as679 > patch did add that special treatment. Well, that fixes things for me, for a non-speedtouch modem. Unfortunately, I never managed to get a failure using a speedtouch modem, so it may be a different issue that's being solved here. So can anyone who's been having trouble with a speedtouch please test this patch!
Created attachment 8599 [details] Schedule dump on host-controller halt I've finally managed to reproduce the speedtouch modem problem. As far as I can see, it is most likely NOT fixed in 2.6.18-rc2, in spite of the fact I can't reproduce it in 2.6.18-rc2: the patch that makes the problem go away is as707: [PATCH] UHCI: Improve FSBR-off timing (see http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c5e3b741a3fec6077a480aa65ded29d79ded8898) So most likely the problem is simply being hidden by the reduction in FSBR. That's the bad news. The good news is that on my machine the problem causes the host-controller to halt (most likely due to memory poisoning, see below), with an informative schedule dump (attached file). Points of interest: Dummy TD [d5335030] link (a9a9a9a9) e0 Length=0 MaxLen=7ff DT0 EndPt=0 Dev=0, PID=e1(OUT) (buf=00000000) The link a9a9a9a9 looks like POOL_POISON_ALLOCATED (!initted, dmapool). Same thing with the other dummy TD. Also, [d54fc6c0] BLK QH link (154fc4e2) element (14cdd720) Element != First TD
That line: [d54fc6c0] BLK QH link (154fc4e2) element (14cdd720) Element != First TD is where the problem lies. Not the "Element != First TD" part; that's allowed to happen. But the 14cdd720 (or its virtual equivalent d4cdd720) doesn't occur anywhere in the dump. Probably it's one of the three TDs before the current first one, one which has already been deallocated, and somehow the driver mistakenly set qh->element to that value. There aren't too many places where that can happen. Can you try adding a printk statement (display the values of qh, urbp, and td) near the assignment to qh->element in uhci_advance_check()? That line should never trigger on a modern controller; only really old ones have the bug it works around. The other place is the second assignment to qh->element in uhci_fixup_short_transfer(). (The first assignment affects only control URBs.) That one will trigger whenever you get a short transfer, so instrumenting it will likely produce a lot of noise. Still, maybe it will help find the problem.
The last few posts aside, the patch (id=8591) appears to fix the speedtouch problem I was having with 2.6.17. The connection no longer hangs. Many thanks!
> ... Can you try adding a printk ... Hi Alan, thanks for thinking about this problem. I will add the printk's when I get back from travelling, in a few days time.
> There aren't too many places where that can happen. Can you try adding a > printk > statement (display the values of qh, urbp, and td) near the assignment to > qh->element in uhci_advance_check()? That line should never trigger on a > modern > controller; only really old ones have the bug it works around. It doesn't trigger, so that's not it. > The other place is the second assignment to qh->element in > uhci_fixup_short_transfer(). (The first assignment affects only control > URBs.) > That one will trigger whenever you get a short transfer, so instrumenting it > will likely produce a lot of noise. Still, maybe it will help find the > problem. I put a minimal printk here (write one character), and that's enough to make the problem go away! This suggests that timing is important to triggering the problem, so probably it's due to a race. It also suggests that this code path is significant in the race. Investigating further...
Interesting... At that point in uhci_fixup_short_transfer (just before the assignment to qh->element) it should always be true that qh->element == cpu_to_le32(qh->post_td->dma_handle) && (td_token(qh->post_td) & TD_CTRL_SPD) != 0 as well as a few other things. Try testing this condition; if it ever fails print out the values of qh->element, qh->post_td, qh->post_td->token, and qh->post_td->status.
Created attachment 8634 [details] Schedule dump on invariant failure Hi Alan, the condition qh->element == cpu_to_le32(qh->post_td->dma_handle) && (td_token(qh->post_td) & TD_CTRL_SPD) != 0 doesn't always hold. I modified the code so that it dumps the schedule and then prints qh->element, qh->post_td and so forth. The result is in the attached dump. For reference, the code I hacked in was (QQ is a counter, to limit the number of dumps to 3): ... if (list_empty(&urbp->td_list)) td = qh->post_td; if (unlikely (QQ>0 && !( qh->element == cpu_to_le32(qh->post_td->dma_handle) && (td_token(qh->post_td) & TD_CTRL_SPD) != 0 ))) { --QQ; if (errbuf) { uhci_sprint_schedule(uhci, errbuf, ERRBUF_LEN); lprintk(errbuf); } printk("qh->element %08x, cpu_to_le32(qh->post_td->dma_handle) %08x, qh->post_td %p, td_token(qh->post_td) %08x\n", qh->element, cpu_to_le32(qh->post_td->dma_ handle), qh->post_td, td_token(qh->post_td)); } qh->element = td->link; tmp = urbp->td_list.prev; ...
Created attachment 8640 [details] Don't test SPD bit; the controller may have changed it My mistake; the invariant should have said (td_status(qh->post_td) & TD_CTRL_SPD) != 0 instead of td_token. But never mind; I think I know what's going wrong. Apparently your controller is turning on the SPD bit in the status word. (Unless you set the URB_SHORT_NOT_OK flag when you submitted the request, in which case the SPD bit would have been 1 all along -- presumably you don't use that flag. If you do then there's a hardware bug in the controller.) I never expected the controller to mess up that bit. Forunately it's easy enough to avoid checking it. This patch should do the job. Not coincidentally, it's very similar to the earlier patch for 2.6.17.
Apologies, but a couple of stupid questions: 1. Should I use *both* patches (8640 & 8591) on a 2.6.17 kernel? 2. Are these likely to appear for the next 2.6.17 release? Many thanks.
Patch 8640 is intended for 2.6.18, not 2.6.17. Patch 8591 has been submitted for inclusion in the stable 2.6.17.x series. I don't know how quickly it will be accepted.
> Apparently your controller is turning on the SPD bit in the status word. > (Unless you set the URB_SHORT_NOT_OK flag when you submitted the request, in > which case the SPD bit would have been 1 all along -- presumably you don't use > that flag. If you do then there's a hardware bug in the controller.) > > I never expected the controller to mess up that bit. Forunately it's easy > enough to avoid checking it. This patch should do the job. Not > coincidentally, it's very similar to the earlier patch for 2.6.17. Hi Alan, this seems to fix the problem. Can everyone please check whether http://bugzilla.kernel.org/attachment.cgi?id=8640&action=view works for them too. If so, let's all give three cheers for Alan Stern!
Patch 8640 is in 2.6.18-rc3-git4, and works for me in Gentoo - thanks :)