Most recent kernel where this bug did not occur: 2.6.14-rc5 x86_64 Distribution: SLES9 x86_64 Hardware Environment: 2 x Dual Core AMD Opteron 280 Software Environment: Problem Description: With PowerNow is enabled on the system, powernow_k8 and cpufreq_userspace loaded and cpu frequency is changed Unrecoverable OHCI errors and fatal EHCI errors occur. Steps to reproduce: 1) Either boot the system with powersaved turned on, the USB errors happen when a load is put on the system that cause the cpu frequency to change. 2) Or boot the system with powersaved off and manually do: modprobe powernow_k8 modprobe cpufreq_userspace cd /sys/devices/system/cpu echo userspace > cpu0/cpufreq/scaling_governor echo userspace > cpu2/cpufreq/scaling_governor echo 2200000 > cpu0/cpufreq/scaling_setspeed echo 2200000 > cpu2/cpufreq/scaling_setspeed sleep 1 echo 2000000 > cpu0/cpufreq/scaling_setspeed echo 2000000 > cpu2/cpufreq/scaling_setspeed sleep 1 echo 1800000 > cpu0/cpufreq/scaling_setspeed echo 1800000 > cpu2/cpufreq/scaling_setspeed sleep 1 echo 1000000 > cpu0/cpufreq/scaling_setspeed echo 1000000 > cpu2/cpufreq/scaling_setspeed sleep 1 echo 1800000 > cpu0/cpufreq/scaling_setspeed echo 1800000 > cpu2/cpufreq/scaling_setspeed sleep 1 echo 2000000 > cpu0/cpufreq/scaling_setspeed echo 2000000 > cpu2/cpufreq/scaling_setspeed sleep 1 echo 2200000 > cpu0/cpufreq/scaling_setspeed echo 2200000 > cpu2/cpufreq/scaling_setspeed sleep 1 echo 2400000 > cpu0/cpufreq/scaling_setspeed echo 2400000 > cpu2/cpufreq/scaling_setspeed The USB errors happen after one of the frequency changes. The errors are: ohci_hcd 0000:00:03.1: OHCI Unrecoverable Error, disabled and ehci_hcd 0000:00:03.2: fatal error
Created attachment 6386 [details] part of dmesg containg USB error and debug messages I turned on debugging in USB. The attachment contains the resulting messages.
That's not something I'd expect to see. You're sure that BIOS is ignoring this? (Enable the "usb-handoff" kernel parameter. at least for kernels with the latest USB patches.) For OHCI, the "Unrecoverable Error" usually mean some sort of DMA pointer was smashed (or acts smashed). Were there any USB devices connected at the time? If not, that narrows down the potential options enormously. ... Likewise with EHCI, also normally a DMA access problem. Very odd that _both_ get such issues at the same time. (Another possible error causing this: something smashed the pages those drivers use for their DMA descriptor chains.) Were you maybe re-clocking the memory while one of the USB controllers was performing DMA? Or goofed on the cache coherency guarantees that you must follow to re-clock all four processors at the same time .. The "leak ed" shows up sometime, it's a "should not happen" but maybe there's an SMP issue. The leak is minor and AFAIK harmless, though I'd be glad to see a patch. Is it possible that this is one of the kinds of system that needs its I/O peripherals to be idled before changing clocks? Or which needs to adjust some of the I/O clocks (e.g. 48 MHz USB PLL) after adjusting the clock from which the CPU clock is derived? I hadn't though PC chipsets needed such things. "lspci -vv" output should be included in the bug report, as well as records of the OHCI and EHCI controllers initializing. And "lsusb -v" output too (for lsusb v0.71 or newer).
maybe also related to 4744 ?
Is this bug still present in 2.6.20-rc7?
Please reopen this bug if it's still present with kernel 2.6.20.