Bug 5495 - changing cpu frequency causes fatal USB errors
Summary: changing cpu frequency causes fatal USB errors
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: cpufreq
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-25 14:07 UTC by Max Asbock
Modified: 2007-03-08 21:25 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.14-rc5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
part of dmesg containg USB error and debug messages (3.52 KB, text/plain)
2005-10-25 14:11 UTC, Max Asbock
Details

Description Max Asbock 2005-10-25 14:07:20 UTC
Most recent kernel where this bug did not occur:
2.6.14-rc5 x86_64
Distribution:
SLES9 x86_64
Hardware Environment:
2 x Dual Core AMD Opteron 280
Software Environment:
Problem Description:
With PowerNow is enabled on the system, powernow_k8 and cpufreq_userspace loaded
and cpu frequency is changed Unrecoverable OHCI errors and fatal EHCI errors occur.

Steps to reproduce:
1) Either boot the system with powersaved turned on, the USB errors happen when
a load is put on the system that cause the cpu frequency to change.

2) Or boot the system with powersaved off and manually do:
modprobe powernow_k8
modprobe cpufreq_userspace
cd /sys/devices/system/cpu
echo userspace > cpu0/cpufreq/scaling_governor
echo userspace > cpu2/cpufreq/scaling_governor
echo 2200000 > cpu0/cpufreq/scaling_setspeed
echo 2200000 > cpu2/cpufreq/scaling_setspeed
sleep 1
echo 2000000 > cpu0/cpufreq/scaling_setspeed
echo 2000000 > cpu2/cpufreq/scaling_setspeed
sleep 1
echo 1800000 > cpu0/cpufreq/scaling_setspeed
echo 1800000 > cpu2/cpufreq/scaling_setspeed
sleep 1
echo 1000000 > cpu0/cpufreq/scaling_setspeed
echo 1000000 > cpu2/cpufreq/scaling_setspeed
sleep 1
echo 1800000 > cpu0/cpufreq/scaling_setspeed
echo 1800000 > cpu2/cpufreq/scaling_setspeed
sleep 1
echo 2000000 > cpu0/cpufreq/scaling_setspeed
echo 2000000 > cpu2/cpufreq/scaling_setspeed
sleep 1
echo 2200000 > cpu0/cpufreq/scaling_setspeed
echo 2200000 > cpu2/cpufreq/scaling_setspeed
sleep 1
echo 2400000 > cpu0/cpufreq/scaling_setspeed
echo 2400000 > cpu2/cpufreq/scaling_setspeed

The USB errors happen after one of the frequency changes.
The errors are:
ohci_hcd 0000:00:03.1: OHCI Unrecoverable Error, disabled
and
ehci_hcd 0000:00:03.2: fatal error
Comment 1 Max Asbock 2005-10-25 14:11:44 UTC
Created attachment 6386 [details]
part of dmesg containg USB error and debug messages

I turned on debugging in USB. The attachment contains the resulting messages.
Comment 2 David Brownell 2005-10-25 20:03:54 UTC
That's not something I'd expect to see.  You're sure that BIOS
is ignoring this?  (Enable the "usb-handoff" kernel parameter.
at least for kernels with the latest USB patches.)

For OHCI, the "Unrecoverable Error" usually mean some sort of
DMA pointer was smashed (or acts smashed).  Were there any USB
devices connected at the time?  If not, that narrows down the
potential options enormously.  ... Likewise with EHCI, also
normally a DMA access problem.  Very odd that _both_ get such
issues at the same time.  (Another possible error causing
this:  something smashed the pages those drivers use for
their DMA descriptor chains.)

Were you maybe re-clocking the memory while one of the USB
controllers was performing DMA?  Or goofed on the cache
coherency guarantees that you must follow to re-clock all
four processors at the same time ..

The "leak ed" shows up sometime, it's a "should not happen"
but maybe there's an SMP issue.  The leak is minor and AFAIK
harmless, though I'd be glad to see a patch.

Is it possible that this is one of the kinds of system that needs
its I/O peripherals to be idled before changing clocks?  Or which
needs to adjust some of the I/O clocks (e.g. 48 MHz USB PLL) after
adjusting the clock from which the CPU clock is derived?  I hadn't
though PC chipsets needed such things.

"lspci -vv" output should be included in the bug report, as
well as records of the OHCI and EHCI controllers initializing.
And "lsusb -v" output too (for lsusb v0.71 or newer).
Comment 3 Dominik Brodowski 2005-11-15 13:55:46 UTC
maybe also related to 4744 ?
Comment 4 Andrew Morton 2007-01-31 01:21:00 UTC
Is this bug still present in 2.6.20-rc7?
Comment 5 Adrian Bunk 2007-03-08 21:25:58 UTC
Please reopen this bug if it's still present with kernel 2.6.20.

Note You need to log in before you can comment on or make changes to this bug.