I've also booted 2.6.[11,12,13,14,15,16] and they all behave identically. _System_ Notebook is an Acer Aspire 1520 (1524) WLMi with an AMD Athlon64 Processor 3400+ on a VIA K8M800 (VT8237 PCI bridge [K8T800 South] according to lspci, but I've read the actual chip markings and southbridge is a VT8235), 2 Gig memory. Different lspci dumps can be found as attachments in bug 6072 and acpidump in bug 5767 . System nowadays is a from-scratch-ish pure 64bit thing where I'm in full control/understanding, meaning no udev, hal, dbus etc with a static /dev No desktop, just a WM. _Problem_ When an external USB 2.0 HD is plugged in, and ehci_hcd gets hold of it, the core CPU temperature climbs 2-4 degrees centigrade. This temp. raise is not due to any obvious processor activity, and not because of the increased power drain through the USB ports (if uhci_hcd is in charge of the HD no anomaly exist). Disconnecting the HD, the temp. remains at the high level until a "rmmod ehci_hcd" is executed. Then the temp. immediately begins to drop to normal. Related issue(?) - where ehci_hcd prevents "AMD K7 CPU Disconnect Control": https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172592 _Consequence_ My notebook's fan control is strongly tied to CPU temperature. Especially at boot time. If the machine is cold - eg first time boot - there's no problem. But a reboot (cold or warm) with a connected external HD is aggravating. Fan has four states - Off, Low, Medium, High - and at boot time the CPU must drop to 49C for the fan to enter Low speed. The different situations look like this: Room temperature 26C (summer is closing in) * No external HD connected * Passing BIOS - High fan Loading kernel - dropping to Medium fan. Log in on text console - CPU 800MHz After ca 2 minutes CPU temp. falls to 49C - dropping to Low fan. CPU temp. climbs to 50C due to the lower fan speed. * External HD connected * Passing BIOS - High fan Loading kernel - dropping to Medium fan. Log in on text console - CPU 800MHz After 10 minutes the CPU temp. shows 52C - still Medium fan. Disconnecting HD Another 10 minutes and still at 52C CPU temp. - Medium fan... "rmmod ehci_hcd" Immediate temp. drop. After ca 1 minute CPU temp. has reached 49C - dropping to Low fan. CPU temp. climbs to 50C due to the lower fan speed. And a "Medium" fan speed on this notebook is serious noise! _Thoughts_ The above temperatures came from a clean kernel.org 2.6.17-rc4, but even when I undervolt the CPU (which I usually do) through a patch like http://dev.gentoo.org/~morfic/powernow-k8-vcore_list-2.6.16-rc2-v2.diff it is a close shave reaching the 49C boot temp. Today for example it failed once, and when summer hits with full force it will be futile. If I re-enable the C1-state linus stole - see bug 6072 for my patch - there is an increase of ca 1250 switches from C0 to C1 per 10 minutes if the HD is connected (not mounted) as opposed to disconnected, but that seems unrelated since the unpatched temp. situation is identical. And older kernels like 2.6.11 equally had no C-state handling on this machine. _Extra Info_ Connecting the HD it looks like this (why does it seem to connect twice?): usb 1-3: new high speed USB device using ehci_hcd and address 4 usb 1-3: configuration #1 chosen from 1 choice scsi0 : SCSI emulation for USB Mass Storage devices usb-storage: device found at 4 usb-storage: waiting for device to settle before scanning Vendor: IC25N080 Model: ATMR04-0 Rev: MO4O Type: Direct-Access ANSI SCSI revision: 00 SCSI device sda: 156301487 512-byte hdwr sectors (80026 MB) sda: Write Protect is off sda: Mode Sense: 03 00 00 00 sda: assuming drive cache: write through SCSI device sda: 156301487 512-byte hdwr sectors (80026 MB) sda: Write Protect is off sda: Mode Sense: 03 00 00 00 sda: assuming drive cache: write through sda: sda1 sd 0:0:0:0: Attached scsi disk sda usb-storage: device scan complete And here's a cat of /proc/bus/usb/devices : T: Bus=04 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0 D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.17-rc4 uhci_hcd S: Product=UHCI Host Controller S: SerialNumber=0000:00:10.2 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms T: Bus=03 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 B: Alloc= 93/900 us (10%), #Int= 1, #Iso= 0 D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.17-rc4 uhci_hcd S: Product=UHCI Host Controller S: SerialNumber=0000:00:10.1 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms T: Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=1.5 MxCh= 0 D: Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 P: Vendor=04b4 ProdID=0033 Rev= 1.00 S: Product=RF Mouse C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=10ms T: Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 B: Alloc=236/900 us (26%), #Int= 2, #Iso= 0 D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.17-rc4 uhci_hcd S: Product=UHCI Host Controller S: SerialNumber=0000:00:10.0 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=1.5 MxCh= 0 D: Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 P: Vendor=046d ProdID=c50c Rev=22.40 S: Manufacturer=Logitech S: Product=USB Receiver C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr= 98mA I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=01 Driver=usbhid E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=10ms I: If#= 1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=10ms T: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=480 MxCh= 6 B: Alloc= 0/800 us ( 0%), #Int= 0, #Iso= 0 D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.17-rc4 ehci_hcd S: Product=EHCI Host Controller S: SerialNumber=0000:00:10.3 C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=256ms T: Bus=01 Lev=01 Prnt=01 Port=02 Cnt=01 Dev#= 4 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=067b ProdID=3507 Rev= 0.01 S: Manufacturer=Prolific S: Product=PL-3507C USB Storage Device S: SerialNumber=013023EC C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms If more info is needed, you know where to reach me.
Created attachment 8299 [details] experimental ehci unlink patch Yeech, VIA again. We know there are hardware issues with this, it doesn't issue some IRQs it's supposed to issue. Try the patch I've attached here, which changes how those hardware issues get worked around ... maybe it will help, maybe not. Also, with CONFIG_USB_DEBUG, when it's getting this overheat thing, please look at /sys/class/usb_host/.../registers for that controller (the file will say inside that it's EHCI). Look at it several times, see if its contents are changing during this overheat thing, and please attach a copy of it (plus a description of any changes you noticed).
Compiled two 2.6.17-rc6-git4 kernels with CONFIG_USB_DEBUG and used the patch on one of them. Wrote a script to capture the data you requested - I'm in no position to 'notice' changes in this area, except the temperature. Unfortunately the temp didn't stay put with the patched kernel. Testing procedure was: Cold boot. Wait for the core CPU temp to reach 49C. Start script. Wait 1 minute. Plug in HD (no mount). Wait 1 minute. Unplug HD. Search for "plugged" to find the crossover data points. #!/bin/sh if ! grep -q ehci /proc/modules; then modprobe ehci_hcd fi echo "" >usb-test.txt echo "USB Test Begin" >>usb-test.txt echo "**********" >>usb-test.txt touch /root/.usb while (true) do if [ -e /root/.usb ]; then if grep -q Prolific /proc/bus/usb/devices; then rm -f /root/.usb echo "" >>usb-test.txt echo "**********" >>usb-test.txt echo "HD plugged!" >>usb-test.txt echo "**********" >>usb-test.txt fi fi echo "----------" >>usb-test.txt date >>usb-test.txt echo "----------" >>usb-test.txt cat /proc/acpi/thermal_zone/*/temperature >>usb-test.txt cat /sys/class/usb_host/usb_host1/registers >>usb-test.txt if ! [ -e /root/.usb ]; then if ! grep -q Prolific /proc/bus/usb/devices; then echo "" >>usb-test.txt echo "**********" >>usb-test.txt echo "HD unplugged!" >>usb-test.txt echo "**********" >>usb-test.txt for i in 1 2 3 4 5; do echo "----------" >>usb-test.txt date >>usb-test.txt echo "----------" >>usb-test.txt cat /proc/acpi/thermal_zone/*/temperature >>usb-test.txt cat /sys/class/usb_host/usb_host1/registers >>usb-test.txt sleep 2s done rmmod ehci_hcd echo "" >>usb-test.txt echo "**********" >>usb-test.txt echo "ehci driver unloaded..." >>usb-test.txt echo "**********" >>usb-test.txt for i in 1 2 3 4 5; do echo "----------" >>usb-test.txt date >>usb-test.txt echo "----------" >>usb-test.txt cat /proc/acpi/thermal_zone/*/temperature >>usb-test.txt sleep 2s done exit fi fi sleep 2s done
Created attachment 8303 [details] usb-test-proper.txt unpatched kernel
Created attachment 8304 [details] usb-test-patched.txt patched kernel
Still broken in 2.6.18 final
Mats, any updates on the problem? How are the new releases working for you? Thanks, --Natalie
Natalie, linux-2.6.22-rc5-git3 under Ubuntu 7.04. No change. Running my test-script above shows the core CPU temperature rising from 43C to 45C eight seconds after the HD was plugged in. For me this is no longer a problem. I've done surgery on the notebook and installed passive cooling through various fins and plates on all hotspots, drilled extra ventilation holes and, most importantly, attached a variable resistor to the fan. The machine is whisper quiet.
This is great workaround, should be offered as a patch ;) But seriously, this way the test system is no longer available, Mats! can you please put is all back as it was before... Is there known erratas on this chipset? Maybe this problem needs to be brought to attention of ACPI people?
Eh... the test system is exactly as before in terms of _symptom_ (2 to 4 degrees core CPU temp rise on HD engagement through ehci_hcd), it's only the _consequence_ (high fan speed == noise) that has been mitigated through my hardware modifications. The changes are irrevocable, unless you want me to desolder resistors and plug forty one mm ventilation holes etc (it's a work of enginering art ;-) I don't know about chipset errata, but as you see in comment #1 the USB people know about VIA... From what I've seen on the net, VIA is not particularly friendly visavis open source developers. According to kernel sources, and other evidence, people worked under an NDA when eg doing IDE stuff for this southbridge (the VT8235). As for involving ACPI, I can't see a future there. The fan speeds seem to be controlled purely through hardware, reacting against certain temperature thresh-holds (nothing in /proc/acpi controls it).
Copying to Alan, to help sort out problem with EHCI and overheating. (and decide if to keep this bug open)
I have no idea what's wrong, other than the fact that some VIA EHCI chips are known to configure themselves incorrectly. It would be good to try 2.6.25-rc6; that kernel includes a fix for a problem known to affect lots of EHCI controllers (including VIA's). Also, a patch was submitted last week to prevent some of them from hogging the PCI bus (not applicable to the vt8235, unfortunately). Maybe something similar is needed to prevent the overheating. FYI, the bus-hogging patch is <http://marc.info/?l=linux-usb&m=120599996404777&w=2>