Bug 5186

Summary: performance drop down 300% and more with high cpu load
Product: Drivers Reporter: Pascal Cavy (p92)
Component: Hardware MonitoringAssignee: Jean Delvare (jdelvare)
Status: CLOSED DUPLICATE    
Severity: high CC: davej, jdelvare, nacc, zwane
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13 Subsystem:
Regression: --- Bisected commit-id:
Attachments: 2.6.10 data
2.6.12 data
2.6.13 data
some infos incliding mtrr for ubuntu kernel 2610-5 which works well
kernel.org 2.6.11.1 just after boot
kernel.org 2.6.11.1 after trying to work on kde
kernel.org 2.6.11.1 just after boot
2.6.11 .config
2.6.10 .config
kernel log of different boots
2.6.11-rc3 good perf
2.6.11-rc4 BAD PERFs
2.6.13.1 bad perf
2.6.13.2 GOOD PERF at boot - BAD after running kde
diffs between 2.6.11 rc3 and rc4
2.6.11-rc3 to rc4 diffstat
2.6.11-rc4 without FB nor RADEON DRM
Zwane's 2.6 megaconfig
2.6.11-rc4 megaconfig + corrections
results with corrected megaconfig
diffs between 2.6.11 rc3 and rc4 configs
tests in single user mode
2.6.14 config
data and bad perf with 2.6.14
current diffs between megaconfig and mine
small diff causing perf loss
bad perf with this config on 2.6.11-rc4
good perf on 2.6.11-rc4 with this config
good perf on 2.6.14 with this config - all other options are re enabled (drm framebuffer p4 etc)

Description Pascal Cavy 2005-09-04 16:57:39 UTC
Most recent kernel where this bug did not occur:  ubuntu kernel 2.6.10-5-686 
 
Distribution: ubuntu 
 
Hardware Environment: CPU Intel P4 1.7Ghz - 1.256 Gb RAM - Asus P4B Mb 
 
Software Environment:  
 
Problem Description:  
 
I encounter serious performance problems with all kernels older than ubuntu 
kernel 2.6.10-5-686. Even the latest kernel.org 2.6.13 have this problem. 
 
The first symptom that I noticed was that KDM login screen took more than 
30sec to appear when it took an average 3sec before. Then all applications 
were absolutly slower than on 2.6.10. 
 
I did a little bash shell loop program (testit attached in zzxxx files) to 
compare performances on the same basis. This same bash shell setting 100000 
times a variable takes 8sec on 2.6.10 kernel, 28sec on 2.6.12 or 2.6.13 !  So 
I think there is a serious problem on this kernel. 
 
Then I compiled my own 2.6.12 kernel with very reduced options adapted to my 
particular system. I'll join the .config for reference. 
 
But even with this kernel the performance problem is EXACTLY THE SAME !   
 
time ./testit still shows 28sec realtime on K2.6.12 instead of 8sec  on 
K2.6.10. 
 
Booting this kernel with acpi=off noacpi noapic nolapic    does not solve this 
problem either. 
 
Then I felt relieved when I see that :  
 
$ time ./testit  
  
real    0m6.474s  
user    0m6.211s  
sys     0m0.126s  
$ uname -a  
Linux cha92-7-82-230-174-61 2.6.12-5-686 #1 Thu Jul 28 09:25:12 UTC 2005 i686 
GNU/Linux  
  
Well problem seems to be solved by latest kernel 2.6.12-5-686.  
 
But it was not. Now the problem does not occur immediately after boot. 
Performance degrade quickly only if some processes take the CPU to high load, 
like kmail + spamd on lots of incomming messages, or compiling kde etc... 
 
After such CPU loads occur, any process tend to take 5 to 10% CPU ! So U 
understanf easilly that the system becomes quickly so slow that it becomes 
absolutly unusable. 
breezy kernel Linux version 2.6.12-7-686 (buildd@terranova) (gcc version 3.4.5 
20050809 (prerelease) (Debian 3.4.4-6ubuntu4)) #1 Fri Aug 19 13:08:28 UTC 2005  
  
After several weeks of running the different new kernels I still have this 
problem. All I can say about it is the following:  
- after boot everything is fine now (before, kdm was horribly slow to start, 
now it starts in 4 sec instead of 2 minutes)  
- residual CPU when not doing anything is around 1 to 2 %  
- after some time of use OR if I use 100% CPU during a couple minutes, 
residual CPU reaches near 30% all the time. in fact, top shows each running 
process taking  
around 10% CPU. After that the machine begins to be unusable of course.  
- a reboot fix the problem  
  
I thought of a cpu temperature problem (cpu is between 56-61 
Comment 1 Pascal Cavy 2005-09-04 17:01:05 UTC
Created attachment 5889 [details]
2.6.10 data

some data whith this kernel 2.6.10 that works well

xorg data are there because I thought it was xorg problem at first.
Comment 2 Pascal Cavy 2005-09-04 17:02:12 UTC
Created attachment 5890 [details]
2.6.12 data

perf problems when using 2.6.12
Comment 3 Nishanth Aravamudan 2005-09-04 17:02:38 UTC
First of all, distro kernels are not going to be too helpful to reference. Does
kernel.org 2.6.10 display the same problem? What was the last *kernel.org*
kernel which did not display the performance issues? Does anything show up in
the logs? What's your .config? Any binary drivers?

Thanks,
Nish
Comment 4 Pascal Cavy 2005-09-04 17:09:06 UTC
Created attachment 5891 [details]
2.6.13 data

perf problem with vanilla 2.6.13

more data to come
Comment 5 Andrew Morton 2005-09-05 02:49:00 UTC
Odd.  Initial blame always falls to mtrr settings.   Can you please
generate copies of /proc/mtrr for both good and bad kernels?

I wonder if this could be caused by cpufreq going bananas?
Comment 6 Pascal Cavy 2005-09-07 03:14:30 UTC
Created attachment 5922 [details]
some infos incliding mtrr for ubuntu kernel 2610-5 which works well

I am currently compiling and testing kernel.org kernels 2.6.10 2.6.11 ... to
find where the problem begins. please be patient
Comment 7 Pascal Cavy 2005-09-07 05:10:38 UTC
Created attachment 5923 [details]
kernel.org 2.6.11.1 just after boot

2.6.11.1 just after boot when everything idles (X11 is started and kdm login
screen too)
perf is correct  

now I start KDE ...
Comment 8 Pascal Cavy 2005-09-07 05:15:10 UTC
Created attachment 5924 [details]
kernel.org 2.6.11.1 after trying to work on kde

now kde tries to startttt  to staaaaarrt to staaaaaaaarrrrrtttttttt
after 10 minutes it has started but is entirely unusable
you can see the windows refresh draw...

I log out kde, wait for everything to be idle like in the preceeding comment
(kdm login screen) and take a report

you can see that per has dropped near 500%  
nothing runs and testit takes 28sec instead of 6sec after boot.

so for the moment something weird happens between kernel.org 2.6.10 and
2.6.11-1.
Comment 9 Zwane Mwaikambo 2005-09-07 06:57:15 UTC
Thanks for narrowing it down! I presume that 2.6.11 is also buggy?
Comment 10 Pascal Cavy 2005-09-07 08:01:53 UTC
Created attachment 5925 [details]
kernel.org 2.6.11.1 just after boot

well this is worse with 2.6.11 as the perf problems appear immediately after
boot
Comment 11 Pascal Cavy 2005-09-07 08:03:38 UTC
Created attachment 5926 [details]
2.6.11 .config
Comment 12 Pascal Cavy 2005-09-07 08:04:28 UTC
Created attachment 5927 [details]
2.6.10 .config
Comment 13 Pascal Cavy 2005-09-07 08:10:10 UTC
Created attachment 5928 [details]
kernel log of different boots
Comment 14 Pascal Cavy 2005-09-07 08:18:48 UTC
Summury as of today : 
 
k2.6.10 perf stable 
k2.6.11 bad perf start at boot 
k2.6.11-1 - 2.6.13 perf ok at boot but degrades after solliciting high CPU 
 
- no binary driver 
- no significant msg in logs afaik 
- noacpi acpi=off noapic nolapic boot option does not solve the problem 
- how can cpufreq affect me since I have no process setting it (packages have 
been removed from my system since 2.6.11, that was the first thing I thought 
of) 
- am I the only one to see that ! it cannot go unnoticed ! is it MB arch 
dependent ? 
 
 
** I will try 2.6.11rc1 
please suggest me what to do to narrow search down more. thanks. 
Comment 15 Pascal Cavy 2005-09-08 02:10:32 UTC
kernel.org 2.6.11rc1 works 
 
will try rc3 
Comment 16 Pascal Cavy 2005-09-09 00:59:41 UTC
rc3 works 
 
rc5 have perf problem 
 
trying rc4... 
Comment 17 Pascal Cavy 2005-09-09 03:18:04 UTC
OK the culprit is hidding since 2.6.11-rc4 ! 
I will join my 2 reports for final compare :  
zz2.6.11-rc3 
zz2.6.11-rc4 
 
another example : 
in rc3   top takes 1% cpu 
in rc4   top takes 5% cpu ! 
 
 
Comment 18 Pascal Cavy 2005-09-09 03:18:54 UTC
Created attachment 5945 [details]
2.6.11-rc3 good perf
Comment 19 Pascal Cavy 2005-09-09 03:19:40 UTC
Created attachment 5946 [details]
2.6.11-rc4 BAD PERFs
Comment 20 Pascal Cavy 2005-09-15 15:51:30 UTC
Created attachment 6039 [details]
2.6.13.1 bad perf
Comment 21 Pascal Cavy 2005-09-20 17:08:01 UTC
Created attachment 6063 [details]
2.6.13.2 GOOD PERF at boot - BAD after running kde

Don't know what was wrong  but this kernel is perfect, same good perfs for me 
as 2.6.10
Comment 22 Pascal Cavy 2005-09-21 17:12:47 UTC
no no no  forget my joy on 2.6.13.2  - after using kde on it  performance is
badly impacted also even after stopping kde and xorg

So i'm still stuck with 2.6.11-rc3 !
Comment 23 Zwane Mwaikambo 2005-09-24 20:21:52 UTC
Could you please try disabling CONFIG_DRM_RADEON?
Comment 24 Pascal Cavy 2005-09-25 13:36:07 UTC
I tried this suggestion with kernel 2.6.13.1 and 2.6.13.2. No drm module is 
compiled and no drm module is loaded. 
 
The result is unchanged, very bad perf, going from tesit taking 6 sec 
immediatly after boot to 13mn20  (yes 13 minutes !) under kde loaded. 
stopping kde and xorg takes this test down to 30secs. I can have 20secs if I 
go to single user mode with minimal processes. but I cannot return back to the 
excellent 6secs perf. 
 
with 2.6.11-rc3  perf is always aroung 9secs  regardless of the system load. 
 
very weird... 
Comment 25 Pascal Cavy 2005-09-25 13:39:21 UTC
I meant no radeon drm module...  
 
This is the timing under kde, notice the great difference between user and 
real time: 
$ time ./testit 
real    13m20.337s 
user    0m54.559s 
sys     0m1.684s 
 
Comment 26 Pascal Cavy 2005-09-25 13:49:09 UTC
$ time ./testit  
  
real    1m41.005s  
user    0m44.523s  
sys     0m1.192s  
  
under kde but with no load from other kde applications.  
Comment 27 Pascal Cavy 2005-10-09 03:41:17 UTC
Created attachment 6262 [details]
diffs between 2.6.11 rc3 and rc4

This is the result of the following shell showing diff -u between .c and .h in
2.6.11-rc3 and 2.6.11-rc4 

it might suggest ideas...

#!/bin/bash
find linux-2.6.11-rc3 -name '*.[ch]' | while read a
do
	f=$(echo $a | cut -d'/' -f2-)
	diff -u $a linux-2.6.11-rc4/$f
	if [ $? -eq 0 ]
	then
		 continue
	fi
	echo
'_______________________________________________________________________________'

	echo
done
Comment 28 Zwane Mwaikambo 2005-10-16 00:42:45 UTC
Could you please try turning off CONFIG_FB as well as Radeon DRM.
Comment 29 Zwane Mwaikambo 2005-10-16 00:43:57 UTC
Created attachment 6309 [details]
2.6.11-rc3 to rc4 diffstat
Comment 30 Pascal Cavy 2005-10-16 11:54:54 UTC
Created attachment 6314 [details]
2.6.11-rc4 without FB nor RADEON DRM 

No more luck with this - still 30 sec immediately after boot. the slowness was
visible during boot where boot steps are really slower than with rc3. I dont
feel this has to do with graphics or video...
Comment 31 Zwane Mwaikambo 2005-10-16 21:45:18 UTC
Created attachment 6322 [details]
Zwane's 2.6 megaconfig
Comment 32 Zwane Mwaikambo 2005-10-16 21:46:37 UTC
The diffstat isn't exactly revealing and DRM/FB were two suspects, that's why i
singled them out. Please test the config called "Zwane's 2.6 megaconfig".
Comment 33 Pascal Cavy 2005-10-20 14:01:37 UTC
Created attachment 6349 [details]
2.6.11-rc4 megaconfig + corrections 

make xconfig corrected some options in your megaconfig
I added bttv and usbmouse too, after verifying the initial corrected config was
working right.

Result is perfect.
Comment 34 Pascal Cavy 2005-10-20 14:03:32 UTC
Created attachment 6350 [details]
results with corrected megaconfig

after boot  perf is OK 6 seconds for my testit program
usage under KDE is perfect.
Comment 35 Pascal Cavy 2005-10-20 14:40:52 UTC
Created attachment 6351 [details]
diffs between 2.6.11 rc3 and rc4 configs

now where is the culprit ?  pentiumII instead of pentiumIV ?
Comment 36 Zwane Mwaikambo 2005-10-20 19:18:38 UTC
Hmm, can you try turn off CONFIG_AUDIT?
Comment 37 Pascal Cavy 2005-10-24 15:38:17 UTC
Created attachment 6382 [details]
tests in single user mode

I tried removing the following options in order, one by one with no success
beginning from the rc3 config

- pentiumII
- no drm
- no fb
- no audit
- no up_apic

at this point I only see an improvement in perf after boot
as is demonstrated in the attached file
after boot in single user mode to avoid graphics and kde environnement my test
loop takes 4 secs
just compiling a kernel for 2 minutes makes this same loop take FIVE times more
ie 20 secs !

Maybe we can take your config as a starting point and u suggest me to add
options one by one to see at what point it breaks perf ?
Comment 38 Zwane Mwaikambo 2005-10-24 17:42:40 UTC
Can you try without CONFIG_HPET_TIMER?
Comment 39 Pascal Cavy 2005-10-25 14:58:28 UTC
not better with :  
- pentiumII  
- no drm  
- no fb  
- no audit  
- no up_apic  
- no hpet timer  
Comment 40 Zwane Mwaikambo 2005-10-25 15:06:04 UTC
Without CONFIG_X86_PM_TIMER?
Comment 41 Pascal Cavy 2005-10-25 17:46:06 UTC
not better with :   
- pentiumII   
- no drm   
- no fb   
- no audit   
- no up_apic   
- no hpet_timer  
- no hpet 
Comment 42 Pascal Cavy 2005-10-25 17:47:13 UTC
  
not better with :    
- pentiumII    
- no drm    
- no fb    
- no audit    
- no up_apic    
- no hpet_timer   
- no hpet 
- no x86_pm_timer  
 
Comment 43 Zwane Mwaikambo 2005-10-25 18:41:47 UTC
How about disabling CONFIG_HIGHMEM
Comment 44 Pascal Cavy 2005-10-30 17:35:17 UTC
  
not better with :    
- pentiumII    
- no drm    
- no fb    
- no audit    
- no up_apic    
- no hpet_timer   
- no hpet 
- no x86_pm_timer  
- no highmem

I also gave a try to 2.6.14 with my initial config : perf is bad after boot (29
sec).
Comment 45 Pascal Cavy 2005-10-30 17:39:07 UTC
Created attachment 6424 [details]
2.6.14 config
Comment 46 Pascal Cavy 2005-10-30 17:40:49 UTC
Created attachment 6425 [details]
data and bad perf with 2.6.14
Comment 47 Zwane Mwaikambo 2005-10-30 23:21:11 UTC
Could you take my mega config, make your necessary changes to boot/use your
system (only select the options you really require here please) and then diff
the changes between my mega config and your new config.
Comment 48 Pascal Cavy 2005-10-31 12:43:01 UTC
Created attachment 6429 [details]
current diffs between megaconfig and mine

you will notice that I also tried with SMP + SMT and MAXCPU=2 as in your
megaconfig, with no improvement.
anyway this is the current diff between our 2 configs.
Comment 49 Pascal Cavy 2005-11-04 07:01:23 UTC
Created attachment 6473 [details]
small diff causing perf loss

this is my latest smallest diff of 2.6.11-rc4 configs that is causing perf loss
on my machine.
So it seems related to either I2C or SENSORS modules.

I have configured the same on kernel 2.6.14 and I can obtain a viable
performant kernel when removing all i2c and sensors modules.
Comment 50 Pascal Cavy 2005-11-04 07:02:18 UTC
Created attachment 6474 [details]
bad perf with this config on 2.6.11-rc4
Comment 51 Pascal Cavy 2005-11-04 07:02:59 UTC
Created attachment 6475 [details]
good perf on 2.6.11-rc4 with this config
Comment 52 Pascal Cavy 2005-11-04 07:04:35 UTC
Created attachment 6476 [details]
good perf on 2.6.14 with this config - all other options are re enabled (drm framebuffer p4 etc)
Comment 53 Pascal Cavy 2005-11-04 07:09:21 UTC
can my problem be related to this ? 
http://www2.lm-sensors.nu/~lm78/cvs/lm_sensors2/prog/hotplug/README.p4b  
Comment 54 Pascal Cavy 2005-11-04 07:33:13 UTC
Or is this patch harmfull to me ? (I use this chip as a sensor on my MB) 
 
--- linux-2.6.11-rc3/drivers/i2c/chips/w83781d.c	2005-02-03 
02:54:37.000000000 +0100 
+++ linux-2.6.11-rc4/drivers/i2c/chips/w83781d.c	2005-02-13 
04:04:47.000000000 +0100 
Comment 55 Zwane Mwaikambo 2005-11-04 08:14:33 UTC
Does backing out that patch make a difference?
Comment 56 Pascal Cavy 2005-11-04 14:19:40 UTC
latest tests I have done shows that kernel slowness on my motherboard is  
triggered ONLY if w83781d is activated DURING BOOT phase (from /etc/modules).  
  
Modprobing it after manually does not produce the bad effects.  
  
This is very weird.  
  
Concerning the patch, I just looked at my diff listing from comment #27 for 
w83781d and found it there, but I don't know where the corresponding patch is 
and how I can disable it from compiling. It is out of my skills ...   
  
Comment 57 Nishanth Aravamudan 2005-11-04 14:42:16 UTC
Jean Delvare thinks this may be duplicate of a bug
(http://bugzilla.kernel.org/show_bug.cgi?id=4332) reported in the 2.6.11 cycle,
with similar h/w and symptoms. I'm going to go ahead and make him the owner, so
he can narrow down if it's an i2c issue.

Thanks,
Nish
Comment 58 Jean Delvare 2005-11-04 14:42:37 UTC

*** This bug has been marked as a duplicate of 4332 ***
Comment 59 Jean Delvare 2005-11-04 14:54:50 UTC
Pascal:

The performance drop is most likely caused by CPU throttling, itself due to your
hardware monitoring chip (Asus AS99127F) to erroneously asserting a CPU
overheating alert condition. This condition is itself triggered by "sensors -s"
which must be run by one of your initialization scripts right after loading the
w83781d driver.

"sensors -s" programs the temperature limits according to data it finds in
/etc/sensors.conf. This file requires an update due to a fix which was done to
the w83781d driver in 2.6.11-rc4. Pick a fresh copy from that file from the
lm_sensors project and your performance problem should belong to the past.

As a quick test, you can simply move /etc/sensors.conf away and reboot your
system with any kernel. If my guess is correct, your system should run just fine.
Comment 60 Pascal Cavy 2005-11-05 00:19:08 UTC
hummm damn it, 6 months searching for problem already in the database :(  
  
yes it's is true, now I remember looking in sensors.conf because my cpu temp  
was bad and choosing this line to correct it under as99127f-* :  
 compute temp2 (@*30/43)+25, (@-25)*43/30  
  
then this line stayed asis during ubuntu lm-sensors upgrades...  
  
commenting this line fixed the problem :)   
  
thanks all for your time.   
I updated the pending bug I opened on ubuntu  
https://bugzilla.ubuntu.com/show_bug.cgi?id=12641  
Comment 61 Jean Delvare 2005-11-05 01:10:20 UTC
Rather than commenting out the line, you can use this one instead for 2.6.11+
kernels:

   compute temp2 (@*15/43)+25, (@-25)*43/15

Basically the same, with "15" instead of "30". This should give you correct
temperature readings again.