Bug 9745 - CPU is waking up too often
Summary: CPU is waking up too often
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: platform_i386
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-14 04:43 UTC by Patrick Schoenfeld
Modified: 2009-03-24 07:33 UTC (History)
8 users (show)

See Also:
Kernel Version: 2.6.24-rc7
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
kernel config (78.65 KB, application/octet-stream)
2008-01-14 05:45 UTC, Patrick Schoenfeld
Details
bootlog (26.07 KB, application/octet-stream)
2008-01-14 05:45 UTC, Patrick Schoenfeld
Details
output of cat /proc/timer_list (2.13 KB, application/octet-stream)
2008-01-14 05:46 UTC, Patrick Schoenfeld
Details
output file of at /proc/interrupts >irqs.txt; sleep 10; cat /proc/interrupts >>irqs.txt (1.76 KB, text/plain)
2008-01-14 06:06 UTC, Patrick Schoenfeld
Details
kernel config (the right one :o) (84.03 KB, application/octet-stream)
2008-01-14 08:43 UTC, Patrick Schoenfeld
Details
ircqs with max_cstate=1 (1.68 KB, text/plain)
2008-01-14 23:59 UTC, Patrick Schoenfeld
Details
irqs with max_cstate=2 (1.68 KB, text/plain)
2008-01-15 00:00 UTC, Patrick Schoenfeld
Details
DIFF between failing and non-failing configuration (105.02 KB, application/octet-stream)
2008-09-12 01:19 UTC, Patrick Schoenfeld
Details
diff between loaded modules in single- and multiuser-mode (3.94 KB, application/octet-stream)
2008-09-12 01:23 UTC, Patrick Schoenfeld
Details

Description Patrick Schoenfeld 2008-01-14 04:43:38 UTC
Latest working kernel version: 2.6.22
Earliest failing kernel version: 2.6.24-rc7
Distribution: Debian GNU/Linux Sid (Unstable)
Hardware Environment:
CPU: Mobile AMD Sempron(tm) Processor 3300+
System is a Fujitsu Siemens K7610 mobile system.
lspci:
00:00.0 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South]
00:09.0 CardBus bridge: ENE Technology Inc CB1410 Cardbus Controller (rev 01)
00:0a.0 Ethernet controller: Atheros Communications, Inc. AR5005G 802.11abg NIC (rev 01)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem Controller (rev 80)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 VGA compatible controller: VIA Technologies, Inc. S3 Unichrome Pro VGA Adapter (rev 01)

Software Environment:
Debian Sid, Xorg with openchrome drivers from openchrome.org, Software typical for a Desktop system

Problem Description:
With 2.6.22 according to powertop my system is waking up between 55times a second and 200 times a second with an average at about 80 times a second.
When I now boot 2.6.24rc7 this number increases to an average of 2500 wakeups/second with (seldom but happening) tops above the 10000 wake-ups/second mark (to top values of 28000 wake-ups/second!!). This way the CPU is always running at C0 (99,2%).

Top causes for wakeups (according to powertop):
  42.6% ( 17.6)       <interrupt> : extra timer interrupt 
  21.8% (  9.0)       <interrupt> : ide1
   4.8% (  2.0)       <interrupt> : acpi
   4.1% (  1.7)           xfsbufd : schedule_timeout (process_timeout)
   3.4% (  1.4)              Xorg : do_setitimer (it_real_fn)
   2.4% (  1.0)            dhcdbd : schedule_timeout (process_timeout)
   2.4% (  1.0)         nm-applet : schedule_timeout (process_timeout)
   1.5% (  0.6)     <kernel core> : neigh_table_init_no_netlink (neigh_periodic_timer)
   1.2% (  0.5)          ifconfig : __netdev_watchdog_up (dev_watchdog)
   1.2% (  0.5)   hald-addon-stor : schedule_timeout (process_timeout)
   1.2% (  0.5)    NetworkManager : schedule_timeout (process_timeout)
   1.2% (  0.5)       gnome-panel : schedule_timeout (process_timeout)
   1.2% (  0.5)   <kernel module> : neigh_table_init_no_netlink (neigh_periodic_timer)
   1.2% (  0.5)     <kernel core> : queue_delayed_work_on (delayed_work_timer_fn)
   0.7% (  0.3)       <interrupt> : eth0
   0.7% (  0.3)     <kernel core> : neigh_update (neigh_timer_handler)
   0.7% (  0.3)   <kernel module> : register_sound_midi (irlmp_discovery_timer_expired)
   0.7% (  0.3)   gnome-cups-icon : schedule_timeout (process_timeout)
   0.7% (  0.3)   gnome-power-man : schedule_timeout (process_timeout)
   0.5% (  0.2)       <interrupt> : ide0
   0.5% (  0.2)    mapping-daemon : schedule_timeout (process_timeout)
   0.5% (  0.2)   gnome-cups-icon : futex_wait (hrtimer_wakeup)
   0.5% (  0.2)              init : schedule_timeout (process_timeout)
   0.5% (  0.2)             cupsd : schedule_timeout (process_timeout)
   0.5% (  0.2)          nautilus : schedule_timeout (process_timeout)
   0.5% (  0.2)   gnome-settings- : schedule_timeout (process_timeout)
   0.5% (  0.2)   <kernel module> : ide_do_rw_disk (ledtrig_ide_timerfunc)
   0.2% (  0.1)            urxvtd : schedule_timeout (process_timeout)
   0.2% (  0.1)           syslogd : do_setitimer (it_real_fn)
   0.2% (  0.1)          gconfd-2 : schedule_timeout (process_timeout)
   0.2% (  0.1)     <kernel core> : page_writeback_init (wb_timer_fn)

Steps to reproduce:

Boot system with 2.6.24-rc7
Comment 1 Thomas Renninger 2008-01-14 05:12:08 UTC
This has nothing to do with cpufreq, but with timers.
I also saw a similar bugreport somewhere else.

I cannot help here (but would like to learn how to debug this...), if you have some luck Ingo Molnar or Thomas Gleixner can give you some hints how to debug this.

I expect this has been introduced (just a guess, git should help) by the clockevents/highres/tickless timer patches. If you could prove this with git, you should get some help from the author of the patchsets...
Comment 2 Thomas Gleixner 2008-01-14 05:32:08 UTC
Patrick, can you please upload your .config, a full boot log and the output of /proc/timer_list ?

Thanks,
       tglx
Comment 3 Patrick Schoenfeld 2008-01-14 05:45:15 UTC
Created attachment 14450 [details]
kernel config
Comment 4 Patrick Schoenfeld 2008-01-14 05:45:43 UTC
Created attachment 14451 [details]
bootlog
Comment 5 Patrick Schoenfeld 2008-01-14 05:46:09 UTC
Created attachment 14452 [details]
output of cat /proc/timer_list
Comment 6 Patrick Schoenfeld 2008-01-14 05:48:12 UTC
Thomas Renninger,
yeah, you are right. It first looked like cpufreq scaling wouldn't happen and I
didn't really knew which category would fit. Neither do I do really know now.
(Subsections of Timers is quiet confusing for me).

Thomas Gleixner,
I'm not exactly sure what do you mean by "full boot log"? A full boot log of
the problematic kernel or a log including boots of working kernels? I've added
the part of /var/log/kern.log where I booted the kernel the last time.

Hope that helps.

Regards,
Patrick
Comment 7 Thomas Gleixner 2008-01-14 06:01:02 UTC
> I'm not exactly sure what do you mean by "full boot log"? A full boot log of
> the problematic kernel or a log including boots of working kernels? I've
> added
> the part of /var/log/kern.log where I booted the kernel the last time.

boot log of the problematic kernel. kern.log is fine. The output of
dmesg is easier to read though.

According to the output of /proc/timer_list there is no fast rearming
timer, so it's likely to be something else.

Can you please do the following:

# cat /proc/interrupts >irqs.txt; sleep 10; cat /proc/interrupts >>irqs.txt

and upload irqs.txt 

Thanks,
	tglx
Comment 8 Patrick Schoenfeld 2008-01-14 06:06:29 UTC
Created attachment 14453 [details]
output file of at /proc/interrupts >irqs.txt; sleep 10; cat /proc/interrupts >>irqs.txt

Here you are.
Comment 9 Thomas Gleixner 2008-01-14 06:27:51 UTC
Hmm. The deltas are:                                                            
                                                                                
   0:   742                                                                     
  14:     4                                                                     
  15:    90                                                                     
  18:     1                                                                     
 LOC:   724                                                                     
-----------                                                                     
       1561                                                                     
                                                                                
1561/10 ~= 156 interrupts/sec                                                   
                                                                                
Not amazing, but not really in the range of 2500+.                              
Comment 10 Patrick Schoenfeld 2008-01-14 06:38:52 UTC
Hmm. So probably this really is a powertop issue. However why isn't it triggered by 2.6.22? Btw. you can see [1] for an example on a powertop output, where the value is beyond 16000.

[1] http://just-imho.net/16000wakeupspersecond.png
Comment 11 Thomas Renninger 2008-01-14 06:46:05 UTC
Maybe this is related to:
https://bugzilla.novell.com/show_bug.cgi?id=335086
and
https://bugzilla.novell.com/show_bug.cgi?id=334170 (there only after s2ram case)
and
https://bugs.launchpad.net/ubuntu/+bug/145377

there also powertop shows extrem interrupt activity and blacklisting yenta_socket driver helps.
Comment 12 Patrick Schoenfeld 2008-01-14 06:51:38 UTC
Yeah, indeed it seems related to 335086 @ Novell Bugzilla. If I do rmmod processor (i need to force it but that just besides) then the situation is relative normal with an average of 100-200 wakeups. Thats better then 2500, but still worse as with 2.6.22 (there the average is 80 and sometimes less then 50)
Comment 13 Thomas Renninger 2008-01-14 07:01:50 UTC
Then a workaround for you should be processor.max_cstate=2 boot param, if this does not help processor.max_cstate=1.
But it would be great to know more about the root cause (and get this fixed to fully support the C-states).
Could you try to keep processor driver without max_cstate workaround and blacklist yenta_socket, e.g. add a line:
install yenta_socket /bin/echo "Yenta socket blacklisted, it will not get loaded"
to /etc/modprobe.conf
I already asked Arjan about the yenta_socket problem and he said it would be the yenta driver's fault... As there seem to be quite some people affected, it would be great to bring them together and find out a bit more about the problem (Did this machine ever worked correctly with C-states? If yes, one might want to start a git bisect...).
Comment 14 Patrick Schoenfeld 2008-01-14 07:19:49 UTC
Thomas Renninger,

the current workaround for me is to use 2.6.22 which seems not to be affected by the problem. However in my case it does help nothing to blacklist yenta_socket.
Well, I haven't yet tried the max_cstate thing as it requires me to reboot again.
Comment 15 Thomas Gleixner 2008-01-14 07:40:09 UTC
> the current workaround for me is to use 2.6.22 which seems not to be affected
> by the problem. However in my case it does help nothing to blacklist
> yenta_socket.
> Well, I haven't yet tried the max_cstate thing as it requires me to reboot
> again.

If you decide to reboot, could you please try to capture something,
which shows us the large number of interrupts (via the shell command I
gave you before).

I looked at your .config again. That's the .22 one. Is the .23
significantly different? Can you please upload it as well.

Thanks,
	tglx
Comment 16 Patrick Schoenfeld 2008-01-14 08:42:03 UTC
(In reply to comment #15)
> If you decide to reboot, could you please try to capture something,
> which shows us the large number of interrupts (via the shell command I
> gave you before).

You mean With those above stated workarounds?

> I looked at your .config again. That's the .22 one. Is the .23
> significantly different? Can you please upload it as well.

Oh. You are right. I'll attach the one for .24. Sorry, my mistake.
However I don't have a .23 version ;)
Comment 17 Patrick Schoenfeld 2008-01-14 08:43:04 UTC
Created attachment 14454 [details]
kernel config (the right one :o)
Comment 18 Patrick Schoenfeld 2008-01-14 23:58:18 UTC
Okay I did some additional testing. With the cstate options from above I experience the following result (NOK = NOT OK)

max_cstate = 2 = NOK
max_cstate = 1 = NOK (increased (!) value in powertop!)

I have created files with the cat /proc interupts.. command again and will attach them in a minute.
Comment 19 Patrick Schoenfeld 2008-01-14 23:59:54 UTC
Created attachment 14460 [details]
ircqs with max_cstate=1
Comment 20 Patrick Schoenfeld 2008-01-15 00:00:23 UTC
Created attachment 14461 [details]
irqs with max_cstate=2
Comment 21 Patrick Schoenfeld 2008-01-15 00:08:02 UTC
BTW. interesting: If I do cat /proc.. on a 2.6.22 system the interrupts/sec is even higher. So is this probably really a bug in powertop?
Comment 22 Arjan van de Ven 2008-01-15 06:35:40 UTC
max_cstate=1 is a red herring btw
the kernel doesn't count wakeups for C1 so powertop doesnt get to see them either; that makes it a counting game not something reflecting what is actually happening on the system
Comment 23 Natalie Protasevich 2008-06-03 22:28:53 UTC
What is the status on this problem, still exists with recent kernel?
Thanks.
Comment 24 Dionisus Torimens 2008-06-13 20:10:05 UTC
I seem to have the same problem on a completely different system,
Acer Extensa 5220, Centrino M530, x64 vanilla kernels.
tested in 2.6.24, 2.6.25.6 and 2.6.26-rc6.

* rmmod -fw processor helps
lsmod after removing thermal and rmmod still shows me:
"processor              26864  1"
but the wakes go down from ~50000 to > 200.

* blacklisting yenta_socket, pcmcia, rsrc does not help

The interesting thing for my sytem is that the problem *stops* after suspending and resuming from s3. It then remains fixed until next real reboot (not kexec).

Just let me know if I should post some things here or open a new bug report.
Comment 25 Dionisus Torimens 2008-06-13 22:16:54 UTC
Oh and the problem starts immediately with loading the processor module and it doesn't matter what max_cstate parameter I give. And it always gets stuck while trying to remove it, lsmod shows it's being used, but not by whom.

On full load(unpacking to ramdisk) the wakes go down to around 1000-2000.

While top shows ~95 % idle, Powertop often tells me 
C2                0.0ms (80.3%)
Wakeups-from-idle per second : 47272.8  interval: 5.0s.

I guess it must be something generic in the processor code?
Comment 26 Dionisus Torimens 2008-06-14 17:01:55 UTC
Btw. the problem for me does *not* start immediately after loading the processor module, but within one of two minutes afterwards. Thus it may well be that it is in fact triggered by another component.

But it is then triggered only while the processor module is loaded. I could also notice a spike of 2+ Watts in battery usage when the wakes/s went up from 50-200 to 40000+, so something really does change.

And while the bug is not yet triggered, powertop shows 100% C0.
Comment 27 Natalie Protasevich 2008-06-14 20:47:51 UTC
Dionisus, since this is Intel processor, you should file another bug. 
Can you attach your boot trace and /proc/interrupts also please.
Comment 28 Dionisus Torimens 2008-06-15 01:34:51 UTC
Posted my (probably related) bug at http://bugzilla.kernel.org/show_bug.cgi?id=10914
Comment 29 Thomas Gleixner 2008-09-04 11:11:53 UTC
Patrick, can you confirm http://bugzilla.kernel.org/show_bug.cgi?id=10914#c74 ?
Comment 30 Dionisus Torimens 2008-09-04 12:57:39 UTC
@Thomas: Well, that patch wouldn't work on his system. It's specificly for my system.

Patrick, please try booting with 2.6.26 and the kernel command line "idle=nomwait" and see it that helps. 

Knowing if that fixes it might help to find the root cause of the problem.
Comment 31 Thomas Gleixner 2008-09-04 15:07:09 UTC
Sorry my bad.
Comment 32 Patrick Schoenfeld 2008-09-09 03:11:18 UTC
Uh, sorry that I didn't come back to you. I thought that the Debian Kernel maintainers would notify you that the bug is fixed for me, when I closed the bug I opened there. Regardless of this: My fault. However, the problem disappeared with some version of 2.6.26. Don't know which one it has been, sorry. So feel free to close this issue, unless their are still other people affected by something similar.
Comment 33 Dionisus Torimens 2008-09-09 10:35:54 UTC
Hi Patrick,

actually there are still people affected by very similar problems.
And it is very hard to tell though, what the root cause of the problems is, even if they can be fixed. You can mark the bug as resolved for now, but please don't close it yet.

If you could do a git bisect to find out which patch fixed the problem for you that might be very helpful. If you could further narrow down the versions of 2.6.26 in which it happened, the bisect shouldn't take too long. 

I could speed up the process by 
1. setting all drivers needed to boot the kernel as builtin (not modules) in make menuconfig
2. then doing just "make bzImage" instead of the whole make (leaving away the all modules)
3. and then booting with init=/bin/bash.

That should speed up the testing process a lot.

I will ask ykzhao to confirm if a kernel git bisect would be useful.
Comment 34 ykzhao 2008-09-09 18:39:11 UTC
Hi, Patrick
    Your system is based on AMD processor, which is different with Acer 5220. On the Acer 5220 the mwait is supported by the Intel processor. I am not sure whether the mwait is also supported by AMD processor.
    If not supported, the boot option of "idle=nomwait" will have no effect.
    What Dionisus said is right. Maybe the issue can be fixed by the patch.(For example: the issue on Acer5220 can be fixed by adding the boot option of "idle=nomwait").But I can't identify the root cause of so many break events.

    Of course the kernel git bisect would be useful.
    Thanks.
Comment 35 Patrick Schoenfeld 2008-09-10 00:07:04 UTC
Is there some documentation available regarding the git bisect process? I'm not familiar with git myself at all, so this isn't that easy for me.

Besides that: I'm currently running a different system (distro) on that laptop, as soon as I know how to use git I need to try weither the problem is reproducible on this system with the kernel I used (2.6.24-rc7 according to the original bug report) at all.

Apart from this I'm up to doing this, but I cannot give a timeframe, as I guess the process is still time intensive (building a kernel on my system needs time) and I can't work on it all and every day.
Comment 36 Dionisus Torimens 2008-09-10 01:00:13 UTC
Hi Patrick,

you will find a quick intro here http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ 
an the man page http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html

of course you first need to get the repository:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

The compiling takes most of the time. And yes, please first confirm that the machine does trigger the bug in that kernel.

Thanks
Comment 37 Thomas Gleixner 2008-09-10 01:32:34 UTC
> The compiling takes most of the time. And yes, please first confirm that the
> machine does trigger the bug in that kernel.

First thing we want to know is whether the problem still persists with
latest mainline (2.6.27-rc6).

Thanks,

	tglx
Comment 38 Dionisus Torimens 2008-09-10 05:09:14 UTC
(In reply to comment #37)
> First thing we want to know is whether the problem still persists with
> latest mainline (2.6.27-rc6).
As he wrote in #32, it disappeared during 2.6.26.
Comment 39 Patrick Schoenfeld 2008-09-10 05:10:35 UTC
Hmm. I cannot help with this confirmation. As I said the issue is fixed for me with 2.6.26.5.

However I'm trying to bi-sect the issue, but I'm running into trouble as I cannot get the old kernel where it happened to compile.

kernel/built-in.o: In function `getnstimeofday':
(.text+0x15f49): undefined reference to `__umoddi3'
kernel/built-in.o: In function `do_gettimeofday':
(.text+0x15fff): undefined reference to `__udivdi3'
kernel/built-in.o: In function `do_gettimeofday':
(.text+0x16022): undefined reference to `__umoddi3'
kernel/built-in.o: In function `timekeeping_resume':
timekeeping.c:(.text+0x16187): undefined reference to `__udivdi3'
timekeeping.c:(.text+0x161aa): undefined reference to `__umoddi3'
kernel/built-in.o: In function `update_wall_time':
(.text+0x16791): undefined reference to `__udivdi3'
kernel/built-in.o: In function `update_wall_time':
(.text+0x167b4): undefined reference to `__umoddi3'
kernel/built-in.o: In function `update_wall_time':
(.text+0x1684c): undefined reference to `__udivdi3'
kernel/built-in.o: In function `update_wall_time':
(.text+0x16876): undefined reference to `__umoddi3'
make: *** [.tmp_vmlinux1] Fehler 1

I think this is due to gcc 4.3 (I found a bug report for Debian with similar problem where it was found that this was fixed in a later version, but this does not help me much :) Any suggestions?
Comment 40 Patrick Schoenfeld 2008-09-10 05:12:37 UTC
Ah, to clear-up any confusions:
Probably the problem was already fixed with some version of 2.6.24. As this bug report was reported with 2.6.24-rc7 its even likely that the reported bug was fixed for me with the 2.6.24 final. However I first need to reproduce the bug with 2.6.24-rc7, which is making me trouble, because of the above stated problems.
Comment 41 Dionisus Torimens 2008-09-10 06:15:01 UTC
It looks like you need to first do a make clean in #39 and then make bzImage.
Otherwise you can --skip the commit that doesn't compile for you. git then does some magic to work around it and still do what it can to find the patch patch.

(the problem is a linkage problem, the kernel make system does not always detect what files have to be compiled newly and sometimes patches simply forget something and produce such a situation.)
Comment 42 Patrick Schoenfeld 2008-09-10 06:36:32 UTC
Well, no that is not the root of the problem. I tried that before I posted here.
But I found the information that the ABI interface of gcc changed with gcc 4.3 and linux <= 2.6.25 wasn't compatible with it so possibly I just need to compile with a gcc < 4.2. I'm trying that currently.
Comment 43 Patrick Schoenfeld 2008-09-10 06:52:19 UTC
That worked. Unfortunately I cannot reproduce the problem currently. It doesn't even happen on 2.6.24-rc7. Now IMHO there are two possible reasons:
Either my kernel contained some obscure option earlier, when I had the problem, which it now doesn't (I stripped down my .config a whole lot, because I hoped to decrease waiting time this way) or the problem was fixed with a newer powertop version. For now I will set the bug on UNREPRODUCIBLE, however I'll try to reproduce it by going with option 1. That means I will compare the two .configs and see if there are ACPI-related differences.
Comment 44 Patrick Schoenfeld 2008-09-10 08:20:21 UTC
Okay, I now figured that its somehow related to the config, because I built another kernel with the .config. That takes a long time, but leastwise I'm now able to start a bisect, because I have a version thats not working.
Comment 45 Dionisus Torimens 2008-09-10 09:25:15 UTC
Dear Patrick,

do you think you could find out what part of the config makes the difference. That would probably make the bisect much quicker, as you could pass the patch to the part of the kernel touched by the configuration change to git bisect and you would also have much less to try out.

Thanks for investing the time and effort of a bisection.
Comment 46 Patrick Schoenfeld 2008-09-10 10:44:07 UTC
Possibly. Unfortunately we talk about quiet a big difference:

[root@fixit]~# diff -u kernel-config kernel-config-old|diffstat
 kernel-config | 3604 ++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 2872 insertions(+), 732 deletions(-)

Any chance to reduce that?
Comment 47 D. Jansen 2008-09-10 11:21:47 UTC
diff  -w -B --suppress-common-lines ?

On Wed, Sep 10, 2008 at 7:44 PM,  <bugme-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9745
>
>
>
>
>
> ------- Comment #46 from schoenfeld@in-medias-res.com  2008-09-10 10:44
> -------
> Possibly. Unfortunately we talk about quiet a big difference:
>
> [root@fixit]~# diff -u kernel-config kernel-config-old|diffstat
>  kernel-config | 3604
> ++++++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 2872 insertions(+), 732 deletions(-)
>
> Any chance to reduce that?
>
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
>
Comment 48 Patrick Schoenfeld 2008-09-11 01:29:05 UTC
I don't see where this would help. More likely I thought about something like.. hmm.. components that clearly can be excluded. I'm not a kernel hacker, so my knowledge about internal kernel structures is limited.
Comment 49 Thomas Gleixner 2008-09-11 03:39:25 UTC
Can you please upload the diff ?
Comment 50 Patrick Schoenfeld 2008-09-12 01:19:16 UTC
Created attachment 17741 [details]
DIFF between failing and non-failing configuration

Yep.
Attached is the diff between the failing and the non-failing kernel. Note that kernel-config-old is the kernel config attached to this bug report originally (e.g. the big-sized-config) while kernel-config is the config I created a few days ago, where the problem does not happen.
Comment 51 Patrick Schoenfeld 2008-09-12 01:21:35 UTC
BTW. I made another observation that might help to reduce the scope a little bit. The problem does not happen in single user mode. I guess that is because only a small subset of kernel modules is loaded in this mode. So I think the candidates are limited to these differences. I'm attaching the diff in a moment.
Comment 52 Patrick Schoenfeld 2008-09-12 01:23:44 UTC
Created attachment 17742 [details]
diff between loaded modules in single- and multiuser-mode

As a note: No X was started. So X is most likely unrelated.
Comment 53 Thomas Gleixner 2008-09-12 06:12:52 UTC
 powernow_k8            14304  0 
-processor              36552  2 powernow_k8

Hmm, why is the processor module unloaded in multiuser mode ?
Comment 54 Patrick Schoenfeld 2008-09-12 06:19:18 UTC
Uhmm, it isn't. I just missed to remove the columns from the files which aren't of concern, therefore the diff also shows situations where different modules are using a given module. In this case the module is in use by the thermal module additional in mult-user mode, while it isn't in single-user mode.
Comment 55 Thomas Gleixner 2008-09-12 06:20:39 UTC
(In reply to comment #50)
DIFF between failing and non-failing configuration

The non failing one is a UP kernel which does not have APIC support, that
affects interrupt routing and the local apic timer is not used.

Please enable CONFIG_X86_UP_APIC in your minimal working one. That should make
the problem come back. If the problem shows again then please add
"nolapic_timer" to the kernel command line.
Comment 56 Thomas Gleixner 2008-09-12 06:24:56 UTC
(In reply to comment #54)
Can you please provide non edited diff. That's confusing the hell out of me :)
Comment 57 Alan 2009-03-24 07:33:52 UTC
Closing out dead bug (no further diff provided)

Note You need to log in before you can comment on or make changes to this bug.